Tuesday, March 5, 2024

Local LLM deployment framework - Ollama

Are you looking for an easy way to run LLM on your local environment? Ollama seems like a straightforward option. Open-WebUI (formerly known as Ollama-webui) even gives you a chatGPT-like web interface! To get Ollama up and running, you just need to follow a few simple steps.

Step 1: Install Ollama locally and deploy your preferred LLM. For example, you can run the command

 $ ollama run llama2 

to download the llama2 model. Ollama supports a variety of models available on ollama.com/library.

Step 2: Run Open-WebUI to get a web interface. To do this, run the command 

$ podman machine start, then 

$ podman run -d -p 3000:8080 --network slirp4netns:allow_host_loopback=true -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main.

This will start the Open-WebUI container and map port 3000 on the host to port 8080 in the container, allowing incoming HTTP traffic on port 3000 to be forwarded to the container. Now, open your browser and navigate to http://localhost:3000 to access the web interface. 


To validate how my local LLM environment works, I used the above podman run command as an input to see, whether it can explain. So my simple prompt was;

Input: 

Explain me the following command.

podman run -d -p 3000:8080 --network slirp4netns:allow_host_loopback=true -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main 


And here is the response! What would be your reaction on the response?

Response: 

The podman run command is a Docker-like command for running containers in the Podman environment. Here's a breakdown of the options used in the command you provided:
  • -d: Run the container in detached mode, meaning it will run in the background and not interact with the terminal. 
  • -p 3000:8080: Map port 3000 on the host to port 8080 in the container, allowing incoming HTTP traffic on port 3000 to be forwarded to the container.
  • --network slirp4netns:allow_host_loopback=true: Use the slirp4netns network plugin, which allows for host-to-container loopback connections. This is useful when developing applications that need to communicate with other containers or the host machine.
  • -v open-webui:/app/backend/data: Mount a volume from the open-webui directory inside the container to the /app/backend/data directory on the host. This allows data to be shared between the container and the host.
  • --name open-webui: Set the name of the container.
  • -restart always ghcr.io/open-webui/open-webui:main: Restart the container if it crashes or is terminated, using the ghcr.io/open-webui/open-webui:main image. The :main part of the image name indicates that this is the main image for the application. 
The ghcr.io domain is a container registry service provided by Google Cloud. It allows you to store and manage Docker images, including the open-webui image used in the command. 
In summary, the command runs a container using the slirp4netns network plugin, maps port 3000 on the host to port 8080 in the container, mounts a volume from the open-webui directory inside the container to the host's /app/backend/data directory, sets the name of the container, and restarts the container if it crashes or is terminated using the ghcr.io/open-webui/open-webui:main image.

Pretty cool, right?

  • Ollama - https://github.com/ollama/ollama
  • Open-WebUI (formally known as Ollama-webui) - https://github.com/open-webui/open-webui
Btw, I came across another local LLM studio for "discover, download, and run local LLMs" as well: https://lmstudio.ai/, fast moving space uh?


Some podman related commands that you may need to use for any needed debugging.
  • podman machine list
  • podman machine stop
  • podman machine start
  • podman container list
  • podman container rm open-webui
  • ollama serve
  • lsof -i:3000