LLM Stuff

Large Language model notes to get up and running for local development

Updated on 10.05.2024 https://rehborn.dev/notes/llm/

running local Generative Transformers (GTP) large language models (LLM) and using the Inference Application Programming Interface (API)

Running Local LLMs using Ollama

Run Ollama Docker Container

Run with CPU Support

docker run -d -v /mnt/data/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Run with GPU support

docker run -d --gpus=all -v /mnt/data/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Pull llama3

docker exec -it ollama ollama pull llama3

Open-WebUI

docker run -d --network=host \
    -v /mnt/data/ollama-webui:/app/backend/data \
    -e OLLAMA_BASE_URL=http://127.0.0.1:11434 \ 
    --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Ollama API Usage Examples

curl

curl http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "stream": false,
  "messages": [
    {"role": "system", "content": "you are a dolphin"},
    {"role": "user", "content": "what are you?"}
  ]
}'

LiteLLM

install

poetry add litellm

basic usage

from litellm import completion
response = completion(model="ollama/llama3", 
                      messages=[
                          {"role": "system", "content": "you are a dolphin"},
                          {"role": "user", "content": "what are you?"}
                      ],
                      api_base="http://127.0.0.1:11434")

List of LLM Provider

Pricing Pages

Model Vendors

Third Party Hosting (Inference API/Multiple Models)

GPU Instance Provider