LLM Stuff
Large Language model notes to get up and running for local development
Updated on 10.05.2024 https://rehborn.dev/notes/llm/
running local Generative Transformers (GTP) large language models (LLM) and using the Inference Application Programming Interface (API)
Running Local LLMs using Ollama
Run Ollama Docker Container
Run with CPU Support
docker run -d -v /mnt/data/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Run with GPU support
docker run -d --gpus=all -v /mnt/data/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Pull llama3
docker exec -it ollama ollama pull llama3
Open-WebUI
docker run -d --network=host \
-v /mnt/data/ollama-webui:/app/backend/data \
-e OLLAMA_BASE_URL=http://127.0.0.1:11434 \
--name open-webui --restart always ghcr.io/open-webui/open-webui:main
Ollama API Usage Examples
curl
curl http://localhost:11434/api/chat -d '{
"model": "llama3",
"stream": false,
"messages": [
{"role": "system", "content": "you are a dolphin"},
{"role": "user", "content": "what are you?"}
]
}'
LiteLLM
install
poetry add litellm
basic usage
from litellm import completion
response = completion(model="ollama/llama3",
messages=[
{"role": "system", "content": "you are a dolphin"},
{"role": "user", "content": "what are you?"}
],
api_base="http://127.0.0.1:11434")
List of LLM Provider
Pricing Pages