LlamaFarm is a comprehensive, modular AI framework that gives you complete control over your AI stack. Build powerful AI applications locally with production-ready components including RAG systems, vector databases, model management, prompt engineering, and fine-tuning - all designed to work seamlessly together or independently.
Features:
- Local-First Development - Build and test entirely on your machine
- Production-Ready Components - Battle-tested modules that scale from laptop to cluster
- Strategy-Based Configuration - Smart defaults with infinite customization
- Deploy Anywhere - Same code runs locally, on-premises, or in any cloud
- Multi-Provider Support - Use cloud LLM providers or run your own models locally
- Complete RAG Pipeline - Document processing, embedding, and retrieval
README
π¦ LlamaFarm - Run your own AI anywhere
> Build powerful AI locally, extend anywhere.
LlamaFarm is an open-source framework for building retrieval-augmented and agentic AI applications. It ships with opinionated defaults (Ollama for local models, Chroma for vector storage) while staying 100% extendableβswap in vLLM, remote OpenAI-compatible hosts, new parsers, or custom stores without rewriting your app.
Local-first developer experience with a single CLI (lf) that manages projects, datasets, and chat sessions.
Production-ready architecture that mirrors server endpoints and enforces schema-based configuration.
Composable RAG pipelines you can tailor through YAML, not bespoke code.
Extendable everything: runtimes, embedders, databases, extractors, and CLI tooling.
Need the full walkthrough with dataset ingestion and troubleshooting tips? Jump to the Quickstart guide.
> Prefer building from source? Clone the repo and follow the steps in Development & Testing.
Run services manually (without Docker auto-start):
git clone https://github.com/llama-farm/llamafarm.git
cd llamafarm
# Install Nx globally and bootstrap the workspace
npm install -g nx
nx init --useDotNxInstallation --interactive=false
# Option 1: start both server and RAG worker with one command
nx dev
# Option 2: start services in separate terminals
# Terminal 1
nx start rag
# Terminal 2
nx start server
Open another terminal to run lf commands (installed or built from source). This is equivalent to what lf start orchestrates automatically.
π Why LlamaFarm
Own your stack β Run small local models today and swap to hosted vLLM, Together, or custom APIs tomorrow by changing llamafarm.yaml.
Battle-tested RAG β Configure parsers, extractors, embedding strategies, and databases without touching orchestration code.
Config over code β Every project is defined by YAML schemas that are validated at runtime and easy to version control.
Validates strategy/database against project config.
Upload files
lf datasets upload research-notes ./docs/*.pdf
Supports globs and directories.
Process dataset
lf datasets process research-notes
Streams heartbeat dots during long processing.
Semantic query
lf rag query --database main_db "What did the 2024 FDA letters require?"
Use --filter, --include-metadata, etc.
See the CLI reference for full command details and troubleshooting advice.
π REST API
LlamaFarm provides a comprehensive REST API (compatible with OpenAI's format) for integrating with your applications. The API runs at http://localhost:8000.
Key Endpoints
Chat Completions (OpenAI-compatible)
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What are the FDA requirements?"}
],
"stream": false,
"rag_enabled": true,
"database": "main_db"
}'
# Upload file
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/datasets/{dataset}/data \
-F "file=@document.pdf"
# Process dataset
curl -X POST http://localhost:8000/v1/projects/{namespace}/{project}/datasets/{dataset}/process
Finding Your Namespace and Project
Check your llamafarm.yaml:
name: my-project # Your project name
namespace: my-org # Your namespace
Or inspect the file system: ~/.llamafarm/projects/{namespace}/{project}/
See the complete API Reference for all endpoints, request/response formats, Python/TypeScript clients, and examples.
ποΈ Configuration Snapshot
llamafarm.yaml is the source of truth for each project. The schema enforces required fields and documents every extension point.
Multi-Model Configuration (Recommended)
version: v1
name: fda-assistant
namespace: default
runtime:
default_model: fast # Which model to use by default
models:
fast:
description: "Fast Ollama model"
provider: ollama
model: gemma3:1b
powerful:
description: "More capable Ollama model"
provider: ollama
model: qwen3:8b
lemon:
description: "Lemonade local model with NPU/GPU"
provider: lemonade
model: user.Qwen3-4B
base_url: "http://127.0.0.1:11534/v1"
lemonade:
backend: llamacpp
port: 11534
context_size: 32768
prompts:
- role: system
content: >-
You are an FDA specialist. Answer using short paragraphs and cite document titles when available.
rag:
databases:
- name: main_db
type: ChromaStore
default_embedding_strategy: default_embeddings
default_retrieval_strategy: filtered_search
embedding_strategies:
- name: default_embeddings
type: OllamaEmbedder
config:
model: nomic-embed-text:latest
retrieval_strategies:
- name: filtered_search
type: MetadataFilteredStrategy
config:
top_k: 5
data_processing_strategies:
- name: pdf_ingest
parsers:
- type: PDFParser_LlamaIndex
config:
chunk_size: 1500
chunk_overlap: 200
extractors:
- type: HeadingExtractor
- type: ContentStatisticsExtractor
datasets:
- name: research-notes
data_processing_strategy: pdf_ingest
database: main_db
Using your models:
lf models list # See all configured models
lf chat "Question" # Uses default model (fast)
lf chat --model powerful "Complex question" # Use specific model
lf chat --model lemon "Local GGUF model" # Use Lemonade model
> Note: Lemonade models require manual startup via nx start lemonade from the project root. The nx start lemonade command automatically picks up configuration from your llamafarm.yaml. In the future, Lemonade will run as a container and be auto-started. See Lemonade Quickstart for setup.
Swap runtimes by pointing to any OpenAI-compatible endpoint (vLLM, Mistral, Anyscale). Update runtime.provider, base_url, and api_key; regenerate schema types if you add a new provider enum.
Bring your own vector store by implementing a store backend, adding it to rag/schema.yaml, and updating the server service registry.
Add parsers/extractors to support new file formats or metadata pipelines. Register implementations and extend the schema definitions.
Extend the CLI with new Cobra commands under cli/cmd; the docs include guidance on adding dataset utilities or project tooling.
Run lf datasets and lf rag query commands from each example folder to reproduce the flows demonstrated in the docs.
π§ͺ Development & Testing
# Python server + RAG tests
cd server
uv sync
uv run --group test python -m pytest
# CLI tests
cd ../cli
go test ./...
# RAG tooling smoke tests
cd ../rag
uv sync
uv run python cli.py test
# Docs build (ensures navigation/link integrity)
cd ..
nx build docs
Linting: uv run ruff check --fix . (Python), go fmt ./... and go vet ./... (Go).
π€ Community & Support
Discord β chat with the team, share feedback, find collaborators.