Ollama Operator Neko
winget install --id=nekomeowww.OllamaOperator -e
While Ollama is a powerful tool for running large language models locally, and the user experience of CLI is just the same as using Docker CLI, it's not possible yet to replicate the same user experience on Kubernetes, especially when it comes to running multiple models on the same cluster with loads of resources and configurations. That's where the Ollama Operator kicks in: - Install the operator on your Kubernetes cluster - Apply the needed CRDs - Create your models - Wait for the models to be fetched and loaded, that's it! Thanks to the great works of lama.cpp, no more worries about Python environment, CUDA drivers. The journey to large language models, AIGC, localized agents, π¦π Langchain and more is just a few steps away!
Ollama Operator is a Kubernetes operator designed to simplify the deployment and management of large language models at scale. It enables users to run multiple models efficiently on a single cluster with minimal resource overhead and configuration complexity.
Key Features:
- Kubernetes Integration: Install the operator directly on your Kubernetes cluster using winget, enabling seamless integration with existing infrastructure.
- CRD Support: Utilize custom resource definitions (CRDs) for fine-grained control over model parameters and configurations.
- Leverage lama.cpp: Eliminate compatibility issues related to Python environments or CUDA drivers through native support for lama.cpp.
- OpenAI API Compatibility: Access familiar endpoints for consistent integration with existing applications, ensuring no code changes are needed.