BitLlama Desktop imonoonoko

ai desktop gui inference llm machine-learning

Use this command to install BitLlama Desktop:

winget install --id=imonoonoko.BitLlamaDesktop -e

BitLlama Desktop is a desktop application built on Tauri 2.0 and Svelte, designed for local inference of large language models (LLMs). It provides an intuitive graphical user interface to streamline AI-driven tasks and experimentation.

Key Features:

Streaming Chat: Engage in real-time conversations with LLMs through interactive chat interfaces.
Model Browser: Access a catalog of pre-trained models, complete with download functionality for easy local deployment.
Soul Learning: Enhance model performance with drag-and-drop capabilities and correction learning, enabling users to fine-tune outputs directly.
TTT Adaptive Inference: Optimize resource usage with adaptive inference techniques that balance speed and accuracy.
Hardware Auto-Detection: Automatically identify and utilize compatible hardware (e.g., GPUs) for accelerated processing.
Multilingual Support (i18n): Available in English and Japanese, catering to a global user base.

Audience & Benefit: Ideal for data scientists, machine learning engineers, and AI enthusiasts seeking tools to experiment with LLMs locally. BitLlama Desktop empowers users to accelerate their workflow while maintaining control over their data and experiments.

The application can be installed via winget, making it accessible for integration into various development environments.

BitLlama

Pure Rust LLM inference engine with Soul learning and hierarchical memory.

> Status: v1.0.0 — Development Complete. This project is fully functional and no longer under active development.

What is BitLlama?

A local LLM inference engine written entirely in Rust. It runs GGUF and safetensors models on your PC, with a unique Soul system that lets the AI learn and remember across conversations.

Key features:

GGUF + safetensors model inference (Llama 2/3, Gemma 2/3, Qwen2.5, Mistral, BitNet)
Soul learning — teach the AI via LoRA fine-tuning from conversations
Memory system — 4-layer hierarchical memory (Episodes/Facts/Concepts/Worldview)
Sleep consolidation — background memory organization (Tidy/Fold/Merge/Elevate/Dream)
Desktop GUI (Tauri 2.0 + Svelte 5) with Japanese/English i18n
CLI with chat, learning, API server, RAG, and MCP support
CUDA acceleration + Q8 KV Cache
1096+ tests, Pure Rust single binary

Quick Start

Install

# Homebrew (macOS / Linux)
brew tap imonoonoko/bitllama &amp;&amp; brew install bitllama

# Windows (winget)
winget install imonoonoko.BitLlama

# Or download from GitHub Releases

Run

bitllama pull bartowski/gemma-2-2b-it-GGUF
bitllama run ~/.bitllama/models/gemma-2-2b-it-Q4_K_M.gguf

Teach

bitllama learn "My name is Onoko" --model model.gguf --save onoko.soul
bitllama run model.gguf --soul onoko.soul

API Server

bitllama serve model.gguf --port 8000
# OpenAI-compatible: POST /v1/chat/completions

Desktop GUI

BitLlama Desktop — built with Tauri 2.0 + Svelte 5.

Model download, management, and auto-recommendation
Streaming chat with conversation history
Soul learning (chat, drag & drop, correction)

Model	Format	Chat Template
Llama-2 7B/13B	GGUF	llama2
Llama-3 8B	GGUF	llama3
Gemma-2 2B/9B	GGUF	gemma
Gemma-3	GGUF	gemma
Qwen2.5 0.5B-7B	GGUF	chatml
Mistral 7B	GGUF	mistral
BitNet 2B4T	safetensors	bitnet

Model	Speed	vs llama.cpp
Llama-2 7B	45.4 tok/s	90%
Mistral 7B	42.1 tok/s	89%
Gemma-2 2B	75.1 tok/s	74%

BitLlama Desktop imonoonoko

README

BitLlama

What is BitLlama?

Quick Start

Install

Run

Teach

API Server

Desktop GUI

Supported Models

Performance

Architecture

Soul & Memory Architecture

Build from Source

What Was Built

Acknowledgments

License