LLM Wiki llmwiki

agent agentic ai chatbot knowledge-base large-language-model llm wiki

Use this command to install LLM Wiki:

winget install --id=nashsu.LLMWiki -e

LLM Wiki is a cross-platform desktop application that turns your documents into an organized, interlinked knowledge base — automatically. Instead of traditional RAG (retrieve-and-answer from scratch every time), the LLM incrementally builds and maintains a persistent wiki from your sources. Knowledge is compiled once and kept current, not re-derived on every query. This project is based on Karpathy's LLM Wiki pattern — a methodology for building personal knowledge bases using LLMs. We implemented the core ideas as a full desktop application with significant enhancements. Features - Two-Step Chain-of-Thought Ingest — LLM analyzes first, then generates wiki pages with source traceability and incremental cache - 4-Signal Knowledge Graph — relevance model with direct links, source overlap, Adamic-Adar, and type affinity - Louvain Community Detection — automatic knowledge cluster discovery with cohesion scoring - Graph Insights — surprising connections and knowledge gaps with one-click Deep Research - Vector Semantic Search — optional embedding-based retrieval via LanceDB, supports any OpenAI-compatible endpoint - Persistent Ingest Queue — serial processing with crash recovery, cancel, retry, and progress visualization - Folder Import — recursive folder import preserving directory structure, folder context as LLM classification hint - Deep Research — LLM-optimized search topics, multi-query web search, auto-ingest results into wiki - Async Review System — LLM flags items for human judgment, predefined actions, pre-generated search queries - Chrome Web Clipper — one-click web page capture with auto-ingest into knowledge base

README

LLM Wiki

A personal knowledge base that builds itself. LLM reads your documents, builds a structured wiki, and keeps it current.

What is this? • Features • Tech Stack • Installation • Credits • License

English | 中文 | 日本語

Features

Two-Step Chain-of-Thought Ingest — LLM analyzes first, then generates wiki pages with source traceability and incremental cache
Multimodal Image Ingestion — extract embedded images from PDFs, generate factual captions with a vision LLM, surface them in image-aware search results with lightbox preview and jump-to-source
4-Signal Knowledge Graph — relevance model with direct links, source overlap, Adamic-Adar, and type affinity
Louvain Community Detection — automatic knowledge cluster discovery with cohesion scoring
Graph Insights — surprising connections and knowledge gaps with one-click Deep Research
Vector Semantic Search — optional embedding-based retrieval via LanceDB, supports any OpenAI-compatible endpoint
Persistent Ingest Queue — serial processing with crash recovery, cancel, retry, and progress visualization
Folder Import — recursive folder import preserving directory structure, folder context as LLM classification hint
Source Folder Auto-Watch — detects external changes in raw/sources/ and keeps ingest/delete cleanup in sync
Deep Research — LLM-optimized search topics, multi-query web search via Tavily, SerpApi, or SearXNG, auto-ingest results into wiki

Signal	Weight	Description
Direct link	×3.0	Pages linked via `[[wikilinks]]`
Source overlap	×4.0	Pages sharing the same raw source (via frontmatter `sources[]`)
Adamic-Adar	×1.5	Pages sharing common neighbors (weighted by neighbor degree)
Type affinity	×1.0	Bonus for same page type (entity↔entity, concept↔concept)

Format	Method
PDF	pdf-extract (Rust) with file caching
DOCX	docx-rs — headings, bold/italic, lists, tables → structured Markdown
PPTX	ZIP + XML — slide-by-slide extraction with heading/list structure
XLSX/XLS/ODS	calamine — proper cell types, multi-sheet support, Markdown tables
Images	Native preview (png, jpg, gif, webp, svg, etc.)
Video/Audio	Built-in player
Web clips	Readability.js + Turndown.js → clean Markdown

Layer	Technology
Desktop	Tauri v2 (Rust backend)
Frontend	React 19 + TypeScript + Vite
UI	shadcn/ui + Tailwind CSS v4
Editor	Milkdown (ProseMirror-based WYSIWYG)
Graph	sigma.js + graphology + ForceAtlas2
Search	Tokenized search + graph relevance + optional vector (LanceDB)
Vector DB	LanceDB (Rust, embedded, optional)
PDF	pdf-extract
Office	docx-rs + calamine
i18n	react-i18next
State	Zustand
LLM	Streaming fetch (OpenAI, Anthropic, Google, Ollama, Custom)
Web Search	Tavily, SerpApi, SearXNG JSON API

LLM Wiki llmwiki

README

LLM Wiki

Features

What is this?

Credits

What We Kept from the Original

What We Changed & Added

1. From CLI to Desktop Application

2. Purpose.md — The Wiki's Soul

3. Two-Step Chain-of-Thought Ingest

4. Knowledge Graph with Relevance Model

5. Louvain Community Detection

6. Graph Insights — Surprising Connections & Knowledge Gaps

7. Optimized Query Retrieval Pipeline

8. Multi-Conversation Chat with Persistence

9. Thinking / Reasoning Display

10. KaTeX Math Rendering

11. Review System (Async Human-in-the-Loop)

12. Deep Research

13. Browser Extension (Web Clipper)

14. Multi-format Document Support

15. File Deletion with Cascade Cleanup

16. Configurable Context Window

17. Cross-Platform Compatibility

18. Other Additions

Tech Stack

Installation

Pre-built Binaries

Build from Source

Chrome Extension

Quick Start

Project Structure

Star History

License