PaperRAG

Local-first Retrieval-Augmented Generation for academic PDF collections.

PaperRAG lets you index academic PDFs and query them using natural language, powered by local LLM backends. It runs entirely offline once models are available locally.

Key Features

  • Structured PDF parsing via Docling with adaptive OCR

  • Section-aware chunking that respects document structure

  • FAISS vector store with deterministic SHA-256 hashing

  • Local LLM backends via Ollama or llama.cpp

  • Interactive REPL with command history and live settings

  • Focused review command for index-and-open workflows

  • Parallel indexing with RAM-aware worker auto-detection

  • Fully offline-capable and reproducible

Quick Example

# Index your PDFs
paperrag index --input-dir ~/papers

# Start an interactive session from the generated index
paperrag --index-dir ~/papers/.paperrag-index -m qwen2.5:1.5b

# Or review one paper directly
paperrag review ~/papers/paper.pdf