PaperRAG¶
Local-first Retrieval-Augmented Generation for academic PDF collections.
PaperRAG lets you index academic PDFs and query them using natural language, powered by local LLM backends. It runs entirely offline once models are available locally.
Key Features¶
Structured PDF parsing via Docling with adaptive OCR
Section-aware chunking that respects document structure
FAISS vector store with deterministic SHA-256 hashing
Local LLM backends via Ollama or
llama.cppInteractive REPL with command history and live settings
Focused
reviewcommand for index-and-open workflowsParallel indexing with RAM-aware worker auto-detection
Fully offline-capable and reproducible
Quick Example¶
# Index your PDFs
paperrag index --input-dir ~/papers
# Start an interactive session from the generated index
paperrag --index-dir ~/papers/.paperrag-index -m qwen2.5:1.5b
# Or review one paper directly
paperrag review ~/papers/paper.pdf