Getting Started¶
Installation¶
Prerequisites¶
Python 3.11 or later
uvfor environment managementOne local LLM backend:
Ollama for model names like
qwen2.5:1.5bllama-serverfromllama.cppfor local.ggufmodels or HuggingFace GGUF repos
Install with uv (recommended, CPU-only default)¶
uv sync
To run commands without activating the environment:
uv run paperrag --help
If you prefer an editable install with uv pip:
uv venv --python 3.11
source .venv/bin/activate
uv pip install --index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pypi.org/simple -e .
Install with pip¶
python -m venv .venv
source .venv/bin/activate
pip install --index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pypi.org/simple -e .
Optional LLM extras¶
If you want HuggingFace GGUF download support, install:
uv pip install huggingface-hub
If you want additional Transformers-based tooling outside the default PaperRAG flow:
uv pip install transformers accelerate
Indexing PDFs¶
Before querying, you must index your PDF collection:
paperrag index --input-dir /path/to/pdfs
This will:
Discover all PDFs in the directory
Parse each PDF using Docling (with adaptive OCR)
Chunk the text into sections
Embed chunks and store them in a FAISS index
The index is saved to <input-dir>/.paperrag-index/ by default. Use --index-dir to specify a different location.
Adaptive OCR¶
PaperRAG automatically detects whether each PDF needs OCR:
Text-based PDFs skip OCR (2-3x faster)
Scanned PDFs enable OCR for accurate extraction
Override with
--ocr alwaysor--ocr never
Parallel Workers¶
Worker count is auto-detected based on available RAM (~2 GB per worker). Override with --workers N:
# For low-RAM systems
paperrag index --workers 2
# Speed up on high-RAM systems
paperrag index --workers 10
Running Queries¶
Interactive REPL (recommended)¶
The REPL loads the index once and lets you ask multiple questions:
paperrag --index-dir /path/to/index -m qwen2.5:1.5b
Inside the REPL, type any question to query your papers. Use /help to see all available commands.
Focused Review Mode¶
For one paper or one directory, review runs indexing first and then opens the REPL:
paperrag review /path/to/paper.pdf
paperrag review /path/to/papers --max-tokens 512
If the file hash is unchanged in the target index, PaperRAG skips re-indexing automatically.
One-off Query¶
For single questions or scripting:
paperrag query "what is speech chain?" --index-dir /path/to/index -m qwen2.5:1.5b
REPL Commands¶
Once inside the REPL, these commands are available:
Command |
Description |
|---|---|
|
Query the indexed papers |
|
Re-index the current PDF directory or file |
|
Re-index a specific PDF file or directory |
|
Focus queries on a specific paper |
|
Set top-k for retrieval |
|
Set similarity threshold (0.0-1.0) |
|
Set LLM temperature (0.0-2.0) |
|
Set max output tokens |
|
Set LLM context window size |
|
Set the system prompt |
|
Switch the active model/backend |
|
Show current configuration |
|
Show loaded |
|
Show help |
|
Exit the REPL |
LLM Setup¶
Ollama backend¶
Install Ollama from https://ollama.com, then:
ollama pull qwen2.5:1.5b
paperrag --index-dir /path/to/index -m qwen2.5:1.5b
llama.cpp backend¶
Install llama-server from llama.cpp:
brew install llama-cpp
Then either use a local GGUF file:
paperrag --index-dir /path/to/index -m /path/to/model.gguf
Or a HuggingFace GGUF repo ID:
paperrag --index-dir /path/to/index -m Qwen/Qwen3-1.7B-GGUF