Getting Started

Installation

Prerequisites

  • Python 3.11 or later

  • uv for environment management

  • One local LLM backend:

    • Ollama for model names like qwen2.5:1.5b

    • llama-server from llama.cpp for local .gguf models or HuggingFace GGUF repos

Install with pip

python -m venv .venv
source .venv/bin/activate
pip install --index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pypi.org/simple -e .

Optional LLM extras

If you want HuggingFace GGUF download support, install:

uv pip install huggingface-hub

If you want additional Transformers-based tooling outside the default PaperRAG flow:

uv pip install transformers accelerate

Indexing PDFs

Before querying, you must index your PDF collection:

paperrag index --input-dir /path/to/pdfs

This will:

  1. Discover all PDFs in the directory

  2. Parse each PDF using Docling (with adaptive OCR)

  3. Chunk the text into sections

  4. Embed chunks and store them in a FAISS index

The index is saved to <input-dir>/.paperrag-index/ by default. Use --index-dir to specify a different location.

Adaptive OCR

PaperRAG automatically detects whether each PDF needs OCR:

  • Text-based PDFs skip OCR (2-3x faster)

  • Scanned PDFs enable OCR for accurate extraction

  • Override with --ocr always or --ocr never

Parallel Workers

Worker count is auto-detected based on available RAM (~2 GB per worker). Override with --workers N:

# For low-RAM systems
paperrag index --workers 2

# Speed up on high-RAM systems
paperrag index --workers 10

Running Queries

Focused Review Mode

For one paper or one directory, review runs indexing first and then opens the REPL:

paperrag review /path/to/paper.pdf
paperrag review /path/to/papers --max-tokens 512

If the file hash is unchanged in the target index, PaperRAG skips re-indexing automatically.

One-off Query

For single questions or scripting:

paperrag query "what is speech chain?" --index-dir /path/to/index -m qwen2.5:1.5b

REPL Commands

Once inside the REPL, these commands are available:

Command

Description

<any text>

Query the indexed papers

/index

Re-index the current PDF directory or file

/index <path>

Re-index a specific PDF file or directory

/focus <name>

Focus queries on a specific paper

/topk <n>

Set top-k for retrieval

/threshold <n>

Set similarity threshold (0.0-1.0)

/temperature <n>

Set LLM temperature (0.0-2.0)

/max-tokens <n>

Set max output tokens

/ctx-size <n>

Set LLM context window size

/prompt <text>

Set the system prompt

/model <name>

Switch the active model/backend

/config

Show current configuration

/rc

Show loaded .paperragrc files and values

/help

Show help

/exit / /quit

Exit the REPL

LLM Setup

Ollama backend

Install Ollama from https://ollama.com, then:

ollama pull qwen2.5:1.5b
paperrag --index-dir /path/to/index -m qwen2.5:1.5b

llama.cpp backend

Install llama-server from llama.cpp:

brew install llama-cpp

Then either use a local GGUF file:

paperrag --index-dir /path/to/index -m /path/to/model.gguf

Or a HuggingFace GGUF repo ID:

paperrag --index-dir /path/to/index -m Qwen/Qwen3-1.7B-GGUF