API Reference

This page documents PaperRAG’s Python modules. All public classes and functions are listed below.

Configuration

Configuration module using Pydantic models.

class paperrag.config.ChunkerConfig(*, chunk_size: Annotated[int, Ge(ge=100)] = 1000, chunk_overlap: Annotated[int, Ge(ge=0)] = 200)[source]

Bases: BaseModel

Chunking configuration.

chunk_overlap: int
chunk_size: int
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class paperrag.config.EmbedderConfig(*, model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', batch_size: Annotated[int, Ge(ge=1)] = 64, device: str | None = None, normalize: bool = True, seed: int = 42)[source]

Bases: BaseModel

Embedding model configuration.

batch_size: int
device: str | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str
normalize: bool
seed: int
class paperrag.config.IndexingConfig(*, checkpoint_interval: Annotated[int, Ge(ge=0)] = 50, n_workers: Annotated[int, Ge(ge=0)] = 0, pdf_timeout: Annotated[int, Ge(ge=0)] = 300, enable_gc_per_batch: bool = True, log_memory_usage: bool = False, continue_on_error: bool = True, max_failures: int = -1)[source]

Bases: BaseModel

Indexing configuration.

checkpoint_interval: int
continue_on_error: bool
enable_gc_per_batch: bool
get_n_workers() int[source]

Get actual worker count, auto-detecting if needed.

Uses RAM-aware calculation to prevent OOM kills: - Each worker needs ~2GB during peak Docling usage - Formula: min(cpu_cores - 1, available_ram_gb // 2)

log_memory_usage: bool
max_failures: int
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_workers: int
pdf_timeout: int
class paperrag.config.LLMConfig(*, model_name: str = 'qwen2.5:1.5b', system_prompt: str = 'You are a helpful research assistant. Answer based on the provided context. If the context does not contain relevant information, say so. Be concise and cite sources.', temperature: float = 0.0, max_tokens: int = 1024, ctx_size: Annotated[int, Ge(ge=512)] = 4096, n_gpu_layers: Annotated[int, Ge(ge=0)] = 0, n_threads: Annotated[int, Ge(ge=0)] = 0, think: bool = False)[source]

Bases: BaseModel

LLM configuration.

ctx_size: int
max_tokens: int
model_config = {'extra': 'ignore'}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str
n_gpu_layers: int
n_threads: int
system_prompt: str
temperature: float
think: bool
class paperrag.config.PaperRAGConfig(*, input_dir: str = <factory>, parser: ParserConfig = <factory>, chunker: ChunkerConfig = <factory>, embedder: EmbedderConfig = <factory>, retriever: RetrieverConfig = <factory>, indexing: IndexingConfig = <factory>, llm: LLMConfig = <factory>)[source]

Bases: BaseModel

Top-level configuration.

chunker: ChunkerConfig
embedder: EmbedderConfig
classmethod expand_input_dir(v: str) str[source]
property index_dir: str

Return index directory - custom path if set, otherwise input_dir/.paperrag-index.

indexing: IndexingConfig
input_dir: str
llm: LLMConfig
classmethod load_snapshot(path: Path) PaperRAGConfig[source]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self – The BaseModel instance.

  • context – The context.

parser: ParserConfig
retriever: RetrieverConfig
save_snapshot(path: Path) None[source]
snapshot() dict[source]

Return a JSON-serialisable config snapshot for index metadata.

class paperrag.config.ParserConfig(*, extract_tables: bool = False, fallback_to_raw: bool = True, ocr_mode: Literal['auto', 'always', 'never'] = 'auto', manifest_file: str | None = None)[source]

Bases: BaseModel

PDF parsing configuration.

extract_tables: bool
fallback_to_raw: bool
manifest_file: str | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ocr_mode: Literal['auto', 'always', 'never']
class paperrag.config.RetrieverConfig(*, top_k: Annotated[int, Ge(ge=1)] = 5, score_threshold: Annotated[float, Ge(ge=0.0), Le(le=1.0)] = 0.1, use_mmr: bool = False, mmr_lambda: Annotated[float, Ge(ge=0.0), Le(le=1.0)] = 0.5, max_results_per_paper: Annotated[int, Ge(ge=1)] = 2)[source]

Bases: BaseModel

Retrieval configuration.

max_results_per_paper: int
mmr_lambda: float
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

score_threshold: float
top_k: int
use_mmr: bool
paperrag.config.apply_rc(cfg: PaperRAGConfig, overrides: dict) None[source]

Apply .paperragrc overrides to a PaperRAGConfig instance.

paperrag.config.load_rc(path: Path) dict[source]

Load a .paperragrc TOML file, returning a flat dict of overrides.

PDF Parser

PDF parsing module using Docling.

class paperrag.parser.ParsedPaper(file_path: str, file_hash: str, title: str, authors: str, sections: list[ParsedSection] = <factory>, raw_text: str = '', abstract: str = '', doi: str = '')[source]

Structured representation of a parsed PDF.

abstract: str = ''
authors: str
doi: str = ''
file_hash: str
file_path: str
raw_text: str = ''
sections: list[ParsedSection]
title: str
class paperrag.parser.ParsedSection(name: str, text: str)[source]

A single section extracted from a PDF.

name: str
text: str
paperrag.parser.compute_file_hash(path: Path) str[source]

Compute SHA256 hash of a file.

paperrag.parser.compute_file_hashes_parallel(pdf_paths: list[Path], n_workers: int = 4) dict[str, str][source]

Compute hashes for multiple PDFs in parallel.

Parameters:
  • pdf_paths – List of PDF file paths to hash

  • n_workers – Number of parallel worker threads

Returns:

Dictionary mapping str(pdf_path) to hash string

paperrag.parser.discover_pdfs(input_dir: Path) list[Path][source]

Recursively find all PDF files under input_dir, or return a single PDF file.

paperrag.parser.has_text_layer(pdf_path: Path, min_chars: int = 100) bool[source]

Detect if PDF has extractable text (i.e., not a scanned image).

Parameters:
  • pdf_path – Path to PDF file

  • min_chars – Minimum characters on first page to consider text-based

Returns:

True if PDF has text layer, False if likely scanned/image-based

paperrag.parser.load_manifest(manifest_path: Path) dict[str, dict[str, str]][source]

Load CSV manifest with metadata to skip parsing.

Expected columns: filename, title, authors, abstract (optional), doi (optional) Returns dict: {filename: {title, authors, abstract, doi}}

paperrag.parser.parse_pdf(path: Path, config: ParserConfig | None = None, manifest: dict[str, dict[str, str]] | None = None) ParsedPaper[source]

Parse a single PDF using Docling and return structured output.

Parameters:
  • path – Path to PDF file

  • config – Parser configuration

  • manifest – Optional manifest dict for fast metadata lookup

Chunker

Section-aware deterministic chunking module.

class paperrag.chunker.Chunk(chunk_id: int, hash_id: str, text: str, paper_title: str, section_name: str, file_path: str, file_hash: str)[source]

A single text chunk with full provenance metadata.

chunk_id: int
file_hash: str
file_path: str
classmethod from_dict(d: dict) Chunk[source]
hash_id: str
paper_title: str
section_name: str
text: str
to_dict() dict[source]
paperrag.chunker.chunk_paper(paper: ParsedPaper, config: ChunkerConfig | None = None) list[Chunk][source]

Chunk a parsed paper into a list of Chunk objects.

Chunking is section-aware: each section is chunked independently and chunks carry the section name in their metadata. Ordering is deterministic.

paperrag.chunker.chunk_text(text: str, chunk_size: int, chunk_overlap: int) list[str][source]

Split text into overlapping windows of characters.

Deterministic: same input always produces the same output list.

Embedder

Embedding module using sentence-transformers.

class paperrag.embedder.Embedder(config: EmbedderConfig | None = None)[source]

Wrapper around a SentenceTransformer model with batched encoding and deterministic seed control.

embed(texts: Sequence[str]) ndarray[source]

Encode texts and return an (N, D) float32 array.

Uses batched encoding with the configured batch size.

Vector Store

FAISS-backed vector store with persistence and metadata tracking.

class paperrag.vectorstore.VectorStore(index_dir: Path, dimension: int)[source]

Bases: object

Manages a FAISS IndexFlatIP index plus chunk metadata on disk.

add(embeddings: ndarray, chunks: list[Chunk]) None[source]

Add vectors and their corresponding chunk metadata.

classmethod exists(index_dir: Path) bool[source]
get_file_hash(file_path: str) str | None[source]
classmethod load(index_dir: Path) VectorStore[source]

Load an existing index from disk.

remove_by_file(file_path: str) None[source]

Remove all vectors belonging to file_path.

Because FAISS IndexFlatIP does not support selective removal we rebuild the index from the remaining vectors.

save(config: PaperRAGConfig | None = None) None[source]

Write index, metadata, hashes, version with atomic operations.

search(query_vec: ndarray, top_k: int = 3, file_path: str | None = None) list[tuple[dict, float]][source]

Return top-k (chunk_metadata, score) pairs, optionally filtered by file_path.

set_file_hash(file_path: str, file_hash: str) None[source]

Retriever

Retriever: ties embedder + vector store for query-time retrieval.

class paperrag.retriever.RetrievalResult(text: str, score: float, paper_title: str, section_name: str, file_path: str, chunk_id: int)[source]

A single retrieval hit.

chunk_id: int
file_path: str
paper_title: str
score: float
section_name: str
text: str
class paperrag.retriever.Retriever(config: PaperRAGConfig, store: VectorStore | None = None)[source]

High-level retriever that loads an existing index and answers queries.

get_all_chunks_for_file(file_path: str) list[RetrievalResult][source]

Return all chunks for a given file, ordered by chunk_id.

Used for full-document context mode where the entire paper is sent to the LLM instead of just top-k retrieval hits.

retrieve(query: str, top_k: int | None = None, file_path: str | None = None) list[RetrievalResult][source]

Embed query and return the top-k results from the vector store.

Results are filtered by score_threshold - only results with similarity scores above the threshold are returned.

If use_mmr=True, uses Maximal Marginal Relevance for diversity.

retrieve_file_paths(query: str, top_k: int | None = None) list[str][source]

Return list of file_path strings (useful for evaluation).

LLM

LLM module for local inference via Ollama or llama.cpp (GGUF / HuggingFace models).

Backend selection rules

  • Local *.gguf file path → llama-server (from brew install llama-cpp)

  • HuggingFace repo ID → download GGUF + llama-server (e.g. Qwen/Qwen3-1.7B-GGUF)

  • All other names → Ollama (unchanged)

Example usage

paperrag query "What is X?" --model qwen2.5:1.5b         # Ollama
paperrag query "What is X?" --model Qwen/Qwen3-1.7B-GGUF # HF download + llama-server
paperrag query "What is X?" --model /path/to/model.gguf  # local GGUF + llama-server
paperrag.llm.describe_llm_error(exc: Exception, model_name: str) tuple[str, str | None][source]

Return (short_error, optional_hint) for a human-readable LLM error message.

The hint is non-None when there’s a concrete remediation action.

paperrag.llm.generate_answer(question: str, context_chunks: list[str], config: LLMConfig | None = None, conversation_history: list[dict] | None = None) str[source]

Generate an answer using the configured LLM backend (blocking).

Backend selection:

  • HuggingFace repo IDs (org/repo) and local .gguf file paths use llama.cpp via llama-server (install: brew install llama-cpp).

  • All other model names delegate to Ollama.

Parameters:
  • conversation_history (list[dict] | None) – Optional list of previous messages (role/content dicts) to provide context for follow-up questions.

  • Examples::

    # Ollama (unchanged) paperrag query “What is X?” –model qwen2.5:1.5b

    # llama.cpp — download Qwen3 GGUF from HuggingFace automatically paperrag query “What is X?” –model Qwen/Qwen3-1.7B-GGUF

    # llama.cpp — use a local GGUF file paperrag query “What is X?” –model /path/to/model.gguf

paperrag.llm.prewarm_ollama(config: LLMConfig) bool[source]

Send a minimal 1-token request to load the Ollama model into memory.

Returns True if successful, False if Ollama is unreachable or llama-server backend. Only applies to the Ollama backend; llama-server has its own startup mechanism.

paperrag.llm.stream_answer(question: str, context_chunks: list[str], config: LLMConfig | None = None, source_files: list[str] | None = None, conversation_history: list[dict] | None = None) Iterator[str][source]

Yield text chunks as they arrive from the LLM (streaming).

Backend selection:

  • HuggingFace repo IDs (org/repo) and local .gguf file paths use llama.cpp via llama-server.

  • All other model names delegate to Ollama.

Parameters:
  • conversation_history (list[dict] | None) – Optional list of previous messages (role/content dicts) to provide context for follow-up questions.

  • Usage::

    for chunk in stream_answer(question, chunks, cfg.llm):

    sys.stdout.write(chunk) sys.stdout.flush()

paperrag.llm.stream_followup(question: str, conversation_history: list[dict], config: LLMConfig | None = None) Iterator[str][source]

Yield text chunks for a follow-up question using conversation history only.

This is used when retrieval returns no results but conversation history exists, allowing the LLM to answer based on previously discussed context.

Parallel Processing

Parallel PDF processing utilities.

paperrag.parallel.parallel_process_pdfs(pdf_paths: list[Path], parser_config: ParserConfig, chunker_config: ChunkerConfig, n_workers: int, timeout: int = 0, manifest: dict[str, dict[str, str]] | None = None) list[tuple[Path, str | None, list[Chunk] | None, str | None]][source]

Process PDFs in parallel, return parsed results.

Parameters:
  • pdf_paths – List of PDF paths to process

  • parser_config – Parser configuration

  • chunker_config – Chunker configuration

  • n_workers – Number of worker processes

  • timeout – Timeout in seconds per PDF (0 = no timeout)

  • manifest – Optional manifest dict for fast metadata lookup

Returns:

(pdf_path, file_hash, chunks, error_message)

Return type:

List of tuples

paperrag.parallel.process_single_pdf(pdf_path: Path, parser_config: ParserConfig, chunker_config: ChunkerConfig, manifest: dict[str, dict[str, str]] | None = None) tuple[Path, str | None, list[Chunk] | None, str | None][source]

Process one PDF: parse + chunk (NOT embed yet).

Parameters:
  • pdf_path – Path to PDF file

  • parser_config – Parser configuration

  • chunker_config – Chunker configuration

  • manifest – Optional manifest dict for fast metadata lookup

Returns:

Tuple of (pdf_path, file_hash, chunks, error_message) If successful: (pdf_path, file_hash, chunks, None) If failed: (pdf_path, None, None, error_message)

CLI

Typer CLI for PaperRAG.

paperrag.cli.entrypoint(ctx: Context, version: bool = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, topk: int = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) None[source]

PaperRAG - local RAG for academic PDFs.

Starts an interactive REPL session using an existing index.

paperrag.cli.evaluate(benchmark_file: str = <typer.models.ArgumentInfo object>, top_k: int = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>) None[source]

Evaluate retrieval quality using a JSONL benchmark.

Each line: {“question”: “…”, “relevant_documents”: [“path1”, …]}

paperrag.cli.export(query: str = <typer.models.OptionInfo object>, output_path: str = <typer.models.OptionInfo object>, format: str = <typer.models.OptionInfo object>, top_k: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>) None[source]

Export query results to a file.

Retrieves and saves results in the specified format.

paperrag.cli.index(input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, force: bool = <typer.models.OptionInfo object>, checkpoint_interval: int = <typer.models.OptionInfo object>, workers: int = <typer.models.OptionInfo object>, ocr: str = <typer.models.OptionInfo object>, manifest: str = <typer.models.OptionInfo object>, embed_model: str = <typer.models.OptionInfo object>) None[source]

Index PDF files into the FAISS vector store.

paperrag.cli.main() None[source]
paperrag.cli.query(question: str = <typer.models.ArgumentInfo object>, top_k: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, no_llm: bool = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) None[source]

Query the indexed papers.

paperrag.cli.review(input_path: str = <typer.models.ArgumentInfo object>, index_dir: str = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, topk: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, preset: str = <typer.models.OptionInfo object>, n_gpu_layers: int = <typer.models.OptionInfo object>, output: str = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) None[source]

Index a PDF file (or directory) and start an interactive review session.

Convenience command for focused paper review — equivalent to running:

paperrag index –input-dir <path> && paperrag –index-dir <auto>

Examples

paperrag review paper.pdf

paperrag review paper.pdf –preset reviewer

paperrag review paper.pdf –preset reviewer –output review.md

paperrag review ./papers/ –topk 5

paperrag review paper.pdf –index-dir /tmp/my-index

paperrag.cli.status(index_dir: str = <typer.models.OptionInfo object>) None[source]

Show index health information.

paperrag.cli.version_callback(value: bool) None[source]

Display version and license information.

REPL

Interactive REPL for PaperRAG. REPL: Read, Evaluate, Print, Loop! This mode is first class in PaperRAG

paperrag.repl.start_repl(cfg: PaperRAGConfig | None = None, *, auto_focus: Path | None = None, review_mode: bool = False, output_path: Path | None = None) None[source]

Launch the interactive REPL session.