API Reference¶

This page documents PaperRAG’s Python modules. All public classes and functions are listed below.

Configuration¶

Configuration module using Pydantic models.

class paperrag.config.ChunkerConfig(*, chunk_size: Annotated[int, Ge(ge=100)] = 1000, chunk_overlap: Annotated[int, Ge(ge=0)] = 200)[source]¶

Bases: BaseModel

Chunking configuration.

chunk_overlap: int¶

chunk_size: int¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class paperrag.config.EmbedderConfig(*, model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', batch_size: Annotated[int, Ge(ge=1)] = 64, device: str | None = None, normalize: bool = True, seed: int = 42)[source]¶

Bases: BaseModel

Embedding model configuration.

batch_size: int¶

device: str | None¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str¶

normalize: bool¶

seed: int¶

class paperrag.config.IndexingConfig(*, checkpoint_interval: Annotated[int, Ge(ge=0)] = 50, n_workers: Annotated[int, Ge(ge=0)] = 0, pdf_timeout: Annotated[int, Ge(ge=0)] = 300, enable_gc_per_batch: bool = True, log_memory_usage: bool = False, continue_on_error: bool = True, max_failures: int = -1)[source]¶

Bases: BaseModel

Indexing configuration.

checkpoint_interval: int¶

continue_on_error: bool¶

enable_gc_per_batch: bool¶

get_n_workers() → int[source]¶

Get actual worker count, auto-detecting if needed.

Uses RAM-aware calculation to prevent OOM kills: - Each worker needs ~2GB during peak Docling usage - Formula: min(cpu_cores - 1, available_ram_gb // 2)

log_memory_usage: bool¶

max_failures: int¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

n_workers: int¶

pdf_timeout: int¶

class paperrag.config.LLMConfig(*, model_name: str = 'qwen2.5:1.5b', system_prompt: str = 'You are a helpful research assistant. Answer based on the provided context. If the context does not contain relevant information, say so. Be concise and cite sources.', temperature: float = 0.0, max_tokens: int = 1024, ctx_size: Annotated[int, Ge(ge=512)] = 4096, n_gpu_layers: Annotated[int, Ge(ge=0)] = 0, n_threads: Annotated[int, Ge(ge=0)] = 0, think: bool = False)[source]¶

Bases: BaseModel

LLM configuration.

ctx_size: int¶

max_tokens: int¶

model_config = {'extra': 'ignore'}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_name: str¶

n_gpu_layers: int¶

n_threads: int¶

system_prompt: str¶

temperature: float¶

think: bool¶

class paperrag.config.PaperRAGConfig(*, input_dir: str = <factory>, parser: ParserConfig = <factory>, chunker: ChunkerConfig = <factory>, embedder: EmbedderConfig = <factory>, retriever: RetrieverConfig = <factory>, indexing: IndexingConfig = <factory>, llm: LLMConfig = <factory>)[source]¶

Bases: BaseModel

Top-level configuration.

chunker: ChunkerConfig¶

embedder: EmbedderConfig¶

classmethod expand_input_dir(v: str) → str[source]¶

property index_dir: str¶: Return index directory - custom path if set, otherwise input_dir/.paperrag-index.

indexing: IndexingConfig¶

input_dir: str¶

llm: LLMConfig¶

classmethod load_snapshot(path: Path) → PaperRAGConfig[source]¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None¶

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

parser: ParserConfig¶

retriever: RetrieverConfig¶

save_snapshot(path: Path) → None[source]¶

snapshot() → dict[source]¶: Return a JSON-serialisable config snapshot for index metadata.

class paperrag.config.ParserConfig(*, extract_tables: bool = False, fallback_to_raw: bool = True, ocr_mode: Literal['auto', 'always', 'never'] = 'auto', manifest_file: str | None = None)[source]¶

Bases: BaseModel

PDF parsing configuration.

extract_tables: bool¶

fallback_to_raw: bool¶

manifest_file: str | None¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ocr_mode: Literal['auto', 'always', 'never']¶

class paperrag.config.RetrieverConfig(*, top_k: Annotated[int, Ge(ge=1)] = 5, score_threshold: Annotated[float, Ge(ge=0.0), Le(le=1.0)] = 0.1, use_mmr: bool = False, mmr_lambda: Annotated[float, Ge(ge=0.0), Le(le=1.0)] = 0.5, max_results_per_paper: Annotated[int, Ge(ge=1)] = 2)[source]¶

Bases: BaseModel

Retrieval configuration.

max_results_per_paper: int¶

mmr_lambda: float¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

score_threshold: float¶

top_k: int¶

use_mmr: bool¶

paperrag.config.apply_rc(cfg: PaperRAGConfig, overrides: dict) → None[source]¶: Apply .paperragrc overrides to a PaperRAGConfig instance.

paperrag.config.load_rc(path: Path) → dict[source]¶: Load a .paperragrc TOML file, returning a flat dict of overrides.

PDF Parser¶

PDF parsing module using Docling.

class paperrag.parser.ParsedPaper(file_path: str, file_hash: str, title: str, authors: str, sections: list[ParsedSection] = <factory>, raw_text: str = '', abstract: str = '', doi: str = '')[source]¶

Structured representation of a parsed PDF.

abstract: str = ''¶

authors: str¶

doi: str = ''¶

file_hash: str¶

file_path: str¶

raw_text: str = ''¶

sections: list[ParsedSection]¶

title: str¶

class paperrag.parser.ParsedSection(name: str, text: str)[source]¶

A single section extracted from a PDF.

name: str¶

text: str¶

paperrag.parser.compute_file_hash(path: Path) → str[source]¶: Compute SHA256 hash of a file.

paperrag.parser.compute_file_hashes_parallel(pdf_paths: list[Path], n_workers: int = 4) → dict[str, str][source]¶

Compute hashes for multiple PDFs in parallel.

Parameters:

pdf_paths – List of PDF file paths to hash
n_workers – Number of parallel worker threads

Returns:

Dictionary mapping str(pdf_path) to hash string

paperrag.parser.discover_pdfs(input_dir: Path) → list[Path][source]¶: Recursively find all PDF files under input_dir, or return a single PDF file.

paperrag.parser.has_text_layer(pdf_path: Path, min_chars: int = 100) → bool[source]¶

Detect if PDF has extractable text (i.e., not a scanned image).

Parameters:

pdf_path – Path to PDF file
min_chars – Minimum characters on first page to consider text-based

Returns:

True if PDF has text layer, False if likely scanned/image-based

paperrag.parser.load_manifest(manifest_path: Path) → dict[str, dict[str, str]][source]¶

Load CSV manifest with metadata to skip parsing.

Expected columns: filename, title, authors, abstract (optional), doi (optional) Returns dict: {filename: {title, authors, abstract, doi}}

paperrag.parser.parse_pdf(path: Path, config: ParserConfig | None = None, manifest: dict[str, dict[str, str]] | None = None) → ParsedPaper[source]¶

Parse a single PDF using Docling and return structured output.

Parameters:

path – Path to PDF file
config – Parser configuration
manifest – Optional manifest dict for fast metadata lookup

Chunker¶

Section-aware deterministic chunking module.

class paperrag.chunker.Chunk(chunk_id: int, hash_id: str, text: str, paper_title: str, section_name: str, file_path: str, file_hash: str)[source]¶

A single text chunk with full provenance metadata.

chunk_id: int¶

file_hash: str¶

file_path: str¶

classmethod from_dict(d: dict) → Chunk[source]¶

hash_id: str¶

paper_title: str¶

section_name: str¶

text: str¶

to_dict() → dict[source]¶

paperrag.chunker.chunk_paper(paper: ParsedPaper, config: ChunkerConfig | None = None) → list[Chunk][source]¶

Chunk a parsed paper into a list of Chunk objects.

Chunking is section-aware: each section is chunked independently and chunks carry the section name in their metadata. Ordering is deterministic.

paperrag.chunker.chunk_text(text: str, chunk_size: int, chunk_overlap: int) → list[str][source]¶

Split text into overlapping windows of characters.

Deterministic: same input always produces the same output list.

Embedder¶

Embedding module using sentence-transformers.

class paperrag.embedder.Embedder(config: EmbedderConfig | None = None)[source]¶

Wrapper around a SentenceTransformer model with batched encoding and deterministic seed control.

embed(texts: Sequence[str]) → ndarray[source]¶

Encode texts and return an (N, D) float32 array.

Uses batched encoding with the configured batch size.

Vector Store¶

FAISS-backed vector store with persistence and metadata tracking.

class paperrag.vectorstore.VectorStore(index_dir: Path, dimension: int)[source]¶

Bases: object

Manages a FAISS IndexFlatIP index plus chunk metadata on disk.

add(embeddings: ndarray, chunks: list[Chunk]) → None[source]¶: Add vectors and their corresponding chunk metadata.

classmethod exists(index_dir: Path) → bool[source]¶

get_file_hash(file_path: str) → str | None[source]¶

classmethod load(index_dir: Path) → VectorStore[source]¶: Load an existing index from disk.

remove_by_file(file_path: str) → None[source]¶

Remove all vectors belonging to file_path.

Because FAISS IndexFlatIP does not support selective removal we rebuild the index from the remaining vectors.

save(config: PaperRAGConfig | None = None) → None[source]¶: Write index, metadata, hashes, version with atomic operations.

search(query_vec: ndarray, top_k: int = 3, file_path: str | None = None) → list[tuple[dict, float]][source]¶: Return top-k (chunk_metadata, score) pairs, optionally filtered by file_path.

set_file_hash(file_path: str, file_hash: str) → None[source]¶

Retriever¶

Retriever: ties embedder + vector store for query-time retrieval.

class paperrag.retriever.RetrievalResult(text: str, score: float, paper_title: str, section_name: str, file_path: str, chunk_id: int)[source]¶

A single retrieval hit.

chunk_id: int¶

file_path: str¶

paper_title: str¶

score: float¶

section_name: str¶

text: str¶

class paperrag.retriever.Retriever(config: PaperRAGConfig, store: VectorStore | None = None)[source]¶

High-level retriever that loads an existing index and answers queries.

get_all_chunks_for_file(file_path: str) → list[RetrievalResult][source]¶

Return all chunks for a given file, ordered by chunk_id.

Used for full-document context mode where the entire paper is sent to the LLM instead of just top-k retrieval hits.

retrieve(query: str, top_k: int | None = None, file_path: str | None = None) → list[RetrievalResult][source]¶

Embed query and return the top-k results from the vector store.

Results are filtered by score_threshold - only results with similarity scores above the threshold are returned.

If use_mmr=True, uses Maximal Marginal Relevance for diversity.

retrieve_file_paths(query: str, top_k: int | None = None) → list[str][source]¶: Return list of file_path strings (useful for evaluation).

LLM¶

LLM module for local inference via Ollama or llama.cpp (GGUF / HuggingFace models).

Backend selection rules¶

Local *.gguf file path → llama-server (from brew install llama-cpp)
HuggingFace repo ID → download GGUF + llama-server (e.g. Qwen/Qwen3-1.7B-GGUF)
All other names → Ollama (unchanged)

Example usage¶

paperrag query "What is X?" --model qwen2.5:1.5b         # Ollama
paperrag query "What is X?" --model Qwen/Qwen3-1.7B-GGUF # HF download + llama-server
paperrag query "What is X?" --model /path/to/model.gguf  # local GGUF + llama-server

paperrag.llm.describe_llm_error(exc: Exception, model_name: str) → tuple[str, str | None][source]¶

Return (short_error, optional_hint) for a human-readable LLM error message.

The hint is non-None when there’s a concrete remediation action.

paperrag.llm.generate_answer(question: str, context_chunks: list[str], config: LLMConfig | None = None, conversation_history: list[dict] | None = None) → str[source]¶

Generate an answer using the configured LLM backend (blocking).

Backend selection:

HuggingFace repo IDs (org/repo) and local .gguf file paths use llama.cpp via llama-server (install: brew install llama-cpp).
All other model names delegate to Ollama.

Parameters:

conversation_history (list[dict] | None) – Optional list of previous messages (role/content dicts) to provide context for follow-up questions.
Examples:: –
# Ollama (unchanged) paperrag query “What is X?” –model qwen2.5:1.5b

# llama.cpp — download Qwen3 GGUF from HuggingFace automatically paperrag query “What is X?” –model Qwen/Qwen3-1.7B-GGUF

# llama.cpp — use a local GGUF file paperrag query “What is X?” –model /path/to/model.gguf

paperrag.llm.prewarm_ollama(config: LLMConfig) → bool[source]¶

Send a minimal 1-token request to load the Ollama model into memory.

Returns True if successful, False if Ollama is unreachable or llama-server backend. Only applies to the Ollama backend; llama-server has its own startup mechanism.

paperrag.llm.stream_answer(question: str, context_chunks: list[str], config: LLMConfig | None = None, source_files: list[str] | None = None, conversation_history: list[dict] | None = None) → Iterator[str][source]¶

Yield text chunks as they arrive from the LLM (streaming).

Backend selection:

HuggingFace repo IDs (org/repo) and local .gguf file paths use llama.cpp via llama-server.
All other model names delegate to Ollama.

Parameters:

conversation_history (list[dict] | None) – Optional list of previous messages (role/content dicts) to provide context for follow-up questions.
Usage:: –

for chunk in stream_answer(question, chunks, cfg.llm):
sys.stdout.write(chunk) sys.stdout.flush()

paperrag.llm.stream_followup(question: str, conversation_history: list[dict], config: LLMConfig | None = None) → Iterator[str][source]¶

Yield text chunks for a follow-up question using conversation history only.

This is used when retrieval returns no results but conversation history exists, allowing the LLM to answer based on previously discussed context.

Parallel Processing¶

Parallel PDF processing utilities.

paperrag.parallel.parallel_process_pdfs(pdf_paths: list[Path], parser_config: ParserConfig, chunker_config: ChunkerConfig, n_workers: int, timeout: int = 0, manifest: dict[str, dict[str, str]] | None = None) → list[tuple[Path, str | None, list[Chunk] | None, str | None]][source]¶

Process PDFs in parallel, return parsed results.

Parameters:

pdf_paths – List of PDF paths to process
parser_config – Parser configuration
chunker_config – Chunker configuration
n_workers – Number of worker processes
timeout – Timeout in seconds per PDF (0 = no timeout)
manifest – Optional manifest dict for fast metadata lookup

Returns:

(pdf_path, file_hash, chunks, error_message)

Return type:

List of tuples

paperrag.parallel.process_single_pdf(pdf_path: Path, parser_config: ParserConfig, chunker_config: ChunkerConfig, manifest: dict[str, dict[str, str]] | None = None) → tuple[Path, str | None, list[Chunk] | None, str | None][source]¶

Process one PDF: parse + chunk (NOT embed yet).

Parameters:

pdf_path – Path to PDF file
parser_config – Parser configuration
chunker_config – Chunker configuration
manifest – Optional manifest dict for fast metadata lookup

Returns:

Tuple of (pdf_path, file_hash, chunks, error_message) If successful: (pdf_path, file_hash, chunks, None) If failed: (pdf_path, None, None, error_message)

CLI¶

Typer CLI for PaperRAG.

paperrag.cli.entrypoint(ctx: Context, version: bool = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, topk: int = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) → None[source]¶

PaperRAG - local RAG for academic PDFs.

Starts an interactive REPL session using an existing index.

paperrag.cli.evaluate(benchmark_file: str = <typer.models.ArgumentInfo object>, top_k: int = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>) → None[source]¶

Evaluate retrieval quality using a JSONL benchmark.

Each line: {“question”: “…”, “relevant_documents”: [“path1”, …]}

paperrag.cli.export(query: str = <typer.models.OptionInfo object>, output_path: str = <typer.models.OptionInfo object>, format: str = <typer.models.OptionInfo object>, top_k: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>) → None[source]¶

Export query results to a file.

Retrieves and saves results in the specified format.

paperrag.cli.index(input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, force: bool = <typer.models.OptionInfo object>, checkpoint_interval: int = <typer.models.OptionInfo object>, workers: int = <typer.models.OptionInfo object>, ocr: str = <typer.models.OptionInfo object>, manifest: str = <typer.models.OptionInfo object>, embed_model: str = <typer.models.OptionInfo object>) → None[source]¶: Index PDF files into the FAISS vector store.

paperrag.cli.main() → None[source]¶

paperrag.cli.query(question: str = <typer.models.ArgumentInfo object>, top_k: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, no_llm: bool = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) → None[source]¶: Query the indexed papers.

paperrag.cli.review(input_path: str = <typer.models.ArgumentInfo object>, index_dir: str = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, topk: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, preset: str = <typer.models.OptionInfo object>, n_gpu_layers: int = <typer.models.OptionInfo object>, output: str = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) → None[source]¶

Index a PDF file (or directory) and start an interactive review session.

Convenience command for focused paper review — equivalent to running:

paperrag index –input-dir <path> && paperrag –index-dir <auto>

Examples

paperrag review paper.pdf

paperrag review paper.pdf –preset reviewer

paperrag review paper.pdf –preset reviewer –output review.md

paperrag review ./papers/ –topk 5

paperrag review paper.pdf –index-dir /tmp/my-index

paperrag.cli.status(index_dir: str = <typer.models.OptionInfo object>) → None[source]¶: Show index health information.

paperrag.cli.version_callback(value: bool) → None[source]¶: Display version and license information.

REPL¶

Interactive REPL for PaperRAG. REPL: Read, Evaluate, Print, Loop! This mode is first class in PaperRAG

paperrag.repl.start_repl(cfg: PaperRAGConfig | None = None, *, auto_focus: Path | None = None, review_mode: bool = False, output_path: Path | None = None) → None[source]¶: Launch the interactive REPL session.