API Reference¶
This page documents PaperRAG’s Python modules. All public classes and functions are listed below.
Configuration¶
Configuration module using Pydantic models.
- class paperrag.config.ChunkerConfig(*, chunk_size: Annotated[int, Ge(ge=100)] = 1000, chunk_overlap: Annotated[int, Ge(ge=0)] = 200)[source]¶
Bases:
BaseModelChunking configuration.
- chunk_overlap: int¶
- chunk_size: int¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class paperrag.config.EmbedderConfig(*, model_name: str = 'sentence-transformers/all-MiniLM-L6-v2', batch_size: Annotated[int, Ge(ge=1)] = 64, device: str | None = None, normalize: bool = True, seed: int = 42)[source]¶
Bases:
BaseModelEmbedding model configuration.
- batch_size: int¶
- device: str | None¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_name: str¶
- normalize: bool¶
- seed: int¶
- class paperrag.config.IndexingConfig(*, checkpoint_interval: Annotated[int, Ge(ge=0)] = 50, n_workers: Annotated[int, Ge(ge=0)] = 0, pdf_timeout: Annotated[int, Ge(ge=0)] = 300, enable_gc_per_batch: bool = True, log_memory_usage: bool = False, continue_on_error: bool = True, max_failures: int = -1)[source]¶
Bases:
BaseModelIndexing configuration.
- checkpoint_interval: int¶
- continue_on_error: bool¶
- enable_gc_per_batch: bool¶
- get_n_workers() int[source]¶
Get actual worker count, auto-detecting if needed.
Uses RAM-aware calculation to prevent OOM kills: - Each worker needs ~2GB during peak Docling usage - Formula: min(cpu_cores - 1, available_ram_gb // 2)
- log_memory_usage: bool¶
- max_failures: int¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- n_workers: int¶
- pdf_timeout: int¶
- class paperrag.config.LLMConfig(*, model_name: str = 'qwen2.5:1.5b', system_prompt: str = 'You are a helpful research assistant. Answer based on the provided context. If the context does not contain relevant information, say so. Be concise and cite sources.', temperature: float = 0.0, max_tokens: int = 1024, ctx_size: Annotated[int, Ge(ge=512)] = 4096, n_gpu_layers: Annotated[int, Ge(ge=0)] = 0, n_threads: Annotated[int, Ge(ge=0)] = 0, think: bool = False)[source]¶
Bases:
BaseModelLLM configuration.
- ctx_size: int¶
- max_tokens: int¶
- model_config = {'extra': 'ignore'}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_name: str¶
- n_gpu_layers: int¶
- n_threads: int¶
- system_prompt: str¶
- temperature: float¶
- think: bool¶
- class paperrag.config.PaperRAGConfig(*, input_dir: str = <factory>, parser: ParserConfig = <factory>, chunker: ChunkerConfig = <factory>, embedder: EmbedderConfig = <factory>, retriever: RetrieverConfig = <factory>, indexing: IndexingConfig = <factory>, llm: LLMConfig = <factory>)[source]¶
Bases:
BaseModelTop-level configuration.
- chunker: ChunkerConfig¶
- embedder: EmbedderConfig¶
- property index_dir: str¶
Return index directory - custom path if set, otherwise input_dir/.paperrag-index.
- indexing: IndexingConfig¶
- input_dir: str¶
- classmethod load_snapshot(path: Path) PaperRAGConfig[source]¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_post_init(context: Any, /) None¶
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self – The BaseModel instance.
context – The context.
- parser: ParserConfig¶
- retriever: RetrieverConfig¶
- class paperrag.config.ParserConfig(*, extract_tables: bool = False, fallback_to_raw: bool = True, ocr_mode: Literal['auto', 'always', 'never'] = 'auto', manifest_file: str | None = None)[source]¶
Bases:
BaseModelPDF parsing configuration.
- extract_tables: bool¶
- fallback_to_raw: bool¶
- manifest_file: str | None¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- ocr_mode: Literal['auto', 'always', 'never']¶
- class paperrag.config.RetrieverConfig(*, top_k: Annotated[int, Ge(ge=1)] = 5, score_threshold: Annotated[float, Ge(ge=0.0), Le(le=1.0)] = 0.1, use_mmr: bool = False, mmr_lambda: Annotated[float, Ge(ge=0.0), Le(le=1.0)] = 0.5, max_results_per_paper: Annotated[int, Ge(ge=1)] = 2)[source]¶
Bases:
BaseModelRetrieval configuration.
- max_results_per_paper: int¶
- mmr_lambda: float¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- score_threshold: float¶
- top_k: int¶
- use_mmr: bool¶
- paperrag.config.apply_rc(cfg: PaperRAGConfig, overrides: dict) None[source]¶
Apply .paperragrc overrides to a PaperRAGConfig instance.
PDF Parser¶
PDF parsing module using Docling.
- class paperrag.parser.ParsedPaper(file_path: str, file_hash: str, title: str, authors: str, sections: list[ParsedSection] = <factory>, raw_text: str = '', abstract: str = '', doi: str = '')[source]¶
Structured representation of a parsed PDF.
- abstract: str = ''¶
- authors: str¶
- doi: str = ''¶
- file_hash: str¶
- file_path: str¶
- raw_text: str = ''¶
- sections: list[ParsedSection]¶
- title: str¶
- class paperrag.parser.ParsedSection(name: str, text: str)[source]¶
A single section extracted from a PDF.
- name: str¶
- text: str¶
- paperrag.parser.compute_file_hashes_parallel(pdf_paths: list[Path], n_workers: int = 4) dict[str, str][source]¶
Compute hashes for multiple PDFs in parallel.
- Parameters:
pdf_paths – List of PDF file paths to hash
n_workers – Number of parallel worker threads
- Returns:
Dictionary mapping str(pdf_path) to hash string
- paperrag.parser.discover_pdfs(input_dir: Path) list[Path][source]¶
Recursively find all PDF files under input_dir, or return a single PDF file.
- paperrag.parser.has_text_layer(pdf_path: Path, min_chars: int = 100) bool[source]¶
Detect if PDF has extractable text (i.e., not a scanned image).
- Parameters:
pdf_path – Path to PDF file
min_chars – Minimum characters on first page to consider text-based
- Returns:
True if PDF has text layer, False if likely scanned/image-based
- paperrag.parser.load_manifest(manifest_path: Path) dict[str, dict[str, str]][source]¶
Load CSV manifest with metadata to skip parsing.
Expected columns: filename, title, authors, abstract (optional), doi (optional) Returns dict: {filename: {title, authors, abstract, doi}}
- paperrag.parser.parse_pdf(path: Path, config: ParserConfig | None = None, manifest: dict[str, dict[str, str]] | None = None) ParsedPaper[source]¶
Parse a single PDF using Docling and return structured output.
- Parameters:
path – Path to PDF file
config – Parser configuration
manifest – Optional manifest dict for fast metadata lookup
Chunker¶
Section-aware deterministic chunking module.
- class paperrag.chunker.Chunk(chunk_id: int, hash_id: str, text: str, paper_title: str, section_name: str, file_path: str, file_hash: str)[source]¶
A single text chunk with full provenance metadata.
- chunk_id: int¶
- file_hash: str¶
- file_path: str¶
- hash_id: str¶
- paper_title: str¶
- section_name: str¶
- text: str¶
- paperrag.chunker.chunk_paper(paper: ParsedPaper, config: ChunkerConfig | None = None) list[Chunk][source]¶
Chunk a parsed paper into a list of Chunk objects.
Chunking is section-aware: each section is chunked independently and chunks carry the section name in their metadata. Ordering is deterministic.
Embedder¶
Embedding module using sentence-transformers.
- class paperrag.embedder.Embedder(config: EmbedderConfig | None = None)[source]¶
Wrapper around a SentenceTransformer model with batched encoding and deterministic seed control.
Vector Store¶
FAISS-backed vector store with persistence and metadata tracking.
- class paperrag.vectorstore.VectorStore(index_dir: Path, dimension: int)[source]¶
Bases:
objectManages a FAISS IndexFlatIP index plus chunk metadata on disk.
- add(embeddings: ndarray, chunks: list[Chunk]) None[source]¶
Add vectors and their corresponding chunk metadata.
- classmethod load(index_dir: Path) VectorStore[source]¶
Load an existing index from disk.
- remove_by_file(file_path: str) None[source]¶
Remove all vectors belonging to file_path.
Because FAISS IndexFlatIP does not support selective removal we rebuild the index from the remaining vectors.
- save(config: PaperRAGConfig | None = None) None[source]¶
Write index, metadata, hashes, version with atomic operations.
Retriever¶
Retriever: ties embedder + vector store for query-time retrieval.
- class paperrag.retriever.RetrievalResult(text: str, score: float, paper_title: str, section_name: str, file_path: str, chunk_id: int)[source]¶
A single retrieval hit.
- chunk_id: int¶
- file_path: str¶
- paper_title: str¶
- score: float¶
- section_name: str¶
- text: str¶
- class paperrag.retriever.Retriever(config: PaperRAGConfig, store: VectorStore | None = None)[source]¶
High-level retriever that loads an existing index and answers queries.
- get_all_chunks_for_file(file_path: str) list[RetrievalResult][source]¶
Return all chunks for a given file, ordered by chunk_id.
Used for full-document context mode where the entire paper is sent to the LLM instead of just top-k retrieval hits.
- retrieve(query: str, top_k: int | None = None, file_path: str | None = None) list[RetrievalResult][source]¶
Embed query and return the top-k results from the vector store.
Results are filtered by score_threshold - only results with similarity scores above the threshold are returned.
If use_mmr=True, uses Maximal Marginal Relevance for diversity.
LLM¶
LLM module for local inference via Ollama or llama.cpp (GGUF / HuggingFace models).
Backend selection rules¶
Local
*.gguffile path → llama-server (frombrew install llama-cpp)HuggingFace repo ID → download GGUF + llama-server (e.g.
Qwen/Qwen3-1.7B-GGUF)All other names → Ollama (unchanged)
Example usage¶
paperrag query "What is X?" --model qwen2.5:1.5b # Ollama
paperrag query "What is X?" --model Qwen/Qwen3-1.7B-GGUF # HF download + llama-server
paperrag query "What is X?" --model /path/to/model.gguf # local GGUF + llama-server
- paperrag.llm.describe_llm_error(exc: Exception, model_name: str) tuple[str, str | None][source]¶
Return (short_error, optional_hint) for a human-readable LLM error message.
The hint is non-None when there’s a concrete remediation action.
- paperrag.llm.generate_answer(question: str, context_chunks: list[str], config: LLMConfig | None = None, conversation_history: list[dict] | None = None) str[source]¶
Generate an answer using the configured LLM backend (blocking).
Backend selection:
HuggingFace repo IDs (
org/repo) and local.gguffile paths use llama.cpp viallama-server(install:brew install llama-cpp).All other model names delegate to Ollama.
- Parameters:
conversation_history (list[dict] | None) – Optional list of previous messages (role/content dicts) to provide context for follow-up questions.
Examples:: –
# Ollama (unchanged) paperrag query “What is X?” –model qwen2.5:1.5b
# llama.cpp — download Qwen3 GGUF from HuggingFace automatically paperrag query “What is X?” –model Qwen/Qwen3-1.7B-GGUF
# llama.cpp — use a local GGUF file paperrag query “What is X?” –model /path/to/model.gguf
- paperrag.llm.prewarm_ollama(config: LLMConfig) bool[source]¶
Send a minimal 1-token request to load the Ollama model into memory.
Returns True if successful, False if Ollama is unreachable or llama-server backend. Only applies to the Ollama backend; llama-server has its own startup mechanism.
- paperrag.llm.stream_answer(question: str, context_chunks: list[str], config: LLMConfig | None = None, source_files: list[str] | None = None, conversation_history: list[dict] | None = None) Iterator[str][source]¶
Yield text chunks as they arrive from the LLM (streaming).
Backend selection:
HuggingFace repo IDs (
org/repo) and local.gguffile paths use llama.cpp viallama-server.All other model names delegate to Ollama.
- Parameters:
conversation_history (list[dict] | None) – Optional list of previous messages (role/content dicts) to provide context for follow-up questions.
Usage:: –
- for chunk in stream_answer(question, chunks, cfg.llm):
sys.stdout.write(chunk) sys.stdout.flush()
- paperrag.llm.stream_followup(question: str, conversation_history: list[dict], config: LLMConfig | None = None) Iterator[str][source]¶
Yield text chunks for a follow-up question using conversation history only.
This is used when retrieval returns no results but conversation history exists, allowing the LLM to answer based on previously discussed context.
Parallel Processing¶
Parallel PDF processing utilities.
- paperrag.parallel.parallel_process_pdfs(pdf_paths: list[Path], parser_config: ParserConfig, chunker_config: ChunkerConfig, n_workers: int, timeout: int = 0, manifest: dict[str, dict[str, str]] | None = None) list[tuple[Path, str | None, list[Chunk] | None, str | None]][source]¶
Process PDFs in parallel, return parsed results.
- Parameters:
pdf_paths – List of PDF paths to process
parser_config – Parser configuration
chunker_config – Chunker configuration
n_workers – Number of worker processes
timeout – Timeout in seconds per PDF (0 = no timeout)
manifest – Optional manifest dict for fast metadata lookup
- Returns:
(pdf_path, file_hash, chunks, error_message)
- Return type:
List of tuples
- paperrag.parallel.process_single_pdf(pdf_path: Path, parser_config: ParserConfig, chunker_config: ChunkerConfig, manifest: dict[str, dict[str, str]] | None = None) tuple[Path, str | None, list[Chunk] | None, str | None][source]¶
Process one PDF: parse + chunk (NOT embed yet).
- Parameters:
pdf_path – Path to PDF file
parser_config – Parser configuration
chunker_config – Chunker configuration
manifest – Optional manifest dict for fast metadata lookup
- Returns:
Tuple of (pdf_path, file_hash, chunks, error_message) If successful: (pdf_path, file_hash, chunks, None) If failed: (pdf_path, None, None, error_message)
CLI¶
Typer CLI for PaperRAG.
- paperrag.cli.entrypoint(ctx: Context, version: bool = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, topk: int = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) None[source]¶
PaperRAG - local RAG for academic PDFs.
Starts an interactive REPL session using an existing index.
- paperrag.cli.evaluate(benchmark_file: str = <typer.models.ArgumentInfo object>, top_k: int = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>) None[source]¶
Evaluate retrieval quality using a JSONL benchmark.
Each line: {“question”: “…”, “relevant_documents”: [“path1”, …]}
- paperrag.cli.export(query: str = <typer.models.OptionInfo object>, output_path: str = <typer.models.OptionInfo object>, format: str = <typer.models.OptionInfo object>, top_k: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>) None[source]¶
Export query results to a file.
Retrieves and saves results in the specified format.
- paperrag.cli.index(input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, force: bool = <typer.models.OptionInfo object>, checkpoint_interval: int = <typer.models.OptionInfo object>, workers: int = <typer.models.OptionInfo object>, ocr: str = <typer.models.OptionInfo object>, manifest: str = <typer.models.OptionInfo object>, embed_model: str = <typer.models.OptionInfo object>) None[source]¶
Index PDF files into the FAISS vector store.
- paperrag.cli.query(question: str = <typer.models.ArgumentInfo object>, top_k: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, input_dir: str = <typer.models.OptionInfo object>, index_dir: str = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, no_llm: bool = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) None[source]¶
Query the indexed papers.
- paperrag.cli.review(input_path: str = <typer.models.ArgumentInfo object>, index_dir: str = <typer.models.OptionInfo object>, model: str = <typer.models.OptionInfo object>, topk: int = <typer.models.OptionInfo object>, threshold: float = <typer.models.OptionInfo object>, temperature: float = <typer.models.OptionInfo object>, max_tokens: int = <typer.models.OptionInfo object>, ctx_size: int = <typer.models.OptionInfo object>, system_prompt: str = <typer.models.OptionInfo object>, preset: str = <typer.models.OptionInfo object>, n_gpu_layers: int = <typer.models.OptionInfo object>, output: str = <typer.models.OptionInfo object>, think: bool = <typer.models.OptionInfo object>) None[source]¶
Index a PDF file (or directory) and start an interactive review session.
Convenience command for focused paper review — equivalent to running:
paperrag index –input-dir <path> && paperrag –index-dir <auto>
Examples
paperrag review paper.pdf
paperrag review paper.pdf –preset reviewer
paperrag review paper.pdf –preset reviewer –output review.md
paperrag review ./papers/ –topk 5
paperrag review paper.pdf –index-dir /tmp/my-index
REPL¶
Interactive REPL for PaperRAG. REPL: Read, Evaluate, Print, Loop! This mode is first class in PaperRAG
- paperrag.repl.start_repl(cfg: PaperRAGConfig | None = None, *, auto_focus: Path | None = None, review_mode: bool = False, output_path: Path | None = None) None[source]¶
Launch the interactive REPL session.