Configuration

PaperRAG uses Pydantic models for configuration. Settings can be provided via CLI options, .paperragrc run control files, config snapshots (saved with the index), or code defaults.

Config Hierarchy

Settings are applied in this order (later overrides earlier):

  1. Defaults – Pydantic model defaults

  2. Global .paperragrc~/.paperragrc

  3. Local .paperragrc.paperragrc in current directory

  4. Config snapshot – loaded from <index-dir>/config_snapshot.json when an existing index is opened

  5. CLI options – highest priority

.paperragrc Run Control File

Create a .paperragrc file to set persistent defaults so you don’t have to pass CLI arguments every time. The file uses TOML format (parsed with Python’s built-in tomllib – no extra dependencies).

Scopes:

  • Global: ~/.paperragrc – applies to all PaperRAG invocations

  • Local: .paperragrc in the current working directory – project-specific overrides

Local values override global values. CLI arguments override both.

Example ~/.paperragrc:

# PaperRAG defaults
model = "qwen2.5:1.5b"
topk = 3
max-tokens = 256
temperature = 0.0
threshold = 0.1
index-dir = "/home/user/papers/.paperrag-index"
input-dir = "/home/user/papers"

Supported keys:

RC Key

Maps To

Type

model

LLM model name

str

topk

Number of chunks to retrieve

int

max-tokens

LLM max output tokens

int

temperature

LLM temperature

float

threshold

Minimum similarity score

float

index-dir

Index directory path

str

input-dir

PDF input directory path

str

ctx-size

LLM context window size

int

system-prompt

Override the system prompt

str

Unknown keys produce a warning but do not cause errors. Invalid TOML files are skipped with a warning.

Use rc in the REPL to see which RC files are loaded and their values.

Top-level: PaperRAGConfig

Field

Type

Default

Description

input_dir

str

~/Documents/Mendeley Desktop

PDF directory

index_dir

str

<input_dir>/.paperrag-index

Index storage directory (property)

parser

ParserConfig

PDF parsing settings

chunker

ChunkerConfig

Chunking settings

embedder

EmbedderConfig

Embedding model settings

retriever

RetrieverConfig

Retrieval settings

indexing

IndexingConfig

Indexing pipeline settings

llm

LLMConfig

LLM settings

ParserConfig

Field

Type

Default

Description

extract_tables

bool

False

Extract tables from PDFs

fallback_to_raw

bool

True

Fall back to raw text extraction on parse failure

ocr_mode

"auto" | "always" | "never"

"auto"

OCR strategy per PDF

manifest_file

str | None

None

CSV manifest path with paper metadata

ChunkerConfig

Field

Type

Default

Description

chunk_size

int

1000

Maximum chunk size in characters (min: 100)

chunk_overlap

int

200

Overlap between consecutive chunks (min: 0)

EmbedderConfig

Field

Type

Default

Description

model_name

str

sentence-transformers/all-MiniLM-L6-v2

Sentence-transformer model

batch_size

int

64

Embedding batch size

device

str | None

None

Device (cuda, cpu, or auto-detect)

normalize

bool

True

L2-normalize embeddings

seed

int

42

Random seed for reproducibility

RetrieverConfig

Field

Type

Default

Description

top_k

int

3

Number of results to return

score_threshold

float

0.1

Minimum similarity score (0.0 = no filtering)

use_mmr

bool

False

Use Maximal Marginal Relevance for diversity

mmr_lambda

float

0.5

MMR lambda (0 = max diversity, 1 = max relevance)

max_results_per_paper

int

2

Maximum results from the same paper

IndexingConfig

Field

Type

Default

Description

checkpoint_interval

int

50

Save index every N PDFs (0 = disabled)

n_workers

int

0

Parallel workers (0 = auto-detect)

pdf_timeout

int

300

Timeout per PDF in seconds (0 = no timeout)

enable_gc_per_batch

bool

True

Run garbage collection after each batch

log_memory_usage

bool

False

Log memory usage during indexing

continue_on_error

bool

True

Continue if individual PDFs fail

max_failures

int

-1

Stop after N failures (-1 = unlimited)

Worker Auto-detection

When n_workers is 0, PaperRAG calculates a safe worker count:

workers = min(cpu_cores - 1, (available_ram_gb - 2) / 2)

Each worker requires approximately 2 GB of RAM during peak Docling usage.

LLMConfig

Field

Type

Default

Description

model_name

str

qwen2.5:1.5b

LLM model name

system_prompt

str

(default researcher prompt)

System persona

temperature

float

0.0

Sampling temperature

max_tokens

int

256

Maximum response tokens

ctx_size

int

2048

Context window size