PromptConfig Reference¶

This is the exhaustive reference for all PromptConfig fields. For a tutorial introduction with examples, see Full Evaluation.

PromptConfig is a Pydantic model with 7 fields that inject custom user instructions into specific pipeline LLM calls. All fields are optional strings defaulting to None.

Import path: from karenina.schemas.verification import PromptConfig

Fields¶

Field	Type	Default	Pipeline Stage	Description
`generation`	`str \\| None`	`None`	Answer generation	Custom instructions for the answering LLM call. Appended to the system prompt when the answering model generates a response to the benchmark question.
`parsing`	`str \\| None`	`None`	Template parsing	Custom instructions for the judge LLM parsing call. Appended to the system prompt when the parsing model extracts structured attributes from raw responses into the answer template schema.
`abstention_detection`	`str \\| None`	`None`	Abstention check	Custom instructions for the abstention detection LLM call. Guides the model when determining whether the answering model refused to answer, evaded the question, or deflected.
`sufficiency_detection`	`str \\| None`	`None`	Sufficiency check	Custom instructions for the sufficiency detection LLM call. Guides the model when assessing whether the response contains enough information to populate the answer template.
`rubric_evaluation`	`str \\| None`	`None`	Rubric evaluation	Fallback instructions for all `rubric_*` sub-tasks (see Covered Sub-Tasks below). Applied when no direct field matches the specific rubric task.
`agentic_parsing`	`str \\| None`	`None`	Agentic template parsing	Fallback instructions for all `agentic_parsing_*` sub-tasks (see Covered Sub-Tasks below). Applied when agentic parsing is enabled and no direct field matches.
`deep_judgment`	`str \\| None`	`None`	Deep judgment	Fallback instructions for all `dj_*` sub-tasks (see Covered Sub-Tasks below). Applied when no direct field matches the specific deep judgment task.

Configuration: extra="forbid" — PromptConfig rejects any fields not listed above.

Direct vs Fallback Fields¶

The 7 fields fall into two categories:

Direct fields (4) — each maps to exactly one pipeline LLM call type:

generation — answer generation
parsing — template parsing
abstention_detection — abstention check
sufficiency_detection — sufficiency check

Fallback fields (3) — each covers multiple sub-tasks within a pipeline stage:

rubric_evaluation — covers 7 rubric evaluation sub-tasks
agentic_parsing — covers 2 agentic template parsing sub-tasks
deep_judgment — covers 7 deep judgment sub-tasks

Rubric Evaluation Sub-Tasks¶

The rubric_evaluation field serves as a fallback for these 7 PromptTask values:

Sub-Task	Description
`rubric_llm_trait_batch`	Evaluates all boolean/score LLM rubric traits in a single batched call
`rubric_llm_trait_single`	Evaluates a single boolean/score LLM rubric trait sequentially
`rubric_literal_trait_batch`	Evaluates all literal (categorical) rubric traits in a single batched call
`rubric_literal_trait_single`	Evaluates a single literal (categorical) rubric trait sequentially
`rubric_metric_trait`	Evaluates a metric rubric trait via confusion matrix extraction
`rubric_agentic_trait_investigation`	Agent investigates response/workspace to evaluate an agentic rubric trait
`rubric_agentic_trait_extraction`	Extracts the final score from an agentic rubric investigation trace

Which sub-tasks run depends on the rubric_evaluation_strategy setting in VerificationConfig:

"batch" (default) — uses *_batch sub-tasks
"sequential" — uses *_single sub-tasks

Agentic Parsing Sub-Tasks¶

The agentic_parsing field serves as a fallback for these 2 PromptTask values:

Sub-Task	Description
`agentic_parsing_investigation`	Investigation agent examines workspace/trace to evaluate answer template attributes
`agentic_parsing_extraction`	Extracts structured answer from the agentic investigation trace

These sub-tasks run only when agentic_parsing=True on VerificationConfig. The investigation step uses an AgentPort with tool access to verify workspace artifacts; the extraction step uses a ParserPort to produce the final structured answer.

Deep Judgment Sub-Tasks¶

The deep_judgment field serves as a fallback for these 7 PromptTask values:

Sub-Task	Flow	Description
`dj_template_excerpt_extraction`	Template	Extracts verbatim excerpts from the response per template attribute
`dj_template_hallucination`	Template	Assesses hallucination risk for extracted excerpts via web search
`dj_template_reasoning`	Template	Generates reasoning mapping excerpts to template attributes
`dj_rubric_excerpt_extraction`	Rubric	Extracts excerpts supporting deep-judgment-enabled rubric traits
`dj_rubric_hallucination`	Rubric	Assesses per-excerpt hallucination risk using search results
`dj_rubric_reasoning`	Rubric	Generates reasoning explaining trait evaluation based on excerpts
`dj_rubric_score_extraction`	Rubric	Extracts the final score from deep judgment reasoning

Resolution Logic¶

The get_for_task(task_value) method resolves instructions in this order:

Direct match — if a PromptConfig field name matches task_value exactly, return that field's value
Category fallback — if task_value starts with rubric_, return rubric_evaluation; if it starts with agentic_parsing_, return agentic_parsing; if it starts with dj_, return deep_judgment
None — no custom instructions for this task

get_for_task("parsing")                       → self.parsing              (direct match)
get_for_task("rubric_llm_trait_batch")        → self.rubric_evaluation    (fallback)
get_for_task("agentic_parsing_investigation") → self.agentic_parsing      (fallback)
get_for_task("dj_rubric_reasoning")           → self.deep_judgment        (fallback)
get_for_task("unknown_task")                  → None                      (no match)

All PromptTask Values¶

The complete list of 21 PromptTask enum values recognized by the pipeline, grouped by the PromptConfig field that covers them:

PromptConfig Field	PromptTask Values
`generation`	`generation`
`parsing`	`parsing`
`abstention_detection`	`abstention_detection`
`sufficiency_detection`	`sufficiency_detection`
`rubric_evaluation`	`rubric_llm_trait_batch`, `rubric_llm_trait_single`, `rubric_literal_trait_batch`, `rubric_literal_trait_single`, `rubric_metric_trait`, `rubric_agentic_trait_investigation`, `rubric_agentic_trait_extraction`
`agentic_parsing`	`agentic_parsing_investigation`, `agentic_parsing_extraction`
`deep_judgment`	`dj_template_excerpt_extraction`, `dj_template_hallucination`, `dj_template_reasoning`, `dj_rubric_excerpt_extraction`, `dj_rubric_hallucination`, `dj_rubric_reasoning`, `dj_rubric_score_extraction`

Source: karenina.benchmark.verification.prompts.task_types.PromptTask

How Instructions Are Applied¶

PromptConfig participates in the tri-section prompt assembly pattern. When the pipeline builds a prompt for any LLM call:

┌─────────────────────────────┐
│  1. Task instructions       │  ← Built-in prompts for each stage
├─────────────────────────────┤
│  2. Adapter instructions    │  ← Per-interface adjustments (registered per adapter)
├─────────────────────────────┤
│  3. User instructions       │  ← Your PromptConfig text (appended last)
└─────────────────────────────┘

Your custom text is appended to the system prompt after all built-in and adapter instructions:

You cannot override built-in instructions, only supplement them
Your instructions are seen by the LLM as additional guidance
The order gives your instructions the highest positional priority in the prompt

See Prompt Assembly System for implementation details.

Usage in VerificationConfig¶

PromptConfig is passed as a field on VerificationConfig:

from karenina.schemas.verification import VerificationConfig, PromptConfig

config = VerificationConfig(
    prompt_config=PromptConfig(
        parsing="Normalize gene names to official HGNC symbols.",
        rubric_evaluation="Apply strict grading standards.",
    ),
    # ... other config fields
)

In Presets¶

PromptConfig serializes as a nested JSON object within the preset "config" key:

{
  "name": "strict-scoring",
  "config": {
    "prompt_config": {
      "parsing": "Normalize gene names to official HGNC symbols.",
      "rubric_evaluation": "Apply strict grading standards."
    }
  }
}

Only non-null fields need to be included — omitted fields default to None.