Skip to content

VerificationConfig Reference

This is the exhaustive reference for all VerificationConfig fields. For a tutorial introduction with examples, see Basic Verification.

VerificationConfig is a Pydantic model with 33 fields organized into 10 categories below.


Models

Field Type Default Description
answering_models list[ModelConfig] [] List of answering model configurations. Each defines a model that generates responses to benchmark questions. Default system prompt applied automatically if not set.
parsing_models list[ModelConfig] (required) List of parsing (judge) model configurations. At least one is required. Each defines a model that parses LLM responses into structured templates. Default system prompt applied automatically if not set.

Default system prompts (applied when model has no explicit system_prompt):

  • Answering: "You are an expert assistant. Answer the question accurately and concisely."
  • Parsing: "You are a validation assistant. Parse and validate responses against the given Pydantic template."

See ModelConfig Reference for all ModelConfig fields.


Execution

Field Type Default Description
replicate_count int 1 Number of times to run each question/model combination. Higher values allow measuring variance across runs. Must be >= 1.
parsing_only bool False When True, only parsing models are required (no answering models needed). Used for TaskEval and similar use cases where answers are pre-generated.

Evaluation Mode

Field Type Default Description
evaluation_mode Literal["template_only", "template_and_rubric", "rubric_only"] "template_only" Determines which pipeline stages run. template_only: template verification only. template_and_rubric: both template and rubric evaluation. rubric_only: skip template verification, evaluate rubrics on raw response. When set to template_and_rubric or rubric_only, rubric evaluation is automatically enabled.
rubric_trait_names list[str] \| None None Optional filter to evaluate only specific rubric traits by name. When None, all traits are evaluated.
rubric_evaluation_strategy Literal["batch", "sequential"] \| None "batch" How LLM rubric traits are evaluated. batch: all LLM traits in a single call (efficient, requires JSON output). sequential: traits evaluated one-by-one (more reliable, higher cost).
agentic_rubric_strategy Literal["individual", "shared"] "individual" How agentic rubric traits are evaluated. individual: one agent per trait (default, most reliable). shared: one agent evaluates all traits that share a model (efficient, but falls back to individual when models differ).
agentic_rubric_parallel bool False Reserved for future use. When implemented, will allow parallel evaluation of independent agentic traits.

Trace Filtering

Field Type Default Description
use_full_trace_for_template bool False If True, pass full agent trace to template parsing. If False, extract only the final AI message. The full trace is always captured in raw_llm_response regardless.
use_full_trace_for_rubric bool True If True, pass full agent trace to rubric evaluation. If False, extract only the final AI message. The full trace is always captured in raw_llm_response regardless.

Note

If use_full_trace_for_template=False and the trace doesn't end with an AI message, the trace validation stage will fail with an error.


Pre-Parsing Checks

Field Type Default Description
abstention_enabled bool False Enable abstention/refusal detection. When the model refuses to answer, parsing is skipped and the result is auto-failed.
sufficiency_enabled bool False Enable response sufficiency detection. When the response lacks enough information to fill the template, parsing is skipped and the result is auto-failed.

See Full Evaluation for usage examples.


Embedding Check

Field Type Default Env Var Description
embedding_check_enabled bool False EMBEDDING_CHECK Enable semantic similarity verification as a fallback after template verify().
embedding_check_model str "all-MiniLM-L6-v2" EMBEDDING_CHECK_MODEL SentenceTransformer model name for computing embeddings.
embedding_check_threshold float 0.85 EMBEDDING_CHECK_THRESHOLD Cosine similarity threshold. Constrained to [0.0, 1.0]. Values above this threshold are considered semantically matching.

Environment variable precedence: Env vars are applied only when the field is not explicitly set. Explicit arguments always take priority over env vars.


Async Execution

Field Type Default Env Var Description
async_enabled bool True KARENINA_ASYNC_ENABLED Enable parallel execution of verification across questions.
async_max_workers int 2 KARENINA_ASYNC_MAX_WORKERS Maximum number of concurrent verification workers when async is enabled. Must be >= 1.

Both sequential and parallel execution modes collect per-question errors without aborting. If any questions fail (or the parallel batch exceeds its timeout), VerificationBatchError is raised with partial_results and errors attributes so callers can recover partial progress. See Basic Verification: Error Handling for usage examples.


Deep Judgment — Templates

Field Type Default Description
deep_judgment_enabled bool False Enable multi-stage deep judgment analysis for template verification. Adds excerpt extraction, fuzzy matching, and reasoning to parsed results.
deep_judgment_max_excerpts_per_attribute int 3 Maximum number of excerpts to extract per template attribute during deep judgment.
deep_judgment_fuzzy_match_threshold float 0.80 Fuzzy match similarity threshold for validating excerpts against the original trace.
deep_judgment_excerpt_retry_attempts int 2 Number of retry attempts for excerpt extraction when fuzzy matching fails.
deep_judgment_search_enabled bool False Enable search-enhanced excerpt validation. When enabled, excerpts are verified against external evidence to detect hallucination.
deep_judgment_search_tool str \| Callable "tavily" Search tool for excerpt validation. Built-in: "tavily". Can also be any callable with signature (str \| list[str]) -> (str \| list[str]). Requires TAVILY_API_KEY for built-in tool.

Deep Judgment — Rubrics

Field Type Default Description
deep_judgment_rubric_mode Literal["disabled", "enable_all", "use_checkpoint", "custom"] "disabled" Controls how deep judgment is applied to rubric traits. disabled: off. enable_all: apply to all LLM traits. use_checkpoint: use settings saved in checkpoint. custom: use per-trait configuration from deep_judgment_rubric_config.
deep_judgment_rubric_global_excerpts bool True For enable_all mode: globally enable or disable excerpt extraction for all traits.
deep_judgment_rubric_config dict[str, Any] \| None None Per-trait configuration for custom mode. See structure below.
deep_judgment_rubric_max_excerpts_default int 7 Default maximum excerpts per rubric trait (used as fallback when per-trait config omits this setting).
deep_judgment_rubric_fuzzy_match_threshold_default float 0.80 Default fuzzy match threshold for rubric excerpt validation.
deep_judgment_rubric_excerpt_retry_attempts_default int 2 Default retry attempts for rubric excerpt extraction.
deep_judgment_rubric_search_tool str \| Callable "tavily" Search tool for rubric hallucination detection. Same options as deep_judgment_search_tool.

Custom Mode Config Structure

The deep_judgment_rubric_config dict (for custom mode) expects:

{
  "global": {
    "TraitName": {
      "enabled": true,
      "excerpt_enabled": true,
      "max_excerpts": 5,
      "fuzzy_match_threshold": 0.80,
      "excerpt_retry_attempts": 2,
      "search_enabled": false
    }
  },
  "question_specific": {
    "question-id": {
      "TraitName": {
        "enabled": true,
        "excerpt_enabled": false
      }
    }
  }
}

Each trait entry is validated as a DeepJudgmentTraitConfig with these fields:

Field Type Default Description
enabled bool True Whether deep judgment is enabled for this trait.
excerpt_enabled bool True Whether to extract excerpts for this trait.
max_excerpts int \| None None Max excerpts (falls back to deep_judgment_rubric_max_excerpts_default).
fuzzy_match_threshold float \| None None Fuzzy threshold (falls back to global default).
excerpt_retry_attempts int \| None None Retry attempts (falls back to global default).
search_enabled bool False Enable search validation for this trait's excerpts.

Agentic Parsing

Field Type Default Description
agentic_parsing bool False Enable agentic parsing (Stage 7b). The judge uses tools to independently verify artifacts before extracting structured data. Requires a parsing model with agent_tier='deep_agent'.
agentic_judge_context Literal["workspace_only", "trace_and_workspace", "trace_only"] "workspace_only" What context the investigation agent receives. workspace_only: question + workspace path. trace_and_workspace: answering agent trace + workspace path. trace_only: equivalent to classical Stage 7a parsing.
agentic_parsing_max_turns int 15 Max turns for the investigation agent. Must be >= 1.
agentic_parsing_timeout float 120.0 Timeout in seconds for the investigation agent. Must be >= 0.0.

Scenario Execution

Field Type Default Description
scenario_turn_limit int 20 Maximum turns before forced termination in scenario execution. Must be >= 1.

Additional Configuration

Field Type Default Description
few_shot_config FewShotConfig \| None None Few-shot prompting configuration. Controls example injection into prompts. See Few-Shot Configuration.
prompt_config PromptConfig \| None None Per-task prompt instruction overrides. Injects custom instructions into specific pipeline stages. See Full Evaluation for usage and PromptConfig Reference for all fields.
db_config DBConfig \| None None DBConfig instance for automatic result persistence to a database. When set, results are saved after each verification run. See DBConfig fields below.

DBConfig Fields

DBConfig controls the database connection for auto-saving verification results. Import from karenina.storage:

from karenina.storage import DBConfig

db_config = DBConfig(storage_url="sqlite:///results.db")
Field Type Default Description
storage_url str (required) SQLAlchemy database URL (e.g. sqlite:///results.db, postgresql://user:pass@host/db)
auto_create bool True Automatically create tables and views if missing
auto_commit bool True Commit transactions automatically after operations
echo bool False Log all SQL statements (useful for debugging)
pool_size int 5 Connection pool size (non-SQLite only)
max_overflow int 10 Max connections beyond pool_size (non-SQLite only)
pool_recycle int 3600 Recycle connections after N seconds (-1 to disable)
pool_pre_ping bool True Test connections before use

SQLite databases automatically set pool_size=1 and max_overflow=0.

Auto-save is controlled by the AUTOSAVE_DATABASE environment variable (true/false, default true). Auto-save only runs when db_config is set — without it, no database writes occur. Auto-save is non-blocking: failures are logged but do not raise exceptions.


Convenience Methods

from_overrides()

Create a VerificationConfig by applying selective overrides to an optional base config. This is the canonical way to construct configs programmatically.

config = VerificationConfig.from_overrides(
    answering_model="claude-haiku-4-5",
    answering_provider="anthropic",
    answering_id="my-answering",
    parsing_model="claude-haiku-4-5",
    parsing_provider="anthropic",
    parsing_id="my-parsing",
    evaluation_mode="template_and_rubric",
    abstention=True,
)
Parameter Maps To Description
answering_model answering_models[0].model_name Answering model name
answering_provider answering_models[0].model_provider Answering model provider
answering_id answering_models[0].id Answering model identifier
answering_interface answering_models[0].interface Answering adapter interface
parsing_model parsing_models[0].model_name Parsing model name
parsing_provider parsing_models[0].model_provider Parsing model provider
parsing_id parsing_models[0].id Parsing model identifier
parsing_interface parsing_models[0].interface Parsing adapter interface
temperature Both models' temperature Shared temperature override
manual_traces answering_models[0].manual_traces Pre-recorded traces (sets interface to manual)
replicate_count replicate_count Number of replicates
abstention abstention_enabled Enable abstention detection
sufficiency sufficiency_enabled Enable sufficiency detection
embedding_check embedding_check_enabled Enable embedding check
deep_judgment deep_judgment_enabled Enable template deep judgment
evaluation_mode evaluation_mode Sets the evaluation mode
embedding_threshold embedding_check_threshold Embedding similarity threshold
embedding_model embedding_check_model Embedding model name
async_execution async_enabled Enable async execution
async_workers async_max_workers Number of async workers
use_full_trace_for_template use_full_trace_for_template Trace filtering for templates
use_full_trace_for_rubric use_full_trace_for_rubric Trace filtering for rubrics
deep_judgment_rubric_mode deep_judgment_rubric_mode Rubric deep judgment mode
deep_judgment_rubric_excerpts deep_judgment_rubric_global_excerpts Global excerpt toggle
deep_judgment_rubric_max_excerpts deep_judgment_rubric_max_excerpts_default Max excerpts per trait
deep_judgment_rubric_fuzzy_threshold deep_judgment_rubric_fuzzy_match_threshold_default Fuzzy match threshold
deep_judgment_rubric_retry_attempts deep_judgment_rubric_excerpt_retry_attempts_default Retry attempts
deep_judgment_rubric_search_tool deep_judgment_rubric_search_tool Rubric search tool
deep_judgment_rubric_config deep_judgment_rubric_config Custom per-trait config

Preset Methods

Method Description
save_preset(name, description, presets_dir) Save config as a preset JSON file
from_preset(filepath) Load a VerificationConfig from a preset file
sanitize_preset_name(name) Convert preset name to safe filename
validate_preset_metadata(name, description) Validate preset name and description

See Presets for usage details.

Inspection Methods

Method Returns Description
get_few_shot_config() FewShotConfig \| None Get the active few-shot configuration
is_few_shot_enabled() bool Check if few-shot prompting is enabled

Configuration Precedence

Fields are resolved in this order (highest priority first):

  1. Explicit arguments passed to the constructor or from_overrides()
  2. Environment variables (only for fields that support them — embedding and async settings)
  3. Field defaults defined on the class

See Configuration Hierarchy for the full precedence model including presets and CLI arguments.