Skip to content

Karenina Documentation

Home

Initializing search

biocypher/karenina

Home
Getting Started
Core Concepts
Workflows
Reference
Advanced
Contributing

Karenina Documentation

biocypher/karenina

Home
Home
- Philosophy
- Reading Paths
Getting Started
Getting Started
Core Concepts
Core Concepts
- Questions & Benchmarks
  Questions & Benchmarks
- TaskEval
- Answer Templates
- Verification Primitives
- Rubric Traits
  Rubric Traits
- Scenarios
  Scenarios
- Templates vs Rubrics
- Evaluation Modes
- Results and Scoring
- Question Context
  Question Context
  - ADeLe: Question Classification
  - Few-Shot Examples
- Pipeline
  Pipeline
- Infrastructure
  Infrastructure
Workflows
Workflows
- Evaluating with TaskEval
- Configuration
  Configuration
  - Environment Variables
  - Presets
- Creating Benchmarks
  Creating Benchmarks
- Running Verification
  Running Verification
- Analyzing Results
  Analyzing Results
- Scenarios
  Scenarios
  - Sycophancy Detection: A Scenario Tutorial
Reference
Reference
- Configuration Reference
  Configuration Reference
- CLI Reference
  CLI Reference
  - verify
  - verify-status
  - preset
  - serve
  - init
  - optimize
- API Reference
  API Reference
  - Core API
    Core API
    
    karenina
    karenina
    
    karenina
    
    adapters
    adapters
    
    karenina.adapters
    
    claude_agent_sdk
    
    claude_tool
    
    factory
    
    langchain
    
    langchain_deep_agents
    
    llm_parallel
    
    manual
    
    registry
    
    benchmark
    benchmark
    
    karenina.benchmark
    
    authoring
    
    benchmark
    
    benchmark_helpers
    
    core
    
    task_eval
    
    verification
    
    cli
    
    exceptions
    
    integrations
    
    ports
    ports
    
    karenina.ports
    
    adapter_instruction
    
    agent
    
    capabilities
    
    errors
    
    llm
    
    messages
    
    parser
    
    usage
    
    scenario
    
    schemas
    schemas
    
    karenina.schemas
    
    checkpoint
    
    config
    config
    
    karenina.schemas.config
    
    models
    
    dataframes
    dataframes
    
    karenina.schemas.dataframes
    
    judgment
    
    rubric
    
    template
    
    entities
    entities
    
    karenina.schemas.entities
    
    answer
    
    composition
    
    question
    
    rubric
    
    template_spec
    
    verified_field
    
    outputs
    outputs
    
    karenina.schemas.outputs
    
    rubric
    
    primitives
    
    results
    results
    
    karenina.schemas.results
    
    aggregation
    
    judgment
    
    rubric
    
    rubric_judgment
    
    template
    
    verification_result_set
    
    scenario
    
    shared
    
    verification
    verification
    
    karenina.schemas.verification
    
    api_models
    
    config
    
    config_presets
    
    job
    
    model_identity
    
    prompt_config
    
    result
    
    result_components
    
    storage
    
    utils
  - Internals
    Internals
    
    karenina
    karenina
    
    adapters
    adapters
    
    claude_agent_sdk
    claude_agent_sdk
    
    agent
    
    availability
    
    errors
    
    llm
    
    mcp
    
    messages
    
    parser
    
    prompts
    prompts
    
    karenina.adapters.claude_agent_sdk.prompts
    
    deep_judgment
    
    parsing
    
    rubric
    
    registration
    
    trace
    
    usage
    
    claude_tool
    claude_tool
    
    agent
    
    llm
    
    mcp
    
    messages
    
    parser
    
    prompts
    prompts
    
    karenina.adapters.claude_tool.prompts
    
    deep_judgment
    
    parsing
    
    rubric
    
    registration
    
    tools
    
    trace
    
    usage
    
    langchain
    langchain
    
    agent
    
    initialization
    
    llm
    
    mcp
    
    messages
    
    middleware
    
    models
    
    parser
    
    prompts
    prompts
    
    karenina.adapters.langchain.prompts
    
    deep_judgment
    
    feedback_format
    
    feedback_null
    
    format_instructions
    
    parsing
    
    rubric
    
    summarization
    
    registration
    
    trace
    
    usage
    
    langchain_deep_agents
    langchain_deep_agents
    
    agent
    
    availability
    
    errors
    
    initialization
    
    llm
    
    mcp
    
    messages
    
    parser
    
    prompts
    prompts
    
    karenina.adapters.langchain_deep_agents.prompts
    
    deep_judgment
    
    parsing
    
    rubric
    
    registration
    
    trace
    
    usage
    
    manual
    manual
    
    exceptions
    
    helpers
    
    manager
    
    message_utils
    
    registration
    
    traces
    
    benchmark
    benchmark
    
    authoring
    authoring
    
    answers
    answers
    
    karenina.benchmark.authoring.answers
    
    builder
    
    generator
    
    generator_code
    
    generator_prompts
    
    reader
    
    questions
    questions
    
    karenina.benchmark.authoring.questions
    
    extractor
    
    reader
    
    core
    core
    
    base
    
    exports
    
    metadata
    
    question_query
    
    questions
    
    results
    
    results_io
    
    rubrics
    
    templates
    
    verification_manager
    
    task_eval
    task_eval
    
    helpers
    
    models
    
    task_eval
    
    verification
    verification
    
    batch_runner
    
    evaluators
    evaluators
    
    karenina.benchmark.verification.evaluators
    
    rubric
    rubric
    
    karenina.benchmark.verification.evaluators.rubric
    
    agentic_trait
    
    deep_judgment
    
    evaluator
    
    llm_trait
    
    metric_trait
    
    template
    template
    
    karenina.benchmark.verification.evaluators.template
    
    deep_judgment
    
    evaluator
    
    results
    
    trace
    trace
    
    karenina.benchmark.verification.evaluators.trace
    
    abstention
    
    sufficiency
    
    executor
    
    prompts
    prompts
    
    karenina.benchmark.verification.prompts
    
    assembler
    
    deep_judgment
    deep_judgment
    
    karenina.benchmark.verification.prompts.deep_judgment
    
    rubric
    rubric
    
    karenina.benchmark.verification.prompts.deep_judgment.rubric
    
    deep_judgment
    
    template
    template
    
    karenina.benchmark.verification.prompts.deep_judgment.template
    
    excerpt
    
    hallucination
    
    reasoning
    
    parsing
    parsing
    
    karenina.benchmark.verification.prompts.parsing
    
    parsing_instructions
    
    rubric
    rubric
    
    karenina.benchmark.verification.prompts.rubric
    
    literal_trait
    
    llm_trait
    
    metric_trait
    
    presence_check
    
    task_types
    
    trace
    trace
    
    karenina.benchmark.verification.prompts.trace
    
    abstention
    
    sufficiency
    
    runner
    
    stages
    stages
    
    karenina.benchmark.verification.stages
    
    core
    core
    
    karenina.benchmark.verification.stages.core
    
    autofail_stage_base
    
    base
    
    check_stage_base
    
    orchestrator
    
    helpers
    helpers
    
    karenina.benchmark.verification.stages.helpers
    
    deep_judgment_helpers
    
    results_exporter
    
    pipeline
    pipeline
    
    karenina.benchmark.verification.stages.pipeline
    
    abstention_check
    
    agentic_parse_template
    
    agentic_rubric_evaluation
    
    deep_judgment_autofail
    
    deep_judgment_rubric_auto_fail
    
    embedding_check
    
    finalize_result
    
    generate_answer
    
    parse_template
    
    recursion_limit_autofail
    
    rubric_evaluation
    
    sufficiency_check
    
    trace_validation_autofail
    
    validate_template
    
    verify_template
    
    utils
    utils
    
    karenina.benchmark.verification.utils
    
    cache_helpers
    
    class_discovery
    
    embedding_check
    
    error_helpers
    
    llm_invocation
    
    llm_judge_helpers
    
    resource_helpers
    
    schema_builder
    
    search_helpers
    
    search_provider
    
    search_tavily
    
    storage_helpers
    
    task_helpers
    
    template_converter
    
    template_parsing_helpers
    
    template_validation
    
    trace_agent_metrics
    
    trace_formatting
    
    trace_fuzzy_match
    
    trace_parsing
    
    trace_usage_tracker
    
    cli
    cli
    
    interactive
    
    optimize
    
    preset
    
    serve
    
    status
    
    utils
    
    verify
    
    verify_config
    
    verify_output
    
    integrations
    integrations
    
    adele
    adele
    
    karenina.integrations.adele
    
    classifier
    
    parser
    
    prompts
    prompts
    
    karenina.integrations.adele.prompts
    
    system
    
    user
    
    schemas
    
    traits
    
    gepa
    gepa
    
    karenina.integrations.gepa
    
    adapter
    
    config
    
    data_types
    
    export
    
    feedback
    
    prompts
    prompts
    
    karenina.integrations.gepa.prompts
    
    defaults
    
    feedback
    
    scoring
    
    splitting
    
    tracking
    
    verbose
    
    scenario
    scenario
    
    builder
    
    checkpoint
    
    edge_resolution
    
    manager
    
    outcome_evaluation
    
    source_extraction
    
    sugar
    
    validation
    
    schemas
    schemas
    
    primitives
    primitives
    
    comparisons
    
    composition
    
    normalizers
    
    registry
    
    scope
    
    trace
    
    scenario
    scenario
    
    checks
    
    definition
    
    state
    
    types
    
    shared
    shared
    
    search
    
    storage
    storage
    
    auto_mapper
    
    base
    
    converters
    
    db_config
    
    engine
    
    generated_models
    
    migrate_question_keywords
    
    migrate_template_id
    
    models
    
    operations
    
    queries
    
    rubric_serializer
    
    utils
    
    views
    views
    
    karenina.storage.views
    
    combination_info
    
    deep_judgment_rubric_traits
    
    models_used
    
    question_attributes
    
    raw_llm_answers
    
    result_mcp_servers
    
    result_tools_used
    
    results_metadata
    
    rubric_traits
    
    template_attributes
    
    template_results
    
    utils
    
    utils
    utils
    
    answer_cache
    
    checkpoint
    
    checkpoint_trait_converters
    
    code
    
    errors
    
    file_ops
    
    json_extraction
    
    mcp
    mcp
    
    karenina.utils.mcp
    
    client
    
    tools
    
    messages
    
    progressive_save
    
    retry
    
    testing
    
    version
Advanced
Advanced
- Advanced
- Pipeline
  Pipeline
- Adapters
  Adapters
Contributing
Contributing
- Contributing

Karenina Documentation¶

Redirecting to What is Karenina?...

Made with Material for MkDocs