Skip to content
Karenina Documentation
Home
Initializing search
biocypher/karenina
Home
Getting Started
Core Concepts
Workflows
Reference
Advanced
Contributing
Karenina Documentation
biocypher/karenina
Home
Home
Philosophy
Reading Paths
Getting Started
Getting Started
Installation
Quick Start
Quick Start: Scenarios
Quick Start: TaskEval
Workspace Init
Core Concepts
Core Concepts
Questions & Benchmarks
Questions & Benchmarks
Benchmarks: The Reproducible Package
Questions: The Heart of Evaluation
Checkpoints: The Memory of Evaluation
TaskEval
Answer Templates
Verification Primitives
Rubric Traits
Rubric Traits
LLM Rubric Traits
Regex Traits
Callable Traits
Metric Traits
Agentic Rubric Traits
Scenarios
Scenarios
Building Scenarios
Outcome Criteria
State and Routing
Templates vs Rubrics
Evaluation Modes
Results and Scoring
Question Context
Question Context
ADeLe: Question Classification
Few-Shot Examples
Pipeline
Pipeline
Verification Pipeline
Prompt Assembly
Agentic Evaluation
Infrastructure
Infrastructure
Adapters
MCP Integration
Manual Interface
Workflows
Workflows
Evaluating with TaskEval
Configuration
Configuration
Environment Variables
Presets
Creating Benchmarks
Creating Benchmarks
Factual QA Benchmark
Full Evaluation Benchmark
Quality Assessment Benchmark
Choosing the Right Rubric Trait Type
Scaled Authoring
Running Verification
Running Verification
Basic Verification
Full Evaluation
Multi-Model Comparison
Deep Judgment
MCP Agent Evaluation
Manual Interface
Agentic Evaluation
Analyzing Results
Analyzing Results
Verification Result
DataFrame Analysis
Exporting
Iterating
Scenarios
Scenarios
Sycophancy Detection: A Scenario Tutorial
Reference
Reference
Configuration Reference
Configuration Reference
Environment Variables
VerificationConfig
ModelConfig
PromptConfig
Preset Schema
CLI Reference
CLI Reference
verify
verify-status
preset
serve
init
optimize
API Reference
API Reference
Core API
Core API
karenina
karenina
karenina
adapters
adapters
karenina.adapters
claude_agent_sdk
claude_tool
factory
langchain
langchain_deep_agents
llm_parallel
manual
registry
benchmark
benchmark
karenina.benchmark
authoring
benchmark
benchmark_helpers
core
task_eval
verification
cli
exceptions
integrations
ports
ports
karenina.ports
adapter_instruction
agent
capabilities
errors
llm
messages
parser
usage
scenario
schemas
schemas
karenina.schemas
checkpoint
config
config
karenina.schemas.config
models
dataframes
dataframes
karenina.schemas.dataframes
judgment
rubric
template
entities
entities
karenina.schemas.entities
answer
composition
question
rubric
template_spec
verified_field
outputs
outputs
karenina.schemas.outputs
rubric
primitives
results
results
karenina.schemas.results
aggregation
judgment
rubric
rubric_judgment
template
verification_result_set
scenario
shared
verification
verification
karenina.schemas.verification
api_models
config
config_presets
job
model_identity
prompt_config
result
result_components
storage
utils
Internals
Internals
karenina
karenina
adapters
adapters
claude_agent_sdk
claude_agent_sdk
agent
availability
errors
llm
mcp
messages
parser
prompts
prompts
karenina.adapters.claude_agent_sdk.prompts
deep_judgment
parsing
rubric
registration
trace
usage
claude_tool
claude_tool
agent
llm
mcp
messages
parser
prompts
prompts
karenina.adapters.claude_tool.prompts
deep_judgment
parsing
rubric
registration
tools
trace
usage
langchain
langchain
agent
initialization
llm
mcp
messages
middleware
models
parser
prompts
prompts
karenina.adapters.langchain.prompts
deep_judgment
feedback_format
feedback_null
format_instructions
parsing
rubric
summarization
registration
trace
usage
langchain_deep_agents
langchain_deep_agents
agent
availability
errors
initialization
llm
mcp
messages
parser
prompts
prompts
karenina.adapters.langchain_deep_agents.prompts
deep_judgment
parsing
rubric
registration
trace
usage
manual
manual
exceptions
helpers
manager
message_utils
registration
traces
benchmark
benchmark
authoring
authoring
answers
answers
karenina.benchmark.authoring.answers
builder
generator
generator_code
generator_prompts
reader
questions
questions
karenina.benchmark.authoring.questions
extractor
reader
core
core
base
exports
metadata
question_query
questions
results
results_io
rubrics
templates
verification_manager
task_eval
task_eval
helpers
models
task_eval
verification
verification
batch_runner
evaluators
evaluators
karenina.benchmark.verification.evaluators
rubric
rubric
karenina.benchmark.verification.evaluators.rubric
agentic_trait
deep_judgment
evaluator
llm_trait
metric_trait
template
template
karenina.benchmark.verification.evaluators.template
deep_judgment
evaluator
results
trace
trace
karenina.benchmark.verification.evaluators.trace
abstention
sufficiency
executor
prompts
prompts
karenina.benchmark.verification.prompts
assembler
deep_judgment
deep_judgment
karenina.benchmark.verification.prompts.deep_judgment
rubric
rubric
karenina.benchmark.verification.prompts.deep_judgment.rubric
deep_judgment
template
template
karenina.benchmark.verification.prompts.deep_judgment.template
excerpt
hallucination
reasoning
parsing
parsing
karenina.benchmark.verification.prompts.parsing
parsing_instructions
rubric
rubric
karenina.benchmark.verification.prompts.rubric
literal_trait
llm_trait
metric_trait
presence_check
task_types
trace
trace
karenina.benchmark.verification.prompts.trace
abstention
sufficiency
runner
stages
stages
karenina.benchmark.verification.stages
core
core
karenina.benchmark.verification.stages.core
autofail_stage_base
base
check_stage_base
orchestrator
helpers
helpers
karenina.benchmark.verification.stages.helpers
deep_judgment_helpers
results_exporter
pipeline
pipeline
karenina.benchmark.verification.stages.pipeline
abstention_check
agentic_parse_template
agentic_rubric_evaluation
deep_judgment_autofail
deep_judgment_rubric_auto_fail
embedding_check
finalize_result
generate_answer
parse_template
recursion_limit_autofail
rubric_evaluation
sufficiency_check
trace_validation_autofail
validate_template
verify_template
utils
utils
karenina.benchmark.verification.utils
cache_helpers
class_discovery
embedding_check
error_helpers
llm_invocation
llm_judge_helpers
resource_helpers
schema_builder
search_helpers
search_provider
search_tavily
storage_helpers
task_helpers
template_converter
template_parsing_helpers
template_validation
trace_agent_metrics
trace_formatting
trace_fuzzy_match
trace_parsing
trace_usage_tracker
cli
cli
interactive
optimize
preset
serve
status
utils
verify
verify_config
verify_output
integrations
integrations
adele
adele
karenina.integrations.adele
classifier
parser
prompts
prompts
karenina.integrations.adele.prompts
system
user
schemas
traits
gepa
gepa
karenina.integrations.gepa
adapter
config
data_types
export
feedback
prompts
prompts
karenina.integrations.gepa.prompts
defaults
feedback
scoring
splitting
tracking
verbose
scenario
scenario
builder
checkpoint
edge_resolution
manager
outcome_evaluation
source_extraction
sugar
validation
schemas
schemas
primitives
primitives
comparisons
composition
normalizers
registry
scope
trace
scenario
scenario
checks
definition
state
types
shared
shared
search
storage
storage
auto_mapper
base
converters
db_config
engine
generated_models
migrate_question_keywords
migrate_template_id
models
operations
queries
rubric_serializer
utils
views
views
karenina.storage.views
combination_info
deep_judgment_rubric_traits
models_used
question_attributes
raw_llm_answers
result_mcp_servers
result_tools_used
results_metadata
rubric_traits
template_attributes
template_results
utils
utils
utils
answer_cache
checkpoint
checkpoint_trait_converters
code
errors
file_ops
json_extraction
mcp
mcp
karenina.utils.mcp
client
tools
messages
progressive_save
retry
testing
version
Advanced
Advanced
Advanced
Pipeline
Pipeline
Stages
Deep Judgment (Templates)
Deep Judgment (Rubrics)
Prompt Assembly
Custom Stages
Agentic Evaluation
Agentic Rubric Evaluation
Adapters
Adapters
Ports
Available Adapters
MCP Integration
Writing Adapters
Contributing
Contributing
Contributing
Karenina Documentation
¶
Redirecting to
What is Karenina?
...
Back to top