Workflows¶
This section is about doing — step-by-step guides that take you from a starting point to a concrete outcome. Each page assumes you understand the relevant concepts (or links to them) and focuses on how to accomplish a specific task. For the underlying mental models, see Core Concepts. For exhaustive option tables, see Reference.
In This Section¶
| Workflow | What You'll Do |
|---|---|
| Configuration | Set up the configuration hierarchy: CLI args, presets, environment variables, defaults |
| Evaluating with TaskEval | Evaluate pre-recorded agent traces against templates and rubrics |
| Scenarios | Build and run a multi-turn scenario benchmark with branching paths and outcome criteria |
| Creating Benchmarks | Author questions, write templates, define rubrics, and save checkpoints |
| Running Verification | Configure and execute evaluation via Python API or CLI |
| Analyzing Results | Inspect results, build DataFrames, export data, and iterate |
End-to-End Flows¶
Benchmark Flow (closed-loop)¶
Configure Create Benchmark Run Verification Analyze Results
────────────── → ───────────────── → ───────────────── → ─────────────────
Set up env vars Create checkpoint Load benchmark Explore result structure
Configure presets Add questions Configure models Filter and group
Write templates Choose eval mode Build DataFrames
Define rubrics Execute pipeline Export and iterate
Save (.jsonld) Collect results
Scenario Flow (multi-turn, closed-loop)¶
Configure Build Scenario Graph Run Verification Inspect Outcomes
────────────── → ───────────────── → ───────────────── → ─────────────────
Set up env vars Define nodes (questions) Load benchmark Evaluate outcome criteria
Configure models Define edges (conditions) Configure models Inspect per-turn results
Add outcome criteria Execute pipeline Export results
add_scenario() Collect results
TaskEval Flow (open-loop)¶
Configure Log Outputs Attach Criteria Evaluate & Inspect
────────────── → ───────────────── → ───────────────── → ─────────────────
Set up env vars log() plain text add_template(Answer) evaluate(config)
Configure models log_trace() messages add_rubric(Rubric) Inspect results
Scope by step_id Scope by step_id Export JSON/Markdown
Each workflow section has an overview page with a visual diagram, followed by dedicated pages for each step. Pages include executable notebook examples where applicable.
Prerequisites¶
Before starting these workflows, make sure you've completed the Getting Started section — particularly Installation. Configuration is covered in the first subsection below.