Skip to content

Workflows

This section is about doing — step-by-step guides that take you from a starting point to a concrete outcome. Each page assumes you understand the relevant concepts (or links to them) and focuses on how to accomplish a specific task. For the underlying mental models, see Core Concepts. For exhaustive option tables, see Reference.


In This Section

Workflow What You'll Do
Configuration Set up the configuration hierarchy: CLI args, presets, environment variables, defaults
Evaluating with TaskEval Evaluate pre-recorded agent traces against templates and rubrics
Scenarios Build and run a multi-turn scenario benchmark with branching paths and outcome criteria
Creating Benchmarks Author questions, write templates, define rubrics, and save checkpoints
Running Verification Configure and execute evaluation via Python API or CLI
Analyzing Results Inspect results, build DataFrames, export data, and iterate

End-to-End Flows

Benchmark Flow (closed-loop)

Configure            Create Benchmark          Run Verification          Analyze Results
──────────────  →    ─────────────────    →    ─────────────────    →    ─────────────────
Set up env vars      Create checkpoint         Load benchmark            Explore result structure
Configure presets    Add questions             Configure models          Filter and group
                     Write templates           Choose eval mode          Build DataFrames
                     Define rubrics            Execute pipeline          Export and iterate
                     Save (.jsonld)            Collect results

Scenario Flow (multi-turn, closed-loop)

Configure            Build Scenario Graph      Run Verification          Inspect Outcomes
──────────────  →    ─────────────────    →    ─────────────────    →    ─────────────────
Set up env vars      Define nodes (questions)  Load benchmark            Evaluate outcome criteria
Configure models     Define edges (conditions) Configure models          Inspect per-turn results
                     Add outcome criteria      Execute pipeline          Export results
                     add_scenario()            Collect results

TaskEval Flow (open-loop)

Configure            Log Outputs               Attach Criteria           Evaluate & Inspect
──────────────  →    ─────────────────    →    ─────────────────    →    ─────────────────
Set up env vars      log() plain text          add_template(Answer)      evaluate(config)
Configure models     log_trace() messages      add_rubric(Rubric)        Inspect results
                     Scope by step_id          Scope by step_id          Export JSON/Markdown

Each workflow section has an overview page with a visual diagram, followed by dedicated pages for each step. Pages include executable notebook examples where applicable.


Prerequisites

Before starting these workflows, make sure you've completed the Getting Started section — particularly Installation. Configuration is covered in the first subsection below.