Getting Started¶
This section covers everything you need to go from zero to a working evaluation: how to install Karenina, hands-on quickstarts for all three evaluation modes, and how to set up your workspace.
In This Section¶
| Page | What You'll Learn |
|---|---|
| Installation | Requirements, install commands, optional dependencies |
| Quick Start: Q/A Benchmark | Hands-on walkthrough from zero to a working benchmark |
| Quick Start: Scenarios | Build a multi-turn evaluation with branching and outcome criteria |
| Quick Start: TaskEval | Evaluate pre-recorded outputs (agent traces, external text) |
| Workspace Init | Set up a project directory with karenina init |
Recommended Reading Order¶
If you're new to Karenina, read these pages in order:
- Installation: Install Karenina and set up API keys
- Quick Start: Q/A Benchmark: Run your first single-turn benchmark end-to-end
- Workspace Init: Set up a project directory with
karenina init
If your goal is multi-turn evaluation (sycophancy testing, error correction, progressive disclosure), start with Quick Start: Scenarios after installation.
If your goal is evaluating existing outputs (agent traces, external text) rather than creating benchmarks, start with Quick Start: TaskEval after installation.
Once you're comfortable, move on to Core Concepts for a deeper understanding of checkpoints, templates, rubrics, and adapters.