Few-Shot Example Configuration¶
This tutorial shows how to configure few-shot examples for verification runs. Few-shot examples are prepended to the answering model's prompt, showing it the expected response format before it generates its own answer. The judge LLM never sees these examples. Use few-shot when your answering model produces poorly formatted responses, or when you want to demonstrate the expected output structure.
What you'll learn:
- Add few-shot examples when creating questions
- Configure
FewShotConfigwith global modes (all, k-shot, custom, none) - Override per-question with
QuestionFewShotConfigandinheritmode - Add global external examples
- Select examples by index with
from_index_selections() - Resolve final examples with
resolve_examples_for_question() - Attach
FewShotConfigtoVerificationConfig
How Few-Shot Works¶
Few-shot examples are injected into the answering model's prompt only. The judge, rubric evaluators, and all other pipeline stages never see them:
Few-shot examples + Question --> Answering model prompt
Response only (no examples) --> Judge model prompt
This means few-shot examples influence how the model responds without biasing evaluation.
Add Examples to Questions¶
Each example is a dict with "question" and "answer" keys, passed via benchmark.add_question():
benchmark = Benchmark.create(name="Drug QA", description="Few-shot demo", version="1.0.0")
examples = [
{"question": "What is the target of imatinib?", "answer": "BCR-ABL tyrosine kinase"},
{"question": "What is the target of trastuzumab?", "answer": "HER2 (ErbB2) receptor"},
{"question": "What is the target of rituximab?", "answer": "CD20 protein on B cells"},
]
q_id = benchmark.add_question(
question="What is the target of venetoclax?",
raw_answer="BCL2",
few_shot_examples=examples,
)
q_data = benchmark.get_question(q_id)
print(f"Question: {q_data['question'][:50]}")
print(f"Few-shot examples: {len(q_data['few_shot_examples'])}")
Question: What is the target of venetoclax? Few-shot examples: 3
config_all = FewShotConfig(global_mode="all")
resolved = config_all.resolve_examples_for_question(
question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)
print(f"Mode: {config_all.global_mode}")
print(f"Available: {len(_examples_by_qid[_q2_id])}, Resolved: {len(resolved)}")
for ex in resolved:
print(f" Q: {ex['question'][:45]} A: {ex['answer']}")
Mode: all Available: 4, Resolved: 4 Q: What is the half-life of aspirin? A: 15 to 20 minutes Q: What is the half-life of metformin? A: Approximately 6.2 hours Q: What is the half-life of warfarin? A: 20 to 60 hours Q: What is the half-life of amoxicillin? A: About 1 hour
This works well when you have a small, curated set (2 to 5 examples per question).
Global Mode: K-Shot¶
K-shot randomly samples k examples per question, using the question ID as seed for reproducibility:
config_kshot = FewShotConfig(global_mode="k-shot", global_k=2)
resolved = config_kshot.resolve_examples_for_question(
question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)
print(f"Mode: {config_kshot.global_mode}, k={config_kshot.global_k}")
print(f"Available: {len(_examples_by_qid[_q2_id])}, Resolved: {len(resolved)}")
for ex in resolved:
print(f" Q: {ex['question'][:45]} A: {ex['answer']}")
Mode: k-shot, k=2 Available: 4, Resolved: 2 Q: What is the half-life of aspirin? A: 15 to 20 minutes Q: What is the half-life of metformin? A: Approximately 6.2 hours
If a question has fewer examples than k, all examples are used (no error).
Global Mode: Custom¶
Custom mode selects specific examples by index:
config_custom = FewShotConfig.from_index_selections({
_q1_id: [0, 2], # First and third examples
_q2_id: [1], # Second example only
_q4_id: [0, 1, 3], # Skip third example
})
resolved_q1 = config_custom.resolve_examples_for_question(
question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)
print(f"Mode: {config_custom.global_mode}")
print(f"Q1 resolved ({len(resolved_q1)} examples):")
for ex in resolved_q1:
print(f" Q: {ex['question'][:45]} A: {ex['answer']}")
Mode: custom Q1 resolved (2 examples): Q: What is the target of imatinib? A: BCR-ABL tyrosine kinase Q: What is the target of rituximab? A: CD20 protein on B cells
config_none = FewShotConfig(global_mode="none")
resolved = config_none.resolve_examples_for_question(
question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)
print(f"Mode: {config_none.global_mode}")
print(f"Resolved examples: {len(resolved)}")
Mode: none Resolved examples: 0
Use none to establish a zero-shot baseline for comparison.
Per-Question Overrides¶
Each question can override the global mode via QuestionFewShotConfig. Questions without an explicit config inherit the global settings:
config_mixed = FewShotConfig(
global_mode="all",
question_configs={
_q1_id: QuestionFewShotConfig(mode="k-shot", k=1), # Sample 1 example
_q3_id: QuestionFewShotConfig(mode="none"), # Disable for q3
# q2, q4: inherit global "all" mode
},
)
for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
effective = config_mixed.get_effective_config(qid)
resolved = config_mixed.resolve_examples_for_question(
question_id=qid, available_examples=_examples_by_qid[qid],
)
print(f"{label}: mode={effective.mode}, resolved={len(resolved)}")
q1: mode=k-shot, resolved=1 q2: mode=all, resolved=4 q3: mode=none, resolved=0 q4: mode=all, resolved=4
The inherit mode (the default) delegates to the global mode and k value. Override it to customize specific questions while leaving the rest unchanged.
Global External Examples¶
Global external examples are appended to every question's resolved examples, regardless of mode:
config_external = FewShotConfig(
global_mode="k-shot", global_k=1,
global_external_examples=[
{"question": "What is the capital of France?", "answer": "Paris"},
{"question": "What is 2 + 2?", "answer": "4"},
],
)
resolved = config_external.resolve_examples_for_question(
question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)
print(f"Total resolved: {len(resolved)} (1 from k-shot + 2 global external)")
for ex in resolved:
print(f" Q: {ex['question'][:45]} A: {ex['answer']}")
Total resolved: 3 (1 from k-shot + 2 global external) Q: What is the target of trastuzumab? A: HER2 (ErbB2) receptor Q: What is the capital of France? A: Paris Q: What is 2 + 2? A: 4
Resolution order: stored examples first, then question-specific external, then global external. Add question-specific external examples via QuestionFewShotConfig.external_examples.
Bulk Selection¶
from_index_selections() builds custom selections for multiple questions. k_shot_for_questions() creates per-question k values in one call:
config_bulk = FewShotConfig.from_index_selections({
_q1_id: [0, 1], _q2_id: [0, 2, 3], _q3_id: [1], _q4_id: [0, 3],
})
print(f"Custom bulk (mode={config_bulk.global_mode}):")
for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
resolved = config_bulk.resolve_examples_for_question(
question_id=qid, available_examples=_examples_by_qid[qid],
)
print(f" {label}: {len(resolved)} examples selected")
Custom bulk (mode=custom): q1: 2 examples selected q2: 3 examples selected q3: 1 examples selected q4: 2 examples selected
config_varied_k = FewShotConfig.k_shot_for_questions(
question_k_mapping={_q1_id: 1, _q2_id: 3, _q4_id: 2},
global_k=2,
)
print(f"Varied k-shot (mode={config_varied_k.global_mode}):")
for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
effective = config_varied_k.get_effective_config(qid)
print(f" {label}: k={effective.k}")
Varied k-shot (mode=k-shot): q1: k=1 q2: k=3 q3: k=2 q4: k=2
Resolve Examples¶
Call resolve_examples_for_question() to preview exactly what the answering model will see. This combines the global mode, per-question overrides, and external examples into a final list:
config_preview = FewShotConfig(
global_mode="all",
global_external_examples=[
{"question": "Format example", "answer": "Short, precise answer"},
],
question_configs={_q2_id: QuestionFewShotConfig(mode="k-shot", k=2)},
)
resolved_q1 = config_preview.resolve_examples_for_question(
question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)
print(f"Q1 (inherits 'all'): {len(resolved_q1)} examples (3 stored + 1 external)")
resolved_q2 = config_preview.resolve_examples_for_question(
question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)
print(f"Q2 (k-shot, k=2): {len(resolved_q2)} examples (2 sampled + 1 external)")
Q1 (inherits 'all'): 4 examples (3 stored + 1 external) Q2 (k-shot, k=2): 3 examples (2 sampled + 1 external)
Use this to verify your configuration before running a full verification pass.
Attach to Verification¶
Pass the FewShotConfig to VerificationConfig via the few_shot_config field:
few_shot = FewShotConfig(
global_mode="k-shot",
global_k=2,
question_configs={_q3_id: QuestionFewShotConfig(mode="none")},
)
config = VerificationConfig(
answering_models=[
ModelConfig(id="haiku", model_name="claude-haiku-4-5",
model_provider="anthropic", interface="langchain")
],
parsing_models=[
ModelConfig(id="haiku-parser", model_name="claude-haiku-4-5",
model_provider="anthropic", interface="langchain",
temperature=0.0)
],
few_shot_config=few_shot,
)
print(f"Few-shot enabled: {config.few_shot_config.enabled}")
print(f"Global mode: {config.few_shot_config.global_mode}")
print(f"Global k: {config.few_shot_config.global_k}")
print(f"Per-question: {len(config.few_shot_config.question_configs)} overrides")
Few-shot enabled: True Global mode: k-shot Global k: 2 Per-question: 1 overrides
When few_shot_config is None (the default) or enabled=False, no examples are prepended.
Tuning Strategy¶
- Start with
global_mode="none"to establish a zero-shot baseline - If the answering model produces poorly formatted responses, add 2 to 3 examples per question
- Use
resolve_examples_for_question()to preview before running full verification - Increase k incrementally; more examples increase prompt cost without guaranteed improvement
- Use per-question overrides for questions where the global strategy underperforms
- Compare zero-shot and few-shot results side by side to confirm examples actually help
Next Steps¶
- Few-Shot Concepts: Detailed explanation of modes, resolution, and edge cases
- Prompt Assembly: How few-shot examples are injected into the answering prompt
- VerificationConfig Reference: All configuration fields
- Basic Verification: Simplest verification workflow
- Full Evaluation: Template and rubric evaluation with quality checks