Few-Shot Example Configuration¶

This tutorial shows how to configure few-shot examples for verification runs. Few-shot examples are prepended to the answering model's prompt, showing it the expected response format before it generates its own answer. The judge LLM never sees these examples. Use few-shot when your answering model produces poorly formatted responses, or when you want to demonstrate the expected output structure.

What you'll learn:

Add few-shot examples when creating questions
Configure FewShotConfig with global modes (all, k-shot, custom, none)
Override per-question with QuestionFewShotConfig and inherit mode
Add global external examples
Select examples by index with from_index_selections()
Resolve final examples with resolve_examples_for_question()
Attach FewShotConfig to VerificationConfig

How Few-Shot Works¶

Few-shot examples are injected into the answering model's prompt only. The judge, rubric evaluators, and all other pipeline stages never see them:

Few-shot examples + Question --> Answering model prompt
Response only (no examples) --> Judge model prompt

This means few-shot examples influence how the model responds without biasing evaluation.

Add Examples to Questions¶

Each example is a dict with "question" and "answer" keys, passed via benchmark.add_question():

In [2]:

Copied!





benchmark = Benchmark.create(name="Drug QA", description="Few-shot demo", version="1.0.0")

examples = [
    {"question": "What is the target of imatinib?", "answer": "BCR-ABL tyrosine kinase"},
    {"question": "What is the target of trastuzumab?", "answer": "HER2 (ErbB2) receptor"},
    {"question": "What is the target of rituximab?", "answer": "CD20 protein on B cells"},
]
q_id = benchmark.add_question(
    question="What is the target of venetoclax?",
    raw_answer="BCL2",
    few_shot_examples=examples,
)

q_data = benchmark.get_question(q_id)
print(f"Question: {q_data['question'][:50]}")
print(f"Few-shot examples: {len(q_data['few_shot_examples'])}")
benchmark = Benchmark.create(name="Drug QA", description="Few-shot demo", version="1.0.0")

examples = [
    {"question": "What is the target of imatinib?", "answer": "BCR-ABL tyrosine kinase"},
    {"question": "What is the target of trastuzumab?", "answer": "HER2 (ErbB2) receptor"},
    {"question": "What is the target of rituximab?", "answer": "CD20 protein on B cells"},
]
q_id = benchmark.add_question(
    question="What is the target of venetoclax?",
    raw_answer="BCL2",
    few_shot_examples=examples,
)

q_data = benchmark.get_question(q_id)
print(f"Question: {q_data['question'][:50]}")
print(f"Few-shot examples: {len(q_data['few_shot_examples'])}")

Question: What is the target of venetoclax?
Few-shot examples: 3

Global Mode: All¶

The default mode uses every example attached to each question:

In [3]:

Copied!





config_all = FewShotConfig(global_mode="all")
resolved = config_all.resolve_examples_for_question(
    question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)

print(f"Mode: {config_all.global_mode}")
print(f"Available: {len(_examples_by_qid[_q2_id])}, Resolved: {len(resolved)}")
for ex in resolved:
    print(f"  Q: {ex['question'][:45]}  A: {ex['answer']}")

config_all = FewShotConfig(global_mode="all")
resolved = config_all.resolve_examples_for_question(
    question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)

print(f"Mode: {config_all.global_mode}")
print(f"Available: {len(_examples_by_qid[_q2_id])}, Resolved: {len(resolved)}")
for ex in resolved:
    print(f"  Q: {ex['question'][:45]}  A: {ex['answer']}")

Mode: all
Available: 4, Resolved: 4
  Q: What is the half-life of aspirin?  A: 15 to 20 minutes
  Q: What is the half-life of metformin?  A: Approximately 6.2 hours
  Q: What is the half-life of warfarin?  A: 20 to 60 hours
  Q: What is the half-life of amoxicillin?  A: About 1 hour

This works well when you have a small, curated set (2 to 5 examples per question).

Global Mode: K-Shot¶

K-shot randomly samples k examples per question, using the question ID as seed for reproducibility:

In [4]:

Copied!





config_kshot = FewShotConfig(global_mode="k-shot", global_k=2)
resolved = config_kshot.resolve_examples_for_question(
    question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)

print(f"Mode: {config_kshot.global_mode}, k={config_kshot.global_k}")
print(f"Available: {len(_examples_by_qid[_q2_id])}, Resolved: {len(resolved)}")
for ex in resolved:
    print(f"  Q: {ex['question'][:45]}  A: {ex['answer']}")
config_kshot = FewShotConfig(global_mode="k-shot", global_k=2)
resolved = config_kshot.resolve_examples_for_question(
    question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)

print(f"Mode: {config_kshot.global_mode}, k={config_kshot.global_k}")
print(f"Available: {len(_examples_by_qid[_q2_id])}, Resolved: {len(resolved)}")
for ex in resolved:
    print(f"  Q: {ex['question'][:45]}  A: {ex['answer']}")

Mode: k-shot, k=2
Available: 4, Resolved: 2
  Q: What is the half-life of aspirin?  A: 15 to 20 minutes
  Q: What is the half-life of metformin?  A: Approximately 6.2 hours

If a question has fewer examples than k, all examples are used (no error).

Global Mode: Custom¶

Custom mode selects specific examples by index:

In [5]:

Copied!





config_custom = FewShotConfig.from_index_selections({
    _q1_id: [0, 2],       # First and third examples
    _q2_id: [1],           # Second example only
    _q4_id: [0, 1, 3],    # Skip third example
})
resolved_q1 = config_custom.resolve_examples_for_question(
    question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)

print(f"Mode: {config_custom.global_mode}")
print(f"Q1 resolved ({len(resolved_q1)} examples):")
for ex in resolved_q1:
    print(f"  Q: {ex['question'][:45]}  A: {ex['answer']}")
config_custom = FewShotConfig.from_index_selections({
    _q1_id: [0, 2],       # First and third examples
    _q2_id: [1],           # Second example only
    _q4_id: [0, 1, 3],    # Skip third example
})
resolved_q1 = config_custom.resolve_examples_for_question(
    question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)

print(f"Mode: {config_custom.global_mode}")
print(f"Q1 resolved ({len(resolved_q1)} examples):")
for ex in resolved_q1:
    print(f"  Q: {ex['question'][:45]}  A: {ex['answer']}")

Mode: custom
Q1 resolved (2 examples):
  Q: What is the target of imatinib?  A: BCR-ABL tyrosine kinase
  Q: What is the target of rituximab?  A: CD20 protein on B cells

Global Mode: None¶

Disabling few-shot entirely returns an empty list for all questions:

In [6]:

Copied!





config_none = FewShotConfig(global_mode="none")
resolved = config_none.resolve_examples_for_question(
    question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)

print(f"Mode: {config_none.global_mode}")
print(f"Resolved examples: {len(resolved)}")
config_none = FewShotConfig(global_mode="none")
resolved = config_none.resolve_examples_for_question(
    question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)

print(f"Mode: {config_none.global_mode}")
print(f"Resolved examples: {len(resolved)}")

Mode: none
Resolved examples: 0

Use none to establish a zero-shot baseline for comparison.

Per-Question Overrides¶

Each question can override the global mode via QuestionFewShotConfig. Questions without an explicit config inherit the global settings:

In [7]:

Copied!





config_mixed = FewShotConfig(
    global_mode="all",
    question_configs={
        _q1_id: QuestionFewShotConfig(mode="k-shot", k=1),   # Sample 1 example
        _q3_id: QuestionFewShotConfig(mode="none"),            # Disable for q3
        # q2, q4: inherit global "all" mode
    },
)

for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
    effective = config_mixed.get_effective_config(qid)
    resolved = config_mixed.resolve_examples_for_question(
        question_id=qid, available_examples=_examples_by_qid[qid],
    )
    print(f"{label}: mode={effective.mode}, resolved={len(resolved)}")

config_mixed = FewShotConfig(
    global_mode="all",
    question_configs={
        _q1_id: QuestionFewShotConfig(mode="k-shot", k=1),   # Sample 1 example
        _q3_id: QuestionFewShotConfig(mode="none"),            # Disable for q3
        # q2, q4: inherit global "all" mode
    },
)

for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
    effective = config_mixed.get_effective_config(qid)
    resolved = config_mixed.resolve_examples_for_question(
        question_id=qid, available_examples=_examples_by_qid[qid],
    )
    print(f"{label}: mode={effective.mode}, resolved={len(resolved)}")

q1: mode=k-shot, resolved=1
q2: mode=all, resolved=4
q3: mode=none, resolved=0
q4: mode=all, resolved=4

The inherit mode (the default) delegates to the global mode and k value. Override it to customize specific questions while leaving the rest unchanged.

Global External Examples¶

Global external examples are appended to every question's resolved examples, regardless of mode:

In [8]:

Copied!





config_external = FewShotConfig(
    global_mode="k-shot", global_k=1,
    global_external_examples=[
        {"question": "What is the capital of France?", "answer": "Paris"},
        {"question": "What is 2 + 2?", "answer": "4"},
    ],
)
resolved = config_external.resolve_examples_for_question(
    question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)

print(f"Total resolved: {len(resolved)} (1 from k-shot + 2 global external)")
for ex in resolved:
    print(f"  Q: {ex['question'][:45]}  A: {ex['answer']}")
config_external = FewShotConfig(
    global_mode="k-shot", global_k=1,
    global_external_examples=[
        {"question": "What is the capital of France?", "answer": "Paris"},
        {"question": "What is 2 + 2?", "answer": "4"},
    ],
)
resolved = config_external.resolve_examples_for_question(
    question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)

print(f"Total resolved: {len(resolved)} (1 from k-shot + 2 global external)")
for ex in resolved:
    print(f"  Q: {ex['question'][:45]}  A: {ex['answer']}")

Total resolved: 3 (1 from k-shot + 2 global external)
  Q: What is the target of trastuzumab?  A: HER2 (ErbB2) receptor
  Q: What is the capital of France?  A: Paris
  Q: What is 2 + 2?  A: 4

Resolution order: stored examples first, then question-specific external, then global external. Add question-specific external examples via QuestionFewShotConfig.external_examples.

Bulk Selection¶

from_index_selections() builds custom selections for multiple questions. k_shot_for_questions() creates per-question k values in one call:

In [9]:

Copied!





config_bulk = FewShotConfig.from_index_selections({
    _q1_id: [0, 1], _q2_id: [0, 2, 3], _q3_id: [1], _q4_id: [0, 3],
})
print(f"Custom bulk (mode={config_bulk.global_mode}):")
for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
    resolved = config_bulk.resolve_examples_for_question(
        question_id=qid, available_examples=_examples_by_qid[qid],
    )
    print(f"  {label}: {len(resolved)} examples selected")
config_bulk = FewShotConfig.from_index_selections({
    _q1_id: [0, 1], _q2_id: [0, 2, 3], _q3_id: [1], _q4_id: [0, 3],
})
print(f"Custom bulk (mode={config_bulk.global_mode}):")
for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
    resolved = config_bulk.resolve_examples_for_question(
        question_id=qid, available_examples=_examples_by_qid[qid],
    )
    print(f"  {label}: {len(resolved)} examples selected")

Custom bulk (mode=custom):
  q1: 2 examples selected
  q2: 3 examples selected
  q3: 1 examples selected
  q4: 2 examples selected

In [10]:

Copied!





config_varied_k = FewShotConfig.k_shot_for_questions(
    question_k_mapping={_q1_id: 1, _q2_id: 3, _q4_id: 2},
    global_k=2,
)
print(f"Varied k-shot (mode={config_varied_k.global_mode}):")
for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
    effective = config_varied_k.get_effective_config(qid)
    print(f"  {label}: k={effective.k}")
config_varied_k = FewShotConfig.k_shot_for_questions(
    question_k_mapping={_q1_id: 1, _q2_id: 3, _q4_id: 2},
    global_k=2,
)
print(f"Varied k-shot (mode={config_varied_k.global_mode}):")
for qid, label in [(_q1_id, "q1"), (_q2_id, "q2"), (_q3_id, "q3"), (_q4_id, "q4")]:
    effective = config_varied_k.get_effective_config(qid)
    print(f"  {label}: k={effective.k}")

Varied k-shot (mode=k-shot):
  q1: k=1
  q2: k=3
  q3: k=2
  q4: k=2

Resolve Examples¶

Call resolve_examples_for_question() to preview exactly what the answering model will see. This combines the global mode, per-question overrides, and external examples into a final list:

In [11]:

Copied!





config_preview = FewShotConfig(
    global_mode="all",
    global_external_examples=[
        {"question": "Format example", "answer": "Short, precise answer"},
    ],
    question_configs={_q2_id: QuestionFewShotConfig(mode="k-shot", k=2)},
)

resolved_q1 = config_preview.resolve_examples_for_question(
    question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)
print(f"Q1 (inherits 'all'): {len(resolved_q1)} examples (3 stored + 1 external)")

resolved_q2 = config_preview.resolve_examples_for_question(
    question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)
print(f"Q2 (k-shot, k=2):   {len(resolved_q2)} examples (2 sampled + 1 external)")
config_preview = FewShotConfig(
    global_mode="all",
    global_external_examples=[
        {"question": "Format example", "answer": "Short, precise answer"},
    ],
    question_configs={_q2_id: QuestionFewShotConfig(mode="k-shot", k=2)},
)

resolved_q1 = config_preview.resolve_examples_for_question(
    question_id=_q1_id, available_examples=_examples_by_qid[_q1_id],
)
print(f"Q1 (inherits 'all'): {len(resolved_q1)} examples (3 stored + 1 external)")

resolved_q2 = config_preview.resolve_examples_for_question(
    question_id=_q2_id, available_examples=_examples_by_qid[_q2_id],
)
print(f"Q2 (k-shot, k=2):   {len(resolved_q2)} examples (2 sampled + 1 external)")

Q1 (inherits 'all'): 4 examples (3 stored + 1 external)
Q2 (k-shot, k=2):   3 examples (2 sampled + 1 external)

Use this to verify your configuration before running a full verification pass.

Attach to Verification¶

Pass the FewShotConfig to VerificationConfig via the few_shot_config field:

In [12]:

Copied!





few_shot = FewShotConfig(
    global_mode="k-shot",
    global_k=2,
    question_configs={_q3_id: QuestionFewShotConfig(mode="none")},
)

config = VerificationConfig(
    answering_models=[
        ModelConfig(id="haiku", model_name="claude-haiku-4-5",
                    model_provider="anthropic", interface="langchain")
    ],
    parsing_models=[
        ModelConfig(id="haiku-parser", model_name="claude-haiku-4-5",
                    model_provider="anthropic", interface="langchain",
                    temperature=0.0)
    ],
    few_shot_config=few_shot,
)

print(f"Few-shot enabled: {config.few_shot_config.enabled}")
print(f"Global mode:      {config.few_shot_config.global_mode}")
print(f"Global k:         {config.few_shot_config.global_k}")
print(f"Per-question:     {len(config.few_shot_config.question_configs)} overrides")

few_shot = FewShotConfig(
    global_mode="k-shot",
    global_k=2,
    question_configs={_q3_id: QuestionFewShotConfig(mode="none")},
)

config = VerificationConfig(
    answering_models=[
        ModelConfig(id="haiku", model_name="claude-haiku-4-5",
                    model_provider="anthropic", interface="langchain")
    ],
    parsing_models=[
        ModelConfig(id="haiku-parser", model_name="claude-haiku-4-5",
                    model_provider="anthropic", interface="langchain",
                    temperature=0.0)
    ],
    few_shot_config=few_shot,
)

print(f"Few-shot enabled: {config.few_shot_config.enabled}")
print(f"Global mode:      {config.few_shot_config.global_mode}")
print(f"Global k:         {config.few_shot_config.global_k}")
print(f"Per-question:     {len(config.few_shot_config.question_configs)} overrides")

Few-shot enabled: True
Global mode:      k-shot
Global k:         2
Per-question:     1 overrides

When few_shot_config is None (the default) or enabled=False, no examples are prepended.

Tuning Strategy¶

Start with global_mode="none" to establish a zero-shot baseline
If the answering model produces poorly formatted responses, add 2 to 3 examples per question
Use resolve_examples_for_question() to preview before running full verification
Increase k incrementally; more examples increase prompt cost without guaranteed improvement
Use per-question overrides for questions where the global strategy underperforms
Compare zero-shot and few-shot results side by side to confirm examples actually help

Next Steps¶

Few-Shot Concepts: Detailed explanation of modes, resolution, and edge cases
Prompt Assembly: How few-shot examples are injected into the answering prompt
VerificationConfig Reference: All configuration fields
Basic Verification: Simplest verification workflow
Full Evaluation: Template and rubric evaluation with quality checks