Prompt Assembly System¶
Every LLM call in the verification pipeline uses a tri-section prompt pattern. Three independent instruction sources are combined into the final prompt messages:
- Task instructions --- the base system and user text for a specific pipeline call (e.g., parsing, abstention detection, rubric evaluation)
- Adapter instructions --- text appended based on the LLM backend (e.g., LangChain appends JSON schema formatting; Claude Tool appends extraction directives)
- User instructions --- optional per-task text from
PromptConfig
This separation keeps each concern isolated: task prompt builders don't know about adapters, adapters don't know about user overrides, and user instructions don't depend on either.
How It Works¶
Task Instructions Adapter Instructions User Instructions
(TemplatePromptBuilder, (AdapterInstruction- (PromptConfig)
AbstentionPrompt, etc.) Registry)
system_text ──┐
├──► + adapter.system_addition ──► + user_instructions ──► Final system text
user_text ────┤
└──► + adapter.user_addition ──────────────────────────► Final user text
All instruction sources are append-only: adapter text is appended first, then user instructions. The final texts are either wrapped in Message objects or returned as raw strings depending on the call site.
PromptAssembler¶
PromptAssembler is the single entry point that combines all three instruction sections. It is a dataclass with three fields:
| Field | Type | Description |
|---|---|---|
task |
PromptTask |
Identifies which pipeline LLM call this is for |
interface |
str |
The adapter interface name (e.g., "langchain", "claude_tool") |
capabilities |
PortCapabilities |
The adapter's declared capabilities |
Methods¶
assemble(system_text, user_text, user_instructions, instruction_context) --- Returns list[Message]
The primary method. Applies adapter and user instructions, then builds Message objects. If the adapter does not support system prompts (capabilities.supports_system_prompt == False), the system text is prepended to the user text as a single user message.
assemble_text(system_text, user_text, user_instructions, instruction_context) --- Returns tuple[str, str]
Same tri-section logic but returns raw (system_text, user_text) strings instead of Message objects. Used by multi-stage flows (e.g., deep judgment) that need intermediate text processing before final message construction.
Assembly Order¶
Both methods follow the same internal sequence:
- Look up adapter instructions from the
AdapterInstructionRegistryfor the(interface, task)pair - Append adapter additions --- each registered factory produces an
AdapterInstructionwithsystem_additionanduser_additionproperties; non-empty additions are appended to their respective texts - Append user instructions --- if
user_instructionsis provided (fromPromptConfig), it is appended to the system text - Build messages (for
assemble()only) --- wrap the final texts inMessage.system()andMessage.user(), or combine into a singleMessage.user()if system prompts are not supported
Usage Example¶
This is how the template parsing evaluator uses PromptAssembler:
from karenina.benchmark.verification.prompts.assembler import PromptAssembler
from karenina.benchmark.verification.prompts.task_types import PromptTask
# 1. Build task-specific prompts
builder = TemplatePromptBuilder(answer_class=answer_class)
system_prompt = builder.build_system_prompt(has_tool_traces=has_tools)
user_prompt = builder.build_user_prompt(
question_text=question_text,
response_to_parse=raw_response,
)
# 2. Resolve user instructions from PromptConfig
user_instructions = (
prompt_config.get_for_task(PromptTask.PARSING.value)
if prompt_config else None
)
# 3. Assemble all three sections
assembler = PromptAssembler(
task=PromptTask.PARSING,
interface=model_config.interface,
capabilities=parser.capabilities,
)
messages = assembler.assemble(
system_text=system_prompt,
user_text=user_prompt,
user_instructions=user_instructions,
instruction_context={"json_schema": schema, "format_instructions": fmt},
)
The instruction_context dict is passed to adapter instruction factories. Each factory extracts the parameters it needs (e.g., the LangChain parsing instruction uses json_schema and format_instructions; Claude Tool ignores them).
AdapterInstructionRegistry¶
The registry is a class-level mapping from (interface, task) pairs to lists of instruction factories. It provides a global, shared mechanism for adapters to inject prompt modifications without coupling to specific pipeline stages.
API¶
| Method | Description |
|---|---|
register(interface, task, factory) |
Register a factory for an (interface, task) pair |
get_instructions(interface, task) |
Retrieve factories for a pair (empty list if none) |
clear() |
Clear all registrations (for testing) |
AdapterInstruction Protocol¶
Each instruction factory must return an object implementing the AdapterInstruction protocol:
class AdapterInstruction(Protocol):
@property
def system_addition(self) -> str:
"""Text to append to the system prompt (empty string for no addition)."""
...
@property
def user_addition(self) -> str:
"""Text to append to the user prompt (empty string for no addition)."""
...
Instruction Registration¶
Adapters register their instructions in adapters/<name>/prompts/*.py files, which are imported at the bottom of each adapter's registration.py. This ensures instructions are registered when the adapter is loaded.
Example from the LangChain adapter (parsing):
from karenina.ports.adapter_instruction import AdapterInstructionRegistry
def _langchain_format_instruction_factory(**kwargs):
return _LangChainFormatInstruction(
json_schema=kwargs.get("json_schema"),
format_instructions=kwargs.get("format_instructions", ""),
)
AdapterInstructionRegistry.register(
"langchain", "parsing", _langchain_format_instruction_factory
)
# Also register for interfaces that route through LangChain
AdapterInstructionRegistry.register(
"openrouter", "parsing", _langchain_format_instruction_factory
)
AdapterInstructionRegistry.register(
"openai_endpoint", "parsing", _langchain_format_instruction_factory
)
Registered Adapter Instructions¶
The following table shows all registered (interface, task) pairs across the codebase:
| Interface | Task Categories | What It Adds |
|---|---|---|
langchain |
parsing, rubric (*_batch, *_single, metric), deep judgment (dj_*) |
JSON schema, format instructions, parsing notes |
openrouter |
parsing, rubric, deep judgment | Same as langchain (shared factories) |
openai_endpoint |
parsing, rubric, deep judgment | Same as langchain (shared factories) |
claude_tool |
parsing, rubric, deep judgment | Minimal extraction directives (native structured output) |
claude_agent_sdk |
parsing, rubric, deep judgment | Minimal best-interpretation directive (native structured output) |
manual |
(none) | No registered instructions |
The key difference: LangChain-based adapters need explicit JSON schema and format instructions because they lack native structured output. Claude-based adapters (Claude Tool and Claude Agent SDK) use native structured output, so their instructions are minimal --- just extraction or interpretation directives.
PromptTask Values¶
Each PromptTask enum value identifies a distinct LLM call in the pipeline. The task value is used to look up both adapter instructions (via the registry) and user instructions (via PromptConfig.get_for_task()).
| Task | Pipeline Stage | Description |
|---|---|---|
generation |
GenerateAnswer | LLM generates a response to the question |
parsing |
ParseTemplate | Judge LLM parses response into template schema |
agentic_parsing_investigation |
AgenticParseTemplate | Investigation agent examines workspace/trace |
agentic_parsing_extraction |
AgenticParseTemplate | Extracts structured answer from investigation trace |
abstention_detection |
AbstentionCheck | Detects model refusal |
sufficiency_detection |
SufficiencyCheck | Checks response completeness |
rubric_llm_trait_batch |
RubricEvaluation | Batched boolean/score LLM traits |
rubric_llm_trait_single |
RubricEvaluation | Sequential single LLM trait |
rubric_literal_trait_batch |
RubricEvaluation | Batched literal (categorical) traits |
rubric_literal_trait_single |
RubricEvaluation | Sequential single literal trait |
rubric_metric_trait |
RubricEvaluation | Metric trait (confusion matrix) |
rubric_agentic_trait_investigation |
AgenticRubricEvaluation | Agent investigates response/workspace for rubric trait |
rubric_agentic_trait_extraction |
AgenticRubricEvaluation | Extracts score from agentic investigation trace |
dj_template_excerpt_extraction |
DeepJudgmentAutoFail | Extract verbatim excerpts per attribute |
dj_template_hallucination |
DeepJudgmentAutoFail | Assess hallucination risk via search |
dj_template_reasoning |
DeepJudgmentAutoFail | Generate reasoning for excerpt-to-attribute mapping |
dj_rubric_excerpt_extraction |
DeepJudgmentRubricAutoFail | Extract excerpts for rubric traits |
dj_rubric_hallucination |
DeepJudgmentRubricAutoFail | Assess per-excerpt hallucination risk |
dj_rubric_reasoning |
DeepJudgmentRubricAutoFail | Generate trait evaluation reasoning |
dj_rubric_score_extraction |
DeepJudgmentRubricAutoFail | Extract final score from reasoning |
rubric_dynamic_presence_check |
RubricEvaluation (pre-processing) | Batch concept presence check for DynamicRubric traits |
PortCapabilities¶
PortCapabilities declares what prompt features an adapter supports. The assembler uses these to decide message formatting:
| Field | Type | Default | Effect |
|---|---|---|---|
supports_system_prompt |
bool |
True |
If False, system text is prepended to user text as a single message |
supports_structured_output |
bool |
False |
Used by adapters to signal native structured output support |
Customizing Prompts¶
Via PromptConfig (User Instructions)¶
The most common customization point. Add instructions to PromptConfig fields to influence specific pipeline calls:
from karenina.schemas.verification import VerificationConfig, PromptConfig
config = VerificationConfig(
prompt_config=PromptConfig(
parsing="Focus on gene symbols. Normalize all gene names to HGNC format.",
rubric_evaluation="Grade strictly. Deduct points for missing citations.",
),
# ...
)
User instructions are appended to the system text after adapter instructions. See PromptConfig for details on injection points and fallback logic.
Via Adapter Instructions (For Adapter Authors)¶
To register custom instructions for a new adapter:
- Create a dataclass implementing the
AdapterInstructionprotocol - Write a factory function that accepts
**kwargsand returns the instruction instance - Register with
AdapterInstructionRegistry.register(interface, task, factory) - Import the module from your adapter's
registration.py
The factory receives the instruction_context dict passed to PromptAssembler.assemble(). Common context keys include:
| Key | Type | Provided By |
|---|---|---|
json_schema |
dict |
Template parsing evaluator |
format_instructions |
str |
Template parsing evaluator |
Factories should use kwargs.get() with defaults so they work even when keys are absent.
Key Source Files¶
| File | Purpose |
|---|---|
benchmark/verification/prompts/assembler.py |
PromptAssembler |
benchmark/verification/prompts/task_types.py |
PromptTask enum |
ports/adapter_instruction.py |
AdapterInstructionRegistry, AdapterInstruction protocol |
ports/capabilities.py |
PortCapabilities |
schemas/verification/prompt_config.py |
PromptConfig |
adapters/*/prompts/*.py |
Per-adapter instruction registrations |
benchmark/verification/prompts/parsing/parsing_instructions.py |
TemplatePromptBuilder |
Next Steps¶
- Prompt Config --- configure user instructions per task
- 13 Stages in Detail --- which stages make LLM calls and use the assembler
- Available Adapters --- adapter-specific prompt behavior
- Verification Config Reference ---
prompt_configfield inVerificationConfig - Pipeline Overview --- how stages execute and interact