`karenina.integrations.adele`¶

adele ¶

ADeLe (Annotated Demand Levels) integration for Karenina.

This module provides 18 pre-defined LLMRubricTrait objects based on the ADeLe evaluation framework. Each trait uses kind="literal" with 6 ordinal classes (levels 0-5) for evaluating various cognitive and processing dimensions.

ADeLe was introduced in

Zhou, L., Pacchiardi, L., Martínez-Plumed, F., Collins, K.M., et al. (2025). "General Scales Unlock AI Evaluation with Explanatory and Predictive Power." arXiv:2503.06378. https://arxiv.org/abs/2503.06378 Project: https://kinds-of-intelligence-cfi.github.io/ADELE/

Available Traits (by snake_case name): - attention_and_scan (AS): Attention and scanning requirements - atypicality (AT): Novelty/uniqueness of the task - comprehension_complexity (CEc): Comprehension difficulty - comprehension_evaluation (CEe): Comprehension evaluation difficulty - conceptualization_and_learning (CL): Learning requirements - knowledge_applied_sciences (KNa): Applied sciences knowledge - knowledge_cultural (KNc): Cultural knowledge - knowledge_formal_sciences (KNf): Formal sciences knowledge - knowledge_natural_sciences (KNn): Natural sciences knowledge - knowledge_social_sciences (KNs): Social sciences knowledge - metacognition_relevance (MCr): Metacognitive relevance recognition - metacognition_task_planning (MCt): Metacognitive task planning - metacognition_uncertainty (MCu): Metacognitive uncertainty handling - mind_modelling (MS): Social cognition/mind modelling - logical_reasoning_logic (QLl): Logical reasoning - logical_reasoning_quantitative (QLq): Quantitative reasoning - spatial_physical_understanding (SNs): Spatial/physical understanding - volume (VO): Time/effort required

Usage

Get a single trait¶

from karenina.integrations.adele import get_adele_trait trait = get_adele_trait("attention_and_scan")

Get all traits¶

from karenina.integrations.adele import get_all_adele_traits traits = get_all_adele_traits()

Create a Rubric with ADeLe traits¶

from karenina.integrations.adele import create_adele_rubric rubric = create_adele_rubric() # All 18 traits rubric = create_adele_rubric(["attention_and_scan", "mind_modelling"]) # Selected

List available trait names¶

from karenina.integrations.adele import ADELE_TRAIT_NAMES print(ADELE_TRAIT_NAMES)

Classify questions using ADeLe dimensions¶

from karenina.integrations.adele import QuestionClassifier classifier = QuestionClassifier() result = classifier.classify_single("What is the capital of France?") print(result.scores) # {"attention_and_scan": 0, "volume": 1, ...}

Classes¶

AdeleLevel `dataclass` ¶

A single level in an ADeLe rubric (0-5).

Source code in src/karenina/integrations/adele/parser.py

@dataclass
class AdeleLevel:
    """A single level in an ADeLe rubric (0-5)."""

    index: int
    label: str
    description: str
    examples: list[str] = field(default_factory=list)

    def to_class_description(self) -> str:
        """Format level as a single class description string for LLMRubricTrait.

        Format: "Level N: Label. Description\nExamples:\n* example1\n* example2"
        """
        parts = [f"Level {self.index}: {self.label}. {self.description}"]
        if self.examples:
            parts.append("Examples:")
            for example in self.examples:
                parts.append(f"* {example}")
        return "\n".join(parts)

Functions¶

to_class_description ¶

to_class_description() -> str

Format level as a single class description string for LLMRubricTrait.

    Format: "Level N: Label. Description

Examples: * example1 * example2"

Source code in src/karenina/integrations/adele/parser.py

def to_class_description(self) -> str:
    """Format level as a single class description string for LLMRubricTrait.

    Format: "Level N: Label. Description\nExamples:\n* example1\n* example2"
    """
    parts = [f"Level {self.index}: {self.label}. {self.description}"]
    if self.examples:
        parts.append("Examples:")
        for example in self.examples:
            parts.append(f"* {example}")
    return "\n".join(parts)

AdeleRubric `dataclass` ¶

A parsed ADeLe rubric with optional header and 6 levels.

Source code in src/karenina/integrations/adele/parser.py

@dataclass
class AdeleRubric:
    """A parsed ADeLe rubric with optional header and 6 levels."""

    code: str
    header: str | None
    levels: list[AdeleLevel]

    def __post_init__(self) -> None:
        """Validate rubric structure."""
        if len(self.levels) != 6:
            raise ValueError(f"ADeLe rubric must have exactly 6 levels, got {len(self.levels)}")

        for i, level in enumerate(self.levels):
            if level.index != i:
                raise ValueError(f"Level at position {i} has index {level.index}, expected {i}")

AdeleTraitInfo ¶

Bases: BaseModel

Information about an ADeLe trait for API responses.

Source code in src/karenina/integrations/adele/schemas.py

class AdeleTraitInfo(BaseModel):
    """Information about an ADeLe trait for API responses."""

    name: str = Field(description="Snake_case trait name")
    code: str = Field(description="Original ADeLe code (e.g., 'AS', 'AT')")
    description: str | None = Field(
        default=None,
        description="Trait description/header from the rubric",
    )
    classes: dict[str, str] = Field(
        default_factory=dict,
        description="Mapping from class name to class description",
    )
    class_names: list[str] = Field(
        default_factory=list,
        description="Ordered list of class names (from level 0 to 5)",
    )

QuestionClassificationResult ¶

Bases: BaseModel

Result of classifying a single question against ADeLe dimensions.

Source code in src/karenina/integrations/adele/schemas.py

class QuestionClassificationResult(BaseModel):
    """Result of classifying a single question against ADeLe dimensions."""

    question_id: str | None = Field(
        default=None,
        description="Unique identifier for the question",
    )
    question_text: str = Field(
        description="The question text that was classified",
    )
    scores: dict[str, int] = Field(
        default_factory=dict,
        description=("Mapping from trait name to integer score (0-5). -1 indicates classification error."),
    )
    labels: dict[str, str] = Field(
        default_factory=dict,
        description=("Mapping from trait name to class label (none, very_low, low, intermediate, high, very_high)"),
    )
    model: str = Field(
        default="unknown",
        description="The model used for classification",
    )
    classified_at: str = Field(
        default="",
        description="ISO timestamp of when classification was performed",
    )
    usage_metadata: dict[str, Any] = Field(
        default_factory=dict,
        description="Token usage and other metadata from the LLM call",
    )

    def to_checkpoint_metadata(self) -> dict[str, Any]:
        """
        Convert to format suitable for storing in checkpoint custom_metadata.

        Returns:
            Dictionary with adele_classification key containing scores, labels,
            timestamp, and model.
        """
        return {
            "adele_classification": {
                "scores": self.scores,
                "labels": self.labels,
                "classified_at": self.classified_at,
                "model": self.model,
            }
        }

    @classmethod
    def from_checkpoint_metadata(
        cls,
        metadata: dict[str, Any],
        question_id: str | None = None,
        question_text: str = "",
    ) -> QuestionClassificationResult | None:
        """
        Create from checkpoint custom_metadata format.

        Args:
            metadata: The custom_metadata dict from a question
            question_id: Optional question ID
            question_text: Optional question text

        Returns:
            QuestionClassificationResult if adele_classification exists, else None
        """
        adele_data = metadata.get("adele_classification")
        if adele_data is None:
            return None

        return cls(
            question_id=question_id,
            question_text=question_text,
            scores=adele_data.get("scores", {}),
            labels=adele_data.get("labels", {}),
            model=adele_data.get("model", "unknown"),
            classified_at=adele_data.get("classified_at", ""),
            usage_metadata={},  # Not stored in checkpoint
        )

    def get_summary(self) -> dict[str, str]:
        """
        Get a summary of classifications as trait -> "label (score)" pairs.

        Returns:
            Dictionary mapping trait names to "label (score)" strings.
        """
        summary: dict[str, str] = {}
        for trait_name, score in self.scores.items():
            label = self.labels.get(trait_name, "unknown")
            if score == -1:
                summary[trait_name] = f"error: {label}"
            else:
                summary[trait_name] = f"{label} ({score})"
        return summary

Functions¶

from_checkpoint_metadata `classmethod` ¶

from_checkpoint_metadata(
    metadata: dict[str, Any],
    question_id: str | None = None,
    question_text: str = "",
) -> QuestionClassificationResult | None

Create from checkpoint custom_metadata format.

Parameters:

Name	Type	Description	Default
`metadata` ¶	`dict[str, Any]`	The custom_metadata dict from a question	required
`question_id` ¶	`str \| None`	Optional question ID	`None`
`question_text` ¶	`str`	Optional question text	`''`

Returns:

Type	Description
`QuestionClassificationResult \| None`	QuestionClassificationResult if adele_classification exists, else None

Source code in src/karenina/integrations/adele/schemas.py

@classmethod
def from_checkpoint_metadata(
    cls,
    metadata: dict[str, Any],
    question_id: str | None = None,
    question_text: str = "",
) -> QuestionClassificationResult | None:
    """
    Create from checkpoint custom_metadata format.

    Args:
        metadata: The custom_metadata dict from a question
        question_id: Optional question ID
        question_text: Optional question text

    Returns:
        QuestionClassificationResult if adele_classification exists, else None
    """
    adele_data = metadata.get("adele_classification")
    if adele_data is None:
        return None

    return cls(
        question_id=question_id,
        question_text=question_text,
        scores=adele_data.get("scores", {}),
        labels=adele_data.get("labels", {}),
        model=adele_data.get("model", "unknown"),
        classified_at=adele_data.get("classified_at", ""),
        usage_metadata={},  # Not stored in checkpoint
    )

get_summary ¶

get_summary() -> dict[str, str]

Get a summary of classifications as trait -> "label (score)" pairs.

Returns:

Type	Description
`dict[str, str]`	Dictionary mapping trait names to "label (score)" strings.

Source code in src/karenina/integrations/adele/schemas.py

def get_summary(self) -> dict[str, str]:
    """
    Get a summary of classifications as trait -> "label (score)" pairs.

    Returns:
        Dictionary mapping trait names to "label (score)" strings.
    """
    summary: dict[str, str] = {}
    for trait_name, score in self.scores.items():
        label = self.labels.get(trait_name, "unknown")
        if score == -1:
            summary[trait_name] = f"error: {label}"
        else:
            summary[trait_name] = f"{label} ({score})"
    return summary

to_checkpoint_metadata ¶

to_checkpoint_metadata() -> dict[str, Any]

Convert to format suitable for storing in checkpoint custom_metadata.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with adele_classification key containing scores, labels,
`dict[str, Any]`	timestamp, and model.

Source code in src/karenina/integrations/adele/schemas.py

def to_checkpoint_metadata(self) -> dict[str, Any]:
    """
    Convert to format suitable for storing in checkpoint custom_metadata.

    Returns:
        Dictionary with adele_classification key containing scores, labels,
        timestamp, and model.
    """
    return {
        "adele_classification": {
            "scores": self.scores,
            "labels": self.labels,
            "classified_at": self.classified_at,
            "model": self.model,
        }
    }

QuestionClassifier ¶

Classifies questions using ADeLe rubrics via LLM-as-judge.

This classifier evaluates questions against ADeLe (Assessment Dimensions for Language Evaluation) dimensions to characterize their cognitive complexity. Each dimension produces a score from 0-5 corresponding to levels: none, very_low, low, intermediate, high, very_high.

Example usage

classifier = QuestionClassifier() result = classifier.classify_single("What is 2+2?") print(result.scores) # {"attention_and_scan": 0, "volume": 1, ...} print(result.labels) # {"attention_and_scan": "none", "volume": "very_low", ...}

Source code in src/karenina/integrations/adele/classifier.py

class QuestionClassifier:
    """
    Classifies questions using ADeLe rubrics via LLM-as-judge.

    This classifier evaluates questions against ADeLe (Assessment Dimensions
    for Language Evaluation) dimensions to characterize their cognitive
    complexity. Each dimension produces a score from 0-5 corresponding to
    levels: none, very_low, low, intermediate, high, very_high.

    Example usage:
        classifier = QuestionClassifier()
        result = classifier.classify_single("What is 2+2?")
        print(result.scores)  # {"attention_and_scan": 0, "volume": 1, ...}
        print(result.labels)  # {"attention_and_scan": "none", "volume": "very_low", ...}
    """

    def __init__(
        self,
        llm: LLMPort | None = None,
        model_name: str = "claude-haiku-4-5",
        provider: str = "anthropic",
        temperature: float = 0.0,
        interface: str = "langchain",
        endpoint_base_url: str | None = None,
        endpoint_api_key: str | None = None,
        trait_eval_mode: str = "batch",
        async_enabled: bool | None = None,
        async_max_workers: int | None = None,
        *,
        model_config: ModelConfig | None = None,
    ):
        """
        Initialize the question classifier.

        Args:
            llm: Optional pre-initialized LLMPort instance. If not provided,
                 one will be created using model_config or individual params.
            model_name: Model name to use if llm not provided.
                       Defaults to claude-haiku-4-5 for efficiency.
            provider: Model provider to use if llm not provided.
            temperature: Temperature for LLM calls. Defaults to 0.0 for
                        deterministic classifications.
            interface: The interface to use for model initialization.
                      Supported values: "langchain", "openrouter", "openai_endpoint".
                      Defaults to "langchain".
            endpoint_base_url: Custom base URL for openai_endpoint interface.
                              Required when interface="openai_endpoint".
            endpoint_api_key: API key for openai_endpoint interface.
                             Required when interface="openai_endpoint".
            trait_eval_mode: How to evaluate traits for a single question.
                            "batch" - all traits in one LLM call (faster, cheaper)
                            "sequential" - each trait in separate call (potentially more accurate)
                            Defaults to "batch".
            async_enabled: Whether to run sequential trait evaluations in parallel.
                          If None, reads from KARENINA_ASYNC_ENABLED env var (default: True).
            async_max_workers: Max concurrent workers for parallel execution.
                              If None, reads from KARENINA_ASYNC_MAX_WORKERS env var (default: 2).
            model_config: Optional ModelConfig to use for creating the LLM.
                         Takes precedence over individual model params.
        """
        from karenina.adapters.llm_parallel import read_async_config

        self._llm = llm
        self._model_config = model_config
        self._model_name = model_name
        self._provider = provider
        self._temperature = temperature
        self._interface = interface
        self._endpoint_base_url = endpoint_base_url
        self._endpoint_api_key = endpoint_api_key
        self._trait_eval_mode = trait_eval_mode

        # Read async config with env var fallbacks
        default_enabled, default_workers = read_async_config()
        self._async_enabled = async_enabled if async_enabled is not None else default_enabled
        self._async_max_workers = async_max_workers if async_max_workers is not None else default_workers

    @property
    def llm(self) -> LLMPort:
        """Lazily initialize and return the LLM instance."""
        if self._llm is None:
            from pydantic import SecretStr

            from karenina.adapters.factory import get_llm
            from karenina.schemas.config import ModelConfig

            # Use provided model_config or create one from individual params
            if self._model_config is not None:
                config = self._model_config
            else:
                # Convert endpoint_api_key to SecretStr if provided
                api_key = SecretStr(self._endpoint_api_key) if self._endpoint_api_key else None
                config = ModelConfig(
                    id="adele-classifier",
                    model_name=self._model_name,
                    model_provider=self._provider,
                    temperature=self._temperature,
                    interface=self._interface,
                    endpoint_base_url=self._endpoint_base_url,
                    endpoint_api_key=api_key,
                )
            self._llm = get_llm(config)
        return self._llm

    def classify_single(
        self,
        question_text: str,
        trait_names: list[str] | None = None,
        question_id: str | None = None,
    ) -> QuestionClassificationResult:
        """
        Classify a single question against ADeLe dimensions.

        Args:
            question_text: The question text to classify.
            trait_names: Optional list of ADeLe trait names to evaluate.
                        If None, evaluates all 18 ADeLe traits.
            question_id: Optional ID for the question.

        Returns:
            QuestionClassificationResult with scores, labels, and metadata.
        """
        # Get traits to evaluate
        if trait_names is None:
            trait_names = ADELE_TRAIT_NAMES
        traits = [get_adele_trait(name) for name in trait_names]

        # Choose evaluation mode
        if self._trait_eval_mode == "sequential":
            return self._classify_single_sequential(question_text, traits, question_id)
        else:
            return self._classify_single_batch(question_text, traits, question_id)

    def _classify_single_batch(
        self,
        question_text: str,
        traits: list[Any],
        question_id: str | None = None,
    ) -> QuestionClassificationResult:
        """
        Classify a question by evaluating all traits in a single LLM call.

        This is faster and cheaper but may be less accurate for complex questions.
        """
        from karenina.schemas.outputs import BatchLiteralClassifications

        # Build prompts for question classification
        system_prompt = self._build_system_prompt()
        user_prompt = self._build_user_prompt(question_text, traits)

        messages = [
            Message.system(system_prompt),
            Message.user(user_prompt),
        ]

        # Invoke with structured output using LLMPort
        structured_llm = self.llm.with_structured_output(BatchLiteralClassifications)
        response = structured_llm.invoke(messages)

        # response.raw is the validated Pydantic model
        parsed_result = response.raw
        usage_metadata = asdict(response.usage) if response.usage else {}

        # Validate and convert classifications (use to_dict() to convert list format to dict)
        scores, labels = self._validate_classifications(parsed_result.to_dict(), traits)

        return QuestionClassificationResult(
            question_id=question_id,
            question_text=question_text,
            scores=scores,
            labels=labels,
            model=self._model_name,
            classified_at=datetime.now(UTC).isoformat(),
            usage_metadata=usage_metadata,
        )

    def _classify_single_sequential(
        self,
        question_text: str,
        traits: list[Any],
        question_id: str | None = None,
    ) -> QuestionClassificationResult:
        """
        Classify a question by evaluating each trait in a separate LLM call.

        When async_enabled is True, the LLM calls run in parallel using
        LLMParallelInvoker for significant speedup. Otherwise, calls run
        sequentially (legacy behavior).
        """
        from karenina.schemas.outputs import SingleLiteralClassification

        scores: dict[str, int] = {}
        labels: dict[str, str] = {}
        combined_usage: dict[str, Any] = {"calls": 0, "total_tokens": 0}

        # Build all tasks upfront using Message types
        tasks: list[tuple[list[Message], type[SingleLiteralClassification]]] = []
        for trait in traits:
            system_prompt = self._build_system_prompt_single_trait()
            user_prompt = self._build_user_prompt_single_trait(question_text, trait)
            messages = [
                Message.system(system_prompt),
                Message.user(user_prompt),
            ]
            tasks.append((messages, SingleLiteralClassification))

        if self._async_enabled:
            # Execute in parallel
            scores, labels, combined_usage = self._execute_parallel_classification(tasks, traits)
        else:
            # Fall back to sequential execution
            scores, labels, combined_usage = self._execute_sequential_classification(tasks, traits)

        return QuestionClassificationResult(
            question_id=question_id,
            question_text=question_text,
            scores=scores,
            labels=labels,
            model=self._model_name,
            classified_at=datetime.now(UTC).isoformat(),
            usage_metadata=combined_usage,
        )

    def _execute_parallel_classification(
        self,
        tasks: list[tuple[list[Message], Any]],
        traits: list[Any],
    ) -> tuple[dict[str, int], dict[str, str], dict[str, Any]]:
        """Execute classification tasks in parallel using LLMParallelInvoker."""
        from karenina.adapters.llm_parallel import LLMParallelInvoker

        invoker = LLMParallelInvoker(self.llm, max_workers=self._async_max_workers)
        results = invoker.invoke_batch_structured(tasks)

        scores: dict[str, int] = {}
        labels: dict[str, str] = {}
        combined_usage: dict[str, Any] = {"calls": 0, "total_tokens": 0, "input_tokens": 0, "output_tokens": 0}

        for i, (result, usage, error) in enumerate(results):
            trait = traits[i]
            if error:
                logger.error(f"Failed to classify trait {trait.name}: {error}")
                scores[trait.name] = -1
                labels[trait.name] = f"[ERROR: {error!s}]"
            else:
                assert result is not None  # mypy: error implies result is None
                score, label = self._validate_single_classification(trait, result.classification)
                scores[trait.name] = score
                labels[trait.name] = label
                combined_usage["calls"] += 1
                if usage:
                    combined_usage["total_tokens"] += usage.get("total_tokens", 0)
                    combined_usage["input_tokens"] += usage.get("input_tokens", 0)
                    combined_usage["output_tokens"] += usage.get("output_tokens", 0)

        return scores, labels, combined_usage

    def _execute_sequential_classification(
        self,
        tasks: list[tuple[list[Message], Any]],
        traits: list[Any],
    ) -> tuple[dict[str, int], dict[str, str], dict[str, Any]]:
        """Execute classification tasks sequentially (legacy behavior)."""
        scores: dict[str, int] = {}
        labels: dict[str, str] = {}
        combined_usage: dict[str, Any] = {"calls": 0, "total_tokens": 0, "input_tokens": 0, "output_tokens": 0}

        for i, (messages, model_class) in enumerate(tasks):
            trait = traits[i]
            try:
                # Use LLMPort.with_structured_output() pattern
                structured_llm = self.llm.with_structured_output(model_class)
                response = structured_llm.invoke(messages)

                # response.raw is the validated Pydantic model
                parsed_result = response.raw
                usage_metadata = asdict(response.usage) if response.usage else {}

                # Validate the classification
                score, label = self._validate_single_classification(trait, parsed_result.classification)
                scores[trait.name] = score
                labels[trait.name] = label

                # Accumulate usage metadata
                combined_usage["calls"] += 1
                if usage_metadata:
                    combined_usage["total_tokens"] += usage_metadata.get("total_tokens", 0)
                    combined_usage["input_tokens"] += usage_metadata.get("input_tokens", 0)
                    combined_usage["output_tokens"] += usage_metadata.get("output_tokens", 0)

            except Exception as e:
                logger.error(f"Failed to classify trait {trait.name}: {e}")
                scores[trait.name] = -1
                labels[trait.name] = f"[ERROR: {e!s}]"

        return scores, labels, combined_usage

    def classify_batch(
        self,
        questions: list[tuple[str, str]],
        trait_names: list[str] | None = None,
        on_progress: Callable[[int, int], None] | None = None,
    ) -> dict[str, QuestionClassificationResult]:
        """
        Classify multiple questions against ADeLe dimensions.

        Args:
            questions: List of (question_id, question_text) tuples.
            trait_names: Optional list of ADeLe trait names to evaluate.
                        If None, evaluates all 18 ADeLe traits.
            on_progress: Optional callback function(completed, total) for
                        progress updates.

        Returns:
            Dictionary mapping question_id to QuestionClassificationResult.
        """
        results: dict[str, QuestionClassificationResult] = {}
        total = len(questions)

        for i, (question_id, question_text) in enumerate(questions):
            try:
                result = self.classify_single(
                    question_text=question_text,
                    trait_names=trait_names,
                    question_id=question_id,
                )
                results[question_id] = result
            except Exception as e:
                logger.error(f"Failed to classify question {question_id}: {e}")
                # Create error result
                results[question_id] = QuestionClassificationResult(
                    question_id=question_id,
                    question_text=question_text,
                    scores={},
                    labels={},
                    model=self._model_name,
                    classified_at=datetime.now(UTC).isoformat(),
                    usage_metadata={"error": str(e)},
                )

            if on_progress:
                on_progress(i + 1, total)

        return results

    def _build_system_prompt_single_trait(self) -> str:
        """Build system prompt for single-trait question classification."""
        return SYSTEM_PROMPT_SINGLE_TRAIT

    def _build_user_prompt_single_trait(self, question_text: str, trait: Any) -> str:
        """Build user prompt for single-trait question classification."""
        from karenina.schemas.outputs import SingleLiteralClassification

        if trait.kind != "literal" or trait.classes is None:
            raise ValueError(f"Trait {trait.name} is not a literal trait with classes")

        class_names = list(trait.classes.keys())
        class_details = []
        for name, description in trait.classes.items():
            desc_preview = description[:400] + "..." if len(description) > 400 else description
            class_details.append(f"  - **{name}**: {desc_preview}")

        json_schema = json.dumps(SingleLiteralClassification.model_json_schema(), indent=2)
        mid_idx = len(class_names) // 2
        example_json = json.dumps({"classification": class_names[mid_idx]}, indent=2)

        return USER_PROMPT_SINGLE_TRAIT_TEMPLATE.format(
            trait_name=trait.name,
            question_text=question_text,
            trait_description=trait.description or "Classification dimension",
            class_names=", ".join(class_names),
            class_details="\n".join(class_details),
            json_schema=json_schema,
            example_json=example_json,
        )

    def _build_system_prompt(self) -> str:
        """Build system prompt for batch question classification."""
        return SYSTEM_PROMPT_BATCH

    def _build_user_prompt(self, question_text: str, traits: list[Any]) -> str:
        """Build user prompt for question classification."""
        from karenina.schemas.outputs import BatchLiteralClassifications

        traits_description = []
        example_classifications: list[dict[str, str]] = []

        for trait in traits:
            if trait.kind != "literal" or trait.classes is None:
                continue

            class_names = list(trait.classes.keys())
            # Build class descriptions
            class_details = []
            for name, description in trait.classes.items():
                # Truncate long descriptions for prompt efficiency
                desc_preview = description[:300] + "..." if len(description) > 300 else description
                class_details.append(f"    - **{name}**: {desc_preview}")

            trait_desc = (
                f"- **{trait.name}**: {trait.description or 'Classification trait'}\n"
                f"  Classes (in order of increasing complexity): {', '.join(class_names)}\n" + "\n".join(class_details)
            )
            traits_description.append(trait_desc)
            # Use middle class as example to avoid bias toward first/last
            mid_idx = len(class_names) // 2
            example_classifications.append({"trait_name": trait.name, "class_name": class_names[mid_idx]})

        example_json = json.dumps({"classifications": example_classifications}, indent=2)
        json_schema = json.dumps(BatchLiteralClassifications.model_json_schema(), indent=2)

        return USER_PROMPT_BATCH_TEMPLATE.format(
            question_text=question_text,
            traits_description="\n".join(traits_description),
            json_schema=json_schema,
            example_json=example_json,
        )

    def _validate_classifications(
        self,
        classifications: dict[str, str],
        traits: list[Any],
    ) -> tuple[dict[str, int], dict[str, str]]:
        """
        Validate and convert classifications to scores and labels.

        Args:
            classifications: Dict mapping trait names to class names from LLM
            traits: List of ADeLe traits being evaluated

        Returns:
            Tuple of (scores, labels) dictionaries
        """
        scores: dict[str, int] = {}
        labels: dict[str, str] = {}
        trait_map = {trait.name: trait for trait in traits}

        for trait_name, class_name in classifications.items():
            if trait_name in trait_map:
                trait = trait_map[trait_name]
                score, label = self._validate_single_classification(trait, class_name)
                scores[trait_name] = score
                labels[trait_name] = label

        # Add error state for missing traits
        for trait in traits:
            if trait.name not in scores:
                scores[trait.name] = -1
                labels[trait.name] = "[MISSING_FROM_RESPONSE]"

        return scores, labels

    def _validate_single_classification(self, trait: Any, class_name: str) -> tuple[int, str]:
        """
        Validate and convert a class name to score index and label.

        Args:
            trait: The ADeLe trait being evaluated
            class_name: The class name returned by the LLM

        Returns:
            Tuple of (score, label) where:
            - score: Int index (0 to N-1) if valid, -1 if invalid class name
            - label: The class name if valid, or the invalid value for debugging
        """
        if trait.kind != "literal" or trait.classes is None:
            return -1, f"[NOT_LITERAL_TRAIT: {class_name}]"

        # Get the index for the class name
        index = trait.get_class_index(class_name)
        if index == -1:
            # Try case-insensitive matching as fallback
            class_names_lower = {name.lower(): name for name in trait.classes}
            matched_name = class_names_lower.get(class_name.lower())
            if matched_name is not None:
                index = trait.get_class_index(matched_name)
                class_name = matched_name  # Use the canonical name
            else:
                # Invalid class name - store the invalid value for debugging
                logger.warning(
                    f"Invalid class '{class_name}' for trait '{trait.name}'. "
                    f"Valid classes: {list(trait.classes.keys())}"
                )
                return -1, class_name  # Return invalid class name for debugging

        return index, class_name

Attributes¶

llm `property` ¶

llm: LLMPort

Lazily initialize and return the LLM instance.

Functions¶

init ¶

__init__(
    llm: LLMPort | None = None,
    model_name: str = "claude-haiku-4-5",
    provider: str = "anthropic",
    temperature: float = 0.0,
    interface: str = "langchain",
    endpoint_base_url: str | None = None,
    endpoint_api_key: str | None = None,
    trait_eval_mode: str = "batch",
    async_enabled: bool | None = None,
    async_max_workers: int | None = None,
    *,
    model_config: ModelConfig | None = None,
)

Parameters:

Name	Type	Description	Default
`llm` ¶	`LLMPort \| None`	Optional pre-initialized LLMPort instance. If not provided, one will be created using model_config or individual params.	`None`
`model_name` ¶	`str`	Model name to use if llm not provided. Defaults to claude-haiku-4-5 for efficiency.	`'claude-haiku-4-5'`
`provider` ¶	`str`	Model provider to use if llm not provided.	`'anthropic'`
`temperature` ¶	`float`	Temperature for LLM calls. Defaults to 0.0 for deterministic classifications.	`0.0`
`interface` ¶	`str`	The interface to use for model initialization. Supported values: "langchain", "openrouter", "openai_endpoint". Defaults to "langchain".	`'langchain'`
`endpoint_base_url` ¶	`str \| None`	Custom base URL for openai_endpoint interface. Required when interface="openai_endpoint".	`None`
`endpoint_api_key` ¶	`str \| None`	API key for openai_endpoint interface. Required when interface="openai_endpoint".	`None`
`trait_eval_mode` ¶	`str`	How to evaluate traits for a single question. "batch" - all traits in one LLM call (faster, cheaper) "sequential" - each trait in separate call (potentially more accurate) Defaults to "batch".	`'batch'`
`async_enabled` ¶	`bool \| None`	Whether to run sequential trait evaluations in parallel. If None, reads from KARENINA_ASYNC_ENABLED env var (default: True).	`None`
`async_max_workers` ¶	`int \| None`	Max concurrent workers for parallel execution. If None, reads from KARENINA_ASYNC_MAX_WORKERS env var (default: 2).	`None`
`model_config` ¶	`ModelConfig \| None`	Optional ModelConfig to use for creating the LLM. Takes precedence over individual model params.	`None`

Source code in src/karenina/integrations/adele/classifier.py

def __init__(
    self,
    llm: LLMPort | None = None,
    model_name: str = "claude-haiku-4-5",
    provider: str = "anthropic",
    temperature: float = 0.0,
    interface: str = "langchain",
    endpoint_base_url: str | None = None,
    endpoint_api_key: str | None = None,
    trait_eval_mode: str = "batch",
    async_enabled: bool | None = None,
    async_max_workers: int | None = None,
    *,
    model_config: ModelConfig | None = None,
):
    """
    Initialize the question classifier.

    Args:
        llm: Optional pre-initialized LLMPort instance. If not provided,
             one will be created using model_config or individual params.
        model_name: Model name to use if llm not provided.
                   Defaults to claude-haiku-4-5 for efficiency.
        provider: Model provider to use if llm not provided.
        temperature: Temperature for LLM calls. Defaults to 0.0 for
                    deterministic classifications.
        interface: The interface to use for model initialization.
                  Supported values: "langchain", "openrouter", "openai_endpoint".
                  Defaults to "langchain".
        endpoint_base_url: Custom base URL for openai_endpoint interface.
                          Required when interface="openai_endpoint".
        endpoint_api_key: API key for openai_endpoint interface.
                         Required when interface="openai_endpoint".
        trait_eval_mode: How to evaluate traits for a single question.
                        "batch" - all traits in one LLM call (faster, cheaper)
                        "sequential" - each trait in separate call (potentially more accurate)
                        Defaults to "batch".
        async_enabled: Whether to run sequential trait evaluations in parallel.
                      If None, reads from KARENINA_ASYNC_ENABLED env var (default: True).
        async_max_workers: Max concurrent workers for parallel execution.
                          If None, reads from KARENINA_ASYNC_MAX_WORKERS env var (default: 2).
        model_config: Optional ModelConfig to use for creating the LLM.
                     Takes precedence over individual model params.
    """
    from karenina.adapters.llm_parallel import read_async_config

    self._llm = llm
    self._model_config = model_config
    self._model_name = model_name
    self._provider = provider
    self._temperature = temperature
    self._interface = interface
    self._endpoint_base_url = endpoint_base_url
    self._endpoint_api_key = endpoint_api_key
    self._trait_eval_mode = trait_eval_mode

    # Read async config with env var fallbacks
    default_enabled, default_workers = read_async_config()
    self._async_enabled = async_enabled if async_enabled is not None else default_enabled
    self._async_max_workers = async_max_workers if async_max_workers is not None else default_workers

classify_batch ¶

classify_batch(
    questions: list[tuple[str, str]],
    trait_names: list[str] | None = None,
    on_progress: Callable[[int, int], None] | None = None,
) -> dict[str, QuestionClassificationResult]

Classify multiple questions against ADeLe dimensions.

Parameters:

Name	Type	Description	Default
`questions` ¶	`list[tuple[str, str]]`	List of (question_id, question_text) tuples.	required
`trait_names` ¶	`list[str] \| None`	Optional list of ADeLe trait names to evaluate. If None, evaluates all 18 ADeLe traits.	`None`
`on_progress` ¶	`Callable[[int, int], None] \| None`	Optional callback function(completed, total) for progress updates.	`None`

Returns:

Type	Description
`dict[str, QuestionClassificationResult]`	Dictionary mapping question_id to QuestionClassificationResult.

Source code in src/karenina/integrations/adele/classifier.py

def classify_batch(
    self,
    questions: list[tuple[str, str]],
    trait_names: list[str] | None = None,
    on_progress: Callable[[int, int], None] | None = None,
) -> dict[str, QuestionClassificationResult]:
    """
    Classify multiple questions against ADeLe dimensions.

    Args:
        questions: List of (question_id, question_text) tuples.
        trait_names: Optional list of ADeLe trait names to evaluate.
                    If None, evaluates all 18 ADeLe traits.
        on_progress: Optional callback function(completed, total) for
                    progress updates.

    Returns:
        Dictionary mapping question_id to QuestionClassificationResult.
    """
    results: dict[str, QuestionClassificationResult] = {}
    total = len(questions)

    for i, (question_id, question_text) in enumerate(questions):
        try:
            result = self.classify_single(
                question_text=question_text,
                trait_names=trait_names,
                question_id=question_id,
            )
            results[question_id] = result
        except Exception as e:
            logger.error(f"Failed to classify question {question_id}: {e}")
            # Create error result
            results[question_id] = QuestionClassificationResult(
                question_id=question_id,
                question_text=question_text,
                scores={},
                labels={},
                model=self._model_name,
                classified_at=datetime.now(UTC).isoformat(),
                usage_metadata={"error": str(e)},
            )

        if on_progress:
            on_progress(i + 1, total)

    return results

classify_single ¶

classify_single(
    question_text: str,
    trait_names: list[str] | None = None,
    question_id: str | None = None,
) -> QuestionClassificationResult

Classify a single question against ADeLe dimensions.

Parameters:

Name	Type	Description	Default
`question_text` ¶	`str`	The question text to classify.	required
`trait_names` ¶	`list[str] \| None`	Optional list of ADeLe trait names to evaluate. If None, evaluates all 18 ADeLe traits.	`None`
`question_id` ¶	`str \| None`	Optional ID for the question.	`None`

Returns:

Type	Description
`QuestionClassificationResult`	QuestionClassificationResult with scores, labels, and metadata.

Source code in src/karenina/integrations/adele/classifier.py

def classify_single(
    self,
    question_text: str,
    trait_names: list[str] | None = None,
    question_id: str | None = None,
) -> QuestionClassificationResult:
    """
    Classify a single question against ADeLe dimensions.

    Args:
        question_text: The question text to classify.
        trait_names: Optional list of ADeLe trait names to evaluate.
                    If None, evaluates all 18 ADeLe traits.
        question_id: Optional ID for the question.

    Returns:
        QuestionClassificationResult with scores, labels, and metadata.
    """
    # Get traits to evaluate
    if trait_names is None:
        trait_names = ADELE_TRAIT_NAMES
    traits = [get_adele_trait(name) for name in trait_names]

    # Choose evaluation mode
    if self._trait_eval_mode == "sequential":
        return self._classify_single_sequential(question_text, traits, question_id)
    else:
        return self._classify_single_batch(question_text, traits, question_id)

Functions¶

create_adele_rubric ¶

create_adele_rubric(
    trait_names: list[str] | None = None,
) -> Rubric

Create a Rubric with specified ADeLe traits (or all if None).

Parameters:

Name	Type	Description	Default
`trait_names` ¶	`list[str] \| None`	List of snake_case trait names to include. If None, includes all 18 ADeLe traits.	`None`

Returns:

Type	Description
`Rubric`	Rubric containing the specified ADeLe traits as llm_traits

Raises:

Type	Description
`ValueError`	If any trait name is not recognized

Example

All traits¶

rubric = create_adele_rubric() len(rubric.llm_traits) 18

Selected traits¶

rubric = create_adele_rubric(["attention_and_scan", "mind_modelling"]) len(rubric.llm_traits) 2

Source code in src/karenina/integrations/adele/traits.py

def create_adele_rubric(trait_names: list[str] | None = None) -> Rubric:
    """Create a Rubric with specified ADeLe traits (or all if None).

    Args:
        trait_names: List of snake_case trait names to include.
                    If None, includes all 18 ADeLe traits.

    Returns:
        Rubric containing the specified ADeLe traits as llm_traits

    Raises:
        ValueError: If any trait name is not recognized

    Example:
        >>> # All traits
        >>> rubric = create_adele_rubric()
        >>> len(rubric.llm_traits)
        18

        >>> # Selected traits
        >>> rubric = create_adele_rubric(["attention_and_scan", "mind_modelling"])
        >>> len(rubric.llm_traits)
        2
    """
    traits = get_all_adele_traits() if trait_names is None else [get_adele_trait(name) for name in trait_names]

    return Rubric(llm_traits=traits)

get_adele_trait ¶

get_adele_trait(name: str) -> LLMRubricTrait

Get a single ADeLe trait by snake_case name.

Parameters:

Name	Type	Description	Default
`name` ¶	`str`	Snake_case trait name (e.g., "attention_and_scan", "mind_modelling")	required

Returns:

Type	Description
`LLMRubricTrait`	LLMRubricTrait with kind="literal"

Raises:

Type	Description
`ValueError`	If the trait name is not recognized

Example

trait = get_adele_trait("attention_and_scan") trait.name 'attention_and_scan' trait.kind 'literal' len(trait.classes) 6

Source code in src/karenina/integrations/adele/traits.py

def get_adele_trait(name: str) -> LLMRubricTrait:
    """Get a single ADeLe trait by snake_case name.

    Args:
        name: Snake_case trait name (e.g., "attention_and_scan", "mind_modelling")

    Returns:
        LLMRubricTrait with kind="literal"

    Raises:
        ValueError: If the trait name is not recognized

    Example:
        >>> trait = get_adele_trait("attention_and_scan")
        >>> trait.name
        'attention_and_scan'
        >>> trait.kind
        'literal'
        >>> len(trait.classes)
        6
    """
    code = ADELE_NAME_TO_CODE.get(name)
    if code is None:
        raise ValueError(f"Unknown ADeLe trait name: {name}. Available traits: {', '.join(ADELE_TRAIT_NAMES)}")

    rubric = _load_and_parse_rubric(code)
    return _adele_rubric_to_trait(rubric)

get_adele_trait_by_code ¶

get_adele_trait_by_code(code: str) -> LLMRubricTrait

Get a single ADeLe trait by its original code.

Parameters:

Name	Type	Description	Default
`code` ¶	`str`	Original ADeLe code (e.g., "AS", "AT", "CEc")	required

Returns:

Type	Description
`LLMRubricTrait`	LLMRubricTrait with kind="literal"

Raises:

Type	Description
`ValueError`	If the code is not recognized

Example

trait = get_adele_trait_by_code("AS") trait.name 'attention_and_scan'

Source code in src/karenina/integrations/adele/traits.py

def get_adele_trait_by_code(code: str) -> LLMRubricTrait:
    """Get a single ADeLe trait by its original code.

    Args:
        code: Original ADeLe code (e.g., "AS", "AT", "CEc")

    Returns:
        LLMRubricTrait with kind="literal"

    Raises:
        ValueError: If the code is not recognized

    Example:
        >>> trait = get_adele_trait_by_code("AS")
        >>> trait.name
        'attention_and_scan'
    """
    if code not in ADELE_CODE_TO_NAME:
        raise ValueError(f"Unknown ADeLe code: {code}. Available codes: {', '.join(ADELE_CODES)}")

    rubric = _load_and_parse_rubric(code)
    return _adele_rubric_to_trait(rubric)

get_all_adele_traits ¶

get_all_adele_traits() -> list[LLMRubricTrait]

Get all 18 ADeLe traits.

Returns:

Type	Description
`list[LLMRubricTrait]`	List of 18 LLMRubricTrait objects with kind="literal"

Example

traits = get_all_adele_traits() len(traits) 18 all(t.kind == "literal" for t in traits) True

Source code in src/karenina/integrations/adele/traits.py

def get_all_adele_traits() -> list[LLMRubricTrait]:
    """Get all 18 ADeLe traits.

    Returns:
        List of 18 LLMRubricTrait objects with kind="literal"

    Example:
        >>> traits = get_all_adele_traits()
        >>> len(traits)
        18
        >>> all(t.kind == "literal" for t in traits)
        True
    """
    return [get_adele_trait(name) for name in ADELE_TRAIT_NAMES]

parse_adele_file ¶

parse_adele_file(content: str, code: str) -> AdeleRubric

Parse ADeLe rubric text content into structured format.

Parameters:

Name	Type	Description	Default
`content` ¶	`str`	Raw text content of the rubric file	required
`code` ¶	`str`	File code/identifier (e.g., "AS", "AT", "CEc")	required

Returns:

Type	Description
`AdeleRubric`	Parsed AdeleRubric with header (if present) and 6 levels

Raises:

Type	Description
`ValueError`	If parsing fails or rubric structure is invalid

Source code in src/karenina/integrations/adele/parser.py

def parse_adele_file(content: str, code: str) -> AdeleRubric:
    """Parse ADeLe rubric text content into structured format.

    Args:
        content: Raw text content of the rubric file
        code: File code/identifier (e.g., "AS", "AT", "CEc")

    Returns:
        Parsed AdeleRubric with header (if present) and 6 levels

    Raises:
        ValueError: If parsing fails or rubric structure is invalid
    """
    # Normalize line endings
    content = content.replace("\r\n", "\n").replace("\r", "\n")

    # Find all level markers
    level_matches = list(LEVEL_PATTERN.finditer(content))

    if len(level_matches) != 6:
        raise ValueError(f"Expected 6 levels in rubric {code}, found {len(level_matches)}")

    # Extract header (content before Level 0)
    first_level_start = level_matches[0].start()
    header_content = content[:first_level_start].strip()
    header = header_content if header_content else None

    # Parse each level
    levels: list[AdeleLevel] = []

    for i, match in enumerate(level_matches):
        level_index = int(match.group(1))
        label = match.group(2).strip()
        first_line_desc = match.group(3).strip()

        # Determine where this level's content ends
        level_end = level_matches[i + 1].start() if i + 1 < len(level_matches) else len(content)

        # Extract full level content (after the matched line)
        level_content = content[match.end() : level_end].strip()

        # Combine first line description with continuation
        full_description, examples = _parse_level_content(first_line_desc, level_content)

        levels.append(
            AdeleLevel(
                index=level_index,
                label=label,
                description=full_description,
                examples=examples,
            )
        )

    return AdeleRubric(code=code, header=header, levels=levels)

`karenina.integrations.adele`¶

adele ¶

Get a single trait¶

Get all traits¶

Create a Rubric with ADeLe traits¶

List available trait names¶

Classify questions using ADeLe dimensions¶

Classes¶

AdeleLevel `dataclass` ¶

Functions¶

to_class_description ¶

AdeleRubric `dataclass` ¶

AdeleTraitInfo ¶

QuestionClassificationResult ¶

Functions¶

from_checkpoint_metadata `classmethod` ¶

get_summary ¶

to_checkpoint_metadata ¶

QuestionClassifier ¶

Attributes¶

llm `property` ¶

Functions¶

init ¶

classify_batch ¶

classify_single ¶

Functions¶

create_adele_rubric ¶

`trait_names` ¶

All traits¶

Selected traits¶

get_adele_trait ¶

`name` ¶

get_adele_trait_by_code ¶

`code` ¶

get_all_adele_traits ¶

parse_adele_file ¶

`content` ¶

`code` ¶

karenina.integrations.adele¶

adele ¶

Get a single trait¶

Get all traits¶

Create a Rubric with ADeLe traits¶

List available trait names¶

Classify questions using ADeLe dimensions¶

Classes¶

AdeleLevel dataclass ¶

Functions¶

to_class_description ¶

AdeleRubric dataclass ¶

AdeleTraitInfo ¶

QuestionClassificationResult ¶

Functions¶

from_checkpoint_metadata classmethod ¶

get_summary ¶

to_checkpoint_metadata ¶

QuestionClassifier ¶

Attributes¶

llm property ¶

Functions¶

__init__ ¶

classify_batch ¶

classify_single ¶

Functions¶

create_adele_rubric ¶

trait_names ¶

All traits¶

Selected traits¶

get_adele_trait ¶

name ¶

get_adele_trait_by_code ¶

code ¶

get_all_adele_traits ¶

parse_adele_file ¶

content ¶

code ¶

`karenina.integrations.adele`¶

AdeleLevel `dataclass` ¶

AdeleRubric `dataclass` ¶

from_checkpoint_metadata `classmethod` ¶

llm `property` ¶

init ¶

`trait_names` ¶

`name` ¶

`code` ¶

`content` ¶

`code` ¶