karenina.utils.json_extraction¶
json_extraction
¶
JSON extraction and parsing utilities for LLM responses.
This module provides functions for extracting JSON from LLM responses that may contain markdown fences, reasoning text, or other non-JSON content.
Functions:
| Name | Description |
|---|---|
strip_markdown_fences |
Remove markdown code fences and extract JSON from text |
extract_json_from_text |
Extract JSON objects from mixed text content |
extract_balanced_braces |
Extract balanced brace expressions from text |
extract_json_from_response |
Alias for extract_json_from_text (backwards compat) |
is_invalid_json_error |
Check if an error is related to invalid JSON output |
Functions¶
extract_balanced_braces
¶
Extract a balanced brace expression from text starting at given position.
Properly handles: - Nested braces: {"a": {"b": 1}} - Strings containing braces: {"text": "has { and }"} - Escaped quotes in strings: {"text": "say \"hello\""}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
The full text |
required |
|
int
|
Position of opening brace |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
The balanced brace expression if found, None otherwise |
Source code in src/karenina/utils/json_extraction.py
extract_json_from_response
¶
extract_json_from_response(text: str) -> str
Extract JSON from a response that may be wrapped in markdown or mixed with text.
This function wraps extract_json_from_text with error handling for backwards compatibility. Prefer using extract_json_from_text directly for new code.
Attempts multiple extraction strategies in order of preference:
1. Direct JSON (starts with { or [)
2. Markdown code blocks (json ... or ...)
3. JSON-like content search with balanced brace parsing
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
Raw response text that may contain JSON. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Extracted JSON string, stripped of surrounding whitespace. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no valid JSON can be extracted from the response. |
Example
extract_json_from_response('{"key": "value"}') '{"key": "value"}' extract_json_from_response('
json\n{"key": "value"}\n') '{"key": "value"}' extract_json_from_response('Here is the result: {"key": "value"}') '{"key": "value"}'
Source code in src/karenina/utils/json_extraction.py
extract_json_from_text
¶
extract_json_from_text(text: str) -> str | None
Extract a JSON object from text that may contain reasoning/explanation.
Tries multiple strategies to find valid JSON: 1. Find last JSON object (LLMs often reason first, then output JSON) 2. Find first JSON object (fallback) 3. Handle nested braces properly
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str
|
Text that may contain JSON mixed with other content |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
Extracted JSON string if found and valid, None otherwise |
Example
extract_json_from_text('The answer is {"field": "value"} as shown.') '{"field": "value"}' extract_json_from_text('Processing... Output: {"a": 1, "b": {"c": 2}}') '{"a": 1, "b": {"c": 2}}'
Source code in src/karenina/utils/json_extraction.py
is_invalid_json_error
¶
is_invalid_json_error(error: Exception) -> bool
Check if an error is related to invalid JSON output.
Used by parser adapters to detect JSON-format errors and trigger appropriate retry strategies (e.g., format feedback).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Exception
|
The exception from a parsing attempt. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if this is an invalid JSON error. |
Example
try: ... json.loads("not json") ... except json.JSONDecodeError as e: ... assert is_invalid_json_error(e) is True
Source code in src/karenina/utils/json_extraction.py
strip_markdown_fences
¶
strip_markdown_fences(text: str | None) -> str | None
Remove markdown code fences from text and extract JSON from mixed content.
Handles multiple extraction strategies in order:
1. Triple backtick fences with optional language tags (json ...)
2. JSON objects embedded in reasoning/explanation text
3. Partial fences (only opening or only closing)
The function is designed to handle cases where LLMs output reasoning text before/after the actual JSON response, such as: "Let me analyze this... the answer is { "field": "value" }"
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
str | None
|
Raw text potentially containing markdown fences or mixed content (can be None or non-string) |
required |
Returns:
| Type | Description |
|---|---|
str | None
|
Extracted JSON string, text with markdown fences removed, |
str | None
|
or original value if not a string |
Example
strip_markdown_fences("
json\n{...}\n") "{...}" strip_markdown_fences("The answer is {"field": "value"}") '{"field": "value"}'