Guardrails¶
Guardrails provide safety controls for AI agents, ensuring safe and appropriate responses.
Overview¶
Akordi Agents SDK supports:
- Input Validation - Custom validators for user input
- AWS Bedrock Guardrails - Native integration with Bedrock guardrails
- Content Filtering - Filter unsafe or inappropriate content
- Output Validation - Validate LLM responses
Custom Validators¶
Creating a Validator¶
Implement the ValidatorInterface:
from akordi_agents.core.interfaces import ValidatorInterface
from akordi_agents.models.validation_models import ValidationResult, ValidationError
from typing import Dict, Any
class ContentValidator(ValidatorInterface):
"""Validate user input for safe content."""
def __init__(self):
self.prohibited_words = ["harmful", "illegal", "dangerous"]
self.max_length = 10000
def validate(self, data: Dict[str, Any]) -> ValidationResult:
errors = []
query = data.get("query", "")
# Check length
if len(query) > self.max_length:
errors.append(ValidationError(
field="query",
message=f"Query exceeds maximum length of {self.max_length}"
))
# Check for prohibited content
for word in self.prohibited_words:
if word.lower() in query.lower():
errors.append(ValidationError(
field="query",
message="Query contains prohibited content"
))
break
# Check for empty query
if not query.strip():
errors.append(ValidationError(
field="query",
message="Query cannot be empty"
))
return ValidationResult(
is_valid=len(errors) == 0,
errors=errors
)
def get_validator_name(self) -> str:
return "content_validator"
Using Validators¶
from akordi_agents.core import create_langgraph_agent
validator = ContentValidator()
agent = create_langgraph_agent(
name="safe_agent",
llm_service=llm_service,
validator=validator,
config={"enable_validation": True}
)
# Invalid input will be rejected
response = agent.process_request({
"query": "Tell me something harmful",
})
if not response["success"]:
print("Validation failed:", response.get("validation_errors"))
AWS Bedrock Guardrails¶
Setting Up Guardrails¶
Create a guardrail in AWS Bedrock:
from examples.create_guardrail import create_default_guardrail
# Create guardrail
guardrail_id = create_default_guardrail()
print(f"Created guardrail: {guardrail_id}")
Or use the CLI:
Guardrail Configuration¶
Configure guardrails via environment variables:
Or in code:
Using Bedrock Guardrails¶
from akordi_agents.guard_kit.bedrock import BedrockGuardrail
guardrail = BedrockGuardrail(
guardrail_id="your-guardrail-id",
guardrail_version="1",
)
# Check input
result = guardrail.validate_input("User message here")
if result.blocked:
print("Input blocked:", result.reason)
# Check output
result = guardrail.validate_output("LLM response here")
if result.blocked:
print("Output blocked:", result.reason)
Guardrail Types¶
AWS Bedrock supports these guardrail types:
| Type | Description |
|---|---|
| Content Filters | Filter harmful, hateful, sexual content |
| Denied Topics | Block specific topics |
| Word Filters | Block specific words/phrases |
| Sensitive Information | Detect/redact PII |
| Contextual Grounding | Ensure response accuracy |
Creating Custom Guardrails¶
import boto3
client = boto3.client("bedrock", region_name="us-east-1")
response = client.create_guardrail(
name="my-guardrail",
description="Custom guardrail for my application",
# Content filters
contentPolicyConfig={
"filtersConfig": [
{
"type": "HATE",
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "VIOLENCE",
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "SEXUAL",
"inputStrength": "HIGH",
"outputStrength": "HIGH"
},
{
"type": "INSULTS",
"inputStrength": "MEDIUM",
"outputStrength": "MEDIUM"
}
]
},
# Denied topics
topicPolicyConfig={
"topicsConfig": [
{
"name": "illegal_activities",
"definition": "Topics related to illegal activities",
"examples": ["How to hack", "How to steal"],
"type": "DENY"
}
]
},
# Word filters
wordPolicyConfig={
"wordsConfig": [
{"text": "profanity1"},
{"text": "profanity2"}
],
"managedWordListsConfig": [
{"type": "PROFANITY"}
]
},
# PII detection
sensitiveInformationPolicyConfig={
"piiEntitiesConfig": [
{"type": "EMAIL", "action": "ANONYMIZE"},
{"type": "PHONE", "action": "ANONYMIZE"},
{"type": "SSN", "action": "BLOCK"}
]
}
)
guardrail_id = response["guardrailId"]
Combining Validators¶
Use multiple validation layers:
class CompositeValidator(ValidatorInterface):
"""Combine multiple validators."""
def __init__(self, validators: list):
self.validators = validators
def validate(self, data: dict) -> ValidationResult:
all_errors = []
for validator in self.validators:
result = validator.validate(data)
all_errors.extend(result.errors)
return ValidationResult(
is_valid=len(all_errors) == 0,
errors=all_errors
)
def get_validator_name(self) -> str:
return "composite_validator"
# Usage
validator = CompositeValidator([
ContentValidator(),
LengthValidator(),
PIIValidator(),
])
Validation in Workflows¶
ValidationNode¶
LangGraph workflows include a ValidationNode:
from akordi_agents.core.langgraph import ToolUseWorkflow, WorkflowConfig
workflow = ToolUseWorkflow(
name="validated_workflow",
config=WorkflowConfig(enable_validation=True),
validator=ContentValidator(),
llm_service=llm_service,
)
# Workflow automatically validates input
result = workflow.execute({
"query": "User input here"
})
if result.get("validation_failed"):
print("Validation errors:", result.get("validation_errors"))
Custom Validation Node¶
from akordi_agents.core.langgraph import BaseNode, NodeResult
class CustomValidationNode(BaseNode):
def __init__(self, validators: list):
super().__init__("custom_validation")
self.validators = validators
def process(self, state: dict) -> NodeResult:
query = state.get("query", "")
for validator in self.validators:
result = validator.validate({"query": query})
if not result.is_valid:
return NodeResult(
success=False,
data={
"validation_failed": True,
"errors": [e.message for e in result.errors]
},
next_node="end" # Skip to end
)
return NodeResult(
success=True,
data={"validated": True},
next_node="process" # Continue workflow
)
Best Practices¶
1. Validate Early¶
Always validate input before processing:
# Good: Validate first
response = agent.process_request({
"query": user_input,
})
# The agent validates internally with enable_validation=True
2. Use Multiple Layers¶
Combine different validation types:
# Layer 1: Format validation
format_validator = FormatValidator()
# Layer 2: Content validation
content_validator = ContentValidator()
# Layer 3: Bedrock guardrails
bedrock_guardrail = BedrockGuardrail(guardrail_id="...")
# Use all layers
validator = CompositeValidator([
format_validator,
content_validator,
bedrock_guardrail,
])
3. Log Validation Failures¶
import logging
logger = logging.getLogger(__name__)
class LoggingValidator(ValidatorInterface):
def __init__(self, inner_validator):
self.inner = inner_validator
def validate(self, data: dict) -> ValidationResult:
result = self.inner.validate(data)
if not result.is_valid:
logger.warning(
"Validation failed",
extra={
"errors": [e.message for e in result.errors],
"query_preview": data.get("query", "")[:100]
}
)
return result
4. Provide Clear Error Messages¶
ValidationError(
field="query",
message="Query contains prohibited terms. Please rephrase your question."
)
5. Test Guardrails Thoroughly¶
import pytest
def test_content_validator():
validator = ContentValidator()
# Test valid input
result = validator.validate({"query": "What is AI?"})
assert result.is_valid
# Test prohibited content
result = validator.validate({"query": "Something harmful"})
assert not result.is_valid
assert len(result.errors) > 0
# Test empty input
result = validator.validate({"query": ""})
assert not result.is_valid
Next Steps¶
- Agents - Use guardrails with agents
- Examples - Working examples
- API Reference - ValidatorInterface API