Skip to content

Guardrails

Guardrails provide safety controls for AI agents, ensuring safe and appropriate responses.

Overview

Akordi Agents SDK supports:

  • Input Validation - Custom validators for user input
  • AWS Bedrock Guardrails - Native integration with Bedrock guardrails
  • Content Filtering - Filter unsafe or inappropriate content
  • Output Validation - Validate LLM responses

Custom Validators

Creating a Validator

Implement the ValidatorInterface:

from akordi_agents.core.interfaces import ValidatorInterface
from akordi_agents.models.validation_models import ValidationResult, ValidationError
from typing import Dict, Any

class ContentValidator(ValidatorInterface):
    """Validate user input for safe content."""

    def __init__(self):
        self.prohibited_words = ["harmful", "illegal", "dangerous"]
        self.max_length = 10000

    def validate(self, data: Dict[str, Any]) -> ValidationResult:
        errors = []
        query = data.get("query", "")

        # Check length
        if len(query) > self.max_length:
            errors.append(ValidationError(
                field="query",
                message=f"Query exceeds maximum length of {self.max_length}"
            ))

        # Check for prohibited content
        for word in self.prohibited_words:
            if word.lower() in query.lower():
                errors.append(ValidationError(
                    field="query",
                    message="Query contains prohibited content"
                ))
                break

        # Check for empty query
        if not query.strip():
            errors.append(ValidationError(
                field="query",
                message="Query cannot be empty"
            ))

        return ValidationResult(
            is_valid=len(errors) == 0,
            errors=errors
        )

    def get_validator_name(self) -> str:
        return "content_validator"

Using Validators

from akordi_agents.core import create_langgraph_agent

validator = ContentValidator()

agent = create_langgraph_agent(
    name="safe_agent",
    llm_service=llm_service,
    validator=validator,
    config={"enable_validation": True}
)

# Invalid input will be rejected
response = agent.process_request({
    "query": "Tell me something harmful",
})

if not response["success"]:
    print("Validation failed:", response.get("validation_errors"))

AWS Bedrock Guardrails

Setting Up Guardrails

Create a guardrail in AWS Bedrock:

from examples.create_guardrail import create_default_guardrail

# Create guardrail
guardrail_id = create_default_guardrail()
print(f"Created guardrail: {guardrail_id}")

Or use the CLI:

poetry run python examples/create_guardrail.py --create-default

Guardrail Configuration

Configure guardrails via environment variables:

export GUARDRAIL_ID=your-guardrail-id
export GUARDRAIL_VERSION=1

Or in code:

import os

os.environ["GUARDRAIL_ID"] = "your-guardrail-id"
os.environ["GUARDRAIL_VERSION"] = "1"

Using Bedrock Guardrails

from akordi_agents.guard_kit.bedrock import BedrockGuardrail

guardrail = BedrockGuardrail(
    guardrail_id="your-guardrail-id",
    guardrail_version="1",
)

# Check input
result = guardrail.validate_input("User message here")
if result.blocked:
    print("Input blocked:", result.reason)

# Check output
result = guardrail.validate_output("LLM response here")
if result.blocked:
    print("Output blocked:", result.reason)

Guardrail Types

AWS Bedrock supports these guardrail types:

Type Description
Content Filters Filter harmful, hateful, sexual content
Denied Topics Block specific topics
Word Filters Block specific words/phrases
Sensitive Information Detect/redact PII
Contextual Grounding Ensure response accuracy

Creating Custom Guardrails

import boto3

client = boto3.client("bedrock", region_name="us-east-1")

response = client.create_guardrail(
    name="my-guardrail",
    description="Custom guardrail for my application",

    # Content filters
    contentPolicyConfig={
        "filtersConfig": [
            {
                "type": "HATE",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "VIOLENCE",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "SEXUAL",
                "inputStrength": "HIGH",
                "outputStrength": "HIGH"
            },
            {
                "type": "INSULTS",
                "inputStrength": "MEDIUM",
                "outputStrength": "MEDIUM"
            }
        ]
    },

    # Denied topics
    topicPolicyConfig={
        "topicsConfig": [
            {
                "name": "illegal_activities",
                "definition": "Topics related to illegal activities",
                "examples": ["How to hack", "How to steal"],
                "type": "DENY"
            }
        ]
    },

    # Word filters
    wordPolicyConfig={
        "wordsConfig": [
            {"text": "profanity1"},
            {"text": "profanity2"}
        ],
        "managedWordListsConfig": [
            {"type": "PROFANITY"}
        ]
    },

    # PII detection
    sensitiveInformationPolicyConfig={
        "piiEntitiesConfig": [
            {"type": "EMAIL", "action": "ANONYMIZE"},
            {"type": "PHONE", "action": "ANONYMIZE"},
            {"type": "SSN", "action": "BLOCK"}
        ]
    }
)

guardrail_id = response["guardrailId"]

Combining Validators

Use multiple validation layers:

class CompositeValidator(ValidatorInterface):
    """Combine multiple validators."""

    def __init__(self, validators: list):
        self.validators = validators

    def validate(self, data: dict) -> ValidationResult:
        all_errors = []

        for validator in self.validators:
            result = validator.validate(data)
            all_errors.extend(result.errors)

        return ValidationResult(
            is_valid=len(all_errors) == 0,
            errors=all_errors
        )

    def get_validator_name(self) -> str:
        return "composite_validator"

# Usage
validator = CompositeValidator([
    ContentValidator(),
    LengthValidator(),
    PIIValidator(),
])

Validation in Workflows

ValidationNode

LangGraph workflows include a ValidationNode:

from akordi_agents.core.langgraph import ToolUseWorkflow, WorkflowConfig

workflow = ToolUseWorkflow(
    name="validated_workflow",
    config=WorkflowConfig(enable_validation=True),
    validator=ContentValidator(),
    llm_service=llm_service,
)

# Workflow automatically validates input
result = workflow.execute({
    "query": "User input here"
})

if result.get("validation_failed"):
    print("Validation errors:", result.get("validation_errors"))

Custom Validation Node

from akordi_agents.core.langgraph import BaseNode, NodeResult

class CustomValidationNode(BaseNode):
    def __init__(self, validators: list):
        super().__init__("custom_validation")
        self.validators = validators

    def process(self, state: dict) -> NodeResult:
        query = state.get("query", "")

        for validator in self.validators:
            result = validator.validate({"query": query})
            if not result.is_valid:
                return NodeResult(
                    success=False,
                    data={
                        "validation_failed": True,
                        "errors": [e.message for e in result.errors]
                    },
                    next_node="end"  # Skip to end
                )

        return NodeResult(
            success=True,
            data={"validated": True},
            next_node="process"  # Continue workflow
        )

Best Practices

1. Validate Early

Always validate input before processing:

# Good: Validate first
response = agent.process_request({
    "query": user_input,
})

# The agent validates internally with enable_validation=True

2. Use Multiple Layers

Combine different validation types:

# Layer 1: Format validation
format_validator = FormatValidator()

# Layer 2: Content validation
content_validator = ContentValidator()

# Layer 3: Bedrock guardrails
bedrock_guardrail = BedrockGuardrail(guardrail_id="...")

# Use all layers
validator = CompositeValidator([
    format_validator,
    content_validator,
    bedrock_guardrail,
])

3. Log Validation Failures

import logging

logger = logging.getLogger(__name__)

class LoggingValidator(ValidatorInterface):
    def __init__(self, inner_validator):
        self.inner = inner_validator

    def validate(self, data: dict) -> ValidationResult:
        result = self.inner.validate(data)

        if not result.is_valid:
            logger.warning(
                "Validation failed",
                extra={
                    "errors": [e.message for e in result.errors],
                    "query_preview": data.get("query", "")[:100]
                }
            )

        return result

4. Provide Clear Error Messages

ValidationError(
    field="query",
    message="Query contains prohibited terms. Please rephrase your question."
)

5. Test Guardrails Thoroughly

import pytest

def test_content_validator():
    validator = ContentValidator()

    # Test valid input
    result = validator.validate({"query": "What is AI?"})
    assert result.is_valid

    # Test prohibited content
    result = validator.validate({"query": "Something harmful"})
    assert not result.is_valid
    assert len(result.errors) > 0

    # Test empty input
    result = validator.validate({"query": ""})
    assert not result.is_valid

Next Steps