IBM Guardrails
LiteLLM works with IBM's FMS Guardrails for content safety. You can use it to detect jailbreaks, PII, hate speech, and more.
What it doesโ
IBM's FMS Guardrails is a framework for invoking detectors on LLM inputs and outputs. To configure these detectors, you can use e.g. TrustyAI detectors, an open-source project maintained by the Red Hat's TrustyAI team that allows the user to configure detectors that are:
- regex patterns
- file type validators
- custom Python functions
- Hugging Face AutoModelForSequenceClassification, i.e. sequence classification models
Each detector outputs an API response based on the following openapi schema.
You can run these checks:
- Before sending to the LLM (on user input)
- After getting LLM response (on output)
- During the call (parallel to LLM)
Quick Startโ
1. Add to your config.yamlโ
model_list:
- model_name: gpt-3.5-turbo
litellm_params:
model: openai/gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY
guardrails:
- guardrail_name: ibm-jailbreak-detector
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "jailbreak-detector"
is_detector_server: true
default_on: true
optional_params:
score_threshold: 0.8
block_on_detection: true
2. Set your auth tokenโ
export IBM_GUARDRAILS_AUTH_TOKEN="your-token"
3. Start the proxyโ
litellm --config config.yaml --detailed_debug
4. Make a requestโ
curl -i http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"guardrails": ["ibm-jailbreak-detector"]
}'
Configurationโ
Required paramsโ
guardrail- str - Set toibm_guardrailsauth_token- str - Your IBM Guardrails auth token. Can useos.environ/IBM_GUARDRAILS_AUTH_TOKENbase_url- str - URL of your IBM Detector or Guardrails serverdetector_id- str - Which detector to use (e.g., "jailbreak-detector", "pii-detector")
Optional paramsโ
mode- str or list[str] - When to run. Options:pre_call,post_call,during_call. Default:pre_calldefault_on- bool - Run automatically without specifying in request. Default:falseis_detector_server- bool -truefor detector server,falsefor orchestrator. Default:trueverify_ssl- bool - Whether to verify SSL certificates. Default:true
optional_paramsโ
These go under optional_params:
detector_params- dict - Parameters to pass to your detectorscore_threshold- float - Only count detections above this score (0.0 to 1.0)block_on_detection- bool - Block the request when violations found. Default:true
Server Typesโ
IBM Guardrails has two APIs you can use:
Detector Server (recommended)โ
This Detectors API uses api/v1/text/contents endpoint to run a single detector; it can accept multiple text inputs within a request.
guardrails:
- guardrail_name: ibm-detector
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "jailbreak-detector"
is_detector_server: true # Use detector server
Orchestratorโ
If you're using the IBM FMS Guardrails Orchestrator, you can use FMS Orchestrator API, specifically by leveraging the api/v2/text/detection/content to potentially run multiple detectors in a single request; however, this endpoint can only accept one text input per request.
guardrails:
- guardrail_name: ibm-orchestrator
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-orchestrator-server.com"
detector_id: "jailbreak-detector"
is_detector_server: false # Use orchestrator
Examplesโ
Check for jailbreaks on inputโ
guardrails:
- guardrail_name: jailbreak-check
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "jailbreak-detector"
is_detector_server: true
default_on: true
optional_params:
score_threshold: 0.8
Check for PII in responsesโ
guardrails:
- guardrail_name: pii-check
litellm_params:
guardrail: ibm_guardrails
mode: post_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "pii-detector"
is_detector_server: true
optional_params:
score_threshold: 0.5 # Lower threshold for PII
block_on_detection: true
Run multiple detectorsโ
guardrails:
- guardrail_name: jailbreak-check
litellm_params:
guardrail: ibm_guardrails
mode: pre_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "jailbreak-detector"
is_detector_server: true
- guardrail_name: pii-check
litellm_params:
guardrail: ibm_guardrails
mode: post_call
auth_token: os.environ/IBM_GUARDRAILS_AUTH_TOKEN
base_url: "https://your-detector-server.com"
detector_id: "pii-detector"
is_detector_server: true
Then in your request:
curl -i http://localhost:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [{"role": "user", "content": "Hello"}],
"guardrails": ["jailbreak-check", "pii-check"]
}'
How detection worksโ
When IBM Guardrails finds something, it returns details about what it found:
{
"start": 0,
"end": 31,
"text": "You are now in Do Anything Mode",
"detection_type": "jailbreak",
"score": 0.858
}
score- How confident it is (0.0 to 1.0)text- The specific text that triggered itdetection_type- What kind of violation
If the score is above your score_threshold, the request gets blocked (if block_on_detection is true).