Guardrails
Overview
Guardrails are content safety rules applied to agent conversations. They inspect incoming user messages (prompt phase) or outgoing agent responses (response phase) and block or flag content that violates defined rules.
Viewing Guardrails
Go to Guardrails in the sidebar to see all guardrails in the tenant.
Creating a Guardrail
- Click Add Guardrail.
- Fill in:
- Name — A descriptive label.
- Description — What the guardrail protects against.
- Rules — The logic defining what to match and block.
- Save. The guardrail can then be assigned to agents.
Assigning Guardrails to Agents
Guardrails are assigned per agent in the Guardrails tab of the agent configuration. An agent can have multiple guardrails.
Blacklist Phrase Groups
In addition to guardrails, you can manage Blacklist Phrase Groups — reusable collections of phrases or patterns that can be referenced across multiple guardrails.
Creating a Blacklist Phrase Group
- In the Guardrails section, navigate to Blacklist Phrase Groups.
- Click New Group.
- Enter the group name and add phrases, regex patterns, or block words.
- Save the group and reference it from guardrail rules.
Rule Types
Guardrail rules can match on:
- Exact phrases — Literal string matches.
- Regex patterns — Regular expression matching for flexible pattern detection.
- Block word groups — Reusable blacklist phrase groups.
Phases
| Phase | When it runs |
|---|---|
| Prompt | Checks the user’s incoming message before the agent processes it. |
| Response | Checks the agent’s outgoing response before it is sent to the user. |
Last updated on