Guardrails

Overview

Guardrails are content safety rules applied to agent conversations. They inspect incoming user messages (prompt phase) or outgoing agent responses (response phase) and block or flag content that violates defined rules.

Viewing Guardrails

Go to Guardrails in the sidebar to see all guardrails in the tenant.

Creating a Guardrail

Click Add Guardrail.
Fill in:
- Name — A descriptive label.
- Description — What the guardrail protects against.
- Rules — The logic defining what to match and block.
Save. The guardrail can then be assigned to agents.

Assigning Guardrails to Agents

Guardrails are assigned per agent in the Guardrails tab of the agent configuration. An agent can have multiple guardrails.

Blacklist Phrase Groups

In addition to guardrails, you can manage Blacklist Phrase Groups — reusable collections of phrases or patterns that can be referenced across multiple guardrails.

Creating a Blacklist Phrase Group

In the Guardrails section, navigate to Blacklist Phrase Groups.
Click New Group.
Enter the group name and add phrases, regex patterns, or block words.
Save the group and reference it from guardrail rules.

Rule Types

Guardrail rules can match on:

Exact phrases — Literal string matches.
Regex patterns — Regular expression matching for flexible pattern detection.
Block word groups — Reusable blacklist phrase groups.

Phases

Phase	When it runs
Prompt	Checks the user’s incoming message before the agent processes it.
Response	Checks the agent’s outgoing response before it is sent to the user.