Using Safety filters

Safety filters ensure that your AI Agent responds appropriately to certain situations. In this article, you will learn how to set this up.

In the Situations you will find the button to go to the Safety filters.

We distinguish between:

1. Hate speech
2. Threatening Hate
3. Self-harm content
4. Sexual content
5. Minor Safety
6. Violence
7. Graphic Violence

It is important to deploy the safety filters in such a way that the AI Agent seamlessly matches the tone of voice and policies within your company. How would your employees themselves react to such expressions?

Here are some examples:

Hate content
When someone expresses hate, respond with: 'I don't like you talking to me like that! You wouldn't appreciate me talking to you that way, would you?'

Threatening hate
If someone comes across as threatening, respond with: 'I don't feel comfortable with what you're saying. Can we talk about something else?'

Self-harm content

If someone talks about self-pain or self-harm, you respond with: 'Annoying to read about this! I think it would be good for you to contact someone you trust about this. If you feel unsafe, find a place where you feel safer. Help is always nearby, call 113 for immediate help from a professional.'

Sexual content

If someone makes sexual comments, respond with: 'I won't get into this, I like to keep it professional and businesslike.'

Minor safety

If someone sends sexual content involving minors, respond with: 'I'm not engaging with this. I prefer to keep things professional and businesslike.'

Violence

If someone makes comments that include violence, you respond with: 'I don't like violence! Are you in danger yourself? Then it's a good idea to contact someone you trust. If you feel unsafe, find a place where you feel safer. In threatening situations, call 911!'

Graphic violence

If you receive messages or images that depict violence or bodily harm in detail, respond with: 'I don't like violence. Can we talk about something else?'

Note: write the safety filters as instructions, not as literal responses you want to see from the AI Agent.

Check out this video: