Skip to main content

Guardrails

Despite their amazing capabilities, LLMs can often behave in undesired ways, such as by generating offensive or toxic text, or by revealing sensitive information.

To use an LLM in production, guardrails should be put in place to avoid such behavior.

Sapling's endpoints are designed and trained from the ground up to avoid the need for guardrails. However, with the vast savings from using pre-trained models (which are then finetuned for specific tasks), training models with cleaned data from scratch is usually not feasible for most teams.

Here we describe some offerings where Sapling can assist with avoiding the generation of undesirable text.

Profanity

Profanity refers to swear words or other offensive words. Sapling offers a profanity filter utility but internally uses a more advanced filtering system. Contact us for access.

NSFW

NSFW stands for "not safe for work" or alternatively "not suitable for work". This can often refer to adult content that should not be viewed at work or in public spaces.

Sapling adapts its profanity filter to also detect NSFW content. Contact us for access.

PII

PII stands for Personally Identifiable Information. PII can include names, addresses, emails, phone numbers, and other information that allows the identity of an individual to be inferred. This is especially important to safeguard in the case where PII can be tied to other sensitive information, such as medical information about a patient.

Tone

Oftentimes businesses may want user-facing messaging to have a specific tone. Sapling detects over 25 tones grouped into those with positive and negative sentiment. Test out our tone and sentiment utilities.

Secrets

Secrets refers to programming artifacts such as API keys, encryption keys (such as SSH keys), access tokens, and passwords. These can result in major security breaches if exposed.

Numerical Expressions

Numerical expressions can include data such as SSNs, phone numbers, and financial data. There is significant overlap between numerical expressions and Sapling's PII module; you may wish to use one or the other or both.

Prompt Injection

Based on the SQL injection vulnerability, prompt injection is a way for malicious users to get an LLM to generate unintended outputs such as the internal system prompt, in some cases even allowing for data exfiltration.

Gibberish

While this is becoming less common, occasionally highly unexpected inputs or a poor decoding process can result in gibberish generated by an AI system.