Skip to main content

AI Detector

The endpoint computes the probability that a piece of text is AI-generated, as well as the probability that each constituent sentence and token is AI-generated.

The system is trained to be able to handle LLMs from different vendors, such as OpenAI's GPT family of models, Google's Gemini models, Anthropic's Claude models, and the open-release Llama and Mistral models. It is also somewhat robust to small changes and noisy text.

Try it out

Accuracy and Trade-offs

All AI detection systems have false positives and false negatives. In some cases, small modifications to AI-generated text can cause that text to no longer be flagged as AI-generated. In other cases, human-written (but perhaps rote) text can be misclassified as AI-generated. Depending on the application, false positives or false negatives may be less desirable. Contact us for ways to adjust for your use case.

Sample Code

curl -X POST https://api.sapling.ai/api/v1/aidetect \
-H "Content-Type: application/json" \
-d '{"key":"<api-key>", "text":"This is sample text."}'
AI Detector UI integration

Sapling's Javascript SDK provides a complete end-to-end UI integration for AI content detection capabilities. Head over to our AI Detect JavaScript Quickstart for more details.

AI Detector POST

Request Parameters

https://api.sapling.ai/api/v1/aidetect

HTTP method: POST

The AI Detector API POST endpoint takes JSON parameters documented below:

key: String
32-character API key.

text: String
Text to run detection on. The limit is currently 50,000 characters. If latency is high or requests time out, we recommend adapting this script. Please contact us if you need to run the system on longer inputs. We can also provide suggestions on how to chunk your text into smaller pieces and then combine detection results.

tokens: List of Strings
List of tokens from backend tokenizer that can be used to token_probs to visualize the output prediction per token.

token_probs: List of Floats
List of probabilities that each token is AI-generated. This can be used with tokens to visualize the output prediction per token.

sent_scores: Boolean
Whether to return sentence scores. Defaults to true. If speed is of the essence, you can disable this setting.

score_string: Boolean
Whether to return string highlighting token-level scores. Defaults to false. This allows you to visualize which portions of the text are likely AI-generated similar to on Sapling's AI detector page.

version: String
There are currently 3 versions of the detector available

  1. 20230317
  2. 20231024
  3. 20240606 (current default)

While we have found the later versions to be more performant, you may wish to use the older version to ensure consistency in your application.

Response Parameters

A score from 0 to 1 be returned, with 0 indicating the maximum confidence that the text is human-written, and 1 indicating the maximum confidence that the text is AI-generated.

If score_string is set to true, a score_string field will be provided. The field contains an HTML string with a heatmap of the portions of the text that are predicted to be AI-generated. If the default score string is not what you desire, you can generate your own using tokens and token_probs.

A field sentence_scores containing scores for each sentence will also be returned. The per-sentence scores may not correlate with the overall score field as they're computed using a different method from the overall score.

The AI Detector POST endpoint returns JSON of the following format:

{
"score": 0.98,
"sentence_scores": [
{
"sentence": "Here is a sentence.",
"score": 0.999
}
]
}

Tips

Checking Files (PDF/DOCX)

Sometimes you may wish to send the API PDFs or DOCX files.

To do this, refer to the Files documentation to see how you can extract text from files before passing the text to the API. These endpoints are currently provided free-of-charge; however, if you plan to use them for high-volumes of text, contact us and ensure you're using one of the other endpoints or we may limit usage to reduce server load.