No current AI content detector (including Sapling's) should be used as a standalone check to determine whether text is AI-generated or written by a human. False positives and false negatives will regularly occur.
The top section will show the overall score and highlight portions of the text that appear to be AI-generated.
The bottom section will highlight sentences that appear to be AI-generated.
The detector for the entire text and the per-sentence detector use different techniques, so use them together (along with your best judgement) to make an assessment.
Developed by former researchers at:
Looking for other ways to score content? Contact us.
Recently, models such as GPT-3, GPT-3.5, ChatGPT, and GPT-4 have led to the rise of machine-generated content. This synthetic content is increasingly indistinguishable from human-written content.
Despite rapid progress, these models continue to have shortcomings such as hallucinated facts as well as consequences such as enabling cheating in language courses.
This AI text detector tool provides one way of screening whether a piece of content is written by a human or machine.
The detector uses a machine learning system similar to that used to generate AI content. Instead of generating words, the detector instead generates the probability it thinks each word or token in the input text is AI-generated or not. The result is visualized above for both the entire text as well as for each sentence.
Accuracy must be measured on a specific test or benchmark. There are also multiple measurements of "accuracy" for detection tools. These measurements balance catching as many AI-generated texts as possible while keeping false positives low. On our internal benchmarks, Sapling catches more than 97% of AI-generated texts while keeping false positives below 3%. Please note that these benchmarks tend to use longer texts and may not be representative of your text.
Sapling can have false positives. The shorter the text is, the more general it is, and the more essay-like it is, the more likely it is to result in a false positive. We are working on improving the system so that this occurs less frequently.
While language models are becoming more advanced, they usually use a similar machine learning architecture and a similar dataset on which they're trained. Hence, even detectors trained on earlier versions of langauge model outputs should perform significantly better than random on successive models.
That said, to get the best performance, detectors should be trained on outputs of the latest systems. Sapling regularly updates its detector after re-training it to keep it up-to-date with new systems.