Evolved AI | The problems of AI text detection

Can algorithms currently detect Generative AI generated text? The short answer is ... no, not at the moment. A recent article from European authors saw them test 12 publicly available tools. The researchers conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AI-generated text. Furthermore, content obfuscation techniques significantly worsen the performance of tools. The following14 detection tools were tested:

● Check For AI (https://checkforai.com)

● Compilatio (https://ai-detector.compilatio.net/)

● Content at Scale (https://contentatscale.ai/ai-content-detector/)

● Crossplag (https://crossplag.com/ai-content-detector/)

● DetectGPT (https://detectgpt.ericmitchell.ai/)

● Go Winston (https://gowinston.ai)

● GPT Zero (https://gptzero.me/)

● GPT-2 Output Detector Demo (https://openai-openai-detector.hf.space/)

● OpenAI Text Classifier (https://platform.openai.com/ai-text-classifier)

● PlagiarismCheck (https://plagiarismcheck.org/)

● Turnitin (https://demo-ai-writing-10.turnitin.com/home/)

● Writeful GPT Detector (https://x.writefull.com/gpt-detector)

● Writer (https://writer.com/ai-content-detector/)

● ZeroGPT (https://www.zerogpt.com/)

The accuracy found was highlighted below for a variety of tests, from human or mixed predictions, and methodologies. Suffice to say, mostly they are below 70% accuracy, or 30% fault rate.

The types of documents really mattered as well. Looking at the following types:

(1) Human Generated Text

(2) ChatGPT Generated Text (Feb 2023 version)

(3) Machine translated, but human text

(4) AI generated with Adjustments and using prompting

(5) consistency of identification across use cases

The graph shows the human generated text is fairly easy to detect correctly, scoring well about 95% likely due to errors and problems in the text. On the other hand, this is not the case for anything else, where the simple AI generated or translated text is 70% detectable, and there is a rapid-fall off as soon as that text is touched or edited (though direct or prompt engineering) to 40%. Check out the full article below.

https://arxiv.org/abs/2306.15666

‍

The problems of AI text detection

Resources For

Featured Posts

FULL INTERVIEW WITH PROFESSOR STUART RUSSELL

Separating Between Data and AI

What to expect when you are expecting...Software

HR Use Cases

Understanding Finances: Using AI To Navigate Financial Statements

How Good Is Google's Gemini For Enterprise Use Cases?

The problems of AI text detection

Multi Agent Frameworks for Customer support

The economic value of Language Models in Knowledge Industries

Authors

Dr Michael G Kollo

Get in touch