The challenge of AI checkers

Roxy Paine's "Breach" juxtaposes the natural with the contrived.
Roxy Paine's "Breach" juxtaposes the natural with the contrived.

Several faculty who struggled with unapproved AI-assisted work by students in the spring term asked the Center for Transformative Teaching about AI checkers. Nate Pindell, senior instructional designer, explains challenges of AI checkers, and shares additional ideas and resources.

The shadow of doubt that generative artificial intelligence has cast on student written papers and assessments has grown larger and darker. A November survey last year revealed that 56% of college students say they have used AI on assignments or exams. However, whether AI was used in ways perceived by the instructor as cheating was not specified. Nonetheless, many instructors have expressed concern about students using AI in unauthorized ways to complete their coursework, and several have asked about using AI checkers to check for plagiarism.

What are AI checkers?

AI checkers are software packages that examine a composition and return probabilities in the form of percentages that the software believes pieces or the whole work was crafted by AI. According to the companies that make them, these AI have less than .001% false positive rate. So, why would instructors choose not to use them? For that answer, and answers to other questions, we need to explore how AI and AI checkers work.

How AI are trained and work.

The first thing to know about AI is that they do not understand anything. AI cannot read context clues or decipher meaning from prompts. The best way to approach AI is that the systems are overeager and hyper literal interns. Even if a response from an AI sounds like a person, that is only because of the training. You are using, for the purpose of comparison, software that acts like a robotic parrot, learning phrases and answering requests from the echoes of billions of other inputs. Also important to keep in mind, is each of these “robotic parrots” is unique to the company that created it. A collection of proprietary algorithms are used to train each AI. Gemini, Google’s AI, will learn and behave differently than OpenAI’s ChatGPT. There are hundreds more of different types of generative AI that can be used. All differ in certain ways.

AI turns words and phrases into a “language” that the machine can understand; numbers. After the conversion of all the words (most often of a multitude of languages), it then begins to learn connection probabilities with other numbers. For example, let us say that the word “honey” has the number 97 (the numerical value is probably much different, but for our example that is all we need). The AI looks at other numbers that have a large connection probability. Let’s say 12, 24, and 193 which could be bee, comb, and nut respectively. The AI, using the prompt as a “seed” of sorts, then chooses which word is probably best. If the prompt was asking what type of cereal the user should have, then 193 might be selected for “Honey Nut Cheerios.” The AI then works from one word to the next. Each time looking at the connections and the previously written response and prompt, forging a new connection, and then repeating. This is why AI take so much computational power and actual electrical power. If you repeatedly ask an AI the same prompt you will get different, albeit often slightly, results. If an AI is using the highest probability, then should not they all be the same? Yes, they should, except variance is baked into each AI.

Chris Anderson, the head of TED, explains it this way: Imagine that you are placed on a planet made of nothing but mountains and valleys. You are told that you must find the lowest valley. You are only able to walk. You proceed to walk downhill until you arrive at the bottommost part of a valley. But is that the lowest point on the planet? Or just the lowest point in your immediate area? Well to find that out you would need to walk up to see other valleys! Or at least get to them. That is why AI “jiggle.” An AI must be told that sometimes it must move counter to its goal (up mountains) to find its best answer (deepest valley).

The AI “jiggles” the choices it makes. Maybe it chooses a word that is almost as strong as the top connection. This causes the work to divert from previous attempts. Every word has a chance to “jiggle” to a new connection. That connection can then also “jiggle” or not. Hallucinations (when the AI fabricates information) can occur for this reason as well as a few others.

There are three main points to take away from this section.

  1. There exist hundreds to thousands of generative AI that are trained differently.
  2. Generative AI uses learned probability connections with words to write.
  3. There is slight variance encoded to create unique but still applicable responses.


How AI checkers work.

Probability is the name of the game for both generative AI and AI checkers. Where generative AI use probability to write, AI used as AI checkers use probability to grade. When an AI checker is fed a composition, the software looks at the probability of one word to the next. The AI checker then returns yet another probability that a phrase, sentence, paragraph, or entire composition was written by an AI.

A flagged paragraph with a rating of 90% or higher will be because the AI checker going from word to word found that the probability connection (what I will call a confidence score) from one to the next was so high that it is unlikely that a human is the author. If a sentence has a moderate word to word connection probability the AI checker may return a confidence score of 50% or less. The AI checker observes and weights the probability of word-to-word connections one after the other to assess the likelihood of human versus AI written work.

Why not use AI checkers?

From the above you might be thinking, “That seems pretty cut and dry! Why are we not relying on these tools to assist us with checking for academic dishonesty?”

Well, it’s more complicated than that.

High confidence scores are good for business.

TurnItIn was one of the very first companies that marketed an AI checker. When the tool was first released they confidently stated that the AI checker had a less than one percent false positive rate. In other words, less than 1 out of every 100 submissions would return a result that a student cheated when they did not. However, TurnItIn left out some important information in their marketing.

  1. The first was that their tool had only be trained on ChatGPT3.5. The other dozens of well-known and widely used AI were not used as training parameters.
  2. The next omission was that the tool was only used on large compositions that were completely AI generated. In other words, only used on documents that had hundreds to thousands of words that also contained high word-to-word probability scores that led to such high confidence scores (99%+) that the composition in question was written by AI. This claim is especially problematic since research on high-school students’ use of AI found that only 1.45% of students at private high-schools, and only 19.68% of students at public high-schools said they’d used AI to “write all of a paper, project, or assignment.”


When TurnItIn’s checker was returning a higher false positive rate than advertised, and often instructors finding themselves navigating low confidence scores on sentences and phrases, the truth came out about TurnItIn’s marketing tactics, causing the CPO Annie Chechitelli to change the reported false positive rate. But, TurnItIn is not alone in its claims. Many other AI checkers that report lower than 1% false positive reports often do not disclose factors that help generate high confidence scores.

What confidence score works for you?

Confidence scores can range from 100% all the way down to 10%, depending on the weight that the instructor wants confidence reports given. What confidence score would be high enough for you to be sure that a student cheated and prompt you to act?

Let’s say your best student gets a confidence score of AI written material of 75%. You’re surprised, but know their work has always been above average. Do you let it pass? However, another student, who writes less consistently well, gets a confidence rating of 30% on a passage. Does this impact your inclination to trust the AI checker? Inconsistent application and false positives can be a big problem for instructors.

Let’s look at how false positives might be generated.

Who is more likely to receive a high confidence score?

Those who use tools to improve their writing.

As I stared at this document in its genesis draft, I saw above and below lines of all different shapes and colors. Squiggles, double lines, reds, greens, and blues. All of these were alerting me that Microsoft Word and Grammarly have some suggestions on how I can craft my composition in more coherent ways. Starting with how “TurnItIn,” much to my annoyance, is not in the lexicon of Word’s library.

Tools such as Grammarly use machine learning to assist in spell, grammar, and increasingly, composition checks. Maybe you have right clicked on a sentence marked by one of these tools and thought, “This recommended sentence does sound much better than my own.” Congratulations! You just set yourself up for that section to be flagged in moderate to high confidence of using AI!

That is the problem many students face. Using tools and resources that exist internally in software programs like Microsoft Word or online such as Grammarly can contribute to false positives.

People learning English

¿Dònde està la biblioteca? If you have ever had a Spanish class this question probably resonates in a deep parts of your being. A simple phrase that is taught early in the language’s course. When learning a new language simple and predictable is best. That same strategy is what AI checkers look for. These simple sentences and phrases are taught to assist a person learning the English language. These foundational lessons are what individuals use when speaking and writing the language. Common and simple words, phrases, and sentences return high confidence scores of AI cheating. The AI checkers do not target phrases. They often target entire compositions and give high confidence ratings that the entire work was AI crafted.

Neurodivergent Authors

Students that are neurodivergent (autism, ADHD, dyslexia, etc.) are also prone to receive false positive ratings. There is not a “one reason fits all diagnosis” but if often related to the reliance on repeated phrases, terms, and words. This is a sort of “compositional masking” where neurodivergent individuals learn pattern recognition rather than prose. Even the voice and warmth of a message can be cause for concern, both for AI checkers as well as human readers, such as when Purdue professor Rua Mae Williams when they were accused of being an AI bot. The lack of using pronouns (I, me, we) can, depending on the AI checker and what language models it was trained on, be misconstrued as AI writing.

Word choice is incredibly important. Wordsmithing is as unique as individuals who write. Inside those compositions are fragments learned from here and there — previous successes or borrowed from books and articles. These pieces, especially for those of us who are neurodivergent, can be susceptible to high confidence in AI checkers.

What do we do about cheating with AI?

While high-school students admitted to cheating at about the same rates as they did before AI, both students and instructors are concerned about AI cheating. And even the markets assumed AI would be used for cheating as evidenced by Chegg’s stock dropping 50% as people began to make use of AI. Finally, there are already cases that instructors have brought to my and my colleagues' attention.

So, what to do? In general, consider what you would have done if you suspected academic dishonesty before there was AI or TurnItIn on Canvas. Afterall, students have long paid others to do their work, made use of stolen test banks, or collaborated in unapproved ways. While AI has made offloading work more accessible, the processes and policies for addressing cheating are already in place.

In general, the approach the CTT currently recommends is to begin with a policy that fits your course and your assignments and assessments. In some cases, there may be a general policy in addition to clear assignment-specific policies so that students understand what you consider cheating. For example, is it acceptable to use AI for generating ideas? This was the second most used way students used AI in the study of high-school students’ use. What students reported doing the most was to use AI to explain concepts.

There are also ways to modify assignments and assessments such that AI is a less able collaborator or in such a way that AI is used and referenced appropriately for that class and assignment. For example, perhaps you allow AI usage, but expect that the transcript of prompts used by the student and the output produced by the AI must also be submitted with the assignment.

In their recent book, Teaching with AI, Bowen and Watson, have several recommendations for addressing cheating, creating policies, and adapting assignments and assessments. The text is available through University Libraries as an audio book.

Finally, use AI yourself for different types of tasks and testing your own assignments. If you do not want to create an account, contact an instructional designer assigned to your college. They can demonstrate how different AI’s work and how they might be used on assignments or assessments in your class and help you to craft a plan for addressing AI use this fall. You can also join the “Teaching with AI” learning community offered this fall.

This article and others like it are published and archived as part of the CTT's "AI Exchange," an online blog of ideas around the use of AI in teaching and learning from those in the University of Nebraska community.

Bowen, J. A. (2024). Teaching with AI ([First edition].) [Audio recording]. Ascent Audio.

Brockman, J. (Ed.) (with Anderson, C.). (2019). Possible Minds: 25 Ways of Looking at AI. Penguin Press.

Coffey, L. (n.d.). Students and Professors Believe AI Will Aid Cheating. Inside Higher Ed. Retrieved August 20, 2024

D’Agostino, S. (n.d.). Turnitin’s AI Detector: Higher-Than-Expected False Positives. Inside Higher Ed. Retrieved August 20, 2024

Developing course policies around A.I. | Center for Transformative Teaching | Nebraska. (n.d.). Retrieved August 21, 2024

Help Prevent Academic Misconduct | Student Conduct & Community Standards | Nebraska. (n.d.). Retrieved August 21, 2024

Jimenez, K. (2023, April 12). How AI detection tool spawned a false cheating case at UC Davis. USA Today.

Lee, V. R., Pope, D., Miles, S., & Zárate, R. C. (2024). Cheating in the age of generative AI: A high school survey study of cheating behaviors before and after the release of ChatGPT. Computers and Education: Artificial Intelligence, 7, 100253.

Mathewson, T. G. (2023, August 14). AI Detection Tools Falsely Accuse International Students of Cheating – The Markup.

Nam, J. (n.d.). 56% of College Students Have Used AI on Assignments or Exams | BestColleges. Retrieved August 20, 2024

Reporter, J. K. S. (2023, July 26). Prof accused of being AI bot. Purdue Exponent.

Schafer, J. (2024, April 30). Chegg stock crashes as free AI tools send online education company “spiraling.” Yahoo Finance.

University of Nebraska Student Code of Conduct | Student Conduct & Community Standards | Nebraska. (n.d.). Retrieved August 21, 2024

More details at: https://go.unl.edu/teachingandAI