Research Guides: Generative AI and Critical Evaluation: Assessing AI-based tools for accuracy

Assessing AI-based tools for accuracy

Introduction: analyzing AI-generated information

Although many responses produced by AI text generators are accurate, AI also often generates misinformation. Often, the answers produced by AI will be a mixture of truth and fiction. If you are using AI-generated text for research, it will be essential to be able to verify its outputs. You can use many skills you’d already use to fact-check and think critically about human-written sources, but some of your strategies will have to change. For instance, we can’t check the information by evaluating the credibility of the source or the author, as we usually do. We must use other methods, like lateral reading, which we’ll explain below.

Remember, the AI produces what it believes is the most likely series of words to answer your prompt, so AI might not be providing you with accurate information. When choosing to use AI, it’s wise to use it as a beginning and not an end. Being able to critically analyze the outputs that AI gives you will be an increasingly crucial skill throughout your studies and your life after graduation.

When AI gets it wrong

As of summer 2023, a typical AI model isn't assessing whether the information it provides is correct. When it receives a prompt, its goal is to generate what it thinks is the most likely string of words to answer that prompt. Sometimes, this results in a correct answer, but sometimes it doesn’t – and the AI cannot interpret or distinguish between the two. It’s up to you to make the distinction.

AI can be wrong in multiple ways:

It can give the wrong answer
It can omit information by mistake
It can make up completely fake people, events, and articles
It can mix truth and fiction

Explore each section below to learn more.

It can give a wrong or misleading answer

Sometimes, an AI will confidently return an incorrect answer. The AI generated text could include factual errors or omit important information.

It can make up false information

Sometimes, rather than simply being wrong, an AI will invent information that does not exist. Some people call this a “hallucination,” or, when the invented information is a citation, a “ghost citation.”

These inaccuracies are trickier to catch, because they often contain a mix of real and fake information.

When ChatGPT gives a URL for a source, it often makes up a fake URL, or uses a real URL that leads to something completely different. It’s key to double-check the answers AI gives you with a human-created source. You can find out how to fact-check AI text in sections and video at the bottom of this page.

It cannot accurately produce its sources

Currently, if you ask an AI to cite its sources, the results it gives you are very unlikely to be where it is actually pulling this information. In fact, neither the AI nor its programmers can truly say where the information comes from in its enormous training dataset.

As of summer 2023, even an AI that provides real footnotes doesn't provide the places information is from, just an assortment of webpages and articles that are roughly related to the topic of the prompt. If prompted, the AI will provide the exact same answer but footnote different sources.

For example, the two screenshots below are responses to the same prompt:

In the first screenshot, the user specified to use only peer-reviewed sources. When you compare the two, you can see that the AI cites different sources for word-for-word identical sentences. This means that these footnotes are not where the AI sourced its information. (Also note that the sources on the right are all either not peer-reviewed or not relevant. Plus, artsy.net, history.com, and certainly theprouditalian.com are not reliable enough for you to source from in your assignments.)

This matters because an important part of determining a human author’s credibility is seeing what sources they draw on for their argument. You can go to these sources to fact-check the information they provide, and you can look at their sources as a whole to get insight into the author’s process, potentially revealing a flawed or biased way of information-gathering.

You should treat AI outputs like fact-checking a text that provides no sources, like some online articles or social media posts. You’ll determine its credibility by looking to outside, human-created sources (see lateral reading below).

It can interpret your prompts in an unexpected way

AI can accidentally ignore instructions or interpret a prompt in a way you weren’t expecting. A minor example of this is ChatGPT returning a 5-paragraph response when it was prompted to give a 3-paragraph response, or ignoring a direction to include citations throughout a piece of writing. In more major ways, though, it can make interpretations that you might miss. If you’re not familiar with the topic you’re asking an AI-based tool about, you might not even realize that it’s interpreting your prompt inaccurately.

Lateral reading: your #1 analysis tool

If you cannot take AI-cited sources at face value and you (or the AI programmers) cannot determine where the information is sourced from, how are you going to assess the validity of what AI is telling you? Here, you can use a helpful method for evaluation: lateral reading. Lateral reading is done when you apply fact-checking techniques by leaving the AI output and consulting other sources to evaluate what the AI has provided based on your prompt. You can think of this as “tabbed reading,” moving laterally away from the AI information to sources in other tabs rather than just proceeding “vertically” down the page based on the AI prompt alone.

What does this process look like specifically with AI-based tools? Learn more in the sections below.

Lateral reading and AI

You can use lateral reading when evaluating any online source, but you will find fewer pieces of information to assess through lateral reading when working with AI. While you can typically reach a consensus about online sources by searching for a source’s publication, funding organization, author, or title, none of these bits of information are available to you when assessing AI output. As a result, you must read several sources outside the AI tool to determine whether credible, non-AI sources can confirm the information the tool returned.

With AI, instead of asking, “Who’s behind this information?,” we have to ask, “Who can confirm this information?”. In the video above, lateral reading is applied to an online source with an organization name, logo, URL, and authors whose identities and motivations can be researched and fact-checked from other sources. AI content has no identifiers, and AI output is a composite of multiple unidentifiable sources. This means you must take a look at the factual claims in AI content and decide on the validity of the claims themselves rather than the source of the claims.

Since AI output is not a single source of information but rather drawn from multiple sources that could be both factual and false, you will find it useful to break apart AI output into smaller components of information which you can then evaluate.

Instructions: tackle an AI fact-check

Diagram of a fact-checking process for AI. The diagram is titled “AI-Fact Checking” and shows a linear flow chart with five steps, represented by a series of different-colored arrows. The text of the diagram reads: Step 1: Break It Down. Break down the information. Identify specific claims. Step 2: Search. Look for information supporting a specific claim. For specific info claims: try Google or Wikipedia. For confirming something exists: try Google Scholar or WorldCat. Step 3: Analyze. Consider the info discovered in light of assumptions: What did your prompt assume? What did the Al assume? What perspective or agenda do your fact-check findings hold? Step 4: Decide. What is true? What is misleading? What is factually incorrect? Can you update your prompt to address any errors? Step 5: Repeat/Conclude. Repeat this process for each of the claims identified in the

Here's how to fact-check something you got from ChatGPT or a similar tool:
1. Break down the information. Look at the response and see if you can isolate specific, searchable claims. This is called fractionation.
2. Then it’s lateral reading time! Open a new tab and look for supporting pieces of information. Here are some good sources to start with:
  - When searching for specific pieces of information: Google results or Wikipedia
  - When seeing if something exists: Google Scholar, WorldCat, or Wikipedia
  - Tip: Some things to watch out for – is the AI putting correct information in the wrong context? Is it attributing a fake article to a real author?
3. Next, think deeper about what assumptions are being made here
  - What did your prompt assume?
  - What did the AI assume?
  - Who would know things about this topic? Would they have a different perspective than what the AI is offering? Where could you check to find out?
4. Finally, make a judgment call. What here is true, what is misleading, and what is factually incorrect? Can you re-prompt the AI to try and fix some of these errors? Can you dive deeper into one of the sources you found while fact-checking? Remember, you’re repeating this process for each of the claims the AI made – go back to your list from the first step and keep going!
For an example of this in action, view the video at the bottom of the page.

Beyond fact-checking: Considering bias

Critical thinking about AI responses goes beyond determining whether the specific facts in the text are true or false. We also have to think about bias and viewpoint – two things we keep in mind when reading human authors, but you might be surprised to learn we must keep in mind with AI as well.

Any text implicitly contains a point of view, influenced by the ideologies and societal factors the author lives with. When we think critically about news articles, books, or social media posts out in the wild, we consider the author’s viewpoint and how that might affect the content we’re reading. These texts that all of us produce every day are the foundation of generative AI’s training data. While AI text generators don’t have their own opinions or points of view, they are trained on datasets full of human opinions and points of view, and sometimes those viewpoints surface in its answers.

AI can be explicitly prompted to support a particular point of view (for instance, “give a 6-sentence paragraph on ramen from the perspective of someone obsessed with noodles”). But even when not prompted in any particular way, AI is not delivering a “neutral” response. For many questions, there is not one “objective” answer. This means that for an AI tool to generate an answer, it must choose which viewpoints to represent in its response. It’s also worth thinking about the fact that we can’t know exactly how the AI is determining what is worth including in its response and what is not.

AI also often replicates biases and bigotry found in its training data (as you learned on page 1 of this module). It is very difficult to get an AI tool to arrive at the fact that people in positions of authority, like doctors or professors, can be women, without explicit prompting from a human. AI image editing tools have edited users to be white when prompted to make their headshot look “professional,” and can sexualize or undress women, particularly women of color, when editing pictures of them for any purpose.

AI also replicates biases by omission. When asked for a short history of 16th-century art, ChatGPT and Bing AI invariably only include European art. This is the case even if you ask in other languages, like Chinese and Arabic, so the AI tool is not basing this response on the user’s presumed region. China and the Arabic-speaking world were certainly producing art during the 16th century, but the AI has decided that when users ask for “art history,” they mean “European art history,” and that users only want information about the rest of the world if they specifically say so.

These are more obvious examples, but they also reveal the decision-making processes that the AI is using to answer more complex or subtle questions. The associations that an AI has learned from its training data are the basis of its “worldview,” and we can’t fully know all the connections AI has made and why it has made those connections. Sometimes, these connections lead AI-generated content that reinforce bigotry or provide otherwise undesirable responses. When this happens in ways we can see, it prompts the question: how is this showing up in less obvious ways?

Instructions: go beyond fact checking

Now let’s try lateral reading for a second time, with a focus on the response’s perspective:

We can start with fractionation again, but this time we’re thinking about what claims and perspectives are being represented in the AI response.
- Brainstorm the groups who might be invested in this issue and who might have a discrete perspective (stakeholders, corporations, governments, demographic groups, nationalities, regions, etc.)
- Think about the argument as a whole. What perspective(s) can you find here? Which might be missing?
- Break down the response into individual claims. What perspective(s) can you find in these claims? Which might be missing?
Time to start your lateral reading. Think about what sources might provide the perspectives above, both the ones in the AI’s response and the ones missing from it.
- Try publications like newspapers or well-established magazines, like the Atlantic or Scientific American.
- You can find perspectives you’re looking for in news articles, opinion pieces, speeches, etc. Remember to think critically about these perspectives – some may be based on incorrect facts or a distortion of the issue.
- Check Wikipedia to get a sense of each publication’s reputation.
Next, think deeper about what assumptions are being made here.
- What did your prompt assume?
- What did the AI assume?
- Who would know things about this topic? Would they have a different perspective than what the AI is offering? Where could you check to find out?
Finally, make a judgment call. What here is true, what is misleading, and what is factually incorrect? Can you re-prompt the AI to try to get a different perspective? Can you dive deeper into one of the sources you found while fact-checking?

Again, the key is remembering that the AI is not delivering you the one definitive answer to your question.

Example: let's check an AI-generated response!

Check out the videos below to see these lateral reading strategies in action.

AI Fact Checking Text and Links (Video opens in new window; please return to module after watching the video.)

AI Fact Checking Scholarly Sources (Video opens in new window; please return to module after watching the video.)

References

Buell, S. (2023, July 19). An MIT student asked AI to make her headshot more ‘professional.’ it gave her lighter skin and blue eyes. Boston Globe. Retrieved August 2, 2023, from
Cesari, L. [@bee_in_the_library]. (2022, January 12). A fav strategy of professional fact checkers #infoliteracy #librariansoftiktok #teachersoftiktok [Video]. TikTok. https://www.tiktok.com/@bee_in_the_library/video/7052430272150719790
Heikkilä, M. (2022, December 12). The viral AI avatar app Lensa undressed me—without my consent. MIT Technology Review. Retrieved August 2, 2023, from https://www.technologyreview.com/2022/12/12/1064751/the-viral-ai-avatar-app-lensa-undressed-me-without-my-consent/
Stanford History Education Group. (2020, Jan 16). Sort fact from fiction online with lateral reading [Video]. YouTube. https://www.youtube.com/watch?v=SHNprb2hgzU
Stanford History Education Group. (2020). Teaching lateral reading. Civic Online Reasoning. https://cor.stanford.edu/curriculum/collections/teaching-lateral-reading/

Now that you know how to assess the accuracy of AI-generated work, continue onto the next page of this module to learn how to cite it in your academic work!