Question

Q. How do I know what data or information was used to train AI tools?

Answered By: Amanda Suiters
Last Updated: Mar 26, 2025     Views: 3

The short answer is--you usually don't.

Most AI tools like ChatGPT were trained on large, mixed datasets from books, websites, and online forums.  The training data isn't always made public, and it often doesn't include up-to-date or peer-reviewed academic sources. Most academic quality research is behind paywalls and cannot be included into the datasets due to copyright law.

This means AI tools may give outdated, biased, or incorrect information--especially in scientific, legal, or technical fields. 

Chat with a Librarian

Find More Answers