Finding Data
Text analysis requires a digital text file. Your text can be anything: a novel, journal article, tweets, etc. Here are some things to keep in mind:
Data Format
Be Flexible
Text analysis tools are all designed differently. You might need to experiment with a few, especially if you're interested in comparing across multiple authors/texts. Understand whether and how a tool can make that distinction.
Say I want to find out how many people tweeted negative statements about "climate change" in 2020. I decide to save 100 tweets in one Excel file, where every row records a different person's tweet. I decide to use Voyant and upload the file. I discover a problem: Voyant assumes one file = one author. If I want Voyant to identify and compare across different authors, I need to save every tweet (i.e. every row) in a separate Excel file and upload them all. Or, I could use a different tool, like Orange, which can distinguish multiple authors in one digital file.
Clean your Data
"Cleaning" means prepping the dataset -- removing extraneous rows and columns or information; deleting problematic icons and symbols; reformatting dates and times; adding column headings.
Not sure where to start? Try uploading your dataset to whatever text analysis tool you chose. Your tool might "analyze" text you're not interested in (e.g. page numbers, words like "chapter", etc.).
The library can also help! Try an Excel workshop. There are also several librarians who can help you: Ford Fishman, Margarita Corral, and Natalie Susmann.
And remember -- text analysis is an iterative process. You will likely clean your data, upload it, and then discover more cleaning is needed. This happens to everyone!