Skip to Main Content


This guide includes resources for theoretical and computational linguistics.

Linguistic Corpora

HathiTrust Research Center

JSTOR Data for Research

The JSTOR Data for Research service provides the public with data and text mining access to JSTOR content.

ScienceDirect & Scopus

Text mining access to content in our ScienceDirect and Scopus databases is available through an API.

Chronicling America

The Library of Congress: Chronicling America collection provides access to information about historic newspapers and select digitized newspapers in the United States. The Library of Congress designed several different views of the data they provided, all of which are publicly visible. Each uses common web protocols, and access is not restricted in any way. You do not need to apply for an API key to use them.

Text Creation Partnership Corpora

The Text Creation Partnership has created standardized and accurate XML/SGML-encoded editions of early printed books from ProQuest’s Early English Books Online, Gale Cengage’s Eighteenth Century Collections Online, and Readex’s Evans Early American Imprints.