Linguistic Data Consortium

The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC's host institution. The LDC was founded in 1992 with a grant from the Advanced Research Projects Agency (ARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the National Science Foundation.

Please visit Linguistics Data Consortium at University of Pennsylvania for more information and a catalog of corpora.

The Linguistics Department at Northwestern University has access to many LDC corpora online and maintains a collection of physical copies. Any graduate student, faculty member or researcher at Northwestern University may request access. If you are interested in obtaining any corpus, please first check whether we have a physical copy via LDC Inventory Then, send an email to the Linguistics Department containing the following information:

  • Your name
  • Email address
  • Northwestern Departmental affiliation

If the corpus is held by Northwestern, include:

  • Corpus name
  • Desired dates of borrowing

If corpus is not held by Northwestern, include:

  • Corpus name
  • LDC catalog number
  • Link to LDC catalog entry