A glossary of corpus types

There are many types of corpus depending on their use. Below is a list some of the main types.

diachronic – a corpus which looks at changes across a timeframe.

learner – a corpus of L2 learner writing of speech.

monitor – a type of diachronic corpus which may continue to grow with new texts added over time.

monolingual – includes only one language.

multilingual – a corpus with two or more languages.

parallel – a corpus with both a target language (L2) and first language (L1).

reference – a corpus to which other corpora are used to compare with, usually through statistical data analysis.

synchronic – a corpus that has been constructed at a certain time (like a snapshot) to represent a language.

raw – a corpus with no annotation.

tagged – a corpus with annotation (for example, Parts-Of-Speech tags).

target – a corpus that is compared to a reference corpus.

Published by

4 responses to “A glossary of corpus types”

  1. Thank you! I will add this to the glossary.


  2. May I suggest you another corpus type? Developmental, corpus of texts produced by speakers/writers in the process of acquiring and/or developing their first language, like Lucy, Solar or Doeste (https://doeste.ufersa.edu.br/).

    I congratulate you for spreading linguistics!

    Liked by 1 person

  3. […] This is a great little explainer from Warren M. Tang. It covers the basics of the basics, and provides ready-to-use definitions and descriptions. He has also got a straightforward glossary of corpus types. […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: