Tag: corpus glossary
-
word list
A list of all the types in a corpus. Usually arranged by frequency with the highest frequency at the top. As a reference corpus a word list can tell you which are the most common words within a language. Placed against another corpus from a different period (or one that is marked with usage information)…
-
case
Case – lower and uppercase – serves the purpose of helping reading and therefore meaning in graphic texts. Nothing substantially changes to the pronunciation of a word. It is therefore a wholly written feature of language that is not apparent in spoken form. Concordancing software often allow you to choose between being case-sensitive or not. At times,…
-
type-token ratio
The type-token ratio (or TTR) is used to compare two corpora in terms of lexical complexity. The formula is the number of types divided by the number of tokens. The closer to 1 the greater the complexity. The closer to 0 the greater the repetition of words. There is not a specific ratio which can…
-
collocation
How words (collocates) relate to a particular word (keyword or node). In corpus, this usually means within a certain distance from the node. For example, ±5 words to either side of the node which are then collated and summed for quick comprehension. Words often come together with greater-than-chance regularity. This can either be within the…
-
KWIC
Short for Key Word In Context. It is a way of looking at a search term (type) in a concordance program with the keyword centred so as to see the patterns created by the other words, its context. Below is an example of a concordance search of the term ‘violence’ in a corpus. The words…
-
type
The unique form of the tokens (words) in a corpus. Often accompanied by frequency data. Meaning is treated as secondary. Corpus linguistic analysis does not directly reveal the various meanings of a word. This must be inferred from its usage. In corpus linguistics this usually done by concordancing, collocations, clusters, etc.
-
token
The individual forms (words) of a corpus. The sum of the tokens is the size of the corpus. The term contrasts with type in order to distinguish how we are observing the form, whether as one instance in the corpus (token), or as combined instances relating to its frequency within a corpus (type).