Слайд 2Overview
The focus of this talk – the corpus linguistics perspective collocations

and keywords
• Some examples
• Multi-methods
Слайд 6How large should the span be?
Typically set at +/- 5 words. This

seems to be the most useful span for collocates
Similarly many people set a minimum threshold of frequency for words to count as collocates. I usually use a minimum frequency of 10
Option to stop at sentence boundaries
Слайд 9Mutual information
количество взаимной информации

Слайд 10Dice coefficient
коэффициент Дайса

Слайд 11A look at language collocations and keywords

Слайд 12Colligation
A word collocates with a particular grammatical class.
E.g ‘he’ colligates with

verbs
‘Mrs’ colligates with proper nouns
determiners colligate with nouns
Слайд 13Semantic preference
Similar to Bill Louw’s concept of semantic prosody.
‘the relation,

not between individual words, but between a lemma or word-form and a set of semantically related words’ Stubbs (2001: 65)
Слайд 14Semantic preference – glass of
wine, sherry, champagne, beer, poured, water, juice,

brandy, milk, whisky, orange, lemonade, rum, iced, sipped, gin, vodka, small, port, cider, lager
Слайд 17(Corpus) Keywords
A keyword list is calculated by comparing 2 frequency lists

together – usually a much larger reference corpus against a smaller specialised corpus (but sometimes 2 equal sized corpora).
• Chi-square or log-likelihood test identify the words that are statistically much more frequent in one list when compared to the other.
Слайд 18http://ucrel.lancs.ac.uk/llwizard.html

Слайд 19When is a word a keyword?
The analyst needs to apply cut-off

points for statistical significance.
• Some analysts only look at the top 10 or 50 or 100 keywords instead.
• Additionally, sometimes a minimum frequency is applied (e.g. a word must occur 20 times before it’s a keyword)
• Also, we may specify a keyword has to be reasonably well distributed (occurring in at least 20 texts)
Слайд 20Common types of keywords
1.Proper nouns (Clegg, Ghana etc)
2. Markers of

style (often grammatical words like must, betwixt)
3. Spelling idiosyncrasies (color/colour)
4. “Aboutness” words (politics, recipe etc)
Слайд 22Example – Change over time
(Baker 2011)

Слайд 25Words that are declining the most

Слайд 30Multi Methods
Corpora can answer some questions very well, others not at all.

Corpora can integrate with other methods gainfully
Corpora can help mesh quantitative and qualitative analyses
Corpora are a tool – and like any tool they are good for some jobs and not others. They should also be part of a tool set.
Слайд 31Summing up
Collocates and keywords are important techniques in corpus linguistics –

you will come across the terms many times on this course
They can tell us ‘about’ texts
They can tell us about change over time
They can help us decode argumentation strategies
And more besides!
Слайд 33GraphColl: Collocations in #LancsBox
Collocation is systematic co-occurrence of words in text

and discourse that we identify statistically