Слайд 2Overview
The focus of this talk – the corpus linguistics perspective collocations
and keywords
• Some examples
• Multi-methods
Слайд 6How large should the span be?
Typically set at +/- 5 words. This
seems to be the most useful span for collocates
Similarly many people set a minimum threshold of frequency for words to count as collocates. I usually use a minimum frequency of 10
Option to stop at sentence boundaries
Слайд 9Mutual information
количество взаимной информации
Слайд 10Dice coefficient
коэффициент Дайса
Слайд 11A look at language collocations and keywords
Слайд 12Colligation
A word collocates with a particular grammatical class.
E.g ‘he’ colligates with
verbs
‘Mrs’ colligates with proper nouns
determiners colligate with nouns
Слайд 13Semantic preference
Similar to Bill Louw’s concept of semantic prosody.
‘the relation,
not between individual words, but between a lemma or word-form and a set of semantically related words’ Stubbs (2001: 65)
Слайд 14Semantic preference – glass of
wine, sherry, champagne, beer, poured, water, juice,
brandy, milk, whisky, orange, lemonade, rum, iced, sipped, gin, vodka, small, port, cider, lager
Слайд 17(Corpus) Keywords
A keyword list is calculated by comparing 2 frequency lists
together – usually a much larger reference corpus against a smaller specialised corpus (but sometimes 2 equal sized corpora).
• Chi-square or log-likelihood test identify the words that are statistically much more frequent in one list when compared to the other.
Слайд 18http://ucrel.lancs.ac.uk/llwizard.html
Слайд 19When is a word a keyword?
The analyst needs to apply cut-off
points for statistical significance.
• Some analysts only look at the top 10 or 50 or 100 keywords instead.
• Additionally, sometimes a minimum frequency is applied (e.g. a word must occur 20 times before it’s a keyword)
• Also, we may specify a keyword has to be reasonably well distributed (occurring in at least 20 texts)
Слайд 20Common types of keywords
1.Proper nouns (Clegg, Ghana etc)
2. Markers of
style (often grammatical words like must, betwixt)
3. Spelling idiosyncrasies (color/colour)
4. “Aboutness” words (politics, recipe etc)
Слайд 22Example – Change over time
(Baker 2011)
Слайд 25Words that are declining the most
Слайд 30Multi Methods
Corpora can answer some questions very well, others not at all.
Corpora can integrate with other methods gainfully
Corpora can help mesh quantitative and qualitative analyses
Corpora are a tool – and like any tool they are good for some jobs and not others. They should also be part of a tool set.
Слайд 31Summing up
Collocates and keywords are important techniques in corpus linguistics –
you will come across the terms many times on this course
They can tell us ‘about’ texts
They can tell us about change over time
They can help us decode argumentation strategies
And more besides!
Слайд 33GraphColl: Collocations in #LancsBox
Collocation is systematic co-occurrence of words in text
and discourse that we identify statistically