A look at language collocations and keywords. Lecture 3

Содержание

Слайд 2

Overview

The focus of this talk – the corpus linguistics perspective collocations

Overview The focus of this talk – the corpus linguistics perspective collocations
and keywords
• Some examples
• Multi-methods

Слайд 3

Frequency lists

Frequency lists

Слайд 5

Collocates of diamond

Collocates of diamond

Слайд 6

How large should the span be?

Typically set at +/- 5 words. This

How large should the span be? Typically set at +/- 5 words.
seems to be the most useful span for collocates
Similarly many people set a minimum threshold of frequency for words to count as collocates. I usually use a minimum frequency of 10
Option to stop at sentence boundaries

Слайд 7

Collocates of company

Collocates of company

Слайд 8

Rank by frequency

Rank by frequency

Слайд 9

Mutual information   количество взаимной информации

Mutual information количество взаимной информации

Слайд 10

Dice coefficient коэффициент Дайса

Dice coefficient коэффициент Дайса

Слайд 11

A look at language collocations and keywords

A look at language collocations and keywords

Слайд 12

Colligation

A word collocates with a particular grammatical class.
E.g ‘he’ colligates with

Colligation A word collocates with a particular grammatical class. E.g ‘he’ colligates
verbs
‘Mrs’ colligates with proper nouns
determiners colligate with nouns

Слайд 13

Semantic preference

Similar to Bill Louw’s concept of semantic prosody.
‘the relation,

Semantic preference Similar to Bill Louw’s concept of semantic prosody. ‘the relation,
not between individual words, but between a lemma or word-form and a set of semantically related words’ Stubbs (2001: 65)

Слайд 14

Semantic preference – glass of

wine, sherry, champagne, beer, poured, water, juice,

Semantic preference – glass of wine, sherry, champagne, beer, poured, water, juice,
brandy, milk, whisky, orange, lemonade, rum, iced, sipped, gin, vodka, small, port, cider, lager

Слайд 15

Discourse prosody

Discourse prosody

Слайд 16

Discourse prosody

Discourse prosody

Слайд 17

(Corpus) Keywords

A keyword list is calculated by comparing 2 frequency lists

(Corpus) Keywords A keyword list is calculated by comparing 2 frequency lists
together – usually a much larger reference corpus against a smaller specialised corpus (but sometimes 2 equal sized corpora).
• Chi-square or log-likelihood test identify the words that are statistically much more frequent in one list when compared to the other.

Слайд 18

http://ucrel.lancs.ac.uk/llwizard.html

http://ucrel.lancs.ac.uk/llwizard.html

Слайд 19

When is a word a keyword?

The analyst needs to apply cut-off

When is a word a keyword? The analyst needs to apply cut-off
points for statistical significance.
• Some analysts only look at the top 10 or 50 or 100 keywords instead.
• Additionally, sometimes a minimum frequency is applied (e.g. a word must occur 20 times before it’s a keyword)
• Also, we may specify a keyword has to be reasonably well distributed (occurring in at least 20 texts)

Слайд 20

Common types of keywords

1.Proper nouns (Clegg, Ghana etc)
2. Markers of

Common types of keywords 1.Proper nouns (Clegg, Ghana etc) 2. Markers of
style (often grammatical words like must, betwixt)
3. Spelling idiosyncrasies (color/colour)
4. “Aboutness” words (politics, recipe etc)

Слайд 21

What’s the point of it?

What’s the point of it?

Слайд 22

Example – Change over time (Baker 2011)

Example – Change over time (Baker 2011)

Слайд 23

Identifying key terms

Identifying key terms

Слайд 25

Words that are declining the most

Words that are declining the most

Слайд 30

Multi Methods

Corpora can answer some questions very well, others not at all.

Multi Methods Corpora can answer some questions very well, others not at

Corpora can integrate with other methods gainfully
Corpora can help mesh quantitative and qualitative analyses
Corpora are a tool – and like any tool they are good for some jobs and not others. They should also be part of a tool set.

Слайд 31

Summing up

Collocates and keywords are important techniques in corpus linguistics –

Summing up Collocates and keywords are important techniques in corpus linguistics –
you will come across the terms many times on this course
They can tell us ‘about’ texts
They can tell us about change over time
They can help us decode argumentation strategies
And more besides!

Слайд 33

GraphColl: Collocations in #LancsBox

Collocation is systematic co-occurrence of words in text

GraphColl: Collocations in #LancsBox Collocation is systematic co-occurrence of words in text
and discourse that we identify statistically