Linguistically-Informed Self-Attention for Semantic Role Labeling

Содержание

Слайд 2

Want fast, accurate, robust NLP

Want fast, accurate, robust NLP

Слайд 3

Strickland

Committee

Nobel

who

awards

advanced

advanced

SRL: Who did what to whom?

Committee

awards

Nobel

Strickland

advanced

to

optics

root

nsubj

dobj

prep

pobj

rcmod

nsubj

dobj

agent

predicate

predicate

theme

beneficiary

Strickland Committee Nobel who awards advanced advanced SRL: Who did what to

Слайд 4

Nobel

advanced

optics

Strickland

Strickland

who

who

advanced

awards

Committee

SRL: Who did what to whom?

optics

to

root

nsubj

dobj

prep

pobj

rcmod

nsubj

dobj

agent

predicate

theme

Nobel advanced optics Strickland Strickland who who advanced awards Committee SRL: Who

Слайд 5

who

advanced

optics

awards

Nobel

Committee

Strickland

to

SRL: Who did what to whom?

root

nsubj

dobj

prep

pobj

rcmod

nsubj

dobj

who advanced optics awards Nobel Committee Strickland to SRL: Who did what

Слайд 6

who

advanced

awards

Committee

Nobel

Strickland

who

to

optics

agent

predicate

theme

beneficiary

Strickland

advanced

Nobel

awards

Committee

PropBank SRL: Who did what to whom?

optics

to

root

nsubj

dobj

prep

pobj

rcmod

nsubj

dobj

V

V

ARG0

ARG0

ARG1

ARG1

ARG2

Strickland

who

advanced

Nobel

awards

Committee

optics

agent

predicate

theme

to

R-ARG0

who advanced awards Committee Nobel Strickland who to optics agent predicate theme

Слайд 7

10 years of PropBank SRL

Year

F1

23% error reduction! ?

[Punyakanok et al.]

[Toutanova et al.]

[Täckström

10 years of PropBank SRL Year F1 23% error reduction! ? [Punyakanok
et al.]

[FitzGerald et al.]

[He et al.]

[He et al.]

[Tan et al.]

Syntax-based

End-to-end
deep NN

in-domain

[Toutanova, Haghigi, Manning]

[Punyakanok, Roth, Yih]

[FitzGerald, Täckström, Ganchev, Das]

[Täckström, Ganchev, Das]

[Zhou & Xu]

[He, Lee, Lewis,
Zettlemoyer]

[He, Lee, Levy,
Zettlemoyer]

[Tan, Wang, Xie,
Chen, Shi]

[Zhou & Xu]

Слайд 8

10 years of PropBank SRL

Year

F1

[Punyakanok et al.]

[Toutanova et al.]

[Tackström et al.]

[FitzGerald et

10 years of PropBank SRL Year F1 [Punyakanok et al.] [Toutanova et
al.]

[Zhou & Xu]

[He et al.]

[He et al.]

[Tan et al.]

8% error reduction ?

18% error reduction ?

[LISA]

[LISA]

29% error reduction! ?

23% error reduction! ?

out-of-domain

in-domain

Слайд 9

Linguistically-Informed Self-Attention

Multi-task learning, single-pass inference
Part-of-speech tagging
Labeled dependency parsing
Predicate detection
Semantic role spans &

Linguistically-Informed Self-Attention Multi-task learning, single-pass inference Part-of-speech tagging Labeled dependency parsing Predicate
labeling
Syntactically-informed self-attention
Multi-head self-attention supervised by syntax

Multi-head self-attention supervised by syntax

Слайд 10

[Vaswani et al. 2017]

[Vaswani et al. 2017]

Слайд 11

Self-attention

committee

awards

Strickland

advanced

optics

who

Layer p

Q

K

V

[Vaswani et al. 2017]

Nobel

Self-attention committee awards Strickland advanced optics who Layer p Q K V

Слайд 12

Self-attention

Layer p

Q

K

V

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

Self-attention Layer p Q K V [Vaswani et al. 2017] committee awards

Слайд 13

Self-attention

Layer p

Q

K

V

optics
advanced
who
Strickland
awards
committee
Nobel

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

Self-attention Layer p Q K V optics advanced who Strickland awards committee

Слайд 14

optics
advanced
who
Strickland
awards
committee
Nobel

Self-attention

Layer p

Q

K

V

A

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

optics advanced who Strickland awards committee Nobel Self-attention Layer p Q K

Слайд 15

Self-attention

Layer p

Q

K

V

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

optics
advanced
who
Strickland
awards
committee
Nobel

A

Self-attention Layer p Q K V [Vaswani et al. 2017] committee awards

Слайд 16

Self-attention

Layer p

Q

K

V

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

optics
advanced
who
Strickland
awards
committee
Nobel

A

Self-attention Layer p Q K V [Vaswani et al. 2017] committee awards

Слайд 17

Self-attention

Layer p

Q

K

V

M

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

optics
advanced
who
Strickland
awards
committee
Nobel

A

Self-attention Layer p Q K V M [Vaswani et al. 2017] committee

Слайд 18

Self-attention

Layer p

Q

K

V

M

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

optics
advanced
who
Strickland
awards
committee
Nobel

A

Self-attention Layer p Q K V M [Vaswani et al. 2017] committee

Слайд 19

Multi-head self-attention

Layer p

Q

K

V

M

M1

MH

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

optics
advanced
who
Strickland
awards
committee
Nobel

A

Multi-head self-attention Layer p Q K V M M1 MH [Vaswani et

Слайд 20

Multi-head self-attention

Layer p

Q

K

V

MH

M1

[Vaswani et al. 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

optics
advanced
who
Strickland
awards
committee
Nobel

A

Multi-head self-attention Layer p Q K V MH M1 [Vaswani et al.

Слайд 21

Multi-head self-attention

Layer p

Q

K

V

MH

M1

Layer
p+1

committee

awards

Strickland

advanced

optics

who

Nobel

[Vaswani et al. 2017]

optics
advanced
who
Strickland
awards
committee
Nobel

A

Multi-head self-attention Layer p Q K V MH M1 Layer p+1 committee

Слайд 22

Multi-head self-attention

committee

awards

Strickland

advanced

optics

who

Nobel

[Vaswani et al. 2017]

p+1

Multi-head self-attention committee awards Strickland advanced optics who Nobel [Vaswani et al. 2017] p+1

Слайд 23

Multi-head self-attention + feed forward

Multi-head self-attention + feed forward

Multi-head self-attention

Layer 1

Layer p

Multi-head

Multi-head self-attention + feed forward Multi-head self-attention + feed forward Multi-head self-attention
self-attention + feed forward

Layer J

committee

awards

Strickland

advanced

optics

who

Nobel

[Vaswani et al. 2017]

Слайд 24

[Vaswani et al. 2017]

[Vaswani et al. 2017]

Слайд 25

How to incorporate syntax?

Multi-task learning [Caruana 1993; Collobert et al. 2011]:
Overfits to

How to incorporate syntax? Multi-task learning [Caruana 1993; Collobert et al. 2011]:
training domain like single-task end-to-end NN.
Must re-train SRL model to leverage new (improved) syntax.
Dependency path embeddings [Roth & Lapata 2016]; Graph CNN over parse [Marcheggiani & Titov 2017]
Restricted context: path to predicate or fixed-width window.
Syntactically-informed self-attention
In one head, token attends to its likely syntactic parent(s).
Global context: In next layer, tokens observe all other parents.
At test time: can use own predicted parse, OR supply syntax to improve SRL model without re-training.

Слайд 26

Syntactically-informed self-attention

Layer p

Q

K

V

biaffine
parser

biaffine
parser

biaffine
parser

biaffine
parser

biaffine
parser

biaffine
parser

biaffine
parser

[Dozat and Manning 2017]

committee

awards

Strickland

advanced

optics

who

Nobel

optics
advanced
who
Strickland
awards
committee
Nobel

A

Syntactically-informed self-attention Layer p Q K V biaffine parser biaffine parser biaffine

Слайд 27

Syntactically-informed self-attention

committee

awards

Strickland

advanced

optics

who

Nobel

Syntactically-informed self-attention committee awards Strickland advanced optics who Nobel

Слайд 28

Syntactically-informed self-attention

Multi-head self-attention + feed forward

Syntactically-informed self-attention

Layer 1

Layer p

Multi-head self-attention + feed

Syntactically-informed self-attention Multi-head self-attention + feed forward Syntactically-informed self-attention Layer 1 Layer
forward

Layer J

committee

awards

Strickland

advanced

optics

who

Nobel

Слайд 30

LISA: Linguistically-Informed Self-Attention

Layer 1

Layer r

NNP

NN

VBZ/PRED

NNP

WP

VBN/PRED

NN

committee

awards

Strickland

advanced

optics

who

Nobel

LISA: Linguistically-Informed Self-Attention Layer 1 Layer r NNP NN VBZ/PRED NNP WP

Слайд 31

LISA: Linguistically-Informed Self-Attention

Layer 1

Syntactically-informed self-attention

Layer p

Layer r

committee

awards

Strickland

advanced

optics

who

Nobel

NNP

NN

VBZ/PRED

NNP

WP

NN

VBN/PRED

LISA: Linguistically-Informed Self-Attention Layer 1 Syntactically-informed self-attention Layer p Layer r committee

Слайд 32

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel

Слайд 33

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel args predicates Bilinear

Слайд 34

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel args predicates Bilinear

Слайд 35

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args predicates Bilinear

Слайд 36

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args predicates Bilinear

Слайд 37

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args predicates Bilinear

Слайд 38

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args predicates Bilinear

Слайд 39

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args predicates Bilinear

Слайд 40

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args predicates Bilinear

Слайд 41

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

predicates

Bilinear

pos

preds

labeled
parse

SRL

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args

Слайд 42

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

predicates

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args predicates Bilinear

Слайд 43

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

preds

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args preds Bilinear

Слайд 44

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

preds

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args preds Bilinear

Слайд 45

LISA: Linguistically-Informed Self-Attention

committee

awards

Strickland

advanced

optics

who

Nobel

B-ARG0

args

preds

Bilinear

LISA: Linguistically-Informed Self-Attention committee awards Strickland advanced optics who Nobel B-ARG0 args preds Bilinear

Слайд 47

Experimental results

Experimental results

Слайд 48

Experimental results: CoNLL-2005

+2.49 F1

+3.86 F1

+0.9 F1

+2.15 F1

?

94.9 UAS

96.3 UAS

90.3 UAS

93.4 UAS

Experimental results: CoNLL-2005 +2.49 F1 +3.86 F1 +0.9 F1 +2.15 F1 ?

Слайд 49

Experimental results: CoNLL-2005

+3.23 F1

+2.46 F1

96.5 UAS!

Experimental results: CoNLL-2005 +3.23 F1 +2.46 F1 96.5 UAS!

Слайд 50

Experimental results: Validation

Experimental results: Validation

Слайд 51

Experimental results: Analysis

Experimental results: Analysis

Слайд 52

Experimental results: Analysis

boundary mistakes

Experimental results: Analysis boundary mistakes

Слайд 53

Summary

LISA: Multi-task learning + multi-head self attention trained to attend to syntactic

Summary LISA: Multi-task learning + multi-head self attention trained to attend to
parents
Achieves state-of-the-art F1 on PropBank SRL
Linguistic structure improves generalization
Fast: encodes sequence only once to predict predicates, parts-of-speech, labeled dependency parse, SRL
Everyone wants to run NLP on the entire web:
accuracy, robustness, computational efficiency.

Thank you!

https://github.com/strubell/LISA

Models & Code:

I am on the academic job market this spring!

Слайд 54

Experimental results: CoNLL-2005

Gold predicates; GloVe embeddings

WSJ Test (in-domain):

Brown Test (out-of-domain):

Experimental results: CoNLL-2005 Gold predicates; GloVe embeddings WSJ Test (in-domain): Brown Test (out-of-domain):

Слайд 55

Experimental results: CoNLL-2012

Predicted predicates

Experimental results: CoNLL-2012 Predicted predicates

Слайд 56

Experimental results: Analysis

Experimental results: Analysis

Слайд 57

Experimental results: Analysis

Experimental results: Analysis