W5.1 Evaluation

Слайд 2

Evaluation

Why?
What?
How?
Measures
Training and test data
Significance

Evaluation Why? What? How? Measures Training and test data Significance

Слайд 3

Confusion matrix

Confusion matrix

Слайд 4

Two classes

Two classes: T/F, Positive/Negative

Two classes Two classes: T/F, Positive/Negative

Слайд 5

Two classes

Two classes: T/F, Positive/Negative

Two classes Two classes: T/F, Positive/Negative

Слайд 6

Two class measures

True positive / false positive / true negative / false

Two class measures True positive / false positive / true negative /
negative
Accuracy (TP+TN) /(P+N)
Error rate (FP+FN) / (P+N)
Sensitivity TP / P
Specificity TN / N
Precision TP / (TP + FP)
Recall TP / P
F-score (2 * precision * recall)/(precision + recall)

Слайд 7

Multi-class measures?

True positive / false positive / true negative / false negative
Accuracy (TP+TN)

Multi-class measures? True positive / false positive / true negative / false
/(P+N)
Error rate (FP+FN) / (P+N)
Sensitivity TP / P
Specificity TN / N
Precision TP / (TP + FP)
Recall TP / P
F-score (2 * precision * recall)/(precision + recall)

Слайд 8

Evaluation

Why?
What?
How?
Measures
Training and test data
Significance

Evaluation Why? What? How? Measures Training and test data Significance

Слайд 9

Training en test data 1: same data for training en testing

Bad idea

Training en test data 1: same data for training en testing Bad idea => why?
=> why?

Слайд 10

Training en test data 2: holdout / percentage split

Complete data set

Randomly

Training en test data 2: holdout / percentage split Complete data set
select x% as test data

Risk?
Atypical test set

Слайд 11

Training en test data 3: k-fold cross-validation

Complete data set

Fold 1:

Fold 2:

Training en test data 3: k-fold cross-validation Complete data set Fold 1:

Fold 3:

Fold 4:

Fold 5:

Average results over folds

Слайд 12

More cross-validation

Leave-one-out
Stratified cross-validation

More cross-validation Leave-one-out Stratified cross-validation

Слайд 13

Evaluation

Why?
What?
How?
Measures
Training and test data
Significance

Evaluation Why? What? How? Measures Training and test data Significance

Слайд 14

Method M1 significantly better than M2?

10-fold cross-validation => n=10
Paired t-test
H0: performance M1

Method M1 significantly better than M2? 10-fold cross-validation => n=10 Paired t-test
same as M2
H1: performance M1 differs from M2

Слайд 16

Other aspects of performance
Efficiency
Scalability
Robustness
Interpretability

Other aspects of performance Efficiency Scalability Robustness Interpretability
Имя файла: W5.1-Evaluation.pptx
Количество просмотров: 21
Количество скачиваний: 0