W5.1 Evaluation

Март 3, 2021

Содержание

2. Evaluation Why? What? How? Measures Training and test data Significance
3. Confusion matrix
4. Two classes Two classes: T/F, Positive/Negative
5. Two classes Two classes: T/F, Positive/Negative
6. Two class measures True positive / false positive / true negative / false negative Accuracy (TP+TN)
7. Multi-class measures? True positive / false positive / true negative / false negative Accuracy (TP+TN) /(P+N)
8. Evaluation Why? What? How? Measures Training and test data Significance
9. Training en test data 1: same data for training en testing Bad idea => why?
10. Training en test data 2: holdout / percentage split Complete data set Randomly select x% as
11. Training en test data 3: k-fold cross-validation Complete data set Fold 1: Fold 2: Fold 3:
12. More cross-validation Leave-one-out Stratified cross-validation
13. Evaluation Why? What? How? Measures Training and test data Significance
14. Method M1 significantly better than M2? 10-fold cross-validation => n=10 Paired t-test H0: performance M1 same
16. Other aspects of performance Efficiency Scalability Robustness Interpretability
18. Скачать презентацию

Слайд 2

Evaluation
Why?
What?
How?
Measures
Training and test data
Significance

Evaluation Why? What? How? Measures Training and test data Significance

Слайд 3

Confusion matrix

Confusion matrix

Слайд 4

Two classes
Two classes: T/F, Positive/Negative

Two classes Two classes: T/F, Positive/Negative

Слайд 5

Two classes
Two classes: T/F, Positive/Negative

Two classes Two classes: T/F, Positive/Negative

Слайд 6

Two class measures
True positive / false positive / true negative / false

Two class measures True positive / false positive / true negative /

negative
Accuracy (TP+TN) /(P+N)
Error rate (FP+FN) / (P+N)
Sensitivity TP / P
Specificity TN / N
Precision TP / (TP + FP)
Recall TP / P
F-score (2 * precision * recall)/(precision + recall)

Слайд 7

Multi-class measures?
True positive / false positive / true negative / false negative
Accuracy (TP+TN)

Multi-class measures? True positive / false positive / true negative / false

/(P+N)
Error rate (FP+FN) / (P+N)
Sensitivity TP / P
Specificity TN / N
Precision TP / (TP + FP)
Recall TP / P
F-score (2 * precision * recall)/(precision + recall)

Слайд 8

Evaluation
Why?
What?
How?
Measures
Training and test data
Significance

Evaluation Why? What? How? Measures Training and test data Significance

Слайд 9

Training en test data 1: same data for training en testing
Bad idea

Training en test data 1: same data for training en testing Bad idea => why?

=> why?

Слайд 10

Training en test data 2: holdout / percentage split
Complete data set
Randomly

Training en test data 2: holdout / percentage split Complete data set

select x% as test data

Risk?
Atypical test set

Слайд 11

Training en test data 3: k-fold cross-validation
Complete data set
Fold 1:
Fold 2:

Training en test data 3: k-fold cross-validation Complete data set Fold 1:

Fold 3:

Fold 4:

Fold 5:

Average results over folds

Слайд 12

More cross-validation
Leave-one-out
Stratified cross-validation

More cross-validation Leave-one-out Stratified cross-validation

Слайд 13

Evaluation
Why?
What?
How?
Measures
Training and test data
Significance

Evaluation Why? What? How? Measures Training and test data Significance

Слайд 14

Method M1 significantly better than M2?
10-fold cross-validation => n=10
Paired t-test
H0: performance M1

Method M1 significantly better than M2? 10-fold cross-validation => n=10 Paired t-test

same as M2
H1: performance M1 differs from M2

Слайд 15

Слайд 16

Other aspects of performance
Efficiency
Scalability
Robustness
Interpretability

Other aspects of performance Efficiency Scalability Robustness Interpretability