The E-Discovery Games

Содержание

Слайд 2

Dave Lewis, Ph.D.

President, David D. Lewis Consulting
Co-founder TREC Legal Track
Testifying expert

Dave Lewis, Ph.D. President, David D. Lewis Consulting Co-founder TREC Legal Track
in Kleen Products, LLC, et al. v. Packaging Corp. of America, et al
Fellow of the American Association for the Advancement of Science
75+ publications; 8 patents in:
e-discovery
information retrieval
machine learning
natural language processing
applied statistics
Past research positions: University of Chicago, Bell Labs, AT&T Labs
http://www.DavidDLewis.com


Слайд 3

Kara M. Kirkeby, Esq.

Manager of Document Review Services for Kroll Ontrack
Previously managed

Kara M. Kirkeby, Esq. Manager of Document Review Services for Kroll Ontrack
document reviews on complex matters for a large law firm
Member: Minnesota State Bar Association (Civil Litigation Section), the Hennepin County Bar Association, the American Bar Association, Minnesota Women Lawyers (Communications Committee)
Served as a judicial law clerk for Hon. Karen Klein, Magistrate judge of the U.S. District Court of North Dakota
J.D., magna cum laude, Hamline University School of Law
E-mail: [email protected]


Слайд 4

Discussion Overview

What is Technology Assisted Review (TAR)?
Document Evaluation
Putting TAR into Practice
Conclusion


Discussion Overview What is Technology Assisted Review (TAR)? Document Evaluation Putting TAR into Practice Conclusion

Слайд 5

What is Technology Assisted Review?


What is Technology Assisted Review?

Слайд 6

Why Discuss Alternative Document Review Solutions?

Document review is routinely the most

Why Discuss Alternative Document Review Solutions? Document review is routinely the most
expensive part of the discovery process. Saving time and reducing costs will result in satisfied clients.

Traditional/Linear Paper-Based Document Review

Online Review

Technology
Assisted
Review

Слайд 7

Why Discuss Alternative Document Review Solutions?

Conducting a traditional linear document review

Why Discuss Alternative Document Review Solutions? Conducting a traditional linear document review
is not particularly efficient anymore
Focus instead on a relevance driven review process involving lawyers and technology working together

Слайд 8

What Is Technology Assisted Review (TAR)?

Three major technologies:
Supervised learning from manual coding
Sampling

What Is Technology Assisted Review (TAR)? Three major technologies: Supervised learning from
and statistical quality control
Workflow to route documents, capture manual decisions, and tie it all together in a unified process

recall: 85% +/- 4%

precision: 75% +/- 3%

Presented by Dave Lewis

Слайд 9

Supervised Learning: The Backbone of TAR

By iterating supervised learning, you target documents

Supervised Learning: The Backbone of TAR By iterating supervised learning, you target
most likely to be relevant or on topic, creating a virtuous cycle:

Presented by Dave Lewis

Слайд 10

Software learns to imitate human actions
For e-discovery, this means learning of

Software learns to imitate human actions For e-discovery, this means learning of
classifiers by imitating human coding of documents
Any content-based sorting into classes can be imitated
Responsive vs. Non-responsive
Privileged vs. Non-privileged
Topic A vs. Topic B vs. Topic C
Widely used outside e-discovery:
Spam filtering
Computational advertising
Data mining


Supervised Learning: The Backbone of TAR

Presented by Dave Lewis

Слайд 11

Text REtrieval Conference (“TREC”), hosted by National Institute of Standards and Technology

Text REtrieval Conference (“TREC”), hosted by National Institute of Standards and Technology
(“NIST”) since 1992
Evaluations open to academics and industry
TREC Legal Track (since 2006) provides simulated review for responsiveness task
Focus is on comparing technology assisted approaches
Not a human vs. machine bakeoff
Not a product benchmark
However, results suggest advantages to technology assisted review


Research & Development: TREC Legal Track

Presented by Dave Lewis

Слайд 12

High effectiveness of TAR runs
Best T-A runs in TREC 2009 examined 0.5%

High effectiveness of TAR runs Best T-A runs in TREC 2009 examined
to 4.1% of collection while finding an estimated 76.7% of responsive documents with 84.7% precision
Low effectiveness of manual review
Substantial effort needed by TREC organizers to clean up manual review to point it can be used as gold standard
An argument can be made (Grossman & Cormack, 2011) that 2009 data shows TAR results better than pre-cleanup manual review


Research & Development: TREC Legal Track

Presented by Dave Lewis

Слайд 13

Analyze

What is Technology Assisted Review?

Train

START:
Select document set

Identify training set

Knowledgeable human reviewers

Analyze What is Technology Assisted Review? Train START: Select document set Identify
train system by categorizing training set

System learns from training; prioritizes documents and suggests categories

Evaluate

Evaluate machine suggestions

END: Produce documents

Presented by Dave Lewis

Quality control production set

Human reviewers:

Слайд 14

SELECT

Manually review documents for training
Key docs from your side or opponent
Docs

SELECT Manually review documents for training Key docs from your side or
found by searches on key terms
Docs prioritized for review
Random (non-QC) docs
Docs difficult for previous iteration's classifier (active learning)
Effectiveness increases as training set grows

various
docs for training

random docs for QC

manual
review

train classifiers

auto-code documents

compare coding with elite coding on random sample

estimate effectiveness for entire set

review for privilege

PRODUCTION

priority
docs for review

YES

NO


Learning and Classification

Presented by Dave Lewis

Слайд 15

Manually review prioritized documents
Needs of case
Classifier predictions
If classifier is accurate enough, trust

Manually review prioritized documents Needs of case Classifier predictions If classifier is
its call on responsiveness?
Privilege is more sensitive
Manually select some subsets for 100% privilege review
Employ sampling for other subsets
Classifiers can also help identify likely privileged docs


Production

Presented by Dave Lewis

Слайд 16

Any binary classification can be summarized in a 2x2 table
Linear review, automated

Any binary classification can be summarized in a 2x2 table Linear review,
classifier, machine-assisted...
Responsive v. non-responsive, privileged v. non-privileged...
Test on sample of n documents for which we know answer
TP + FP + FN + TN = n


Classification Effectiveness

Presented by Dave Lewis

Слайд 17

True Negatives

False Positives

True Positives

False Negatives

Classifier Says "Yes"

"Yes" is Correct

All Documents


Classification

True Negatives False Positives True Positives False Negatives Classifier Says "Yes" "Yes"
Effectiveness

Presented by Dave Lewis

Слайд 18

Recall = TP / (TP+FN)
Proportion of interesting stuff that the classifier actually

Recall = TP / (TP+FN) Proportion of interesting stuff that the classifier
found
High recall of interest to both producing and receiving party


Classification Effectiveness

Слайд 19

Precision = TP / (TP+FP)
Proportion of stuff found that was actually interesting
High

Precision = TP / (TP+FP) Proportion of stuff found that was actually
precision of particular interest to producing party: cost reduction!


Classification Effectiveness

Слайд 20

Seminal 1985 study by Blair & Maron
Review for documents relevant to 51

Seminal 1985 study by Blair & Maron Review for documents relevant to
requests related to BART crash
Boolean queries used to select documents for review
Process iterated until reviewer satisfied 75% of responsive documents found
Sampling showed recall of less than 20%
B&M has been used to argue for everything from exhaustive manual review to strong AI
Real lesson is about need for sampling!


Research & Development: Blair & Maron

Presented by Dave Lewis

Слайд 21

Want to know effectiveness without manually reviewing everything. So:
Randomly sample the documents
Manually

Want to know effectiveness without manually reviewing everything. So: Randomly sample the
classify the sample
Estimate effectiveness on full set based on sample
Type of estimates:
Point estimate, e.g. F1 is 0.74
Interval estimate, e.g. F1 in [0.67,0.83] with 95% confidence
Sampling is well-understood
Common in expert testimony in range of disciplines


Sampling and Quality Control

Presented by Dave Lewis

Слайд 22

SELECT

various
docs for training

random docs for QC

priority docs for review

manual
review

train

SELECT various docs for training random docs for QC priority docs for
classifiers

auto-code documents

compare coding with elite coding on random sample

good enough to produce?

estimate effectiveness for entire set

review for privilege

PRODUCTION

Manually review random sample for QC
Use best reviewers here
Estimate recall, precision, etc.
Of auto-coding, manual review, or both combined
Estimates used in:
Deciding when finished
Tuning classifiers (and managing reviewers)
Defensibility
Auto-coding can also be used to find likely mistakes (not shown)

YES

NO


Sampling and Quality Control

Presented by Dave Lewis

Слайд 23


Putting TAR into Practice

Putting TAR into Practice

Слайд 24

Barriers to Widespread Adoption

Industry-wide concern: Is it defensible?
Concern arises from misconceptions

Barriers to Widespread Adoption Industry-wide concern: Is it defensible? Concern arises from
about how the technology works in practice
Belief that technology is devoid of any human interaction or oversight
Confusing “smart” technologies with older technologies such as concept clustering or topic grouping
Limited understanding of underlying “black box” technology
Largest barrier: Uncertainty over judicial acceptance of this approach
Limited commentary from the bench in the form of a court opinion
Fear of being the judiciary’s “guinea pig”

Слайд 25

Developing TAR Case Law

Da Silva Moore v. Publicis Groupe
Class-action suit: parties agreed

Developing TAR Case Law Da Silva Moore v. Publicis Groupe Class-action suit:
on a protocol signed by the court
Peck ordered more seeding reviews between the parties
“Counsel no longer have to worry about being the first ‘guinea pig’ for judicial acceptance of computer-assisted review … [TAR] can now be considered judicially approved for use in appropriate cases.”
Approximately 2 weeks after Peck’s Da Silva Moore opinion, District Court Judge Andrew L. Carter granted plaintiff opportunity to submit supplemental objections
Plaintiff later sought to recuse Judge Peck from the case
Stay tuned for more….

Слайд 26

Developing TAR Case Law

Kleen Products v. Packaging Corporation of America
Defendants had completed

Developing TAR Case Law Kleen Products v. Packaging Corporation of America Defendants
99% of review, Plaintiffs argue that they should use Predictive Coding and start document review over
Not clear whether Defendants did more than keyword search
Other notable points from Kleen Products
Defendants assert they were testing their keyword search queries, not just guessing
Argue they did not use Predictive Coding because it did not exist yet
Stay tuned for more….

Слайд 27

Technology Assisted Review: What It Will Not Do

Will not replace or mimic

Technology Assisted Review: What It Will Not Do Will not replace or
the nuanced expert judgment of experienced attorneys with advanced knowledge of the case
Will not eliminate the need to perform validation and QC steps to ensure accuracy
Will not provide a magic button that will totally automate document review as we know it today

Слайд 28

Technology Assisted Review: What It Can Do

Reduce:
Time required for document review

Technology Assisted Review: What It Can Do Reduce: Time required for document
and administration
Number of documents to review; if you choose an automated categorization or prioritization function
Reliance on contract reviewers or less experienced attorneys
Leverage expertise of experienced attorneys
Increase accuracy and consistency of category decisions (vs. unaided human review)
Identify the most important documents more quickly

Слайд 29

TAR Accuracy

TAR must be as accurate as a traditional review
Studies show that

TAR Accuracy TAR must be as accurate as a traditional review Studies
computer-aided review is as effective as a manual review (if not more so)
Remember: Court standard is reasonableness, not perfection:
“[T]he idea is not to make it perfect, it’s not going to be perfect. The idea is to make it significantly better than the alternative without as much cost.”

-U.S. Magistrate Judge Andrew Peck in Da Silva Moore

Слайд 30


What is Intelligent Review Technology (IRT) by Kroll Ontrack?

Intelligent Prioritization

Intelligent Categorization

Automated

What is Intelligent Review Technology (IRT) by Kroll Ontrack? Intelligent Prioritization Intelligent
Workflow

Reviewing Efficiently, Defensibly & Accurately

Augments the human-intensive document review process to conduct faster and cheaper discovery

Слайд 31

Cut off review after prioritization of documents showed marginal return of responsive

Cut off review after prioritization of documents showed marginal return of responsive
documents for specific number of days
Cut off review of a custodian when, based on prioritization statistics that showed only non-responsive documents remained
Used suggested categorizations to validate human categorizations
Used suggested categorizations to segregate documents as non-responsive at >75% confidence level. After sampling that set, customer found less than .5% were actually responsive (and only marginally so). Review was cut off for that set of documents
Used suggested categorizations to segregate categories suggested as privilege and responsive at >80% confidence. Sampled, mass categorized
Use suggested categorizations to mass categorize documents and move them to the QC stage, by-passing first-level review
Used suggested categorizations to find documents on a new issue category when review was nearing completion


Successes in the Field: Kroll Ontrack’s IRT

Слайд 32

Successes in the Field: Kroll Ontrack’s IRT


Successes in the Field: Kroll Ontrack’s IRT

Слайд 33


Conclusion

Conclusion

Слайд 34


Parting Thoughts

Automated review technology helps lawyers focus on resolution – not

Parting Thoughts Automated review technology helps lawyers focus on resolution – not
discovery – through available metrics
Complements human review, but will not replace the need for skillful human analysis and advocacy
We are on the cusp of full-bore judicial discussion of Automated Review Technologies
Closely monitor judicial opinions for breakthroughs
Follow existing best practices for reasonableness and defensibility
Not all Technology Assisted Review solutions are created equal
Thoroughly vet the technology before adopting
Имя файла: The-E-Discovery-Games.pptx
Количество просмотров: 47
Количество скачиваний: 0