Outside Publication

Using Lean Six Sigma and Predictive Coding to Confront Volume Problem, Legal Intelligencer

October 25, 2011

Traditional document review in the age of e-discovery is reaching the point of infeasibility. Setting hordes of attorneys in front of computer screens to review and code millions (sometimes billions) of records is not only prohibitively expensive, but often results in errors and inconsistent quality. At Morgan Lewis, the eData team is leveraging the combination of predictive coding and Lean Six Sigma techniques to offer clients higher-quality, lower-cost document review and thus a promising solution to the volume problem.

Predictive Coding

In order to reduce discovery costs, our focus is on defensible ways to reduce the volume of documents that require human review while maintaining (or even improving) the accuracy rates of those reviews. In most litigation, a lot of electronically stored information (ESI) is collected and pushed through the e-discovery pipeline until it lands at the most expensive part of the process: attorney review. Predictive coding is an innovative tool that can help reduce the cost and also increase the accuracy of human document review by leveraging technology to reduce data volumes that require attorney review while enhancing the speed and quality of the review.

The current industry standard is to use key words, deduplication and similar objective culling criteria to reduce the volume of data and then to perform a linear human review of any records that remain. Predictive coding can eliminate, escalate, categorize and prioritize records for review, thus decreasing data volumes and enhancing human review.

Key features of predictive coding are seed sets and iterations. First, the legal team creates a seed set composed of a rich, precise set of responsive records. Learning from the seed set, our analytics tool analyzes the rest of the collection to identify similar records and assigns the newly identified records a score reflecting the likelihood of similarity to the seed set. Attorneys approve or reject the newly identified documents and the tool uses that feedback to repeat the training process. This validation and retraining cycle repeats itself until there are no further computer-suggested records left in the collection that meet the standards of the seed set. After the retraining is complete, attorneys review and code the smaller universe of potentially responsive documents. As a result, predictive coding pushes "like" documents up the review queue to ensure a more productive review. A statistical sampling of the nonresponsive documents is performed to validate that the records left are in fact nonresponsive. If the sample passes the statistical test for responsiveness, the review can be considered complete, saving the costs associated with reviewing the nonresponsive documents. Predictive coding can also be deployed for quality control purposes. For example, at the end of a document review, the tool can use a seed set of privileged documents to test against the entire document population. Any computer-suggested documents identified by the tool will include documents like the seed set. The review team can check those computer-identified documents to ensure that all privileged documents were in fact marked privileged.

Although predictive coding and the sophisticated algorithms behind it are not new technologies, the process of using this technology for discovery is novel. Furthermore, since the current standard of using search terms and linear human review has proven to be unreliable, it is only a matter of time before a compelling defensibility argument on the superiority of predictive coding in a review workflow will be made. Indeed, that is why Morgan Lewis is using Lean Six Sigma techniques in combination with predictive coding to develop metrics and standards to demonstrate the superiority and thus the defensibility of this approach.

Lean Six Sigma

Lean Six Sigma is a popular process-improvement methodology that combines two known business strategies, Six Sigma system and lean manufacturing. The purpose behind Six Sigma is to identify and remove defects in a production process while lean manufacturing is a business strategy that focuses on increasing value with less work. The combination results in Lean Six Sigma, a methodology that focuses on metrics and methods to improve a process by identifying and removing the causes of errors and minimizing variability in the process. Applying these principles to the traditional document-review process, Morgan Lewis is leveraging this methodology to develop metrics and standards that document the process improvements of using predictive coding instead of linear attorney review. Our ultimate goal is to decrease the data volume requiring human review while increasing the quality of the review.

At the heart of the Lean Six Sigma methodology is the development of metrics and standards that reflect the extent to which the process eliminates waste, increases efficiency, and improves productivity. The metrics must empirically show the extent of the actual improvement by measuring the process before and after the improvement occurs. The eData team at Morgan Lewis has collected and analyzed a large amount of data from past human document reviews to establish metrics that are used to compare the increase in the efficiency and accuracy of document reviews using predictive coding.

We leverage a monitoring tool that measures the current process of the linear human review of documents using culling techniques such as key words, deduplication and similar objective culling criteria. Based on our past and current projects, a comprehensive baseline has been created that measures manual and automated data points such as review rate, accuracy rate and total cost savings. These baseline metrics for a linear review can be compared to the metrics associated with a predictive coding-assisted human review. The data points can be exported into reports to reflect the accuracy rates associated with the reviews. As an example, one data point measures the accuracy rate of relevancy reviews. This measurement allows us to compare the accuracy rate of a linear human document review to the accuracy rate of the iterations using our predictive coding tool, in particular, the percentage of relevant versus nonrelevant documents identified by the predictive coding tool. Our metrics reflect an improvement in the accuracy rates when predictive coding is used to enhance a linear document review since the tool pushes "like" documents into the review queue, allowing reviewers to code similar documents for relevancy at the same time.

As more projects incorporate predictive coding into their workflow, the baseline metrics will become even more comprehensive and reliable as an indication of the benefits and improvements that the technology offers to the process. The team will be able to show from its own measurements that predictive coding is a tool that can be used to enhance traditional document reviews by improving the accuracy rate of human review. In addition to the improvement in quality, predictive coding also confronts the volume problem by reducing the volume of documents requiring human review, thus reducing the cost of discovery. This is vital in the face of spiraling e-discovery costs, making it even more important to have an automated process within a document review. Lean Six Sigma measurements provide transparent and comprehensive metrics to show the improvements made by utilizing predictive coding in a document review, and should also provide a compelling defensibility argument for using predictive coding.

Six Sigma is a registered service mark and trademark of Motorola, Inc. who originally developed it in 1986. In recent years, Six Sigma ideas were combined with lean manufacturing concepts to create Lean Six Sigma.