Outside Publication

Discovery Collection Strategy: A Hybrid Approach, The Legal Intelligencer

January 30, 2013

Reprinted with permission from the January 30, 2013 edition of The Legal Intelligencer© 2013 ALM media Properties, LLC. All rights reserved. Further duplication without permission is prohibited. For information, contact 877-257-3382 or reprints@alm.com or visit www.almreprints.com.

As in-house counsel at a large company, you just received discovery requests from opposing counsel in one of your active matters. You speak with outside counsel and decide it is time to search and collect email and documents from 20 employees and executives who are likely to have relevant information. After talking with your CIO, you have a basic understanding of the company's IT landscape and know that each of the 20 "custodians" is likely to have data on his or her hard drive and in his or her mailbox, and possibly on network share drives and applications. You consider your collection options: Send an email to all 20 custodians asking them to search for and email responsive documents to you? Hire a discovery vendor to come on-site and make sweeping copies of all those potential data sources? Or is there a better way?

Does this scene sound familiar? Collections can seem daunting, especially given the explosion of data sources and the proliferation of information that continues to accelerate dramatically. The questions can be equally daunting. What data should be collected? You know the volume of potential data - and the corresponding costs of processing, hosting, reviewing and producing that data - could be staggering. Can you devise a collection strategy that limits the volume of data collected without running afoul of your discovery obligations? In other words, can you conduct collections defensibly without breaking the bank?

The answer is yes.

The type and scope of collection is a crucial step in the discovery process, as it dictates data processing, hosting, attorney review and production costs. It is important to collect only the potentially responsive documents, to the extent possible, in order to reduce overall discovery costs. The collection stage is also crucial because it determines the defensibility of your discovery protocol, and not all collections are equally defensible.

Traditional Collection Method

There is a preference, perhaps out of habit, for attorneys to make a broad collection of the data being preserved in a given matter, load it into a processing and review platform and have a team of attorneys review the documents to identify a subset of responsive documents for production to the opposing party. Ten years ago, this approach was common — and even acceptable from a cost perspective — because the average size of a hard drive was 8 GB. Today, the average size of a hard drive may be 500 GB to one TB or more. Custodians may also have iPads, external flash drives and personal network drives containing responsive data. Conducting broad collections may result in the collection of a large volume of nonresponsive data that must be filtered out in order to get to the subset of responsive data that needs to be produced to opposing counsel. The costs associated with processing, hosting, reviewing, and producing the data can be staggering, even as the price for these services on a per unit basis has declined.

There are, however, benefits to broad collections: You will only need to disrupt the work schedules of the 20 custodians once — at the time that the broad collections and images of the hard drives are taken — then the full data sets will be available if the scope of the original discovery request changes. Furthermore, the risk of spoliation is reduced because preservation of the data collected is no longer in the hands of the custodians. There are also matters in which full images are necessary (e.g., where custodian involvement is not practical (high-ranking employee with limited availability), where the employee is suspected of wrongdoing, or where the data belonged to a separated employee for whom only an image or a full hard drive is available. Most matters do not require such sweeping collections for every custodian; broad collections can often be confined to a few key custodians.

Targeted Collections

Many attorneys steer clear of a targeted collection, despite its advantages in terms of volume and cost reduction, out of concern that relevant documents may be missed. Such a concern is generally outweighed by the potential cost savings of targeted collections, especially set against the increasing focus by courts on cooperation and proportionality.

Targeted collections can reduce the overall costs of processing, hosting, reviewing, and producing data, and also enhance the efficiency of the document review (historically the most expensive component of discovery) by capturing a richer set of documents than is collected during a broad collection. Targeted collections also reduce cycle time from collection to production by reducing processing times and time spent reviewing nonrelevant material.

The core requirements for conducting defensible targeted collections are the guidance of an experienced attorney and documentation of both the plan and its execution. A targeted collection generally involves an interview with a custodian during which the custodian discusses his or her document-filing practices and identifies potentially responsive documents for collection. Only the identified files are collected, whether from the custodian's hard drive or external drive or network drive. The documents, folders and locations are memorialized along with interview notes, which serve as documentation of the collection.

There is, however, a downside to targeted collections: If the scope of the original request changes (i.e., there is a change in search terms) and/or discovery of new facts or other events change the scope of discovery, follow-up collections will need to be conducted from all 20 custodians.


Some attorneys prefer that custodians self-select the potentially responsive documents for collection. Typically, you provide the custodians with a short description of the matter and scope of discovery and ask them to select any files that reside on their drives or in their mailboxes. As with broad collections, this option also causes minimal disruption to the custodians' workdays in that they can self-select the files at their leisure. There is, however, always a subset of custodians who drag their feet on collecting their documents and cause delays in discovery schedules. And while self-selection is often the favored collection method because it is low cost, if metadata is not preserved during the self-collections and/or if the custodians are not thorough in the selection of documents, follow-up collections will need to be conducted. These collections often result in incomplete data sets and inconsistent results across custodial collections given the lack of attorney oversight. Depending on the nuances of a given matter, the defensibility of self-selection is a hard sell to opposing counsel or courts.

Hybrid Approach

In an ideal world, every potentially responsive document for your matter would be preserved. You would develop a defensible, repeatable narrow collection strategy that would allow you to reduce costs by collecting only the most responsive data sets from the 20 custodians. Your opposing counsel would agree to your strategy and productions would roll out the door without issue.

Discovery is rarely that easy. You will have a hybrid of collection types: a set of high-ranking employees traveling overseas or with busy schedules who require full images instead of targeted collection, and then there are other employees who have the availability to sit with an experienced attorney to identify potentially responsive materials during attorney-guided targeted collections. It is vital to work closely with attorneys with knowledge of collection and review strategies. This will allow you to survey your company's data landscape and create a well-designed collection strategy. While in the past broad collections caused review costs to skyrocket because attorneys had to sift through large volumes of data, most discovery providers now have a collection of analytical tools to reduce the amount of manual work required. Techniques include concept grouping, near-duplicate identification, email threading and suppression, and technology-assisted review/predictive coding. These tools can help identify relevant data and eliminate irrelevant data quickly, and also reduce the review costs if full images are required in your matter.

Once you develop a collection strategy that gets to the relevant data and avoids over-collection, share the protocol with opposing counsel. Given the skyrocketing costs of discovery, courts are emphasizing and fostering the concepts of cooperation and proportionality. To fill in the gaps between the Federal Rules of Civil Procedure and practice, courts have begun taking matters into their own hands, developing their own discovery best practices and memorializing the best practices in model orders and guidelines. These orders and guidelines should give confidence to attorneys that a well-designed and proportional collection strategy that is shared with opposing party will be approved by the court.