Skip to main content
A Better Partnership


Jan 2019
January 24, 2019

Technology: Utilizing Data Analytics to Improve Document Review

It’s becoming harder to recall the time before computers, email, the Internet, smartphones and social media. The images are sepia toned . . . rotary phones, handwritten letters, encyclopedias, “friending” someone by meeting them in person. Whether we long for those days or never experienced them, one thing is sure — they’re not likely to return. However, technology isn’t inherently good or bad — it’s a tool. Used properly, it can help us be more effective and efficient. That’s particularly true when it comes to litigation, especially discovery.

In the Information Age we live in, data is created every day in mind-blowing volume. 2.5 quintillion bytes per day according to one estimate (a quintillion is the number one followed by 18 zeros). To gain an appreciation for the immensity of this volume, consider that if it took one second to create one byte of data, it would take almost 80 billion years (six times the age of the universe) to create 2.5 quintillion bytes. This is happening every day... and the rate will only increase over time as more and more devices create data — from Fitbits to refrigerators.

Be they a Fortune 500 enterprise or a local “mom and pop,” businesses create and store massive volumes of data. When litigation arises, all of this data becomes potentially discoverable. Sifting through this data manually just isn’t possible.

Special software tools and techniques have been developed to collect, cull, identify and produce the relevant information. Here are just a few that we routinely use to significantly reduce the time and expense of litigation discovery:

DE-DUPLICATING – In most organizations, more than one copy of an electronic record is stored in the system. A good example is email. An email sent company-wide can end up being stored in each email user’s account. Not all copies of that email need to be reviewed and produced. We use eDiscovery software that can identify multiple copies of the same email — and keep track of who has received and kept the email, but only include one copy of the email in the review process.

DOMAIN PARSING – Like it or not, we all receive lots of junk email, even at work — some of it isn’t junk, either. It’s shopping — or hobby-related email or newsletters that come from using our work email to conduct personal business. It is highly unlikely that emails from Amazon or ESPN or the local running club newsletter will contain relevant information. We use software that can identify every email domain, the “” portion of an email address, in a data collection. We then review those domain names and exclude all records from those that are unlikely to contain relevant information.

EMAIL THREADING – An email thread starts with the original email sent by the author. After that email is sent, it can be replied to and/or forwarded numerous times. We use eDiscovery software that can track these threads and identify which threads contain unique information. Only those threads that contain unique information, called “inclusive threads” will be included in the review process. As an example, assume an email has multiple replies, but all the replies fall within the same thread (each reply is added to the last reply in sequence). In this instance, only one email will be included in the review process because the earlier iterations of the email are redundant. Now assume someone in the thread forwarded the email to another recipient and that recipient replied back to the group. Two email threads would go into the review process: the email from the first example plus the forwarded email along with the reply.

KEYWORD SEARCH – Not all documents maintained by an organization or individual will be relevant to every dispute. Keyword searches may be used to cull the universe of potentially relevant documents. To determine what words or phrases would most likely appear in documents relevant to the subject matter of the dispute, we interview the people in the organization most knowledgeable about the dispute — the “subject matter experts.” We develop a list of “keyword searches” from these interviews and import them into our eDiscovery software and have it identify the records containing those keywords and phrases. We then check to make sure the searches returned the records we would expect and fine-tune the searches as needed. Once comfortable with the keyword searches, only those documents containing the keywords are included in the review process.

DE-NISTING – NIST refers to the National Institute of Standards and Technology. NIST maintains a list of non-user created files such as executable files, that is, files used to execute computer programs, Windows system and help files, and font files. These files exist on every computer and in every network, but typically have no evidentiary value. We use eDiscovery software that compares data collected for review against this list and remove those files from the review process beforehand.

TECHNOLOGY-ASSISTED REVIEW (TAR) – TAR goes by other names, like Computer-Assisted Review, Machine-Assisted Review and, most notably, predictive coding. TAR refers to a process where a computer algorithm “learns” from coding decisions made by human subject matter experts — in this instance, the attorneys most knowledgeable about the case. The algorithm then applies that learning to make coding decisions in the larger review population. In the world of eDiscovery, this technology is only now becoming more prevalent as it becomes more accepted by courts. We have this technology available to us and can use it in multiple ways to reduce review costs depending on a client’s needs. It may be used as the exclusive way to identify relevant documents. It may be used to prioritize the documents most likely to be relevant for human review. Or, it may be used as a check against the coding done during human review. The more heavily the algorithm is relied upon, the less expensive the review process becomes.

There is no “typical case.” The amount of data reduction, and ultimately cost savings, that can be expected from the use of these techniques depends on the type and volume of data collected. However, we can give an example to provide some idea of how powerful these techniques can be.

Let’s assume a fairly simple case where data is collected from one computer with a 100 gigabyte hard drive. The hard drive is full, and all the data on the hard drive is collected. So, we start with 100 gigabytes of potentially relevant data. Here’s what we might expect to see using the techniques we’ve discussed.

Rather than reviewing 100 gigabytes of data, the review population is down to 5 gigabytes — a 95% reduction. The cost savings speak for themselves.

If you would like to see any of these data analytics techniques “in action,” we would be happy to provide a demonstration. Just contact us to arrange a date, time and place convenient for you.

“As corporate datasets have grown larger and larger, technology has endeavored to keep evolving to meet increasing demand for creative and efficient solutions to handle the immense volumes. Support professionals and attorneys involved in all aspects of litigation discovery must be up to date on all of the latest innovations in order to manage this demand.” Ashley Tyler, Attorney, Project Manager

“The “needle in the haystack” challenge in litigation is finding the most critical communications in the ever-increasing volume of business email. Data analytics, especially email threading, enables experienced review teams to efficiently find the documents litigators and clients need to evaluate litigation risk, respond to discovery requests, and prepare personnel to testify at depositions and trials.” Myra Willis, Attorney, Senior Project Manager

“Data analytics are a vital part of an efficient and defensible eDiscovery process. Coupled with a targeted collection performed at the direction of knowledgeable client staff and counsel, the end result is a lean, high-relevance dataset.” Adam Cefai, Attorney, Litigation Support Manager
Learn more by visiting our Data Analytics + eDiscovery Practice Group page.

NOTICE. Although we would like to hear from you, we cannot represent you until we know that doing so will not create a conflict of interest. Also, we cannot treat unsolicited information as confidential. Accordingly, please do not send us any information about any matter that may involve you until you receive a written statement from us that we represent you.

By clicking the ‘ACCEPT’ button, you agree that we may review any information you transmit to us. You recognize that our review of your information, even if you submitted it in a good faith effort to retain us, and even if you consider it confidential, does not preclude us from representing another client directly adverse to you, even in a matter where that information could and will be used against you.

Please click the ‘ACCEPT’ button if you understand and accept the foregoing statement and wish to proceed.



+ -