TAR vs. Keyword Search in eDiscovery: Which Saves More Money?

The Honest Answer First

TAR — technology-assisted review, also called predictive coding — saves significant money in large, complex matters. It often doesn't in small ones. The crossover point, roughly speaking, is around 75,000–100,000 documents. Below that threshold, the setup costs and validation requirements of TAR can exceed the savings. Above it, TAR's cost advantage becomes increasingly dramatic.

Here's what that looks like in practice:

Cost comparison between TAR and keyword review by matter size
Matter Size	Keyword Review Cost	TAR Cost	Winner
25,000 docs	$1,500–$3,000	$2,500–$5,000	Keywords
75,000 docs	$4,500–$9,000	$3,750–$7,500	TAR (slight edge)
200,000 docs	$12,000–$24,000	$4,000–$8,000	TAR (clear win)
1,000,000 docs	$60,000–$120,000	$8,000–$20,000	TAR (decisive)

How Each Method Works

Traditional Method

Keyword Search Culling

Legal team develops search terms with counsel
Terms run against full dataset
All "hits" enter linear review queue
Contract attorneys review every document
Responsive/non-responsive/privileged calls made per doc
High recall risk if terms are under-inclusive
High cost if terms are over-inclusive

Technology-Assisted Review

TAR / Predictive Coding

Small seed set of documents reviewed by senior attorney
Machine learning model trained on seed set decisions
Model scores entire corpus for relevance probability
Review focused on high-probability documents
Continuous active learning refines model iteratively
Statistical validation confirms recall targets are met
Documentation trail supports defensibility

TAR 1.0 vs. TAR 2.0: The Practical Difference

The eDiscovery industry distinguishes between two generations of TAR, and the cost difference between them is significant enough to matter in your vendor conversations.

TAR 1.0 (also called Simple Passive Learning or SPL) requires a large, carefully constructed seed set reviewed by senior attorneys before the model trains. It's effective but front-loaded — the upfront attorney time is substantial, which erodes cost savings on smaller matters.

TAR 2.0 (Continuous Active Learning, or CAL) trains the model continuously as reviewers code documents during the normal review workflow. There's no separate seed-set phase. The model improves in real time, and the review team doesn't need to do anything differently. This is the dominant modern approach and the one worth requesting from vendors.

When evaluating vendor platforms, ask specifically whether they support CAL-based TAR 2.0. Several major platforms still default to TAR 1.0 workflows unless you ask otherwise.

The Step Nobody Skips: Pre-Review Culling

Before either keyword review or TAR begins, there are culling steps that cost almost nothing but can reduce your reviewable population by 20–40%. These should be standard practice on every matter regardless of review method:

Date range filtering — eliminate documents clearly outside the relevant period
Custodian filtering — if only 8 of 40 custodians are relevant, process only their data
Near-duplicate detection — collapses clusters of near-identical documents, reviewer sees one representative copy
Email thread suppression — review only the final email in a thread; earlier messages are already captured
System file exclusion — remove operating system files, executables, and known non-responsive file types via NIST NSRL

Proper culling before review — regardless of method — consistently produces 25–40% volume reductions at minimal cost. On a 200,000-document corpus, that's 50,000–80,000 fewer documents entering the review queue before TAR or keywords even start.

The Defensibility Question

Courts have broadly accepted TAR as a defensible review methodology, with significant case law establishing that TAR is at minimum as defensible as keyword search when properly validated. The key requirements are documentation: you need to be able to demonstrate what your recall targets were, how you validated against them, and that your process was reasonable.

The practical implication: TAR requires more upfront documentation than keyword review. This is a one-time cost per matter, and it's worth it — but factor it into your cost comparison. A TAR protocol that can't be explained to opposing counsel or a court isn't defensible regardless of how good the technology is.

Decision Framework: When to Use Each Method

When to use TAR versus keyword review based on matter characteristics
Use Keywords When…	Use TAR When…
Matter has fewer than 75,000 documents	Matter has more than 75,000–100,000 documents
Issues are narrow and well-defined	Issues are broad or evolving during review
Timeline is extremely compressed	Budget is the primary constraint
Terms are highly specific (product codes, account numbers)	Relevant documents use varied, non-predictable language
Small team, limited TAR experience	Complex, multi-issue litigation

The most sophisticated legal teams don't choose one method permanently. They evaluate each matter independently using a framework like the one above, and they build the capability to use both. Vendors who tell you TAR is always better — or always worse — are selling a product, not giving advice.

The Honest Answer First

How Each Method Works

Keyword Search Culling

TAR / Predictive Coding

TAR 1.0 vs. TAR 2.0: The Practical Difference

The Step Nobody Skips: Pre-Review Culling

The Defensibility Question

Decision Framework: When to Use Each Method

More From eDiscovery Insider

eDiscovery Cost Benchmarks

Ghost Billing: The Silent Budget Drain

How to Issue a Defensible Litigation Hold