The Honest Answer First
TAR — technology-assisted review, also called predictive coding — saves significant money in large, complex matters. It often doesn't in small ones. The crossover point, roughly speaking, is around 75,000–100,000 documents. Below that threshold, the setup costs and validation requirements of TAR can exceed the savings. Above it, TAR's cost advantage becomes increasingly dramatic.
Here's what that looks like in practice:
| Matter Size | Keyword Review Cost | TAR Cost | Winner |
|---|---|---|---|
| 25,000 docs | $1,500–$3,000 | $2,500–$5,000 | Keywords |
| 75,000 docs | $4,500–$9,000 | $3,750–$7,500 | TAR (slight edge) |
| 200,000 docs | $12,000–$24,000 | $4,000–$8,000 | TAR (clear win) |
| 1,000,000 docs | $60,000–$120,000 | $8,000–$20,000 | TAR (decisive) |
How Each Method Works
Keyword Search Culling
- Legal team develops search terms with counsel
- Terms run against full dataset
- All "hits" enter linear review queue
- Contract attorneys review every document
- Responsive/non-responsive/privileged calls made per doc
- High recall risk if terms are under-inclusive
- High cost if terms are over-inclusive
TAR / Predictive Coding
- Small seed set of documents reviewed by senior attorney
- Machine learning model trained on seed set decisions
- Model scores entire corpus for relevance probability
- Review focused on high-probability documents
- Continuous active learning refines model iteratively
- Statistical validation confirms recall targets are met
- Documentation trail supports defensibility
TAR 1.0 vs. TAR 2.0: The Practical Difference
The eDiscovery industry distinguishes between two generations of TAR, and the cost difference between them is significant enough to matter in your vendor conversations.
TAR 1.0 (also called Simple Passive Learning or SPL) requires a large, carefully constructed seed set reviewed by senior attorneys before the model trains. It's effective but front-loaded — the upfront attorney time is substantial, which erodes cost savings on smaller matters.
TAR 2.0 (Continuous Active Learning, or CAL) trains the model continuously as reviewers code documents during the normal review workflow. There's no separate seed-set phase. The model improves in real time, and the review team doesn't need to do anything differently. This is the dominant modern approach and the one worth requesting from vendors.
When evaluating vendor platforms, ask specifically whether they support CAL-based TAR 2.0. Several major platforms still default to TAR 1.0 workflows unless you ask otherwise.
The Step Nobody Skips: Pre-Review Culling
Before either keyword review or TAR begins, there are culling steps that cost almost nothing but can reduce your reviewable population by 20–40%. These should be standard practice on every matter regardless of review method:
- Date range filtering — eliminate documents clearly outside the relevant period
- Custodian filtering — if only 8 of 40 custodians are relevant, process only their data
- Near-duplicate detection — collapses clusters of near-identical documents, reviewer sees one representative copy
- Email thread suppression — review only the final email in a thread; earlier messages are already captured
- System file exclusion — remove operating system files, executables, and known non-responsive file types via NIST NSRL
Proper culling before review — regardless of method — consistently produces 25–40% volume reductions at minimal cost. On a 200,000-document corpus, that's 50,000–80,000 fewer documents entering the review queue before TAR or keywords even start.
The Defensibility Question
Courts have broadly accepted TAR as a defensible review methodology, with significant case law establishing that TAR is at minimum as defensible as keyword search when properly validated. The key requirements are documentation: you need to be able to demonstrate what your recall targets were, how you validated against them, and that your process was reasonable.
The practical implication: TAR requires more upfront documentation than keyword review. This is a one-time cost per matter, and it's worth it — but factor it into your cost comparison. A TAR protocol that can't be explained to opposing counsel or a court isn't defensible regardless of how good the technology is.
Decision Framework: When to Use Each Method
| Use Keywords When… | Use TAR When… |
|---|---|
| Matter has fewer than 75,000 documents | Matter has more than 75,000–100,000 documents |
| Issues are narrow and well-defined | Issues are broad or evolving during review |
| Timeline is extremely compressed | Budget is the primary constraint |
| Terms are highly specific (product codes, account numbers) | Relevant documents use varied, non-predictable language |
| Small team, limited TAR experience | Complex, multi-issue litigation |
The most sophisticated legal teams don't choose one method permanently. They evaluate each matter independently using a framework like the one above, and they build the capability to use both. Vendors who tell you TAR is always better — or always worse — are selling a product, not giving advice.