About

What this tool is, what data it covers, and how it works.

Project

In January 2026, the U.S. Department of Justice released over 3.5 million pages of documents related to the Jeffrey Epstein case. This tool makes that information searchable and connected — ask questions in natural language, get AI-powered answers grounded in the actual documents with source citations.

Released: January 2026

Total corpus: 3.5M+ pages across 12 datasets

Case: United States v. Jeffrey Epstein (SDNY)

Source: justice.gov/epstein

Why This Exists

This is an investigative research tool — not a document finder, not a search engine, not a viewer. It takes a massive unstructured corpus and makes it researchable.

Every feature serves one question: does this help someone investigate deeper? Search, discover connections, follow threads, build a case.

This is a public service tool. These are declassified DOJ records released under federal court order.

Search

Your query is converted into a vector embedding and matched against 40,000+ document pages using hybrid search — semantic similarity and full-text keyword matching combined.

Chat

The AI investigator searches for the most relevant documents, feeds them as context to Grok 4.1 Fast, and returns a grounded answer with citations to specific documents and page numbers.

Embeddings

Every page is embedded using Gemini (3072 dimensions) during ingestion. Both full text chunks and structured metadata are indexed for maximum recall.

Data Coverage

DS1

Crime scene photography — 9 East 71st Street

3,156 pages

DS4

Call logs, fax records, CBP travel records, FBI reports

2,704 pages

DS5

FBI lab photos — seized electronics, evidence bags

120 pages

DS6

Grand jury transcripts, sealed indictments

487 pages

DS7

Sentencing negotiations, immunity discussions, testimony

660 pages

DS8

Emails, government docs, CBP records (2019–2022)

29,343 pages

DS12

DOJ emails, prosecution memos, handwritten notes

1,525 pages

indexed rows across datasets. DS2, DS3, DS9–DS11 not yet processed (~3.4M remaining).

Technical

Version0.1.0 (MVP)
SearchLanceDB (hybrid)
Chat modelGrok 4.1 Fast
EmbeddingsGemini (3072-dim)
FrontendReact + Tailwind
ShellNext.js
Indexed rows

Only 7 of 12 DOJ datasets are currently loaded — roughly 1% of the 3.5 million page corpus. AI-generated analysis can contain errors. Every claim cites its source — always verify against the cited documents. This tool does not establish guilt or innocence.