EDIS v3 (Epstein Document Intelligence System) is a cutting-edge multi-modal document analysis platform powered by Google Vertex AI and PostgreSQL. It automatically processes, analyzes, and catalogs sensitive documents at scale.
The system processes 94505 documents across 0 datasets, extracting 0 named entities from sophisticated AI analysis.
Key Analysis Capabilities:
- Entity Extraction — People, organizations, locations, and financial entities
- Relationship Mapping — Connections between entities with typed relationships
- Illegal Activity Detection — Severity assessment with supporting evidence
- Blackmail Indicators — Coercion likelihood scoring
- Financial Forensics — Transaction and asset tracking
- Image Analysis — Scene understanding, sensitivity assessment
- Full-Text Search — PostgreSQL with weighted ranking across documents
The analysis pipeline consists of five distinct phases:
PDFs are converted to text and images, organized by dataset, with metadata extraction.
System determines if each file is text or image, calculates content hash for deduplication, extracts preliminary metadata.
Files uploaded to Google Cloud Storage (GCS) in parallel, creating gs:// URIs for batch processing.
Vertex AI Batch API processes documents using Gemini Vision models, returning analysis in JSONL format.
Results imported into PostgreSQL with full-text search indexes, materialized views, and entity network graphs.
Text Documents: Advanced NLP analysis extracts entities, relationships, and indicators using GPT-4 Vision and Gemini models with specialized prompts for legal, financial, and criminal activity detection.
Image Documents: Computer vision analysis identifies objects, people, scenes, and sensitivity levels. Wealth indicators detected through luxury item recognition.
Severity Levels:
Direct, documented evidence of illegal activity
Strong indicators requiring investigation
Warrants further investigation
No illegal activity detected
- Backend
- Flask + Python on Gunicorn
- Database
- PostgreSQL with JSONB, tsvector, and GIN indexes
- Frontend
- Bootstrap 5.3 + Chart.js 4.4 + Font Awesome
- AI Models
- Google Gemini Vision (Vertex AI Batch)
- Storage
- Google Cloud Storage (gs://)
- Search
- PostgreSQL full-text search with ts_rank_cd
- Deployment
- Railway (or self-hosted)
For comprehensive technical documentation, refer to:
- METHODOLOGY.md — Complete 5-phase pipeline explanation with code examples
- DASHBOARD_V3_GUIDE.md — User guide and feature documentation
- README.md — Project overview and quick start
These files are included in the project repository.
- Total Documents
- 94,505
- Text Documents
- 78,388
- Image Documents
- 16,117
- Named Entities
- 0
- Datasets
- 0
- High Priority
- 685
- From Epstein
- 2162
- Search Query
- <100ms
- Entity Graph
- <500ms
- Page Load
- <800ms
- Export (10K)
- <5s