Methodology & Analysis

System Overview

EDIS v3 (Epstein Document Intelligence System) is a cutting-edge multi-modal document analysis platform powered by Google Vertex AI and PostgreSQL. It automatically processes, analyzes, and catalogs sensitive documents at scale.

The system processes 94505 documents across 0 datasets, extracting 0 named entities from sophisticated AI analysis.

Key Analysis Capabilities:
  • Entity Extraction — People, organizations, locations, and financial entities
  • Relationship Mapping — Connections between entities with typed relationships
  • Illegal Activity Detection — Severity assessment with supporting evidence
  • Blackmail Indicators — Coercion likelihood scoring
  • Financial Forensics — Transaction and asset tracking
  • Image Analysis — Scene understanding, sensitivity assessment
  • Full-Text Search — PostgreSQL with weighted ranking across documents
Processing Pipeline

The analysis pipeline consists of five distinct phases:

Phase 1: Document Preparation
PDFs are converted to text and images, organized by dataset, with metadata extraction.
Phase 2: Content Detection
System determines if each file is text or image, calculates content hash for deduplication, extracts preliminary metadata.
Phase 3: Cloud Upload
Files uploaded to Google Cloud Storage (GCS) in parallel, creating gs:// URIs for batch processing.
Phase 4: AI Analysis
Vertex AI Batch API processes documents using Gemini Vision models, returning analysis in JSONL format.
Phase 5: Storage & Indexing
Results imported into PostgreSQL with full-text search indexes, materialized views, and entity network graphs.
AI Analysis Methodology

Text Documents: Advanced NLP analysis extracts entities, relationships, and indicators using GPT-4 Vision and Gemini models with specialized prompts for legal, financial, and criminal activity detection.

Image Documents: Computer vision analysis identifies objects, people, scenes, and sensitivity levels. Wealth indicators detected through luxury item recognition.

Severity Levels:

clear_evidence
Direct, documented evidence of illegal activity
concerning
Strong indicators requiring investigation
suspicious
Warrants further investigation
none
No illegal activity detected
Technology Stack
Backend
Flask + Python on Gunicorn
Database
PostgreSQL with JSONB, tsvector, and GIN indexes
Frontend
Bootstrap 5.3 + Chart.js 4.4 + Font Awesome
AI Models
Google Gemini Vision (Vertex AI Batch)
Storage
Google Cloud Storage (gs://)
Search
PostgreSQL full-text search with ts_rank_cd
Deployment
Railway (or self-hosted)
Full Documentation

For comprehensive technical documentation, refer to:

  • METHODOLOGY.md — Complete 5-phase pipeline explanation with code examples
  • DASHBOARD_V3_GUIDE.md — User guide and feature documentation
  • README.md — Project overview and quick start

These files are included in the project repository.

Database Stats
Total Documents
94,505
Text Documents
78,388
Image Documents
16,117
Named Entities
0
Datasets
0
High Priority
685
From Epstein
2162
Illegal Activity
Clear Evidence 55
Concerning 930
None 74139
Suspicious 19381
Blackmail Indicators
Definite 2
Likely 58
None 86090
Possible 8355
Top Themes
Financial... 48499
Business dealings 31335
Communications/co... 31075
Legal matters/litigation 28804
Travel/logistics 6495
Allegations/complaints 4068
Employment/staffing 3782
Personal relationships 3643
Real estate/properties 3048
Illegal activities 1877
Performance
Search Query
<100ms
Entity Graph
<500ms
Page Load
<800ms
Export (10K)
<5s