📖 Analysis Methodology

How we analyzed 2,800 documents from the US House Oversight Committee's Epstein investigation

🎯 Project Overview

This dashboard presents an automated analysis of documents released by the U.S. House Committee on Oversight and Accountability as part of their public investigation into Jeffrey Epstein's activities.

2,800 Documents

AI-Powered Analysis

Open Source

⚠️ Important Limitations:

A small number of documents may not have been processed due to technical constraints
The "suspicious activity" detection is experimental and may have false positives/negatives
All analysis is automated and should be independently verified for serious research
This is a transparency tool, not a legal investigation

🤖 AI Model Used

We used DeepSeek v3.1 via Replicate API for document analysis:

Model: deepseek-ai/deepseek-v3
Context Window: 64K tokens (can process very long documents)
Cost: $0.14 per 1M input tokens, $0.28 per 1M output tokens
Why DeepSeek? Excellent at structured data extraction, cost-effective for large-scale analysis

📝 Analysis Prompt

Each document was analyzed using a comprehensive prompt that extracted 14 categories of information:

🎯 Key Innovation: The prompt distinguishes between first-hand illegal activity (the sender directly engaging in illegal behavior) versus shared content (forwarding news articles or court documents about others' activities). This prevents false positives from news articles being flagged as evidence.

Categories Extracted:

Document Metadata (type, date, sender, recipients)
Parties & Entities (people, organizations, locations)
Key Themes & Topics
Relationships & Connections
Financial Information
Content Analysis (tone, purpose)
Legal & Compliance Issues

Notable Quotes
Red Flags & Concerns
Blackmail/Coercion Indicators
Illegal Activity Evidence
Media/Journalist Interactions
Public Knowledge Assessment
Executive Summary

📊 Sample Output

Here's an example of the JSON structure returned by the AI for each document:

{ "metadata": { "document_type": "email", "date": "2005-03-15", "subject": "Travel arrangements", "document_id": "HOUSE_OVERSIGHT_012345", "sender": "example@email.com", "recipients": ["recipient@email.com"] }, "entities": { "people": ["Jeffrey Epstein", "Jane Doe", "John Smith"], "organizations": ["ABC Corporation", "XYZ Foundation"], "locations": ["New York", "Palm Beach", "Little St. James"], "financial_entities": ["Bank of America", "Deutsche Bank"] }, "themes": [ "Travel/logistics", "Financial transactions/money flow", "Personal relationships" ], "relationships": [ { "entity_1": "Jeffrey Epstein", "entity_2": "ABC Corporation", "relationship_type": "business", "description": "Financial consulting arrangement" } ], "financial_info": { "amounts_mentioned": ["$50,000", "$1,000,000"], "transactions": ["Wire transfer to account ending in 1234"], "assets": ["Private jet", "Residential property"] }, "analysis": { "tone": "professional", "emotional_indicators": ["urgency", "concern"], "purpose": "Coordinate travel logistics and financial arrangements", "significance": "Documents financial relationship between parties" }, "legal_compliance": { "concerns": [] }, "notable_quotes": [ "Please ensure complete discretion", "This arrangement must remain confidential" ], "red_flags": [ "Unusual emphasis on secrecy", "Large cash transaction without clear business purpose" ], "blackmail_indicators": { "likelihood": "possible", "evidence": ["References to 'insurance' and 'protection'"], "description": "Vague references to leverage that could indicate coercion" }, "illegal_activity": { "severity": "suspicious", "categories": ["Financial irregularities"], "evidence": ["Structured payments to avoid reporting requirements"], "is_from_jeffrey_epstein": true, "is_shared_content": false, "content_type": "first_hand", "description": "Email from Epstein discussing financial arrangements that may violate banking laws" }, "media_journalist_refs": { "mentioned": false, "details": [] }, "public_knowledge": { "likely_public": false, "media_worthy": true, "context": "Previously unreported financial arrangement" }, "summary": "Email from Jeffrey Epstein coordinating travel and financial arrangements with emphasis on confidentiality. Contains potential evidence of financial irregularities and possible coercion tactics." }

🔍 Classification Logic

Illegal Activity Severity Levels:

None: No evidence of illegal activity
Suspicious: Patterns that raise questions but aren't conclusive
Concerning: Strong indicators of potentially illegal behavior
Clear Evidence: Explicit discussion or evidence of illegal activities

First-Hand vs. Shared Content:

First-Hand (content_type: "first_hand"):

The sender is directly engaging in or planning illegal activity
Email contains evidence of the sender's own criminal behavior
Direct solicitation, arrangements, or coordination of illegal acts

Shared Content (is_shared_content: true):

Forwarding news articles about someone else's illegal activities
Discussing court cases or legal proceedings
Sharing media coverage or third-party reports

Blackmail Likelihood Levels:

None: No indicators of coercion
Possible: Some language that could suggest leverage
Likely: Multiple indicators of coercive behavior
Definite: Explicit threats or coercion tactics

⚡ Processing Pipeline

Document Collection: 2,800 text files from US House Oversight Committee
Preprocessing: Files cleaned and formatted for LLM processing
AI Analysis: Each document sent to DeepSeek v3.1 with structured prompt
JSON Extraction: Structured data extracted from LLM responses
Validation: Automated checks for data completeness and format
Indexing: Documents indexed by entities, themes, dates for fast search
Dashboard: Flask web app provides interactive access to analyzed data

🔧 Technical Stack

Backend: Python 3.12 + Flask
AI Model: DeepSeek v3.1 via Replicate API
Frontend: Bootstrap 5 + Chart.js + D3.js
Deployment: Railway.app
Data Storage: JSON files + Git submodule
Open Source: GitHub Repository

📚 Data Source

All documents analyzed in this dashboard were officially released by the U.S. House Committee on Oversight and Accountability as part of their public investigation.

Source: US House Oversight Committee Public Records
Release Date: 2025 (ongoing releases)
Document Types: Emails, memos, depositions, correspondence
Data Repository: GitHub

🔬 For Researchers & Journalists:

While this tool provides a useful starting point for exploring these documents, we strongly recommend:

Verifying findings by reviewing original documents
Cross-referencing with other sources
Using this as a discovery tool, not definitive evidence
Understanding the limitations of automated analysis

← Back to Dashboard View on GitHub