π Analysis Methodology
How we analyzed 2,800 documents from the US House Oversight Committee's Epstein investigation
π― Project Overview
This dashboard presents an automated analysis of documents released by the U.S. House Committee on Oversight and Accountability as part of their public investigation into Jeffrey Epstein's activities.
2,800 Documents
AI-Powered Analysis
Open Source
β οΈ Important Limitations:
- A small number of documents may not have been processed due to technical constraints
- The "suspicious activity" detection is experimental and may have false positives/negatives
- All analysis is automated and should be independently verified for serious research
- This is a transparency tool, not a legal investigation
π€ AI Model Used
We used DeepSeek v3.1 via Replicate API for document analysis:
- Model: deepseek-ai/deepseek-v3
- Context Window: 64K tokens (can process very long documents)
- Cost: $0.14 per 1M input tokens, $0.28 per 1M output tokens
- Why DeepSeek? Excellent at structured data extraction, cost-effective for large-scale analysis
π Analysis Prompt
Each document was analyzed using a comprehensive prompt that extracted 14 categories of information:
π― Key Innovation: The prompt distinguishes between first-hand illegal activity (the sender directly engaging in illegal behavior) versus shared content (forwarding news articles or court documents about others' activities). This prevents false positives from news articles being flagged as evidence.
Categories Extracted:
- Document Metadata (type, date, sender, recipients)
- Parties & Entities (people, organizations, locations)
- Key Themes & Topics
- Relationships & Connections
- Financial Information
- Content Analysis (tone, purpose)
- Legal & Compliance Issues
- Notable Quotes
- Red Flags & Concerns
- Blackmail/Coercion Indicators
- Illegal Activity Evidence
- Media/Journalist Interactions
- Public Knowledge Assessment
- Executive Summary
π Sample Output
Here's an example of the JSON structure returned by the AI for each document:
{
"metadata": {
"document_type": "email",
"date": "2005-03-15",
"subject": "Travel arrangements",
"document_id": "HOUSE_OVERSIGHT_012345",
"sender": "example@email.com",
"recipients": ["recipient@email.com"]
},
"entities": {
"people": ["Jeffrey Epstein", "Jane Doe", "John Smith"],
"organizations": ["ABC Corporation", "XYZ Foundation"],
"locations": ["New York", "Palm Beach", "Little St. James"],
"financial_entities": ["Bank of America", "Deutsche Bank"]
},
"themes": [
"Travel/logistics",
"Financial transactions/money flow",
"Personal relationships"
],
"relationships": [
{
"entity_1": "Jeffrey Epstein",
"entity_2": "ABC Corporation",
"relationship_type": "business",
"description": "Financial consulting arrangement"
}
],
"financial_info": {
"amounts_mentioned": ["$50,000", "$1,000,000"],
"transactions": ["Wire transfer to account ending in 1234"],
"assets": ["Private jet", "Residential property"]
},
"analysis": {
"tone": "professional",
"emotional_indicators": ["urgency", "concern"],
"purpose": "Coordinate travel logistics and financial arrangements",
"significance": "Documents financial relationship between parties"
},
"legal_compliance": {
"concerns": []
},
"notable_quotes": [
"Please ensure complete discretion",
"This arrangement must remain confidential"
],
"red_flags": [
"Unusual emphasis on secrecy",
"Large cash transaction without clear business purpose"
],
"blackmail_indicators": {
"likelihood": "possible",
"evidence": ["References to 'insurance' and 'protection'"],
"description": "Vague references to leverage that could indicate coercion"
},
"illegal_activity": {
"severity": "suspicious",
"categories": ["Financial irregularities"],
"evidence": ["Structured payments to avoid reporting requirements"],
"is_from_jeffrey_epstein": true,
"is_shared_content": false,
"content_type": "first_hand",
"description": "Email from Epstein discussing financial arrangements that may violate banking laws"
},
"media_journalist_refs": {
"mentioned": false,
"details": []
},
"public_knowledge": {
"likely_public": false,
"media_worthy": true,
"context": "Previously unreported financial arrangement"
},
"summary": "Email from Jeffrey Epstein coordinating travel and financial arrangements with emphasis on confidentiality. Contains potential evidence of financial irregularities and possible coercion tactics."
}
π Classification Logic
Illegal Activity Severity Levels:
- None: No evidence of illegal activity
- Suspicious: Patterns that raise questions but aren't conclusive
- Concerning: Strong indicators of potentially illegal behavior
- Clear Evidence: Explicit discussion or evidence of illegal activities
First-Hand vs. Shared Content:
First-Hand (content_type: "first_hand"):
- The sender is directly engaging in or planning illegal activity
- Email contains evidence of the sender's own criminal behavior
- Direct solicitation, arrangements, or coordination of illegal acts
Shared Content (is_shared_content: true):
- Forwarding news articles about someone else's illegal activities
- Discussing court cases or legal proceedings
- Sharing media coverage or third-party reports
Blackmail Likelihood Levels:
- None: No indicators of coercion
- Possible: Some language that could suggest leverage
- Likely: Multiple indicators of coercive behavior
- Definite: Explicit threats or coercion tactics
β‘ Processing Pipeline
- Document Collection: 2,800 text files from US House Oversight Committee
- Preprocessing: Files cleaned and formatted for LLM processing
- AI Analysis: Each document sent to DeepSeek v3.1 with structured prompt
- JSON Extraction: Structured data extracted from LLM responses
- Validation: Automated checks for data completeness and format
- Indexing: Documents indexed by entities, themes, dates for fast search
- Dashboard: Flask web app provides interactive access to analyzed data
π§ Technical Stack
- Backend: Python 3.12 + Flask
- AI Model: DeepSeek v3.1 via Replicate API
- Frontend: Bootstrap 5 + Chart.js + D3.js
- Deployment: Railway.app
- Data Storage: JSON files + Git submodule
- Open Source: GitHub Repository
π Data Source
All documents analyzed in this dashboard were officially released by the U.S. House Committee on Oversight and Accountability as part of their public investigation.
- Source: US House Oversight Committee Public Records
- Release Date: 2025 (ongoing releases)
- Document Types: Emails, memos, depositions, correspondence
- Data Repository: GitHub
π¬ For Researchers & Journalists:
While this tool provides a useful starting point for exploring these documents, we strongly recommend:
- Verifying findings by reviewing original documents
- Cross-referencing with other sources
- Using this as a discovery tool, not definitive evidence
- Understanding the limitations of automated analysis