Back to Projects

Detecting & Classifying Fraudulent Ethereum Accounts

Developed a machine-learning framework combining supervised and unsupervised methods to detect fraudulent Ethereum accounts with >85% accuracy and <5% false positives, deployed as an interactive Streamlit app.

Python
Scikit-learn
TensorFlow
XGBoost
Web3.py
Streamlit
Pandas
NumPy
Matplotlib
NetworkX
Detecting & Classifying Fraudulent Ethereum Accounts
Project Details

Detecting & Classifying Fraudulent Ethereum Accounts

Project Overview

This project develops a unified machine-learning framework for identifying fraudulent activity on the Ethereum blockchain. By analyzing on-chain transaction patterns and network relationships, it combines unsupervised anomaly detection with supervised classification to flag suspicious accounts. The end-to-end solution is exposed via an interactive Streamlit application, allowing users to explore anomalies, model predictions, and network graphs in real time.

System Architecture

Data Extraction Layer

  • Web3.py: Connects to Ethereum nodes (e.g., via Infura or Alchemy) to stream transaction data.
  • Etherscan API: Supplements on-chain data with metadata (e.g., internal txns, contract events).

Feature Engineering Module

  • Pandas & NumPy: Cleans and aggregates transaction histories.
  • NetworkX: Computes graph-based metrics (centrality, clustering) to capture network effects.

Modeling Pipeline

  • Isolation Forest & Autoencoder: Unsupervised models to detect anomalous transaction patterns.
  • Random Forest & XGBoost: Supervised classifiers trained on labeled fraud samples to assign fraud scores.
  • Ensemble Framework: Merges unsupervised anomaly scores with classifier outputs for robust predictions.

Deployment Interface

  • Streamlit: Hosts the interactive dashboard, enabling dynamic filtering, threshold adjustments, and network visualizations.

Key Challenges Solved

  1. Pseudonymous Data
    Extracted rich behavioral features from address-level activity despite lack of identity labels.
  2. Scalability
    Processed over half a million transactions through vectorized Pandas pipelines and batch inference.
  3. False-Positive Control
    Tuned ensemble thresholds to keep false alarms below 5% while preserving recall.
  4. Model Integration
    Seamlessly combined unsupervised and supervised approaches into a single evaluation pipeline.
  5. Interactive Reporting
    Delivered real-time insights via a user-friendly web app, accelerating investigation workflows.

Implementation Details

from web3 import Web3
import pandas as pd
from sklearn.ensemble import IsolationForest, RandomForestClassifier
from xgboost import XGBClassifier
import streamlit as st

# Connect to Ethereum node
w3 = Web3(Web3.HTTPProvider("https://mainnet.infura.io/v3/<KEY>"))
tx = w3.eth.get_transaction("0x...")

# Feature engineering example
df = pd.DataFrame([...])  # transaction records
df['hour'] = pd.to_datetime(df.timestamp, unit='s').dt.hour
X = df[['value', 'hour', 'gas', 'degree_centrality']]

# Unsupervised detection
iso = IsolationForest(contamination=0.02).fit(X)
df['anomaly_score'] = iso.decision_function(X)

# Supervised classification
rf = RandomForestClassifier().fit(X_train, y_train)
df['fraud_prob'] = rf.predict_proba(X)[:, 1]

# Streamlit app
st.title("Ethereum Fraud Detection Dashboard")
st.dataframe(df[['from', 'to', 'value', 'fraud_prob']])