PUBlished on
November 29, 2023
updated on
November 5, 2025

Detecting AiTM Phishing Sites with Fuzzy Hashing

OBSIDIAN THREAT RESEARCH TEAM

Background

In this blog, we will cover how Obsidian detects phishing kits or Phishing-as-a-Service (PhaaS) websites for our customers by analyzing the fuzzy hashes of visited website content.

This concept draws from prior industry art, as IOCs (ex: SHA-1/SHA-265) and fuzzy hashes (ex: SSDEEP, TLSH) have been used for hunting and detection on endpoints for some time. If unfamiliar, fuzzy hashing creates a hash value that attempts to detect the level of similarity between two things at the binary level.

The examples covered will include EvilProxy/Tycoon and a sophisticated APT group.

EvilProxy/Tycoon Phishing Kit

Menlo Security [1], Proofpoint [2], Microsoft [3], Trendmicro [4], and Sekoia.io [5] have blogged about EvilProxy/Tycoon, an Adversary-in-the-Middle (AitM) phishing kit that steals credentials and session cookies in real-time.

Recent campaigns can be observed on any.run: https://app.any.run/submissions/#tag:tycoon

An example: https://dzse[.]izmqf[.]ru/nY8gx7

Most of these websites are protected with Cloudflare’s bot/scraping protection, which hinders attempts at automated scraping and analysis by many security products. Cloudflare’s protection looks for things such as mouse movements, clicks, and key presses while also using other techniques such as canvas fingerprinting.

Once the Cloudflare check is passed, the user is presented with a page impersonating the Microsoft login page.

When we view the HTML content, it’s a single external script resource:

<script language=”Javascript” src=”<https://dzse>[.]izmqf[.]ru/myscr602166.js”></script>

With the Javascript heavily obfuscated:

var erp = new Array;
erp[0] = 218774561;
erp[1] = 1146045268;
erp[2] = 1498432800;
erp[3] = 1752460652;
erp[4] = 1041041980;
erp[5] = 1752460652;
erp[6] = 543973742;
erp[7] = 1732059749;
erp[8] = 1847737869;
……
erp[1191] = 1041041933;
erp[1192] = 10;
var em = ”;
for(i=0;i0){
em += String.fromCharCode(Math.floor((tmp/Math.pow(256,3))));
};
tmp = tmp – (Math.floor((tmp/Math.pow(256,3))) * Math.pow(256,3));
……
};
document.write(em);

However, once the Javascript runs, the Document Object Model (DOM) reveals what is displayed to the user:

Computing a fuzzy hash for the HTML would prove pretty fruitless since it’s short and not really unique (a single external script resource), and the URL will frequently change.

However, computing a fuzzy hash for the DOM will prove useful, as this is after the Javascript obfuscation has been unwound.

With some minification of the DOM, the computed TLSH hash we get for this website is: T1140351705096AE3B8193C1E1AA751B4E33A1CA0DCFE306564AFEC3AECBC7D89CE45551

If we repeat this process for another EvilProxy/Tycoon website, such as https://295g[.]kirklimo[.]com/h040n, we have the following DOM TLSH hash: T19D0351705096AE378193C1E1A9B51B0E33A1CA0ECFE306564AFE83AECBC7D85CF45551

If we compare these two fuzzy hashes, they are very similar:

$pip install py-tlsh

import tlsh
tlsh.diff(‘T1140351705096AE3B8193C1E1AA751B4E33A1CA0DCFE306564AFEC3AECBC7D89CE45551’, ‘T19D0351705096AE378193C1E1A9B51B0E33A1CA0ECFE306564AFE83AECBC7D85CF45551’)
9

A score of 9 has a false positive rate of roughly 0.001%, per Trend Micro’s paper.

APT Phishing Kit

The same technique can be used to catch users visiting phishing websites created by a popular APT group at the moment.

Websites look like the following, with the logo switched out in each case.

Comparing two different campaigns, one targeting a telecom company and another an insurance company, we find the hashes are very similar for both the HTML and the DOM.

>>> import tlsh
>>> tlsh.diff(‘T11B7173044CFFCC1290034895E9B2F8582E9DE8679308DC8975DC95569F52FC74A53BAD’, ‘T1747171049CFFCC1290034896E9B2F85C1EADE4A79208DC8975DC96665F92FC74A53AAC’)
28

Conclusions

While there are many ways to catch, detect, or block a phishing website, companies continue to be compromised by targeted spearphishing attacks or more sophisticated redteamers. A fuzzy hashing approach gives defenders another way of catching commoditized or targeted phishing attacks. We hope other companies start to incorporate this capability into their products.

Interested in learning more about this capability in our product? Get in touch with our team.

Frequently Asked Questions (FAQs)

What is fuzzy hashing and how is it used to detect phishing websites?

Fuzzy hashing is a technique that generates a hash value representing the similarity between files or content, rather than exact matches. In the context of phishing detection, fuzzy hashes (such as SSDEEP or TLSH) are calculated for the Document Object Model (DOM) of a webpage after scripts are executed. This allows security platforms like Obsidian to identify visually or structurally similar phishing sites, even when URLs or superficial HTML have been changed.

How does Obsidian Security detect Adversary-in-the-Middle (AitM) phishing kits like EvilProxy/Tycoon?

Obsidian Security analyzes the fully rendered content (the DOM) of suspicious websites and computes a fuzzy hash to capture the essence of the page presented to users. By comparing this hash against known hashes from phishing kits such as EvilProxy/Tycoon, Obsidian can quickly identify sites using similar tactics, even if they attempt to evade detection through obfuscation or dynamic URLs.

Why is hashing the DOM more effective than hashing raw HTML for phishing detection?

Hashing the DOM, which includes the effects of executed JavaScript and reveals the actual user-facing content, provides a more accurate representation of the phishing page. Since many phishing kits use obfuscated scripts or minimal HTML, raw HTML hashes can be too generic, while DOM hashes capture the full outcome after all scripts are loaded, making detection of similar attacks far more reliable.

How do security solutions overcome bot protection measures like those used by Cloudflare when analyzing phishing sites?

Advanced security solutions may use automated browsing environments that can mimic human interaction, such as generating mouse movements and clicks, to bypass protections like those offered by Cloudflare. After passing these checks, solutions can then analyze the fully loaded page and compute fuzzy hashes for effective phishing detection.

You May Also Like

Get Started

Start in minutes and secure your critical SaaS applications with continuous monitoring and data-driven insights.

get a demo