Detecting impersonation through language.
OpenEFA® Signals Series | March 29, 2026
Every person who writes email has a linguistic fingerprint. The words they choose, the way they structure sentences, how they greet people, how they sign off — these patterns are remarkably consistent and remarkably personal.
When an attacker takes over an account or impersonates a sender, they bring their own linguistic habits with them. They may have the right email address, the right display name, and even access to the real mailbox. But they don't have the original sender's voice.
That gap is detectable. And it's one of the most powerful signals in modern email security.
Traditional email security focuses on technical indicators: authentication headers, IP reputation, known malicious URLs, attachment signatures. These are necessary, but they are increasingly insufficient.
Modern attacks — particularly business email compromise (BEC) and account takeover — are designed to pass every technical check. The email comes from the real account. Authentication succeeds. There are no links or attachments. The message is just text, asking for something that sounds reasonable.
In these cases, the only thing that distinguishes the attacker's message from the real sender's message is how it's written.
This is not a subtle difference. Research in computational linguistics has demonstrated for decades that individuals maintain consistent patterns across several measurable dimensions:
These patterns are deeply ingrained. People don't consciously choose them, which makes them very difficult to fake — and very reliable to detect when they change.
OpenEFA constructs a behavioral writing profile for each sender based on their historical communication with the organization. This profile is not a single snapshot — it's a living model that evolves as the sender's patterns are observed over time.
The profile captures multiple dimensions:
Which words does this sender use frequently? Do they prefer "please" or "kindly"? Do they write "let me know" or "advise at your earliest convenience"? Do they use contractions ("don't", "won't") or write formally ("do not", "will not")? These preferences are remarkably stable across messages and form a strong baseline.
How does the sender construct sentences? Some people write in short, direct statements. Others favor compound sentences with multiple clauses. Some consistently use passive voice; others are almost exclusively active. These structural patterns persist even when the topic of the email changes entirely.
Opening and closing patterns are among the most consistent behavioral markers in email. A sender who always writes "Hi [Name]," and signs off with "Best," will do so hundreds of times. When a message from the same account suddenly opens with "Dear Sir/Madam" or closes with "Warm regards," something has changed.
Every sender operates within a formality range. Some people write casually to everyone. Others maintain strict professionalism. Most fall somewhere in between, with a predictable spectrum that varies by recipient. OpenEFA maps this gradient: when a sender who is consistently informal with a particular colleague suddenly sends a stiff, formal message, the shift is flagged.
How long are this sender's typical messages? Do they write three-sentence replies or multi-paragraph emails? Do they respond quickly or after delays? While content and context affect these dimensions, the overall patterns are consistent enough to form part of the baseline.
When a new message arrives from a profiled sender, OpenEFA compares it against the established baseline across all tracked dimensions. A deviation in one dimension might mean nothing — people adapt their writing to context. But deviations across multiple dimensions simultaneously indicate that the message may not have been written by the person it claims to be from.
OpenEFA evaluates deviations on a spectrum:
The message falls within the expected range across most dimensions. Minor variations are normal — people write differently when they're in a hurry, when they're on a mobile device, or when they're addressing a new topic. Low deviation messages receive no additional scrutiny from this signal.
Several dimensions show unexpected patterns simultaneously. The greeting has changed. The sentence structure is different. The vocabulary has shifted. Individually, each change could be explained. Together, they suggest a meaningful departure from the sender's baseline. Moderate deviations contribute to the message's overall risk score and may amplify other signals.
The message is fundamentally different from the sender's established pattern across most or all dimensions. The writing style, formality, structure, and vocabulary are inconsistent with everything OpenEFA has observed from this sender. This level of deviation is rare in legitimate communication and is a strong indicator of impersonation or account compromise.
Business email compromise is the most financially damaging form of cybercrime, accounting for billions of dollars in losses annually. The reason is simple: BEC attacks don't rely on malware, exploits, or technical vulnerabilities. They rely on trust and deception.
In a typical BEC scenario:
Every traditional security check passes. The sender is real. The authentication is valid. There are no malicious indicators.
But the attacker cannot perfectly replicate the compromised user's writing style. They may be more formal, or less. They may use different greetings. Their sentence structure may be shorter or longer. They may use vocabulary the real user never employs.
These differences are invisible to the human eye in isolation. They are not invisible to a system that has been tracking the sender's linguistic patterns across hundreds of messages.
A growing concern in email security is the use of generative AI to craft impersonation messages. If an attacker can use AI to mimic a sender's writing style, does linguistic analysis still work?
The answer is yes — but the reasons are nuanced.
AI-generated text has its own detectable characteristics: more uniform sentence length, higher lexical diversity, fewer grammatical errors, more balanced paragraph structure, and a tendency toward neutral formality. Even when prompted to mimic a specific style, AI text tends to be more consistent than real human writing — which is itself an anomaly. Real people are imperfect and variable in predictable ways. AI-generated text is often too smooth.
An attacker using AI to generate a message may capture surface-level style, but they typically lack the deep contextual knowledge that shapes real communication. The real sender knows the recipient's preferences, shared history, inside references, and conversational shorthand. AI-generated impersonation messages tend to be generically appropriate rather than specifically personal — and that generic quality is measurable.
Individual writing contains micro-patterns that are almost never explicitly documented or observed: the consistent use of a particular conjunction, a habit of starting responses with "So," a tendency to place qualifiers before or after the main clause, a preference for specific transitional phrases. These patterns are not captured in the kind of sample text an attacker feeds to an AI model. They emerge only from extensive observation — which is exactly what OpenEFA performs.
The critical asymmetry is this: OpenEFA has access to the sender's entire communication history with the organization. The attacker, even with AI, typically has access to a limited sample at best. The deeper the baseline, the more precisely deviations can be detected. An attacker would need not just the sender's writing style, but their writing style with this specific recipient, on this specific type of topic, at this time of day. That level of fidelity is extraordinarily difficult to achieve.
Consider this situation:
Sender: CFO's email account (authenticated, legitimate mailbox)
Recipient: Controller
Subject: "Quick request"
"Dear Controller, I need you to process a wire transfer to the account detailed below at your earliest convenience. This is related to a confidential acquisition that has not yet been announced. Please do not discuss this with other team members. I am in meetings all day and cannot take calls. Please confirm once completed. Regards, [CFO Name]"
The message comes from the CFO's real account. Authentication passes. There are no links, no attachments, no malicious indicators.
But OpenEFA's writing style analysis detects:
Combined with urgency signals and secrecy requests (both additional OpenEFA signals), the writing style deviation forms a clear picture: this message was not written by the CFO. The account has been compromised.
Writing Style Deviation is part of the OpenEFA Signals framework — a set of behavioral and contextual patterns that reveal risk before it becomes an incident.
The core principle: identity is not just an email address. It is a pattern of behavior that includes how a person communicates. When the communication changes but the address stays the same, something is wrong — and that inconsistency is one of the most reliable indicators of compromise.
Attackers can steal credentials. They can compromise accounts. They can pass every authentication check. But they cannot perfectly replicate another person's voice.
OpenEFA listens for that voice — and notices when it changes.