Writing Style Deviation: Detecting Impersonation Through Language

Every person who writes email has a linguistic fingerprint. The words they choose, the way they structure sentences, how they greet people, how they sign off — these patterns are remarkably consistent and remarkably personal.

When an attacker takes over an account or impersonates a sender, they bring their own linguistic habits with them. They may have the right email address, the right display name, and even access to the real mailbox. But they don't have the original sender's voice.

That gap is detectable. And it's one of the most powerful signals in modern email security.

Why Writing Style Is a Security Signal

Traditional email security focuses on technical indicators: authentication headers, IP reputation, known malicious URLs, attachment signatures. These are necessary, but they are increasingly insufficient.

Modern attacks — particularly business email compromise (BEC) and account takeover — are designed to pass every technical check. The email comes from the real account. Authentication succeeds. There are no links or attachments. The message is just text, asking for something that sounds reasonable.

In these cases, the only thing that distinguishes the attacker's message from the real sender's message is how it's written.

This is not a subtle difference. Research in computational linguistics has demonstrated for decades that individuals maintain consistent patterns across several measurable dimensions:

Vocabulary range and preference — the specific words a person tends to use and avoid
Sentence structure — average length, complexity, use of subordinate clauses
Formality level — how casual or formal the language is, and how consistently
Greeting and closing patterns — "Hi," vs. "Hello," vs. "Hey" vs. no greeting at all
Punctuation habits — use of exclamation points, ellipses, em dashes, semicolons
Paragraph structure — short bursts vs. long blocks, use of bullet points
Capitalization patterns — consistent title case, all lowercase, selective emphasis

These patterns are deeply ingrained. People don't consciously choose them, which makes them very difficult to fake — and very reliable to detect when they change.

How OpenEFA^® Builds Linguistic Profiles

OpenEFA constructs a behavioral writing profile for each sender based on their historical communication with the organization. This profile is not a single snapshot — it's a living model that evolves as the sender's patterns are observed over time.

The profile captures multiple dimensions:

Lexical Patterns

Which words does this sender use frequently? Do they prefer "please" or "kindly"? Do they write "let me know" or "advise at your earliest convenience"? Do they use contractions ("don't", "won't") or write formally ("do not", "will not")? These preferences are remarkably stable across messages and form a strong baseline.

Syntactic Structure

How does the sender construct sentences? Some people write in short, direct statements. Others favor compound sentences with multiple clauses. Some consistently use passive voice; others are almost exclusively active. These structural patterns persist even when the topic of the email changes entirely.

Greeting and Signature Conventions

Opening and closing patterns are among the most consistent behavioral markers in email. A sender who always writes "Hi [Name]," and signs off with "Best," will do so hundreds of times. When a message from the same account suddenly opens with "Dear Sir/Madam" or closes with "Warm regards," something has changed.

Formality Gradient

Every sender operates within a formality range. Some people write casually to everyone. Others maintain strict professionalism. Most fall somewhere in between, with a predictable spectrum that varies by recipient. OpenEFA maps this gradient: when a sender who is consistently informal with a particular colleague suddenly sends a stiff, formal message, the shift is flagged.

Message Cadence and Length

How long are this sender's typical messages? Do they write three-sentence replies or multi-paragraph emails? Do they respond quickly or after delays? While content and context affect these dimensions, the overall patterns are consistent enough to form part of the baseline.

Detecting Deviations

When a new message arrives from a profiled sender, OpenEFA compares it against the established baseline across all tracked dimensions. A deviation in one dimension might mean nothing — people adapt their writing to context. But deviations across multiple dimensions simultaneously indicate that the message may not have been written by the person it claims to be from.

OpenEFA evaluates deviations on a spectrum:

Low Deviation

The message falls within the expected range across most dimensions. Minor variations are normal — people write differently when they're in a hurry, when they're on a mobile device, or when they're addressing a new topic. Low deviation messages receive no additional scrutiny from this signal.

Moderate Deviation

Several dimensions show unexpected patterns simultaneously. The greeting has changed. The sentence structure is different. The vocabulary has shifted. Individually, each change could be explained. Together, they suggest a meaningful departure from the sender's baseline. Moderate deviations contribute to the message's overall risk score and may amplify other signals.

High Deviation

The message is fundamentally different from the sender's established pattern across most or all dimensions. The writing style, formality, structure, and vocabulary are inconsistent with everything OpenEFA has observed from this sender. This level of deviation is rare in legitimate communication and is a strong indicator of impersonation or account compromise.

Why This Matters for BEC and Account Takeover

Business email compromise is the most financially damaging form of cybercrime, accounting for billions of dollars in losses annually. The reason is simple: BEC attacks don't rely on malware, exploits, or technical vulnerabilities. They rely on trust and deception.

In a typical BEC scenario:

An attacker compromises a real email account (through credential phishing, password reuse, or session theft)
They study the mailbox to understand relationships, communication patterns, and ongoing transactions
They send a message — from the real account — requesting a wire transfer, payment redirect, or sensitive data
The recipient trusts the message because it comes from a known, authenticated sender

Every traditional security check passes. The sender is real. The authentication is valid. There are no malicious indicators.

But the attacker cannot perfectly replicate the compromised user's writing style. They may be more formal, or less. They may use different greetings. Their sentence structure may be shorter or longer. They may use vocabulary the real user never employs.

These differences are invisible to the human eye in isolation. They are not invisible to a system that has been tracking the sender's linguistic patterns across hundreds of messages.

The AI Impersonation Challenge

A growing concern in email security is the use of generative AI to craft impersonation messages. If an attacker can use AI to mimic a sender's writing style, does linguistic analysis still work?

The answer is yes — but the reasons are nuanced.

AI Writes Differently Than Humans

AI-generated text has its own detectable characteristics: more uniform sentence length, higher lexical diversity, fewer grammatical errors, more balanced paragraph structure, and a tendency toward neutral formality. Even when prompted to mimic a specific style, AI text tends to be more consistent than real human writing — which is itself an anomaly. Real people are imperfect and variable in predictable ways. AI-generated text is often too smooth.

Context Is Missing

An attacker using AI to generate a message may capture surface-level style, but they typically lack the deep contextual knowledge that shapes real communication. The real sender knows the recipient's preferences, shared history, inside references, and conversational shorthand. AI-generated impersonation messages tend to be generically appropriate rather than specifically personal — and that generic quality is measurable.

Micro-Patterns Are Difficult to Replicate

Individual writing contains micro-patterns that are almost never explicitly documented or observed: the consistent use of a particular conjunction, a habit of starting responses with "So," a tendency to place qualifiers before or after the main clause, a preference for specific transitional phrases. These patterns are not captured in the kind of sample text an attacker feeds to an AI model. They emerge only from extensive observation — which is exactly what OpenEFA performs.

The Baseline Advantage

The critical asymmetry is this: OpenEFA has access to the sender's entire communication history with the organization. The attacker, even with AI, typically has access to a limited sample at best. The deeper the baseline, the more precisely deviations can be detected. An attacker would need not just the sender's writing style, but their writing style with this specific recipient, on this specific type of topic, at this time of day. That level of fidelity is extraordinarily difficult to achieve.

A Real-World Scenario

Consider this situation:

Sender: CFO's email account (authenticated, legitimate mailbox)

Recipient: Controller

Subject: "Quick request"

"Dear Controller, I need you to process a wire transfer to the account detailed below at your earliest convenience. This is related to a confidential acquisition that has not yet been announced. Please do not discuss this with other team members. I am in meetings all day and cannot take calls. Please confirm once completed. Regards, [CFO Name]"

The message comes from the CFO's real account. Authentication passes. There are no links, no attachments, no malicious indicators.

But OpenEFA's writing style analysis detects:

Greeting deviation: The CFO always writes "Hi [first name]" — never "Dear Controller"
Formality shift: The CFO's messages to the controller are consistently casual; this message is formal and stiff
Vocabulary anomaly: The CFO never uses "at your earliest convenience" — they typically write "when you get a chance" or "ASAP"
Closing deviation: The CFO signs off with "Thanks" or "Thx" — never "Regards"
Structure change: The CFO writes in short, fragmented sentences; this message uses long, complete sentences with formal transitions

Combined with urgency signals and secrecy requests (both additional OpenEFA signals), the writing style deviation forms a clear picture: this message was not written by the CFO. The account has been compromised.

The Broader Principle

Writing Style Deviation is part of the OpenEFA Signals framework — a set of behavioral and contextual patterns that reveal risk before it becomes an incident.

The core principle: identity is not just an email address. It is a pattern of behavior that includes how a person communicates. When the communication changes but the address stays the same, something is wrong — and that inconsistency is one of the most reliable indicators of compromise.

Attackers can steal credentials. They can compromise accounts. They can pass every authentication check. But they cannot perfectly replicate another person's voice.

OpenEFA listens for that voice — and notices when it changes.

← Back to Signals ← Back to Blog