Email Spam Filter Performance: 98.27% F1 Score, 96.62% Precision

Executive Summary

OpenEFA is an AI-powered email security platform that uses multi-layered analysis to detect spam, phishing, and malicious emails. Our advanced scoring system combines traditional authentication (SPF, DKIM, DMARC) with AI-powered behavioral analysis, DNS validation, and machine learning to provide industry-leading protection.

Over the past 30 days, OpenEFA has analyzed 52,264 emails with a 98.27% F1 Score and 96.62% precision. The system safely delivered 71.5% to inboxes, quarantined 3.9% for review, and auto-deleted 22.6% as high-confidence spam—all with <2 second processing time. Deployed across 31 protected domains serving 89 recipients, OpenEFA proves that AI-powered email security can deliver enterprise-grade protection at a fraction of the cost.

Industry F1 score context: Basic and open-source spam filters typically publish F1 scores of 0.75–0.88; average commercial secure email gateways score 0.85–0.92; strong enterprise vendors fall in the 0.92–0.95 band; and top-tier research and lab models on curated datasets reach 0.97–0.99+. Commercial vendors generally do not publish F1 scores directly — the closest available independent benchmark is Virus Bulletin’s VBSpam test, summarized further down this page.

Key Metrics at a Glance

Metric	OpenEFA Value	Industry Standard	Status
F1 Score	98.27%	85-92%	Above Average
Spam Detection Rate	99.98%	90-95%	Above Average
False Positive Rate	1.33%	15-25%	94% Better
Precision	96.62%	88-93%	Above Average
Emails Processed (30 days)	52,264	N/A	Production Scale
Daily Volume	~1,686 emails/day	N/A	Peak: 2,106 emails/day

Understanding F1 Score: 98.27%

The F1 Score is the single best measure of email security effectiveness, combining both precision and recall into one metric.

What This Means In Practice:

Out of 100 spam emails: OpenEFA catches ~100
Out of 100 emails flagged: 97 are actually spam
Balance: Strong precision with high detection rate

Industry Context

Email security vendors generally do not publish F1 scores directly. The closest public benchmark is Virus Bulletin's quarterly VBSpam test, which measures spam catch rate and false-positive rate under controlled conditions. VBSpam+ certified products in 2025 ranged from 99.3% to 99.99% on their composite final score (Mimecast: 99.71%; Bitdefender: 99.99%). Barracuda and Proofpoint do not currently participate.

OpenEFA (May 2026): 98.27% F1 on production traffic — see the methodology notes below for the apples-to-apples caveat.

F1 Score Breakdown

98.27%

Overall F1 Score

Precision: 96.62%

Recall: 99.98%

Email Processing Breakdown (30 Days)

Disposition	Count	Percentage	Description
Delivered (Safe)	37,346	71.5%	Clean emails delivered safely to recipient inboxes
Quarantined (Review)	2,013	3.9%	Suspicious emails held for user review and release
Auto-Deleted (Spam)	11,792	22.6%	High-confidence spam automatically removed
Released	503	1.0%	User-released from quarantine
Total Analyzed	52,264	100%	All emails processed by OpenEFA

Protected Infrastructure

Protected Email Domains	31
Protected Recipients	89
Active Users	27
Blocking Rules	11,534
Unique Sender Domains Analyzed	6,475

Average Spam Scores by Disposition

Delivered Emails	0.43	Low risk
Quarantined Emails	59.80	High-risk spam
Auto-Deleted	70.80	Very high-risk spam
Released	-6.04	False positives (trusted)
Overall Average	19.35	System baseline

Key Insight: The 59.37-point difference between delivered and quarantined emails demonstrates excellent separation between legitimate and malicious content.

Confusion Matrix (30-Day Period)

		Predicted
		Spam	Clean
Actual	Spam	14,389 True Positive	3 False Negative
Actual	Clean	503 False Positive	37,346 True Negative

What These Numbers Mean:

True Positives (14,389): Spam correctly identified and blocked
True Negatives (37,346): Clean emails correctly delivered
False Positives (503): Clean emails quarantined (recoverable)
False Negatives (3): Spam that slipped through

Derived Metrics:

Accuracy: (14,389 + 37,346) / 52,241 = 99.03%
Precision: 14,389 / (14,389 + 503) = 96.62%
Recall: 14,389 / (14,389 + 3) = 99.98%
Specificity: 37,346 / (37,346 + 503) = 98.67%

Spam Score Distribution (30 Days)

OpenEFA uses a graduated spam scoring system where each email receives a cumulative score based on multiple risk factors. Understanding score distribution helps evaluate system effectiveness and threshold tuning.

Score Range	Risk Level	Count	Percentage	Typical Action
0 - 5.9	Safe	36,083	69.0%	✅ Delivered
6.0 - 9.9	Suspicious	914	1.8%	⚠️ Quarantined
10.0 - 14.9	High Risk	758	1.5%	🛑 Quarantined
15.0+	Very High Risk	14,510	27.8%	❌ Auto-Deleted

Intelligent Thresholds

OpenEFA uses adaptive, multi-factor thresholds to determine email disposition. Emails are classified as delivered, quarantined, or auto-deleted based on cumulative scoring across all analysis modules.

69.0%

Clean Email (Safe)

3.2%

Suspicious (Quarantine)

27.8%

High-Risk Spam (Deleted)

Top Blocked Threat Types

Threat Type	Count	Description
First-Contact Risk (New Sender)	14,389	Sender and/or domain never seen before in system history
BEC (Business Email Compromise)	13,992	Payment/wire fraud, executive impersonation — 1,023 CRITICAL, 915 HIGH, 2,458 MED, 6,683 LOW
Adversarial Patterns	9,788	Obfuscation, evasion tactics, and adversarial content signals
Phishing Attempts	8,111	Credential harvesting, fake login pages, impersonation
Brand & Display-Name Impersonation	7,653	Spoofed brand names, lookalike domains, executive display-name tricks
SPF / DKIM / DMARC Failures	6,812	Authentication failures across one or more protocols
Marketing / Cold Commercial Patterns	5,718	Unsolicited bulk/marketing content flagged by content classifier
Suspicious Payment Signals	2,926	Invoice, wire transfer, and payment-redirect fraud indicators
EFA Collective RBL Matches	1,742	Crowd-sourced blocklist hits from the OpenEFA Collective
Virus / Malware Detected	252	Known-bad attachments and embedded malware signatures

Machine Learning Performance

OpenEFA's ML ensemble model uses multiple classifiers trained on production email data to provide adaptive spam detection.

Ensemble Model Metrics

Training Samples	23,348
Training Balance	11,674 spam / 11,674 ham
ML Accuracy	89.2%
ML F1 Score	89.5%
ML ROC AUC	96.0%
Features	130
Last Retrain	May 8, 2026

Base Model Performance (ROC AUC)

LightGBM	96.0%
XGBoost	95.9%
CatBoost	95.5%
Random Forest	95.0%
Logistic Regression	93.0%

Ensemble Strategy: Multiple models are combined using stacking to achieve higher accuracy than any individual model.

System Performance

<2s

Avg Processing Time

99.9%

System Uptime

~2.5GB

Memory Footprint

5,000+

Daily Capacity

Volume Statistics (30 Days)

Daily Average: 1,686 emails/day
Peak Day: 2,106 emails

Minimum Day: 674 emails
Total Processed: 52,264 emails

How OpenEFA Spam Scoring Works

OpenEFA uses a multi-module scoring system where each analysis component contributes to the final spam score. This layered approach provides comprehensive threat detection while minimizing false positives.

1. Email Authentication Module

Validates sender authenticity using industry-standard protocols:

SPF: Verifies sending server is authorized
DKIM: Cryptographic signature validation
DMARC: Policy enforcement

Scoring:

✅ All pass: Score reduced (trusted)
⚠️ Partial: Neutral
❌ Failed: Score increased (high risk)

2. DNS Analysis Module

Advanced DNS validation and domain reputation:

RBL Checks: Multiple blocklist sources
Domain Spoofing: Multi-domain validation
PTR Records: Reverse DNS verification
Domain Age: New domain flagging

Scoring:

✅ Clean reputation: No impact
⚠️ Minor issues: Low increase
🛑 RBL listed: Moderate increase
❌ Spoofing detected: Significant increase

3. Phishing Detection Module

AI-powered analysis of phishing indicators:

Suspicious URL patterns (shortened, obfuscated)
Brand impersonation detection
Urgency language analysis
Credential harvesting indicators
Look-alike domain detection

Scoring:

✅ No indicators: No impact
⚠️ Low confidence: Low increase
🛑 Medium confidence: Moderate increase
❌ High confidence: Significant increase

4. Business Email Compromise (BEC)

Detects executive impersonation and wire fraud:

Display name spoofing detection
Payment request indicators
Urgency/secrecy language analysis
Executive title spoofing

Scoring:

✅ No BEC indicators: No impact
⚠️ Low confidence: Low increase
🛑 Medium confidence: Moderate increase
❌ High confidence: Significant increase

5. Behavioral Analysis Module

Analyzes sender behavior patterns and anomalies:

First contact detection
Sender reputation analysis
Graph-based relationship analysis

Scoring:

✅ Normal behavior: No impact
⚠️ Minor anomalies: Low increase
🛑 Significant anomalies: Moderate increase
❌ Severe anomalies: High increase

6. ML Ensemble Module

Adaptive learning from user feedback:

Multi-model ensemble voting
Confidence-weighted adjustments
Learns from released emails (false positives)
Learns from deleted spam (true positives)

Scoring:

✅ Ham prediction: Score reduced
⚠️ Uncertain: No impact
❌ Spam prediction: Score increased

Independent Email Security Testing: VBSpam Q2 2025

VBSpam is a quarterly independent benchmark run by Virus Bulletin. Products are tested against live spam feeds (Project Honey Pot, Abusix, MX Mail Data) and a curated newsletter ham corpus. Products that score 99.5%+ catch rate with zero false positives on the ham corpus earn the VBSpam+ certification.

Rank	Product	Final Score	FP Rate (Ham Corpus)	Certification
1	Bitdefender GravityZone Premium	99.995	0%	VBSpam+
2	SEPPmail.cloudfilter	99.989	0%	VBSpam+
3	Sophos Email	99.988	0%	VBSpam+
4	FortiMail	99.964	0%	VBSpam+
5	Net at Work NoSpamProxy	99.962	0%	VBSpam+
6	N-able Mail Assure	99.948	0%	VBSpam+
7	N-able SpamExperts	99.937	0%	VBSpam+
8	Mimecast	99.709	0%	VBSpam+
9	Zoho Mail	99.329	0%	VBSpam

Notable absences

Barracuda and Proofpoint do not participate in VBSpam testing, so their true catch rate and false-positive performance cannot be independently verified. Their public claims (e.g. "99.9% spam capture") are self-reported and do not disclose false-positive rates in comparable terms.

Where OpenEFA stands

OpenEFA does not currently submit to VBSpam testing. The figures above are reproduced from the published Q2 2025 VBSpam comparative review.

For reference, applying VBSpam's catch-rate methodology to our 30-day production data would yield a spam catch rate of ~99.98%. Our reported false-positive rate of 1.33% is measured against real customer traffic — which includes cold B2B sales, opt-in marketing, and mailing-list content — and is therefore stricter than VBSpam's controlled newsletter ham corpus, where most products score 0%. Direct numerical comparison is not apples-to-apples.

OpenEFA vs Commercial Email Security

Where detection metrics are not independently comparable across vendors, cost, deployment architecture, and data sovereignty are.

Attribute	OpenEFA	Barracuda	Mimecast	Proofpoint
Cost (50 users/year)	$199-799	~$3,000	~$4,800	~$7,200
Self-Hosted / Data Sovereignty	✅ Yes	❌ Cloud only	❌ Cloud only	❌ Cloud only
Source-Available	✅ Yes	❌ No	❌ No	❌ No
Submits to Independent Testing (VBSpam)	Not yet	❌ No	✅ Yes (Q2 2025: 99.709)	❌ No
Adaptive Learning from Customer Feedback	✅ Per-deployment	⚠️ Shared cloud model	⚠️ Shared cloud model	⚠️ Shared cloud model

Key Advantages

✅ Above-average accuracy (98.27% F1 Score)
✅ Strong precision (96.62%)
✅ Low false positive rate (1.33%)
✅ 60-80% cost savings vs. commercial

✅ Full transparency (detailed scoring)
✅ Data sovereignty (self-hosted)
✅ No vendor lock-in
✅ Continuous learning system

Data Quality & Methodology

Measurement Period

Start Date: April 13, 2026
End Date: May 13, 2026
Duration: 30 days
Total Emails: 52,264
Environment: Production deployment (31 domains, 89 recipients)

Classification Methodology

Spam Threshold: Score ≥ 18.0
Clean Threshold: Score < 6.0
Validation: User quarantine actions (releases)
Source: Production MySQL database

Why These Numbers Matter

This 30-day period represents OpenEFA's production performance with fully operational detection modules including multi-module spam scoring with 20+ detection components, AI-powered NLP analysis using spaCy en_core_web_lg, machine learning ensemble with adaptive learning, and real-time DNS and authentication validation.

Note: These statistics represent real production data from OpenEFA deployments across multiple client domains. All metrics are verifiable and reproducible from the source database.

Real-World Email Security Performance: 98.27% F1 Score Accuracy

Executive Summary

Key Metrics at a Glance

Understanding F1 Score: 98.27%

What This Means In Practice:

Industry Context

F1 Score Breakdown

98.27%

Email Processing Breakdown (30 Days)

Protected Infrastructure

Average Spam Scores by Disposition

Confusion Matrix (30-Day Period)

What These Numbers Mean:

Derived Metrics:

Spam Score Distribution (30 Days)

Intelligent Thresholds

69.0%

3.2%

27.8%

Top Blocked Threat Types

Machine Learning Performance

Ensemble Model Metrics

Base Model Performance (ROC AUC)

System Performance

<2s

99.9%

~2.5GB

5,000+

Volume Statistics (30 Days)

How OpenEFA Spam Scoring Works

Independent Email Security Testing: VBSpam Q2 2025

Notable absences

Where OpenEFA stands

OpenEFA vs Commercial Email Security

Key Advantages

Data Quality & Methodology

Measurement Period

Classification Methodology

Why These Numbers Matter

Ready to Experience These Results?