Real-World Email Security Performance: 98.27% F1 Score Accuracy

Production Statistics from 52,264 Emails Filtered Across 30 Days

Updated: May 13, 2026

30-Day Analysis: April 13 - May 13, 2026 AI-Powered Multi-Layer Email Security

Executive Summary

OpenEFA is an AI-powered email security platform that uses multi-layered analysis to detect spam, phishing, and malicious emails. Our advanced scoring system combines traditional authentication (SPF, DKIM, DMARC) with AI-powered behavioral analysis, DNS validation, and machine learning to provide industry-leading protection.

Over the past 30 days, OpenEFA has analyzed 52,264 emails with a 98.27% F1 Score and 96.62% precision. The system safely delivered 71.5% to inboxes, quarantined 3.9% for review, and auto-deleted 22.6% as high-confidence spam—all with <2 second processing time. Deployed across 31 protected domains serving 89 recipients, OpenEFA proves that AI-powered email security can deliver enterprise-grade protection at a fraction of the cost.

Industry F1 score context: Basic and open-source spam filters typically publish F1 scores of 0.75–0.88; average commercial secure email gateways score 0.85–0.92; strong enterprise vendors fall in the 0.92–0.95 band; and top-tier research and lab models on curated datasets reach 0.97–0.99+. Commercial vendors generally do not publish F1 scores directly — the closest available independent benchmark is Virus Bulletin’s VBSpam test, summarized further down this page.

Key Metrics at a Glance

Metric OpenEFA Value Industry Standard Status
F1 Score 98.27% 85-92% Above Average
Spam Detection Rate 99.98% 90-95% Above Average
False Positive Rate 1.33% 15-25% 94% Better
Precision 96.62% 88-93% Above Average
Emails Processed (30 days) 52,264 N/A Production Scale
Daily Volume ~1,686 emails/day N/A Peak: 2,106 emails/day

Understanding F1 Score: 98.27%

The F1 Score is the single best measure of email security effectiveness, combining both precision and recall into one metric.

What This Means In Practice:
  • Out of 100 spam emails: OpenEFA catches ~100
  • Out of 100 emails flagged: 97 are actually spam
  • Balance: Strong precision with high detection rate
Industry Context

Email security vendors generally do not publish F1 scores directly. The closest public benchmark is Virus Bulletin's quarterly VBSpam test, which measures spam catch rate and false-positive rate under controlled conditions. VBSpam+ certified products in 2025 ranged from 99.3% to 99.99% on their composite final score (Mimecast: 99.71%; Bitdefender: 99.99%). Barracuda and Proofpoint do not currently participate.

OpenEFA (May 2026): 98.27% F1 on production traffic — see the methodology notes below for the apples-to-apples caveat.

F1 Score Breakdown

98.27%

Overall F1 Score

Precision: 96.62%
Recall: 99.98%

Email Processing Breakdown (30 Days)

Disposition Count Percentage Description
Delivered (Safe) 37,346 71.5% Clean emails delivered safely to recipient inboxes
Quarantined (Review) 2,013 3.9% Suspicious emails held for user review and release
Auto-Deleted (Spam) 11,792 22.6% High-confidence spam automatically removed
Released 503 1.0% User-released from quarantine
Total Analyzed 52,264 100% All emails processed by OpenEFA
Protected Infrastructure
Protected Email Domains 31
Protected Recipients 89
Active Users 27
Blocking Rules 11,534
Unique Sender Domains Analyzed 6,475
Average Spam Scores by Disposition
Delivered Emails 0.43 Low risk
Quarantined Emails 59.80 High-risk spam
Auto-Deleted 70.80 Very high-risk spam
Released -6.04 False positives (trusted)
Overall Average 19.35 System baseline
Key Insight: The 59.37-point difference between delivered and quarantined emails demonstrates excellent separation between legitimate and malicious content.

Confusion Matrix (30-Day Period)

Predicted
Spam Clean
Actual Spam 14,389
True Positive
3
False Negative
Clean 503
False Positive
37,346
True Negative
What These Numbers Mean:
  • True Positives (14,389): Spam correctly identified and blocked
  • True Negatives (37,346): Clean emails correctly delivered
  • False Positives (503): Clean emails quarantined (recoverable)
  • False Negatives (3): Spam that slipped through
Derived Metrics:
  • Accuracy: (14,389 + 37,346) / 52,241 = 99.03%
  • Precision: 14,389 / (14,389 + 503) = 96.62%
  • Recall: 14,389 / (14,389 + 3) = 99.98%
  • Specificity: 37,346 / (37,346 + 503) = 98.67%

Spam Score Distribution (30 Days)

OpenEFA uses a graduated spam scoring system where each email receives a cumulative score based on multiple risk factors. Understanding score distribution helps evaluate system effectiveness and threshold tuning.

Score Range Risk Level Count Percentage Typical Action
0 - 5.9 Safe 36,083 69.0% ✅ Delivered
6.0 - 9.9 Suspicious 914 1.8% ⚠️ Quarantined
10.0 - 14.9 High Risk 758 1.5% 🛑 Quarantined
15.0+ Very High Risk 14,510 27.8% ❌ Auto-Deleted
Intelligent Thresholds

OpenEFA uses adaptive, multi-factor thresholds to determine email disposition. Emails are classified as delivered, quarantined, or auto-deleted based on cumulative scoring across all analysis modules.

69.0%

Clean Email (Safe)

3.2%

Suspicious (Quarantine)

27.8%

High-Risk Spam (Deleted)

Top Blocked Threat Types

Threat Type Count Description
First-Contact Risk (New Sender) 14,389 Sender and/or domain never seen before in system history
BEC (Business Email Compromise) 13,992 Payment/wire fraud, executive impersonation — 1,023 CRITICAL, 915 HIGH, 2,458 MED, 6,683 LOW
Adversarial Patterns 9,788 Obfuscation, evasion tactics, and adversarial content signals
Phishing Attempts 8,111 Credential harvesting, fake login pages, impersonation
Brand & Display-Name Impersonation 7,653 Spoofed brand names, lookalike domains, executive display-name tricks
SPF / DKIM / DMARC Failures 6,812 Authentication failures across one or more protocols
Marketing / Cold Commercial Patterns 5,718 Unsolicited bulk/marketing content flagged by content classifier
Suspicious Payment Signals 2,926 Invoice, wire transfer, and payment-redirect fraud indicators
EFA Collective RBL Matches 1,742 Crowd-sourced blocklist hits from the OpenEFA Collective
Virus / Malware Detected 252 Known-bad attachments and embedded malware signatures

Machine Learning Performance

OpenEFA's ML ensemble model uses multiple classifiers trained on production email data to provide adaptive spam detection.

Ensemble Model Metrics
Training Samples 23,348
Training Balance 11,674 spam / 11,674 ham
ML Accuracy 89.2%
ML F1 Score 89.5%
ML ROC AUC 96.0%
Features 130
Last Retrain May 8, 2026
Base Model Performance (ROC AUC)
LightGBM 96.0%
XGBoost 95.9%
CatBoost 95.5%
Random Forest 95.0%
Logistic Regression 93.0%
Ensemble Strategy: Multiple models are combined using stacking to achieve higher accuracy than any individual model.

System Performance

<2s

Avg Processing Time

99.9%

System Uptime

~2.5GB

Memory Footprint

5,000+

Daily Capacity

Volume Statistics (30 Days)
  • Daily Average: 1,686 emails/day
  • Peak Day: 2,106 emails
  • Minimum Day: 674 emails
  • Total Processed: 52,264 emails

How OpenEFA Spam Scoring Works

OpenEFA uses a multi-module scoring system where each analysis component contributes to the final spam score. This layered approach provides comprehensive threat detection while minimizing false positives.

1. Email Authentication Module

Validates sender authenticity using industry-standard protocols:

  • SPF: Verifies sending server is authorized
  • DKIM: Cryptographic signature validation
  • DMARC: Policy enforcement
Scoring:
  • ✅ All pass: Score reduced (trusted)
  • ⚠️ Partial: Neutral
  • ❌ Failed: Score increased (high risk)
2. DNS Analysis Module

Advanced DNS validation and domain reputation:

  • RBL Checks: Multiple blocklist sources
  • Domain Spoofing: Multi-domain validation
  • PTR Records: Reverse DNS verification
  • Domain Age: New domain flagging
Scoring:
  • ✅ Clean reputation: No impact
  • ⚠️ Minor issues: Low increase
  • 🛑 RBL listed: Moderate increase
  • ❌ Spoofing detected: Significant increase
3. Phishing Detection Module

AI-powered analysis of phishing indicators:

  • Suspicious URL patterns (shortened, obfuscated)
  • Brand impersonation detection
  • Urgency language analysis
  • Credential harvesting indicators
  • Look-alike domain detection
Scoring:
  • ✅ No indicators: No impact
  • ⚠️ Low confidence: Low increase
  • 🛑 Medium confidence: Moderate increase
  • ❌ High confidence: Significant increase
4. Business Email Compromise (BEC)

Detects executive impersonation and wire fraud:

  • Display name spoofing detection
  • Payment request indicators
  • Urgency/secrecy language analysis
  • Executive title spoofing
Scoring:
  • ✅ No BEC indicators: No impact
  • ⚠️ Low confidence: Low increase
  • 🛑 Medium confidence: Moderate increase
  • ❌ High confidence: Significant increase
5. Behavioral Analysis Module

Analyzes sender behavior patterns and anomalies:

  • First contact detection
  • Sender reputation analysis
  • Graph-based relationship analysis
Scoring:
  • ✅ Normal behavior: No impact
  • ⚠️ Minor anomalies: Low increase
  • 🛑 Significant anomalies: Moderate increase
  • ❌ Severe anomalies: High increase
6. ML Ensemble Module

Adaptive learning from user feedback:

  • Multi-model ensemble voting
  • Confidence-weighted adjustments
  • Learns from released emails (false positives)
  • Learns from deleted spam (true positives)
Scoring:
  • ✅ Ham prediction: Score reduced
  • ⚠️ Uncertain: No impact
  • ❌ Spam prediction: Score increased

Independent Email Security Testing: VBSpam Q2 2025

VBSpam is a quarterly independent benchmark run by Virus Bulletin. Products are tested against live spam feeds (Project Honey Pot, Abusix, MX Mail Data) and a curated newsletter ham corpus. Products that score 99.5%+ catch rate with zero false positives on the ham corpus earn the VBSpam+ certification.

Rank Product Final Score FP Rate (Ham Corpus) Certification
1Bitdefender GravityZone Premium99.9950%VBSpam+
2SEPPmail.cloudfilter99.9890%VBSpam+
3Sophos Email99.9880%VBSpam+
4FortiMail99.9640%VBSpam+
5Net at Work NoSpamProxy99.9620%VBSpam+
6N-able Mail Assure99.9480%VBSpam+
7N-able SpamExperts99.9370%VBSpam+
8Mimecast99.7090%VBSpam+
9Zoho Mail99.3290%VBSpam
Notable absences

Barracuda and Proofpoint do not participate in VBSpam testing, so their true catch rate and false-positive performance cannot be independently verified. Their public claims (e.g. "99.9% spam capture") are self-reported and do not disclose false-positive rates in comparable terms.

Where OpenEFA stands

OpenEFA does not currently submit to VBSpam testing. The figures above are reproduced from the published Q2 2025 VBSpam comparative review.

For reference, applying VBSpam's catch-rate methodology to our 30-day production data would yield a spam catch rate of ~99.98%. Our reported false-positive rate of 1.33% is measured against real customer traffic — which includes cold B2B sales, opt-in marketing, and mailing-list content — and is therefore stricter than VBSpam's controlled newsletter ham corpus, where most products score 0%. Direct numerical comparison is not apples-to-apples.

OpenEFA vs Commercial Email Security

Where detection metrics are not independently comparable across vendors, cost, deployment architecture, and data sovereignty are.

Attribute OpenEFA Barracuda Mimecast Proofpoint
Cost (50 users/year) $199-799 ~$3,000 ~$4,800 ~$7,200
Self-Hosted / Data Sovereignty ✅ Yes ❌ Cloud only ❌ Cloud only ❌ Cloud only
Source-Available ✅ Yes ❌ No ❌ No ❌ No
Submits to Independent Testing (VBSpam) Not yet ❌ No ✅ Yes (Q2 2025: 99.709) ❌ No
Adaptive Learning from Customer Feedback ✅ Per-deployment ⚠️ Shared cloud model ⚠️ Shared cloud model ⚠️ Shared cloud model
Key Advantages
  • ✅ Above-average accuracy (98.27% F1 Score)
  • ✅ Strong precision (96.62%)
  • ✅ Low false positive rate (1.33%)
  • ✅ 60-80% cost savings vs. commercial
  • ✅ Full transparency (detailed scoring)
  • ✅ Data sovereignty (self-hosted)
  • ✅ No vendor lock-in
  • ✅ Continuous learning system

Data Quality & Methodology

Measurement Period
  • Start Date: April 13, 2026
  • End Date: May 13, 2026
  • Duration: 30 days
  • Total Emails: 52,264
  • Environment: Production deployment (31 domains, 89 recipients)
Classification Methodology
  • Spam Threshold: Score ≥ 18.0
  • Clean Threshold: Score < 6.0
  • Validation: User quarantine actions (releases)
  • Source: Production MySQL database
Why These Numbers Matter

This 30-day period represents OpenEFA's production performance with fully operational detection modules including multi-module spam scoring with 20+ detection components, AI-powered NLP analysis using spaCy en_core_web_lg, machine learning ensemble with adaptive learning, and real-time DNS and authentication validation.

Note: These statistics represent real production data from OpenEFA deployments across multiple client domains. All metrics are verifiable and reproducible from the source database.

Ready to Experience These Results?

Join organizations worldwide protecting their email with OpenEFA's AI-powered security.