Privacy-Preserving Adaptive AI for Intelligent Spam Detection
The system learns from low-spam-score emails (spam_score < 2.5) that are considered legitimate. It uses this knowledge to:
Example: If your clients often use "deductible", "premium", or "coverage", these are learned as legitimate terms.
Tracks sender domain ↔ recipient domain communication frequency
Stores:
Example: Frequent low-spam exchanges between insurance.example and your domain improve relationship confidence.
Recognizes predefined business phrases such as:
Tracks frequency and average spam score per phrase.
Maintains per-domain communication metrics:
The system computes a legitimacy score (0–1) based on several weighted components:
These factors are combined to adjust the spam score. Weights and formulas are configurable and can be tuned for specific environments.
Configuration settings are stored in a database table or configuration file. Example keys:
| Config Key | Default | Description |
|---|---|---|
max_adjustment |
2.0 | Maximum spam score adjustment (positive or negative) |
learning_enabled |
true | Enables/disables the learning module |
min_messages_for_learning |
10 | Minimum number of messages before applying adjustments |
vocab_learning_threshold |
3 | Minimum word frequency before inclusion |
relationship_confidence_threshold |
5 | Number of exchanges needed for high confidence |
The system follows strict privacy protection guidelines:
Each word is hashed using SHA256 with an environment-defined private salt:
hash = sha256(f"{env_salt}{word.lower()}").hexdigest()[:16]
The actual words are never stored.
Only domain names are stored, not full email addresses.
Body text and subjects are discarded after analysis.
Frequency counts, averages, and timestamps only.
A web interface at /learning provides insights into the system's progress:
The dashboard only displays data for domains the viewer is authorized to access.
The system automatically learns during normal operation. You can also manually provide legitimate emails to improve learning:
python3 scripts/feed_good_emails_to_learning.py \
--sender "client@example.com" \
--recipients "team@example.com,support@example.com" \
--subject "Follow-up on contract" \
--body "Per our discussion, please find attached the updated policy." \
--score 0.5
This helps bootstrap or correct model learning for new domains.
The learning module can be integrated into spam filtering pipelines via a call such as:
def analyze_with_learning(msg, text_content, spam_score):
learner = ConversationLearner()
legitimacy = learner.calculate_legitimacy_score(msg, text_content)
if spam_score < 2.5:
learner.learn_from_email(msg, text_content, spam_score)
return legitimacy
This adjustment is applied before final spam-handling decisions.
Main logical components:
Database names and schema details can be found in the developer documentation.
Example: client@insurance.example frequently emails yourcompany.example.
All domains shown here are fictional and for demonstration only.
[project_root]/modules/conversation_learner_mysql.py – MySQL-based learning engine[project_root]/scripts/feed_good_emails_to_learning.py – Manual training script[project_root]/scripts/get_learning_stats.py – Statistics collector[config_dir]/.env – Environment configurationThe OpenEFA Learning System is a self-adapting, privacy-conscious filter that:
The more high-quality emails it sees, the smarter it becomes — continuously improving the accuracy of spam and ham classification without compromising privacy.
Last Updated: October 27, 2025
System Version: OpenEFA Learning Engine v2.0
Author: OpenEFA Team