starred/spamscanner

Fork 0

mirror of https://github.com/spamscanner/spamscanner.git synced 2026-04-27 12:45:50 +03:00

[GH-ISSUE #4] How to train Naive Bayes Classifier ? #5

New issue

Closed

opened 2026-03-04 00:58:17 +03:00 by kerem · 6 comments

kerem commented

2026-03-04 00:58:17 +03:00

Owner

Originally created by @JQuags on GitHub (Aug 22, 2020).
Original GitHub issue: https://github.com/spamscanner/spamscanner/issues/4

Is there more information on how to train the classifier?

I see in the source classifier.json is currently private, which explains the broken links on the site.

The source indicates removing classifier.json, should be all that is needed to train and set SPAM_CATEGORY and SCAN_DIRECTOR. Is that all then feed a directory of spam or ham in EML or ARF format?

Originally created by @JQuags on GitHub (Aug 22, 2020). Original GitHub issue: https://github.com/spamscanner/spamscanner/issues/4 Is there more information on how to train the classifier? I see in the source classifier.json is currently private, which explains the broken links on the site. The source indicates removing classifier.json, should be all that is needed to train and set SPAM_CATEGORY and SCAN_DIRECTOR. Is that all then feed a directory of spam or ham in EML or ARF format?

kerem closed this issue

2026-03-04 00:58:18 +03:00

kerem commented

2026-03-04 00:58:19 +03:00

Author

Owner

@wis commented on GitHub (Sep 13, 2020):

I thought you provided a well trained classifier.json, the link in the README 404s, why was it removed? @niftylettuce

@wis commented on GitHub (Sep 13, 2020): I thought you provided a well trained classifier.json, the link in the README 404s, why was it removed? @niftylettuce

kerem commented

2026-03-04 00:58:19 +03:00

Author

Owner

@JQuags commented on GitHub (Sep 14, 2020):

(spam dataset is private at the moment) - is in the comments

I suspect it never has been provided, and there may be privacy reason.

@JQuags commented on GitHub (Sep 14, 2020): * (spam dataset is private at the moment) - is in the comments I suspect it never has been provided, and there may be privacy reason.

kerem commented

2026-03-04 00:58:19 +03:00

Author

Owner

@niftylettuce commented on GitHub (Sep 14, 2020):

I should have this published in the near future. Currently I had to put my focus on something else. But this is not a privacy concern anymore as I have sha256 hashed all the tokens.

@niftylettuce commented on GitHub (Sep 14, 2020): I should have this published in the near future. Currently I had to put my focus on something else. But this is not a privacy concern anymore as I have sha256 hashed all the tokens.

kerem commented

2026-03-04 00:58:19 +03:00

Author

Owner

@wis commented on GitHub (Sep 16, 2020):

good! can we contribute to the training data by forwarding spam emails from our inbox to an email address you setup?

@wis commented on GitHub (Sep 16, 2020): good! can we contribute to the training data by forwarding spam emails from our inbox to an email address you setup?

kerem commented

2026-03-04 00:58:19 +03:00

Author

Owner

@niftylettuce commented on GitHub (Sep 16, 2020):

abuse@forwardemail.net works

On Tue, Sep 15, 2020 at 11:55 PM Wis notifications@github.com wrote:

good! can we contribute to the training data by forwarding spam emails
from our inbox to an email address you setup?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/spamscanner/spamscanner/issues/4#issuecomment-693168925,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAD7XBLTZNPBYGBGKE6YWFTSGBAMJANCNFSM4QH3AZLQ
.

@niftylettuce commented on GitHub (Sep 16, 2020): abuse@forwardemail.net works On Tue, Sep 15, 2020 at 11:55 PM Wis <notifications@github.com> wrote: > good! can we contribute to the training data by forwarding spam emails > from our inbox to an email address you setup? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/spamscanner/spamscanner/issues/4#issuecomment-693168925>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAD7XBLTZNPBYGBGKE6YWFTSGBAMJANCNFSM4QH3AZLQ> > . >

kerem commented

2026-03-04 00:58:19 +03:00

Author

Owner

@titanism commented on GitHub (Dec 22, 2025):

see https://github.com/spamscanner/spamscanner?tab=readme-ov-file#custom-classifier

you'd just write the JSON file you train to classifier.json and then load it basically

you can also make it do sha256 hashing (customizable)

v6 released, we will update classifier.json (there's one published now with sha256) after @fwdemail integration (we're on older v5). the current classifier.json is not that accurate, but we will improve after integration (since we process millions of emails daily, it'll be very accurate soon enough).

https://github.com/spamscanner/spamscanner

https://github.com/spamscanner/spamscanner/releases

X post/announcement @ https://x.com/fwdemail/status/2002872581402063281

we also support TypeScript now in the project (thx to AI, we despise TS internally tho)

@titanism commented on GitHub (Dec 22, 2025): see <https://github.com/spamscanner/spamscanner?tab=readme-ov-file#custom-classifier> you'd just write the JSON file you train to classifier.json and then load it basically you can also make it do sha256 hashing (customizable) --- v6 released, we will update classifier.json (there's one published now with sha256) after @fwdemail integration (we're on older v5). the current classifier.json is not that accurate, but we will improve after integration (since we process millions of emails daily, it'll be very accurate soon enough). <https://github.com/spamscanner/spamscanner> <https://github.com/spamscanner/spamscanner/releases> X post/announcement @ <https://x.com/fwdemail/status/2002872581402063281> we also support TypeScript now in the project (thx to AI, we despise TS internally tho)

kerem referenced this issue

2026-03-04 00:58:27 +03:00

[PR #5] [MERGED] chore(deps): bump node-fetch from 2.6.0 to 2.6.1 #17