[GH-ISSUE #10] Not detecting anything #7

Closed
opened 2026-03-04 00:58:19 +03:00 by kerem · 15 comments
Owner

Originally created by @davidpanic on GitHub (Dec 3, 2020).
Original GitHub issue: https://github.com/spamscanner/spamscanner/issues/10

I've fed thousands of emails to scanner.scan and not even one of them was detected as spam. I've looked at some of the eml files and like 30% of them are definitely the type of spam any spam scanner should be able to detect, so what's the deal? Am I missing a step somewhere? Do I need to provide it with spam to learn, if so what function do I call?

Originally created by @davidpanic on GitHub (Dec 3, 2020). Original GitHub issue: https://github.com/spamscanner/spamscanner/issues/10 I've fed thousands of emails to scanner.scan and not even one of them was detected as spam. I've looked at some of the eml files and like 30% of them are definitely the type of spam any spam scanner should be able to detect, so what's the deal? Am I missing a step somewhere? Do I need to provide it with spam to learn, if so what function do I call?
kerem closed this issue 2026-03-04 00:58:20 +03:00
Author
Owner

@niftylettuce commented on GitHub (Feb 17, 2021):

I haven't published the classifier.json with the package yet... working on it. Will ping back as soon ready. The other parts work other than classification.

<!-- gh-comment-id:780310134 --> @niftylettuce commented on GitHub (Feb 17, 2021): I haven't published the `classifier.json` with the package yet... working on it. Will ping back as soon ready. The other parts work other than classification.
Author
Owner

@eldoy commented on GitHub (Jun 13, 2021):

@niftylettuce Any news on this? Or info on how to train my own?

Should I just use something like this?

classifier.learn('good email content', 'ham')
classifier.learn('bad email content', 'spam')

Would be nice if you at least could post an example on how the file is supposed to look like...

<!-- gh-comment-id:860131563 --> @eldoy commented on GitHub (Jun 13, 2021): @niftylettuce Any news on this? Or info on how to train my own? Should I just use something like this? ```js classifier.learn('good email content', 'ham') classifier.learn('bad email content', 'spam') ``` Would be nice if you at least could post an example on how the file is supposed to look like...
Author
Owner

@niftylettuce commented on GitHub (Jun 13, 2021):

I haven't uploaded the classifier.json file but should have it soon and then will release a new version bump to this and then ping you back here. The reason it's not classifying ham/spam is because it's an empty data set.

<!-- gh-comment-id:860252266 --> @niftylettuce commented on GitHub (Jun 13, 2021): I haven't uploaded the `classifier.json` file but should have it soon and then will release a new version bump to this and then ping you back here. The reason it's not classifying ham/spam is because it's an empty data set.
Author
Owner

@eldoy commented on GitHub (Jun 13, 2021):

@niftylettuce All right, thank you!

<!-- gh-comment-id:860254662 --> @eldoy commented on GitHub (Jun 13, 2021): @niftylettuce All right, thank you!
Author
Owner

@JaTochNietDan commented on GitHub (Aug 27, 2021):

Any update on the pre-learned bayesian dataset being published?

<!-- gh-comment-id:907091221 --> @JaTochNietDan commented on GitHub (Aug 27, 2021): Any update on the pre-learned bayesian dataset being published?
Author
Owner

@niftylettuce commented on GitHub (Aug 27, 2021):

Yes I should have it out within a month or so

On Friday, August 27, 2021, JaTochNietDan @.***> wrote:

Any update on the pre-learned bayesian dataset being published?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/spamscanner/spamscanner/issues/10#issuecomment-907091221,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAD7XBPKIX2BF5FGT75UV7LT65QLVANCNFSM4UL6IL7A
.
Triage notifications on the go with GitHub Mobile for iOS
https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675
or Android
https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

<!-- gh-comment-id:907202109 --> @niftylettuce commented on GitHub (Aug 27, 2021): Yes I should have it out within a month or so On Friday, August 27, 2021, JaTochNietDan ***@***.***> wrote: > Any update on the pre-learned bayesian dataset being published? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/spamscanner/spamscanner/issues/10#issuecomment-907091221>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAD7XBPKIX2BF5FGT75UV7LT65QLVANCNFSM4UL6IL7A> > . > Triage notifications on the go with GitHub Mobile for iOS > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> > or Android > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. > >
Author
Owner

@seeden commented on GitHub (Oct 11, 2021):

Hi @niftylettuce. I have started to use your classifier as well but I realized the same problem with the missing JSON file. Can you publish your file? I am trying to use it just like a spam detector without additional features.

<!-- gh-comment-id:939915733 --> @seeden commented on GitHub (Oct 11, 2021): Hi @niftylettuce. I have started to use your classifier as well but I realized the same problem with the missing JSON file. Can you publish your file? I am trying to use it just like a spam detector without additional features.
Author
Owner

@0x11DFE commented on GitHub (Feb 2, 2022):

I haven't uploaded the classifier.json file but should have it soon and then will release a new version bump to this and then ping you back here. The reason it's not classifying ham/spam is that it's an empty data set.

Any progress on your dataset? I was working for a few days on an automated system for dealing with my ProtonMail inbox spam.
For the time being, I didn't finish the part where it decrypts the PGP Message I got a slower alternative solution where I am using puppeteer and downloading email's one by one. If you cannot release your classifier could you at least tell me how I could train my own?

<!-- gh-comment-id:1027614469 --> @0x11DFE commented on GitHub (Feb 2, 2022): > I haven't uploaded the `classifier.json` file but should have it soon and then will release a new version bump to this and then ping you back here. The reason it's not classifying ham/spam is that it's an empty data set. Any progress on your dataset? I was working for a few days on an automated system for dealing with my ProtonMail inbox spam. For the time being, I didn't finish the part where it decrypts the PGP Message I got a slower alternative solution where I am using puppeteer and downloading email's one by one. If you cannot release your classifier could you at least tell me how I could train my own?
Author
Owner

@Pieter0313 commented on GitHub (Apr 17, 2023):

I would also like to receive the classifier.json file it possible. I get the feeling that you're not working on this project anymore, but could you at least tell me how to train it myself?

<!-- gh-comment-id:1511663265 --> @Pieter0313 commented on GitHub (Apr 17, 2023): I would also like to receive the classifier.json file it possible. I get the feeling that you're not working on this project anymore, but could you at least tell me how to train it myself?
Author
Owner

@suyash-awasthi commented on GitHub (Oct 22, 2024):

Is there any update on this? @niftylettuce

<!-- gh-comment-id:2428766540 --> @suyash-awasthi commented on GitHub (Oct 22, 2024): Is there any update on this? @niftylettuce
Author
Owner

@titanism commented on GitHub (Oct 23, 2024):

We hope to release the data set, an API, and Spam Scanner v7 later this year or early next. We've been hyper focused on some other pressing challenges at https://forwardemail.net right now.

<!-- gh-comment-id:2433496821 --> @titanism commented on GitHub (Oct 23, 2024): We hope to release the data set, an API, and Spam Scanner v7 later this year or early next. We've been hyper focused on some other pressing challenges at https://forwardemail.net right now.
Author
Owner

@tmikaeld commented on GitHub (Jan 6, 2025):

We'd love to be able to use spamscanner as well, it's now 2025.

Is there an ETA when it can become usable?

<!-- gh-comment-id:2572653058 --> @tmikaeld commented on GitHub (Jan 6, 2025): We'd love to be able to use spamscanner as well, it's now 2025. Is there an ETA when it can become usable?
Author
Owner

@titanism commented on GitHub (Jan 6, 2025):

@tmikaeld We've been hard at work on @forwardemail and introducing critical features and infrastructure updates. It is still planned to work on this year and a lot of effort has been put in (there are a ton of comments/notes/todo's in our codebase in our monorepo regarding spamscanner). There is no ETA at this time, but if you'd like to follow along, contribute, or join our community - we have a Matrix chat channel you can find on our website under the Community dropdown menu.

<!-- gh-comment-id:2573966976 --> @titanism commented on GitHub (Jan 6, 2025): @tmikaeld We've been hard at work on @forwardemail and introducing critical features and infrastructure updates. It is still planned to work on this year and a lot of effort has been put in (there are a ton of comments/notes/todo's in our codebase in our monorepo regarding spamscanner). There is no ETA at this time, but if you'd like to follow along, contribute, or join our community - we have a Matrix chat channel you can find on our website under the Community dropdown menu.
Author
Owner

@tmikaeld commented on GitHub (Jan 7, 2025):

@titanism Thanks for the response! I'm impressed with what you're doing both with spamscanner and forwardemail, both are unique and really hard to pull off. I'll see if can contribute and set myself into spamscanner. What you're doing with it has been sorely needed for a very very long time.

<!-- gh-comment-id:2574554520 --> @tmikaeld commented on GitHub (Jan 7, 2025): @titanism Thanks for the response! I'm impressed with what you're doing both with spamscanner and forwardemail, both are unique and really hard to pull off. I'll see if can contribute and set myself into spamscanner. What you're doing with it has been sorely needed for a very very long time.
Author
Owner

@titanism commented on GitHub (Dec 22, 2025):

v6 released, we will update classifier.json (there's one published now with sha256) after @fwdemail integration (we're on older v5). the current classifier.json is not that accurate, but we will improve after integration (since we process millions of emails daily, it'll be very accurate soon enough).

https://github.com/spamscanner/spamscanner

https://github.com/spamscanner/spamscanner/releases

X post/announcement @ https://x.com/fwdemail/status/2002872581402063281

we also support TypeScript now in the project (thx to AI, we despise TS internally tho)

<!-- gh-comment-id:3679766681 --> @titanism commented on GitHub (Dec 22, 2025): v6 released, we will update classifier.json (there's one published now with sha256) after @fwdemail integration (we're on older v5). the current classifier.json is not that accurate, but we will improve after integration (since we process millions of emails daily, it'll be very accurate soon enough). <https://github.com/spamscanner/spamscanner> <https://github.com/spamscanner/spamscanner/releases> X post/announcement @ <https://x.com/fwdemail/status/2002872581402063281> we also support TypeScript now in the project (thx to AI, we despise TS internally tho)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/spamscanner#7
No description provided.