[GH-ISSUE #1] v2.0.0 #2

Open
opened 2026-03-04 00:58:09 +03:00 by kerem · 6 comments
Owner

Originally created by @niftylettuce on GitHub (Apr 24, 2020).
Original GitHub issue: https://github.com/spamscanner/spamscanner/issues/1

  • When we're parsing tokens, striptags implementation needs to additionally be pre-processed with sanitize-html to remove blocks like <style>, <stylesheet>, <meta>, <head> etc.
  • Modify scanner.getPhishingResults to check against OpenPhish and PhishTank datasets.
  • Tokenize and stem other mail headers (e.g. to, from, cc, bcc, reply-to, in-reply-to, etc.)
  • Determine solution to performance issue with classifier.train() in classifier.js per NaturalNode/natural#520.
  • Headers should NOT get converted and preserved for URL/Received-By purposes - only content should be converted
  • Get inspiration from ls /usr/share/spamassassin if needed
Originally created by @niftylettuce on GitHub (Apr 24, 2020). Original GitHub issue: https://github.com/spamscanner/spamscanner/issues/1 - [x] When we're parsing tokens, `striptags` implementation needs to additionally be pre-processed with `sanitize-html` to remove blocks like `<style>`, `<stylesheet>`, `<meta>`, `<head>` etc. - [x] Modify `scanner.getPhishingResults` to check against ~[OpenPhish][]~ and [PhishTank][] datasets. - [x] Tokenize and stem other mail headers (e.g. to, from, cc, bcc, reply-to, in-reply-to, etc.) - [x] Determine solution to performance issue with `classifier.train()` in `classifier.js` per [NaturalNode/natural#520](https://github.com/NaturalNode/natural/issues/520). - [x] Headers should NOT get converted and preserved for URL/Received-By purposes - only content should be converted - [x] Get inspiration from `ls /usr/share/spamassassin` if needed [openphish]: https://openphish.com/ [phishtank]: https://phishtank.com/ [nsfw]: https://github.com/infinitered/nsfwjs [toxicity]: https://github.com/tensorflow/tfjs-models/tree/master/toxicity
Author
Owner

@niftylettuce commented on GitHub (May 4, 2020):

  • Add more tests
<!-- gh-comment-id:623461784 --> @niftylettuce commented on GitHub (May 4, 2020): - [x] Add more tests
Author
Owner

@niftylettuce commented on GitHub (May 8, 2020):

  • Phishing protection is too strict (e.g. sendgrid link tracker/click trackers won't work)
<!-- gh-comment-id:626067573 --> @niftylettuce commented on GitHub (May 8, 2020): - [x] Phishing protection is too strict (e.g. sendgrid link tracker/click trackers won't work)
Author
Owner

@niftylettuce commented on GitHub (May 29, 2020):

  • Add SpamAssassin and rspam clamav integration
<!-- gh-comment-id:635749992 --> @niftylettuce commented on GitHub (May 29, 2020): - [X] Add ~SpamAssassin and rspam~ clamav integration
Author
Owner

@niftylettuce commented on GitHub (Jun 18, 2020):

  • Test against ARF abuse.zip
  • Track reputation of links (e.g. ham sent but with spammy links)
<!-- gh-comment-id:645874446 --> @niftylettuce commented on GitHub (Jun 18, 2020): - [ ] Test against ARF abuse.zip - [ ] Track reputation of links (e.g. ham sent but with spammy links)
Author
Owner

@niftylettuce commented on GitHub (Jun 18, 2020):

  • When ARF parses message, strip out the replacement tokens to get a pure content-only tokens array
  • We may want to do PG approach of looking at (n) most interesting words
  • Gibberish detection with Wikimedia and Google AI datasets
<!-- gh-comment-id:645875328 --> @niftylettuce commented on GitHub (Jun 18, 2020): - [ ] When ARF parses message, strip out the replacement tokens to get a pure content-only tokens array - [x] We may want to do PG approach of looking at (n) most interesting words - [ ] Gibberish detection with Wikimedia and Google AI datasets
Author
Owner

@niftylettuce commented on GitHub (Jun 18, 2020):

<!-- gh-comment-id:646321981 --> @niftylettuce commented on GitHub (Jun 18, 2020): - [x] Methods need wrapped with universalify https://github.com/spamscanner/spamscanner/blob/master/index.js#L352-L360
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/spamscanner#2
No description provided.