mirror of
https://github.com/axllent/mailpit.git
synced 2026-04-26 08:45:54 +03:00
[GH-ISSUE #30] Wrong html santitization for search column #25
Labels
No labels
awaiting feedback
bug
docker
documentation
enhancement
github_actions
invalid
pull-request
question
stale
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/mailpit#25
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @kzaitsev on GitHub (Dec 30, 2022).
Original GitHub issue: https://github.com/axllent/mailpit/issues/30
Hello, it seems something is wrong with HTML sanitization when you build the
searchcolumn. it looks like some tags were ignored and not unwrapped to text. As a result, when you try to find the email by word in the body, you can't get it.To reproduce, I'll attach a zipped eml file. In this case, the text "massmailgoodhost" will be dropped, and the "search" field will not contain it.
It seems like a bug of https://github.com/k3a/html2text, but instead of it, why not use the
Textfield of the envelope structure, which returns the eml parser (github.com/jhillyerd/enmime)?d80e6ca4-fb3c-4dcb-a6f8-030af2f8278f.eml.zip
@axllent commented on GitHub (Dec 30, 2022):
Thanks for the information @kzaitsev. We can't rely on the envelope
Textvalue because so many HTML emails actually have something likeYou require an HTML-compatible email program to read thisrather than an actual text version of the HTML, or a very dumbed-down/broken version of the HTML. From memory the enmime Text value isn't a conversion of HTML but rather theContent-Type: text/plain;part of an email (if set, else blank).The best solution is still to manually convert the HTML (if set) to text, but I'll need to dig much deeper as to exactly why it is happening, and if it is an issue with html2text then that will need to be reported there to fix. Unfortunately I'm just heading off for a short holiday today, so it will be two weeks before I can probably look into this.
I also see I am stripping out
:(when I "clean the text) which results inhttps //www.example.com.... (just noting it here so I don't forget to remove that).@kzaitsev commented on GitHub (Dec 30, 2022):
@axllent thank you for your quick response, I understand.
I do some investigation and it seems https://github.com/jhillyerd/enmime uses https://github.com/jaytaylor/html2text to convert HTML to text in the case of HTML-only emails.
@axllent commented on GitHub (Jan 4, 2023):
Thanks again for reporting this. I found an option I could pass to html2text to include anchor content in the returned output, so this should now be solved in the latest v.1.3.5 release. Please feel free to re-open this if it does not solve your issue!