mirror of
https://github.com/axllent/mailpit.git
synced 2026-04-26 00:35:51 +03:00
[GH-ISSUE #448] Attachment extraction too slow #291
Labels
No labels
awaiting feedback
bug
docker
documentation
enhancement
github_actions
invalid
pull-request
question
stale
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/mailpit#291
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @baiomys on GitHub (Feb 22, 2025).
Original GitHub issue: https://github.com/axllent/mailpit/issues/448
Hi.
Please consider user selectable option for attachment compression and additional endpoint(s).
Thanks.
@axllent commented on GitHub (Feb 22, 2025):
Hi.
I don't understand the question(s). The message summary returns all
Attachments&Inlinefilenames & sizes. This also includes thePartIDwhich is used as the index for the attachment retrieval. How is this too slow?@baiomys commented on GitHub (Feb 22, 2025):
Call to extract ANY message part via
/api/v1/message/{id}/part/{part}
takes over 7 seconds if large attachment(s) (15 megabytes) exist in message.
On Ryzen things are slightly better, but also not brilliant.
Message summary does return filenames and sizes, but it is not always convenient to call and parse this method from JS on page.
Sorry I mixed an issue and feature request together, my fault.
@axllent commented on GitHub (Feb 22, 2025):
This is really a hardware limitation as I return a 27MB attachment in 0.64s. A dual core G-T40N is an extremely weak processor from 2011, and you're not ever going to get any performance from this these days.
In order for Mailpit to get any attachment, it has to extract the entire raw message from the database, decompress that message (ZSTD), parse the message, and finally return the attachment via HTTP. In the next release of Mailpit there is a change that is using a faster ZSTD compression setting (less-compressed), but it's not going to make any noticeable difference on your laptop unfortunately. A 15MB attachment means a message of about 25-30MB in size (due to email encoding), so it's a lot of work to process a message like that.
I really don't know what to say about the "not always convenient" comment either. Email parts are stored dynamically in messages, so there is no guaranteed format or order of message parts. The API conveniently extracts all the information you need in the summary, and the Mailpit web UI uses the exact same API calls to get all the message attachments, paths, sizes, names etc. If you want to know what's in the message then you have to use the summary - it tells you exactly what you need to know to get the parts.
@baiomys commented on GitHub (Feb 22, 2025):
Is it possible to add an option to COMPLETELY disable compression?
RAW email message can be easily stored in database without any extra cpu consuming operations, so IMHO user should be able to choose between space and speed.
BTW most modern Linux systems use ZFS and BTRFS, which support native compression, so using it TWICE is massive overkill.
@axllent commented on GitHub (Feb 22, 2025):
Yes, I will consider this option, although I do not know how much difference it will make in your situation. No compression means more data in/out of the database (disk i/o) which is slower. The message still needs to be parsed and the attachment"extracted", so it may not make much difference in the end. Is your laptop using a 5400RPM spinning laptop hard drive, or an SSD?
I need to do a lot more testing and give this some more thought.
Also yes, some modern filesystems have optional compression, but I don't think they work well (or even at all) on files like SQLite databases which are continuously changing.
@baiomys commented on GitHub (Feb 22, 2025):
I ran gzip compression/decompression test on RAW email which was so slow to process
Bash script made 20 (!) compression/decompression cycles of entire message in 4 seconds on same CPU.
So it looks like your concept of storing and retrieving attachments is somewhat not optimal.
@axllent commented on GitHub (Feb 22, 2025):
I said zstd, not gzip - those are very different compression techniques, and zstd is much faster. I also don't understand the second part about the bash script with 20 cycles in 4 seconds - 20 cycles doing what?
I will say it again though, just to make it perfectly clear, even if I add an option to remove database compression altogether, you are not going to get great performance on your old hardware working with a 15MB attachment in an email. This is because of your hardware. Speaking of your hardware, you did not answer my question about what type of hard drive you are using?
@baiomys commented on GitHub (Feb 22, 2025):
Empty database on ramfs
20 cycles of compressing and decompressing RAW email message
GZIP is slower than ZSTD, but it took only 200 milliseconds to process file in both directions.
So I am curious what took approximately 6800 ms (7000 - 200) to handle API request on same CPU.
=)
@axllent commented on GitHub (Feb 22, 2025):
Are you saying that it took 200ms to compress that raw email (with the 15MB attachment) on your machine using gzip? That is not possible unless your machine is caching the input and output, or your bash script is not working as you expected.
Ramfs will give you "disk" performance because it's all in RAM. How much RAM does that machine have, and how much usable RAM is actually available for Mailpit when you are running it?
@baiomys commented on GitHub (Feb 22, 2025):
Well, you were right, script results for GZIP was wrong as I took truncated file.
Sorry for confusion.
Now I downloaded ZSTD and took right file, results are:
1.5 seconds for level 1
3 seconds for level 3
raw file size 20.5M
So if we completely remove compression and all intermediate conversion procedures POSSIBLY results will be less weird.
It's home OPENWRT router equipped with 4Gb of RAM and most of it is free.
@axllent commented on GitHub (Feb 22, 2025):
The current edge docker built has zstd compression set as 1 (it was 3 before). Could you try that and see if there is at least some improvement (compared to the 7 seconds you mentioned before)? You will need to store the message in the database again (so it is also compressed with 1). This is just to test as I'm curious. I think that most CPU is used for compression, not decompression, so I'm not expecting any miracles, but it will be interesting anyway.
The option I will look at is disabling the zstd compression altigether, but note that Mailpit still will need to process the email to get the attachment parts - there is no way around this. The only difference would be that it does not need to decompress it.
I will also look at a separate option to possibly disable HTTP compression, which may be another thing that is slowing you down. This is easy to test though using
curlon the API attachment URL which by default does not request HTTP compression (unlike your browser which asks for it).@baiomys commented on GitHub (Feb 22, 2025):
Nice, but 7 seconds result was just for displaying 200K image in HTML part from email containing 15 Mb attachment.
So even without direct download of attachment results are frustrating. And it is definitely caused by message preprocessing.
Will try it in the morning. Thanks for your time!
@axllent commented on GitHub (Feb 22, 2025):
To get the 200K image it still has to decompress and process the whole message to extract that attachment. The message processing requirement will not change - I'm just making this clear - so if it's that part which becomes the bottleneck on your end, then you're just going to have to either learn to live with it or use better hardware.
@baiomys commented on GitHub (Feb 23, 2025):
We won a second or even less in new version.
I have IO timeout set to 5 seconds on API operations and it still triggers.
Maybe you can do some profiling on code performing extraction of MIME parts and make it a bit more efficient,
i.e. unroll recursion, limit concurrent goroutines etc.
Maybe this can help
https://github.com/inflex/ripMIME
@axllent commented on GitHub (Feb 23, 2025):
I have been doing a lot of profiling today in relation to #447 - but that is RAM-specific, not CPU. Disabling message compression may help your problem once it's complete & merged, it may not. As I said before, I will also add an option to disable HTTP compression (if the browser requests it), but that's about it. I can spend days profiling, looking for little improvements, but the fact is you're using a shit processor and expecting too much :-) What if you now want to attach a 40MB file, or 100MB.... where does it end?
@baiomys commented on GitHub (Feb 23, 2025):
This limit is already set to 25Mb entire message size on edge SMTP gateway.
It's a challenge. Most of time (over 20 years) I develop commercial software using C and C++ for embedded systems, where resources are VERY limited. So it is painful to see how Go application spend 7 seconds to process TINY plain text message.
=))))
I hope that removing compression will reduce processing time to reasonable value.
BTW you mentioned 27MB attachment in 0.64s, which makes me think that speed depends on thread/core count.
How many cores are on CPU performing so fast and how many threads are active during message processing?
@axllent commented on GitHub (Feb 23, 2025):
They also landed on the moon with something like 256K RAM ;-) Anyway, I can't change much about modern programming languages. I'm sure that if the entire application was written in something like C (from scratch) it would be better optimized, but that would take someone years to achieve a similar result, and in many ways would also be inferior too. This is an entire webserver, SMTP server, database engine, web UI, plus all the other features, packed into a single static binary file that runs on multiple OSes and architectures. If you try any similar application (to Mailpit), even those written in NodeJS or even rust, I'm sure you will also experience same or even worse performance issues too, and the chances are they won't have many of the features. Mailpit is actually pretty well optimized, far much, MUCH, M-U-C-H faster than the software it was built to replace (MailHog)....
I think that removing the compression on both the database level (zstd) decompression, and probably more the HTTP (gzip) compression will help on your end. It's actually the gzip HTTP compression I believe is making your end much slower - it's gzipping every HTTP response including attachments.
I think the machine I was testing on has 12 cores - my laptop (I'm now on) is older (Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz - 4 cores), and the same 27MB attachment takes 2.6 seconds on this machine. Just keep in mind that it also takes time to read that amount of data from any database too, so it's not just parsing / extracting. This processor is much weaker though than my desktop CPU. I'm sure the threads does count too, but the big thing is the CPU I think.
Edit; sorry, missed the question about how many threads are active at the time - I'm not actually sure to be honest. I'd think one for the email parsing etc, however there are multiple background services and cron jobs running within Mailpit too, including HTTPD, SMTPD.
@baiomys commented on GitHub (Feb 23, 2025):
Did you think about introducing sort of "MIME offset markers" during message receiving stage, which can later greatly increase MIME parsing speed. Since email message is immutable, IMHO this can be done relatively easy.
Of course it depends on how SMTP server is organized in your code.
@axllent commented on GitHub (Mar 1, 2025):
I have, but I can also say it will complicate things greatly - so I am not in favour of this option.
I have just finished adding two new unreleased features (currently in the edge docker build):
--compression 0orMP_COMPRESSION=0) - this only works with new messages, not existing ones--disable-http-compressionorMP_DISABLE_HTTP_COMPRESSION=true)With these two options set on
axllent/mailpit:edgeyou should already be getting better performance - please confirm?I recognise that half the issue you are having is the email parsing, specifically with large emails on a very weak processor. This is not an issue on any modern processor, and even on my weakest ones (AMD FX(tm)-6300 Six-Core Processor & Intel(R) Celeron(R) CPU J1900 @ 1.99GHz) which are now about 12+ years old can parse that 40MB file in 1.1s AND 2.2s respectively. However, if your message is trying to load multiple attaches at the same time (eg: images) then your browser is sending multiple requests at the same time, each request is processing the emails .... it adds up quickly. There isn't much I can do about that as each time the message needs to be read from the database and processed. The only way to avoid that is to effectively cache a physical copy of every attachment, but that is very messy and I don't want to go down that route either.
I have however started looking at an alternate email parser which is currently about twice as fast as the one Mailpit currently has, but it is complicated. Email parsing is not simple because so many email clients do not generate emails correctly. Every email parser has certain "tolerance levels" to handle bad encoding / structure, so it is going to take a while for me to test and decide whether it is or isn't worth switching parsers. Speed is not the only factor here, it is also (and more importantly) a certain level compatibility ~ within reason of course. If I was to just switch parsers it may cause far more issues than it solves.
@baiomys commented on GitHub (Mar 1, 2025):
Thanks, most of mailpit instances (not only mine) sooner or later gonna run on weak VPS. Because they are CHEAP.
Speed matters. Always.
=)
Trying to test new build, but unable to start with option --disable-http-compression or --disable-http-compression=1
And it respond with --verbose: command not found
Maybe something is wrong with command line parsing?
UPD: I managed to run app with both options enabled (using export env var)
Great job, it seems that GZIP compression can be disabled by default as it makes no sense (in terms of speed) at all and most exposed instances operate behind reverse proxies. For example use --force-http-compression option instead.
@axllent commented on GitHub (Mar 1, 2025):
How are you running the mailpit command, inside of Docker or as a binary? It looks to me like you are running the binary, but I don't know where you got the binary from because it's only found in the latest docker
axllent/mailpit:edgebuild (or if you compiled from source). The command line parsing works perfectly which is why I'm asking.@baiomys commented on GitHub (Mar 1, 2025):
Router is too weak to handle image building so I create dev docker image in home cluster and then export binary via rsync to router.
It's faster comparing to pushing dev image in docker hub and pulling later
@axllent commented on GitHub (Mar 1, 2025):
Then I think you probably forgot to pull the latest edge build (
docker pull axllent/mailpit:edge) before you copied the binary. The flags--compression=0 --disable-http-compressionshould definitely work without an issue. I also note you have both the-q(quiet) and--verboseflags too - you can't have it both quiet and verbose ;-) If you remove the-qit should tell you something like:@baiomys commented on GitHub (Mar 1, 2025):
It does
Also when I set option --disable-http-compression in last place everything works
Verbose and quiet perfectly work together as I uncomment verbose only during debug runs
And I'm not pulling your images, I build Docker images from source after git clone
It really does not matter as MOST of time router runs mailpit from your release docker image, which in turn is configured using MP_.... variables, which seem to work.
=)
@axllent commented on GitHub (Mar 1, 2025):
Great, that looks like it is working then - I just needed to make sure because it should not matter what order the flags are in, it should always work (unless of course the wrong flags or flag variables are being used). If you find a flag combination that gives you errors then please tell me the exact (full) command you are using and I can investigate. Using environment variables can be very misleading because they are just ignored if Mailpit does not recognise them, but using flags it does not recognise will result in an error.
Anyway, it looks like the optimisations/changes I have made appear to have reduced the load time on your system, and to be perfectly honest are probably the best it is going to get. Even the cheapest (?) VPS machines should easily out-perform your CPU, so if 5 seconds isn't enough while testing, then just extend your timeout in your app for development.
Of course it would be nice to get email parsing even faster, but the only way I can realistically do that is to change the email parser. That other parser I am testing (twice the speed) is still very new and under active development, so I will wait until it reaches a stable 1.0.0 release before I seriously consider it a viable option for Mailpit. I will track that project and will probably come back to it again in the future. I agree that speed always matters (it is the main reason I wrote Mailpit in the first place), but speed cannot take priority over functionality & compatibility.
I still have a few more little things to add to the upcoming release, but the changes so far will be included in that release. Are you happy to close this issue?
@baiomys commented on GitHub (Mar 1, 2025):
It seems that performance of current parser heavily depends on core count.
Most of cheap VPS have only ONE core, so it can become a problem.
Sure, thanks for improvements, there are plenty of them ahead I guess.
=)