[GH-ISSUE #426] API concurrency? #276

Closed
opened 2026-03-15 13:36:01 +03:00 by kerem · 10 comments
Owner

Originally created by @baiomys on GitHub (Jan 21, 2025).
Original GitHub issue: https://github.com/axllent/mailpit/issues/426

Hi, I finally started some load testing, how many API operations can take place simultaneously?

It seems that I can't delete one message while downloading attachments from another.

Currently I am using FastAPI + Uvicorn + Cython.
Webhook processing from MP is blazing fast.
Tested at 30+ msgs per second while Telegram allow to send no more than 20-60 per minute. Depending on target chat/group/channel.

Perfect!

But then it comes to API calls via aiohttp, everything start to look a bit sluggish.
MP database is nearly empty. All logic at my side is completely async.

Originally created by @baiomys on GitHub (Jan 21, 2025). Original GitHub issue: https://github.com/axllent/mailpit/issues/426 Hi, I finally started some load testing, how many API operations can take place simultaneously? It seems that I can't delete one message while downloading attachments from another. Currently I am using FastAPI + Uvicorn + Cython. Webhook processing from MP is blazing fast. Tested at 30+ msgs per second while Telegram allow to send no more than 20-60 per minute. Depending on target chat/group/channel. Perfect! But then it comes to API calls via aiohttp, everything start to look a bit sluggish. MP database is nearly empty. All logic at my side is completely async.
kerem closed this issue 2026-03-15 13:36:06 +03:00
Author
Owner

@axllent commented on GitHub (Jan 21, 2025):

Hi @baiomys.

I can't delete one message while downloading attachments from another.

I think what you are describing is (to me) expected behaviour, depending on what you mean by "can't". If you mean they don't happen at the same time, then yes.

Message deletions include three separate delete operations (the message summary, raw data & message tags) which are wrapped in a single SQL transaction. SQL transactions are "blocking" queries due to the rollback nature - if the delete fails then the database is rolled back (the delete is "undone"). This prevents a situations where we could end up with unreferenced data in the database, as well as prevent a simultaneous request trying to access that message's data (eg: attachment) while it is being deleted.

Message deletion should only be locking the database for a matter of milliseconds though, so it would be great if you could provide some benchmarking to define "sluggish". Yes I would definitely expect a slight slowdown of other operations during the delete, but not one that should greatly impact an application.

<!-- gh-comment-id:2605697101 --> @axllent commented on GitHub (Jan 21, 2025): Hi @baiomys. > I can't delete one message while downloading attachments from another. I _think_ what you are describing is (to me) expected behaviour, depending on what you mean by "can't". If you mean they don't happen at the same time, then yes. Message deletions include three separate delete operations (the message summary, raw data & message tags) which are wrapped in a single SQL transaction. SQL transactions are "blocking" queries due to the rollback nature - if the delete fails then the database is rolled back (the delete is "undone"). This prevents a situations where we could end up with unreferenced data in the database, as well as prevent a simultaneous request trying to access that message's data (eg: attachment) while it is being deleted. Message deletion should only be locking the database for a matter of milliseconds though, so it would be great if you could provide some benchmarking to define "sluggish". Yes I would definitely expect a slight slowdown of other operations during the delete, but not one that should greatly impact an application.
Author
Owner

@baiomys commented on GitHub (Jan 21, 2025):

Thanks for detailed answer.

It seems that you are using WAL mode in sqlite database, so concurrency should not be a problem even with delete operations. On the other hand WAL only works for local databases, and your application able to use remote database.
So do you use WAL and if so why delete operations are blocking?

Do you plan to implement some kind of "soft delete" using for example hidden tag ?
And vacuum operation for major cleanup?

P.S. Just tested MP on Ryzen 3900 + NVME and it is way more responsive even on single core. Looks like MP is disk intensive application. Gonna test it on RAMDISK.
=)

<!-- gh-comment-id:2605714759 --> @baiomys commented on GitHub (Jan 21, 2025): Thanks for detailed answer. It seems that you are using WAL mode in sqlite database, so concurrency should not be a problem even with delete operations. On the other hand WAL only works for local databases, and your application able to use remote database. So do you use WAL and if so why delete operations are blocking? Do you plan to implement some kind of "soft delete" using for example hidden tag ? And vacuum operation for major cleanup? P.S. Just tested MP on Ryzen 3900 + NVME and it is way more responsive even on single core. Looks like MP is disk intensive application. Gonna test it on RAMDISK. =)
Author
Owner

@axllent commented on GitHub (Jan 21, 2025):

There is already automated vacuuming implemented which runs after 5 minutes of database inactivity, and only when there are enough messages which have been deleted (ie: "saving" >= 1% of the total database size). The reason for this is a VACUUM operation is 100% blocking, and can take minutes to complete when a database is several GB in size (time taken is directly linked to the database size and of course CPU & disk speed).

Soft deletion (marking a message as deleted rather than physically deleting it) would likely solve some of the "problem" you describe, however would also require fairly significant changes throughout Mailpit (every SQL query) to accommodate that approach, and it would also introduce some additional overhead too. My question still stands though:

it would be great if you could provide some benchmarking to define "sluggish"

There is very little point changing how Mailpit works without first understanding the problem it is supposed to solve, and whether it is actually a problem at all. You have already indicated that the Mailpit API currently handles 60x more requests than you can process via Telegram anyway (if I understand you correctly), so I'm very curious as to how much slowdown you get when doing simultaneous deletes, and whether this would make any meaningful difference to your integration.

<!-- gh-comment-id:2605809226 --> @axllent commented on GitHub (Jan 21, 2025): There is already automated vacuuming implemented which runs after 5 minutes of database inactivity, and only when there are enough messages which have been deleted (ie: "saving" >= 1% of the total database size). The reason for this is a VACUUM operation is 100% blocking, and can take minutes to complete when a database is several GB in size (time taken is directly linked to the database size and of course CPU & disk speed). Soft deletion (marking a message as deleted rather than physically deleting it) would likely solve some of the "problem" you describe, however would also require fairly significant changes throughout Mailpit (every SQL query) to accommodate that approach, and it would also introduce some additional overhead too. My question still stands though: > it would be great if you could provide some benchmarking to define "sluggish" There is very little point changing how Mailpit works without first understanding the problem it is supposed to solve, and whether it is actually a problem at all. You have already indicated that the Mailpit API currently handles 60x more requests than you can process via Telegram anyway (if I understand you correctly), so I'm very curious as to how much slowdown you get when doing simultaneous deletes, and whether this would make any meaningful difference to your integration.
Author
Owner

@baiomys commented on GitHub (Jan 21, 2025):

My API calls delete on every "wrong" undeliverable message without sending Telegram notification.
So if we have 1000 spam messages none of them will be sent to Telegram, BUT delete will be called 1000 times.
And if it is blocking it is a serious problem.

I think tagging can help me to mark and later remove such messages

Your app seem to be "sluggish" on slow drives. Some investigation needed.

<!-- gh-comment-id:2605815307 --> @baiomys commented on GitHub (Jan 21, 2025): My API calls delete on every "wrong" undeliverable message without sending Telegram notification. So if we have 1000 spam messages none of them will be sent to Telegram, BUT delete will be called 1000 times. And if it is blocking it is a serious problem. I think tagging can help me to mark and later remove such messages Your app seem to be "sluggish" on slow drives. Some investigation needed.
Author
Owner

@axllent commented on GitHub (Jan 21, 2025):

Each delete API call is blocking, but that does not mean the entire application "stops" until all 1000 API requests are finished, all subsequent read/write requests are queued. This means that if you called a delete and immediately another API call to fetch data, that fetch data would happen immediately after the delete, so milliseconds later.

There are two ways of improving your performance:

  1. You could add a tag as you've suggested, or
  2. Buffer your deleted IDs and delete multiple with a single request

Grouping IDs for deletion (option 2) is definitely the better option.

Your app seem to be "sluggish" on slow drives. Some investigation needed.

I don't understand how you can accuse any application (which relies heavily on disk I/O for database read/writes) of being sluggish when you are using slow drives. That seems obvious to me.

<!-- gh-comment-id:2605874968 --> @axllent commented on GitHub (Jan 21, 2025): Each delete API call is blocking, but that does not mean the entire application "stops" until all 1000 API requests are finished, all subsequent read/write requests are queued. This means that if you called a delete and immediately another API call to fetch data, that fetch data would happen immediately after the delete, so milliseconds later. There are two ways of improving your performance: 1. You could add a tag as you've suggested, or 2. Buffer your deleted IDs and [delete multiple](https://mailpit.axllent.org/docs/api-v1/view.html#delete-/api/v1/messages) with a single request Grouping IDs for deletion (option 2) is definitely the better option. > Your app seem to be "sluggish" on slow drives. Some investigation needed. I don't understand how you can accuse any application (which relies heavily on disk I/O for database read/writes) of being sluggish when you are using slow drives. That seems obvious to me.
Author
Owner

@baiomys commented on GitHub (Jan 21, 2025):

I don't understand how you can accuse any application (which relies heavily on disk I/O for database read/writes) of being sluggish when you are using slow drives. That seems obvious to me.

Well, drive is not really too "slow", it's SATA at 500 mb/sec
But it is definitely not enough to be happy.

Don't take it personally.
Your app is great!

<!-- gh-comment-id:2605881114 --> @baiomys commented on GitHub (Jan 21, 2025): > I don't understand how you can accuse any application (which relies heavily on disk I/O for database read/writes) of being sluggish when you are using slow drives. That seems obvious to me. Well, drive is not really too "slow", it's SATA at 500 mb/sec But it is definitely not enough to be happy. Don't take it personally. Your app is great!
Author
Owner

@axllent commented on GitHub (Jan 21, 2025):

So do you use WAL and if so why delete operations are blocking?

Yes, Mailpit uses WAL to allow async read/write access. As I explained earlier, deletions involve separate delete queries to multiple tables, and these tables must remain "in sync" to ensure message data consistency. If any of the deletions fail, then the data is rolled back - and so this happens in an "SQL transaction" (which is blocking to ensure consistency). It's not just deletions either, it also includes new messages being written to the database (for the exact same reason). Despite this, I get 100-200 SMTP transactions (new messages) per second, while also being able to use the UI - so I do not think this is an issue.

drive is not really too "slow", it's SATA at 500 mb/sec

I believe that's the maximum data transfer rate (eg: a large file being streamed). A SATA disk is still a spinning disk with a needle, which is far slower than a SSD especially when it comes to seek actions (to find the data position on the drive to read from or/write to). I won't however pretend to be a hardware expert, I'm not, but for any production-ready application these days that relies on disk I/O, you should be using (at the minimum) a SSD of sorts if you need performance.

Don't take it personally.

It's not so much taking it personally as getting frustrated when someone says something like that. It's like saying "the download speed of your server is terrible when I'm on dialup internet" ;-)

<!-- gh-comment-id:2605913973 --> @axllent commented on GitHub (Jan 21, 2025): > So do you use WAL and if so why delete operations are blocking? Yes, Mailpit uses WAL to allow async read/write access. As I explained earlier, deletions involve separate delete queries to multiple tables, and these tables must remain "in sync" to ensure message data consistency. If any of the deletions fail, then the data is rolled back - and so this happens in an "SQL transaction" (which is blocking to ensure consistency). It's not just deletions either, it also includes new messages being written to the database (for the exact same reason). Despite this, I get 100-200 SMTP transactions (new messages) per second, while also being able to use the UI - so I do not think this is an issue. > drive is not really too "slow", it's SATA at 500 mb/sec I believe that's the maximum data transfer rate (eg: a large file being streamed). A SATA disk is still a spinning disk with a needle, which is far slower than a SSD especially when it comes to seek actions (to find the data position on the drive to read from or/write to). I won't however pretend to be a hardware expert, I'm not, but for any production-ready application these days that relies on disk I/O, you should be using (at the minimum) a SSD of sorts if you need performance. > Don't take it personally. It's not so much taking it personally as getting frustrated when someone says something like that. It's like saying "the download speed of your server is terrible when I'm on dialup internet" ;-)
Author
Owner

@baiomys commented on GitHub (Jan 21, 2025):

A SATA disk is still a spinning disk with a needle, which is far slower than a SSD

SATA is not a spinning disk, it is just interface like SCSI or SAS.
My SATA disk is SSD.
=)

<!-- gh-comment-id:2605916943 --> @baiomys commented on GitHub (Jan 21, 2025): > A SATA disk is still a spinning disk with a needle, which is far slower than a SSD SATA is not a spinning disk, it is just interface like SCSI or SAS. My SATA disk is SSD. =)
Author
Owner

@axllent commented on GitHub (Jan 21, 2025):

Sorry, then I misinterpreted what you were meaning by SATA and "slow drives". In this case it should not matter too much then. NVME will always win against SSD though, but shouldn't be necessary. I use an SSD with SATA and it performs pretty well with my 3.9GB database, but the bottleneck I have is the CPU. It doesn't matter though, this is a personal Mailpit instance, not a high-usage production instance.

<!-- gh-comment-id:2605955590 --> @axllent commented on GitHub (Jan 21, 2025): Sorry, then I misinterpreted what you were meaning by SATA and "slow drives". In this case it should not matter too much then. NVME will always win against SSD though, but shouldn't be necessary. I use an SSD with SATA and it performs pretty well with my 3.9GB database, but the bottleneck I have is the CPU. It doesn't matter though, this is a personal Mailpit instance, not a high-usage production instance.
Author
Owner

@baiomys commented on GitHub (Jan 22, 2025):

Thanks for nice conversation, I appreciate your support.

<!-- gh-comment-id:2606287095 --> @baiomys commented on GitHub (Jan 22, 2025): Thanks for nice conversation, I appreciate your support.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/mailpit#276
No description provided.