mirror of
https://github.com/axllent/mailpit.git
synced 2026-04-26 00:35:51 +03:00
[GH-ISSUE #254] Remote storage #167
Labels
No labels
awaiting feedback
bug
docker
documentation
enhancement
github_actions
invalid
pull-request
question
stale
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/mailpit#167
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ulexxander on GitHub (Feb 28, 2024).
Original GitHub issue: https://github.com/axllent/mailpit/issues/254
Hello, we really like this project and use it everyday on our project,
deploying it to dynamic staging environments on Kubernetes for each GitLab Merge (Pull) request.
We have a lot of them.
While our application is stateless and uses database as a storage, Mailpit is not and uses local SQLITE database,
which requires us to create PersistentVolume for each Merge Request to be able to persist Mailpit data on this environment.
We would like to avoid creating disk for each Merge Request, it would simplify infrastructure and improve performance.
Do you have any plans on implementing any remote storage, that's not bound to local filesystem?
What do you think about https://rqlite.io ?
I could try to contribute and implement it.
It seems that it is compatible and implementing it as an alternative storage should be easy.
@axllent commented on GitHub (Feb 28, 2024):
Hi @ulexxander. I'm wondering if there might be an easier way for you to solve this issue. Could you not just run a dedicated Mailpit server and send mail via SMTP directly to that instead? It would mean all mail from each client ends up in the same mailbox, however that can potentially be addressed by using tags or similar.
To implement a secondary/alternative storage will (in my opinion) not be easy, as Mailpit was never designed to cater for that, so I fear a huge rewrite would be necessary to make this possible - I don't think it will be easy or quick. That being said, if you were willing to do the work and are somehow able to integrate rsqlite as an optional alternative storage, then I have no objection.
I'm keen to discuss this further before you get started though, as I suspect there may already be easy solutions already possible without resorting to any Mailpit rewrites.
@ulexxander commented on GitHub (Mar 3, 2024):
@axllent Thanks for explanation!
Actually proof-of-concept persisting data in rqlite was quite simple
and I was able to persist and display mailbox from rqlite 😄
But probably because
rqlitedatabase driver is built on JSON API, it is behaving a little bit different than SQLite one, so while testing I needed to add few more changes:[]byteto JSON breaks that compressed BLOB)Scan()SQL INTEGER into Go int64 - probably because it is a number in JSON response and driver is decoding it intointerface{}or something, thereforejsonpackage Unmarshal's it into float64.INSERT RETURNINGclause also didn't work because rqlite disallows to modify data on /query endpoint -Query()method is used instead ofExec()in code to fetch result - didn't resolve that.@ulexxander commented on GitHub (Mar 3, 2024):
But there are probably a lot more changes required for all Mailpit logic to work normally with rqlite as a storage.
So I got another idea of how to satisfy my use case without need to modify Mailpit:
I will get rid of that PersistentVolume for each Mailpit.
Instead I will add sidecar container for each Mailpit that will be checking SQLite for changes and will be performing backups to some shared storage.
Then on startup (restart) there will be
initContainerthat will download backup and restore it!I will share solution if I'll succeed.
@axllent commented on GitHub (Mar 3, 2024):
Thanks for the info so far @ulexxander, the implementation you demonstrated is already significantly easier than I had imagined. The need to base64-encode the binary e-mail data however is a bummer, as that would greatly increase the storage size of the database (the whole point of zstd compression is to reduce the db storage size).
Anyway, please keep me in the loop with your progress/solution. Not sure if NFS or something is a feasible alternative (or even possible in Kubernetes), although I think the options I use with sqlite aren't friendly with remote filesystems (that could be conditionally toggled with a startup flag though).
@github-actions[bot] commented on GitHub (Mar 18, 2024):
This issue is stale because it has been open for 14 days with no activity.
@axllent commented on GitHub (Mar 18, 2024):
@ulexxander Just to keep this issue "alive" (so it's not marked as stale) - any progress on your end? I did see your activity in rqlite so it looks like you are still active 👍
@ulexxander commented on GitHub (Mar 18, 2024):
Hello @axllent I didn't have yet opportunity to implement it on work, had too much other tasks.
But I have plans doing it very soon, anyway this should be quick to do.
So I'll report how it goes 😀
Interesting that there was related activity on rqlite, I don't remember mentioning this issue there, but guys there somehow found it and resolved one of the problems. Nice 😀
@axllent commented on GitHub (Mar 18, 2024):
You're totally right, it wasn't you (I incorrectly assumed it was). I guess the author of rqlite may have seen this discussion and thought it was a great idea to promote that implementation. For what it's worth, I believe rsqlite does support blob too (coming back to the earlier base64 comment). I'm actually pretty excited to see the performance of it (I know it won't be as fast as native/local, but I'm really curious as to how it performs over HTTP), and whether it can (almost) wok as a drop-in replacement!
I am also very busy at the moment with other priorities including work, so haven't had time to look into rsqlite other than a bit of light reading, and besides.... you're doing this piece of work which I really appreciate :-) I've already put in hundreds of hours of work so it's really great to have some help. I have been giving it a lot of thought as to how one could potentially "integrate" it from an end-user perspective without over-complicating the user-options - assuming rsqlite can be integrated as simply a DB driver supporting the same SQL commands, the existing
--db-filecould be dual-purpose ~if a file path then then use sqlite, else if a URL then rsqlite.Anyway, I'll leave this with you. Nice to hear you're still working on it.
@ulexxander commented on GitHub (Mar 25, 2024):
Hello @axllent I implemented my idea - to backup Mailpit's SQLite database to S3 frequently instead of mounting persistent storage into container.
And I am satisfied with it for now, works great.
Sharing it here:
s3:GetObjects3:ListBuckets3:PutObjectdb-restorethat will restore last backup andsidecar container
db-backupthat will be checking checksum of database and backup it to AWS S3 if changes.Adjust
sleep 10interval for yourself. If you have small Mailpit database like I do, it should be very lightweight.Benchmarked it by doing checksums 10 times per second with 400KB database - only consumed 0.1 CPU time on
t3.largeinstance.@ulexxander commented on GitHub (Mar 25, 2024):
Alternative solutions could use NFS (I personally avoid it) or FUSE - userspace filesystems like sshfs and s3fs - but they would require priviliged container.
@axllent commented on GitHub (Mar 25, 2024):
Thanks for sharing that. Did the idea of using rqlite not work?
@axllent commented on GitHub (Mar 31, 2024):
@ulexxander I realise you have already found an alternate approach, but I have been playing with the rqlite and I now have a local prototype which is almost fully working. It required a fair number of minor internal changes, eg: int64 -> float64 as that appears to be what rsqlite interprets in many cases via JSON, and an alternate approach for some methods, but it's working as expected.
I hit the same issue as you with the BLOB storage of the compressed raw emails. After quite a bit of hunting and searching I believe I found the "correct" (alternative) way to insert binary data via the sql driver (I'm waiting on confirmation from the rqlite author) which can be used for both the SQLite and rqlite databases, the only difference being that rqlite returns binary BLOB data as a base64-encoded string (which I can easily work around).
DB writes (via SMTP) are significantly slower than the default driver (about 50-75% slower), however even that that is still very reasonable (depending on platform/hardware about 50 messages per second). I don't think that's an issue at all for anyone (unless you're testing mass mail bombs).
I'm not sure if you're still interested (?), but it is looking very promising so far so I thought I'd update you.
@ulexxander commented on GitHub (Mar 31, 2024):
Hello @axllent. Alternative solution that uses backups works quite good and there were no issues with it. I have chosen it because it was quick to implement.
But definitely solution with proper remote storage would be more elegant and way better.
I think for most, including me, worse performance compared to default driver won't be an issue at all.
Big thanks for doing work on that. I would definitely switch to it once it will be implemented and stable.
@axllent commented on GitHub (Apr 5, 2024):
@ulexxander I have just merged this new functionality in, but would love it if you could please do some testing before I release anything (if possible)? Currently only the
axllent/mailpit:edgedocker image contains this feature, and I think it should be fairly easy to integrate whereby all you have to do is set theMP_DATA_FILE=http://<rqlite-server>:<port>(or--db-file http://<rqlite-server>:<port>).Obviously this solution is highly dependent on the rqlite server running beforehand, and if the connection is broken then bad things can happen (I'm not too sure how the internals of reconnection work, but I assume that if it is a cluster then it potentially moves to the next??).
Anyway, I would love (and really appreciate) your feedback before I release anything!
@ulexxander commented on GitHub (Apr 5, 2024):
I tested Mailpit with rqlite today on our stage environment and it works fine! Didn't have any issues and didn't notice difference in performance.
I ran a bunch of automated end-to-end tests that use it and they were working just normally 👍
But unfortunately I faced one limitation of rqlite that I did not noticed before - you can not have multiple database (tenants, isolated schemas) on a single instance, like in PostgreSQL / MySQL / MongoDB. My bad...
https://github.com/rqlite/rqlite/issues/927
This is a deal breaker for me, as I would still have to run stateful application with storage for each stage environment deployed on demand...
Unless I implement some wrapper software that will manage / deploy on-demand multiple
rqliteinstance inside same container attached to the same storage (separate directories for each instance).@otoolep commented on GitHub (Apr 5, 2024):
See https://github.com/rqlite/rqlite/issues/927#issuecomment-2039733913
@otoolep commented on GitHub (Apr 5, 2024):
That is not your only option, as explained in the comment above. There are other ways. Happy to explain more if needed.
@ulexxander commented on GitHub (Apr 5, 2024):
Thanks @otoolep, this is a great suggestion. This level of isolation is totally ok for our use case.
@axllent what do you think about this? Probably introducing tenant name and injecting it into all schema migrations and queries should be relatively simple change? It also could be feature regular SQLite3 users could benefit from.
@otoolep commented on GitHub (Apr 5, 2024):
The main thing is to provide tenancy information to the right layer in your application, and then let that layer decide how it implements multi-tenancy. Different database technology will offer different ways to then implement the isolation (containers, distinct databases, tables per tenant, etc).
The approach I suggest is a common, though older, pattern. I first used it more than 10 years ago to build multi-tenant SaaS systems. It's still used to this day in some systems, though with the advent of containerization, it's becoming more common to deploy dedicated instances of a database per tenant (though the spin-up time for a new tenant can be high). That way usually maximizes isolation, and helps manage resources (CPU, disk, RAM) on a per-tenant basis.
But using the table-prefix approach, or the extra tenant ID column approach, a single rqlite instance can offer reasonable isolation per tenant. And for multi-tenant systems with large, or unpredicatable numbers of tenants, the approach I suggest is fast with little overhead.
@axllent commented on GitHub (Apr 5, 2024):
Thank you both your feedback and discussion @ulexxander & @otoolep. This is a major oversight (@ulexxander), and not one that is easily overcome unfortunately without some significant internal changes and potential performance drawbacks. You caught me right at the start of a busy weekend (other personal commitments) so I'll just share my initial thoughts now.
Adding an extra column to all tables isn't a good option as Mailpit requires a specific database migration state (per "client") when they connect. When it detects that the current state is behind it will run the necessary migration tasks, updating the table structure and running any necessary migrations. The moment an older Mailpit "clients" (in times when there have been database changes by a newer client) it would completely shit the bed because the database would be in a future state it doesn't understand. Assuming this could be worked around (I don't think so though), there are also other performance-related concerns I have, but I won't get into those now as I would need more time to consider what these might be.
Adding a table name prefix is currently the other option (as I am gathering). For starters this would require a custom port of the migration module Mailpit uses as there is currently no flexibility to handle this (the table name that maintains the state of the database is hardcoded). Every instance (specifying a "prefix") would have a separate set of tables. This approach should work (I think), but again requires some core changes within Mailpit and the migration code.
I'll keep giving this some thought, and in the meantime if either of you have other ideas and/or suggestions I'm all ears (just a little delayed). Thanks!
@axllent commented on GitHub (Apr 7, 2024):
I've started work on an optional
--db-tenant-idfeature which will prefix all tables with a sql-safe prefix of whatever is set. It's early stages but seems to be working as expected. Part of this included replacing the original database migration module which was fortunately easier than I expected, at least for what I need anyway. I still need to do a lot of testing in the coming days, but will let you know @ulexxander when I have something ready for testing.Still interested in any feedback / thoughts (if either of you have any), but this solution should mean that rqlite can be used from multiple hosts at the same time provided they each specify a unique tenant id. Technically local storage can also use this too, however I can't really see a use-case for that, and I'm also fairly sure there would be locking conflicts if there were two local Mailpit servers running & both accessing the same DB file (SQLite was never designed for that).
@axllent commented on GitHub (Apr 9, 2024):
@ulexxander I have just pushed a major internal change that adds optional support for a tenant ID. In essence, this prefixes all Mailpit tables with a tenant ID (name) if set. As before, this feature is currently only in the edge Docker build, and I'd really appreciate your feedback & testing, and I'm sure @otoolep would also be keen to hear how it performs with rqlite.
At the moment there is no online documentation as I do not like to add references to features that do not exist (yet), so... you can set this via
--db-tenant-id <name>(or envMP_DB_TENANT_ID="<name>"). This should allow seamless integration with rqlite provided each Mailpit client specifies it's own tenant ID.@ulexxander commented on GitHub (Apr 11, 2024):
Hello @axllent ! Thanks for implementing that. I tested it today and it works great! But I had to set
MP_DB_TENANT,MP_DB_TENANT_IDdidn't work - I saw in code you are using the first one.@axllent commented on GitHub (Apr 11, 2024):
Oh shit, that's my bad. I couldn't make up my mind whether to refer to it as a "tenant" or a "tenant id" (any thoughts?). Glad to hear it's working for you though!
I'm actually strongly considering changing
MP_DATA_FILEtoMP_DATABASE(keeping the old one for compatibilities sake of course) as that's more reflective, so the tenant would become either justMP_TENANTorMP_TENANT_ID.@axllent commented on GitHub (Apr 12, 2024):
I am pleased to announce v1.16.0 which contains optional integration with rqlite. Thanks to you both @ulexxander for the idea & testing and help, and of course @otoolep for your database & advise!
This release included one new flag/env variable, and one changed one (as I mentioned before):
--tenant-id/MP_TENANT_IDto set the optional tenant ID.--database/MP_DATABASEto set the database. This replaces the legacy--db-file/MP_DATA_FILEoptions, although it's worth noting that the old flag/env will remain active (but hidden) for a long time so not to break any existing integrations.I am still very keen to hear about performance and how this handles in the field, as I suspect this feature may start gaining a fair bit of traction in time (it's always hard to tell how users are using Mailpit exactly). Recntly the Mailpit docker downloads jumped to constant +- 1.8 million downloads per day (yes, you read that right) - which is slightly higher than the official MySQL docker images! 🤯
@otoolep commented on GitHub (Apr 12, 2024):
Hey @axllent -- great to see. Thanks to you both for integrating your program with rqlite.
BTW, can you give some pointers on how you generate multi-arch Docker images? I'm getting a lot of requests for the same for rqlite, but the Best Practices for doing so are not clear to me. What do you do?
@axllent commented on GitHub (Apr 12, 2024):
Yes of course @otoolep - it depends on where you want to build them, ie: yourself or via CI. I have found the easiest way is to build them directly via Github Actions (see this) which also pushes them through to Docker Hub. If you want to discuss it more, please email me (
axllent [at] gmail) - I'm just heading off to bed now but will get back to you ASAP.Edit: It's just worth noting that it is not difficult to build them locally either, but you need to have
docker-buildxinstalled (if you're on Linux, I have no idea about any other platform):Basically it emulates the "other platforms" (differing from yours), but note that this process is very slow, especially if you are compiling things. All architectures are then combined into one manifest and pushed as one image to wherever you are pushing them to.
@otoolep commented on GitHub (Apr 12, 2024):
OK, GitHub actions -- good enough. That is one path I'm investigating, so good to know it's what you use. Seems like it's the path I should pursue too.
@otoolep commented on GitHub (Jun 1, 2024):
Thanks again for your help @axllent -- I got it working today, and now rqlite has multi-platform images.
https://www.philipotoole.com/rqlite-8-24-9-now-with-multi-platform-docker-images/
@axllent commented on GitHub (Jun 1, 2024):
I saw that @otoolep, nice! You're very welcome.