[GH-ISSUE #456] Add option to include full Message in /api/v1/search #294

Closed
opened 2026-03-15 13:45:03 +03:00 by kerem · 5 comments
Owner

Originally created by @buschtoens on GitHub (Mar 5, 2025).
Original GitHub issue: https://github.com/axllent/mailpit/issues/456

The /api/v1/search endpoint only returns a MessageSummary as opposed to the /api/v1/message/{ID} endpoint, which returns the full Message.

I think this is a very sensible default, however it makes the API annoying to consume for the following use case: You're searching for a single specific message and need its full Message.Text instead of just the MessageSummary.Snippet.

Currently you have to perform two API requests to do this:

  1. Use /api/v1/search to find the mail and get its ID.
  2. Use /api/v1/message/{ID} to retrieve the Message.Text.

I would love a summary=false (default true) or alternatively full=true (default false) query parameter on the /api/v1/search endpoint that toggles returning Message structs instead of MessageSummary.

Do you think this would be a good addition? I'm happy to try my hand at a PR, but wanted to get your perspective first.

Thanks for the great project!

Originally created by @buschtoens on GitHub (Mar 5, 2025). Original GitHub issue: https://github.com/axllent/mailpit/issues/456 The [`/api/v1/search`](https://github.com/axllent/mailpit/blob/v1.23.0/server/apiv1/messages.go#L286-L333) endpoint only returns a [`MessageSummary`](https://github.com/axllent/mailpit/blob/v1.23.0/internal/storage/structs.go#L65-L97) as opposed to the [`/api/v1/message/{ID}`](https://github.com/axllent/mailpit/blob/v1.23.0/server/apiv1/message.go#L23-L67) endpoint, which returns the full [`Message`](https://github.com/axllent/mailpit/blob/v1.23.0/internal/storage/structs.go#L8-L47). I think this is a very sensible default, however it makes the API annoying to consume for the following use case: You're searching for a single specific message and need its full `Message.Text` instead of just the `MessageSummary.Snippet`. Currently you have to perform two API requests to do this: 1. Use `/api/v1/search` to find the mail and get its `ID`. 2. Use `/api/v1/message/{ID}` to retrieve the `Message.Text`. I would love a `summary=false` (default `true`) or alternatively `full=true` (default `false`) query parameter on the `/api/v1/search` endpoint that toggles returning `Message` structs instead of `MessageSummary`. Do you think this would be a good addition? I'm happy to try my hand at a PR, but wanted to get your perspective first. Thanks for the great project!
kerem closed this issue 2026-03-15 13:45:08 +03:00
Author
Owner

@axllent commented on GitHub (Mar 6, 2025):

Thank you for the feedback @buschtoens. I do understand what you are asking, however I don't think it is that easy (nor a good idea), I'll explain:

  1. The message summary is generated and saved to a specific database table when the message is received. This includes all the data (except for tags) which is currently in the MessageSummary struct. The database also includes a "fulltext search" field which is used only for the search itself. Returning results for this data is fast because Mailpit does not have to process anything, it just reads the results into the struct and returns the struct as JSON.
  2. The message itself (ie: /api/v1/message/{ID}), which includes text & HTML, is stored compressed in a separate table. The raw message is what gets parsed every time to populate the Message struct when an individual message is read. This includes reading the compressed data from the database, extracting & of course parsing it. This process is by far far the most CPU & RAM intensive action in Mailpit.

What you are suggesting would effectively require:

  • an alternative API response (based on a full=true query parameter) would require both a new struct type, and complicate the REST API response (ie: it would need to account for the potential of two separate response types via the same API route)
  • A conditional joined database query for searching. Not only would this greatly complicate the controller logic, but also mean that every matching message (with full=true) would need to be parsed in order to complete the response. This may be OK if you're only dealing with a few small messages, but this becomes extremely heavy when dealing with multiple results, not to mention if the messages are large. If you have a mailbox filled with with hundreds of 25MB messages, and Mailpit takes 0.5 seconds to parse just one of those messages.... then Mailpit becomes an extremely heavy, slow application with huge CPU & RAM overheads, all the things Mailpit was designed to avoid. I think you see where this is going...

The only way this could work efficiently is if Mailpit stored a complete separate copy of the HTML and text of every message, which would mean the database size would increase. Then the next user wants a list of the attachments in that full=true response, the next wants a full set of headers....

So do I think this is a good addition? Unfortunately, no. I believe it would heavily complicate how Mailpit works internally, increase the CPU & RAM usage, and complicate the simple REST API (potentially confusing users in the process). In order to try reduce the CPU & RAM overheads, I would need to drastically change how messages are stored, complicating the controller functions and increasing the database size (not to mention handle all existing data). All of this just to save an additional API request.

Sorry, I really don't like turning down requests (especially when I can understand the user's perspective), however I think the downsides of implementing this far outweigh the benefits.

Can you please explain to me why an extra API request is annoying? Surely whatever you are using the API for is automated?

<!-- gh-comment-id:2702718971 --> @axllent commented on GitHub (Mar 6, 2025): Thank you for the feedback @buschtoens. I do understand what you are asking, however I don't think it is that easy (nor a good idea), I'll explain: 1. The message summary is generated and saved to a specific database table when the message is received. This includes all the data (except for tags) which is currently in the `MessageSummary` struct. The database also includes a "fulltext search" field which is used only for the search itself. Returning results for this data is fast because Mailpit does not have to process anything, it just reads the results into the struct and returns the struct as JSON. 2. The message itself (ie: `/api/v1/message/{ID}`), which includes text & HTML, is stored compressed in a separate table. The raw message is what gets parsed every time to populate the `Message` struct when an individual message is read. This includes reading the compressed data from the database, extracting & of course parsing it. This process is by far far the most CPU & RAM intensive action in Mailpit. What you are suggesting would effectively require: - an alternative API response (based on a `full=true` query parameter) would require both a new struct type, and complicate the REST API response (ie: it would need to account for the potential of two separate response types via the same API route) - A conditional joined database query for searching. Not only would this greatly complicate the controller logic, but also mean that every matching message (with `full=true`) would need to be parsed in order to complete the response. This may be OK if you're only dealing with a few small messages, but this becomes extremely heavy when dealing with multiple results, not to mention if the messages are large. If you have a mailbox filled with with hundreds of 25MB messages, and Mailpit takes 0.5 seconds to parse just one of those messages.... then Mailpit becomes an extremely heavy, slow application with huge CPU & RAM overheads, all the things Mailpit was designed to avoid. I think you see where this is going... The only way this could work efficiently is if Mailpit stored a complete separate copy of the HTML and text of every message, which would mean the database size would increase. Then the next user wants a list of the attachments in that `full=true` response, the next wants a full set of headers.... So do I think this is a good addition? Unfortunately, no. I believe it would heavily complicate how Mailpit works internally, increase the CPU & RAM usage, and complicate the simple REST API (potentially confusing users in the process). In order to try reduce the CPU & RAM overheads, I would need to drastically change how messages are stored, complicating the controller functions and increasing the database size (not to mention handle all existing data). All of this just to save an additional API request. Sorry, I really don't like turning down requests (especially when I can understand the user's perspective), however I think the downsides of implementing this far outweigh the benefits. Can you please explain to me why an extra API request is annoying? Surely whatever you are using the API for is automated?
Author
Owner

@buschtoens commented on GitHub (Mar 6, 2025):

Thank you so much for this very thoughtful and thorough response! I really appreciate it.

I can clearly see now, why this wouldn't be trivially possible and not even a good idea! 😄

Can you please explain to me why an extra API request is annoying? Surely whatever you are using the API for is automated?

Yes, you had the right hunch.

We are in the process of migrating over from fake-smtp-server. For everything it lacked in features and performance (ding!), it had one nice thing going for it:

The API allowed searching and always returned the full message. Now this likely contributed to the general slowness we experienced, but it made it super convenient for direct consumption in automated tests across different languages & testing frameworks.

In our use-case we only ever needed the most recent message that matches anyway, akin to a "find" operation.

Do you think adding something like /api/v1/find would be a sensible alternative?

It would accept a query & tz and search for the most recent message that matches, then return it as a single full Message.

If none matches, 404 is returned.

This could optionally accept a sort parameter, to cover cases, where you wouldn't necessarily want to pick the most recent message.

<!-- gh-comment-id:2702748479 --> @buschtoens commented on GitHub (Mar 6, 2025): Thank you so much for this very thoughtful and thorough response! I really appreciate it. I can clearly see now, why this wouldn't be trivially possible and not even a good idea! 😄 > Can you please explain to me why an extra API request is annoying? Surely whatever you are using the API for is automated? Yes, you had the right hunch. We are in the process of migrating over from [`fake-smtp-server`](https://github.com/ReachFive/fake-smtp-server). For everything it lacked in features and performance (_ding!_), it had one nice thing going for it: The API allowed searching and _always_ returned the full message. Now this likely contributed to the general slowness we experienced, but it made it super convenient for direct consumption in automated tests across different languages & testing frameworks. In our use-case we only ever needed the most recent message that matches anyway, akin to a "find" operation. Do you think adding something like `/api/v1/find` would be a sensible alternative? It would accept a `query` & `tz` and search for the most recent message that matches, then return it as a single full `Message`. If none matches, `404` is returned. This could optionally accept a `sort` parameter, to cover cases, where you wouldn't necessarily want to pick the most recent message.
Author
Owner

@axllent commented on GitHub (Mar 6, 2025):

In our use-case we only ever needed the most recent message that matches anyway, akin to a "find" operation.

Is that the latest message, or the latest message from a specific search? If it's the latest message (ie: the last message Mailpit received), then you can just call /api/v1/message/latest

One challenge I have is not adding features for the sake of adding them, because once they are in the codebase it is effectively impossible to remove them. I'm still confused as to why a second API call is so inconvenient though? The search, which is already very flexible, returns the results (overview), and to get the information you require just requires another call.

<!-- gh-comment-id:2702777387 --> @axllent commented on GitHub (Mar 6, 2025): > In our use-case we only ever needed the most recent message that matches anyway, akin to a "find" operation. Is that the latest message, or the latest message from a specific search? If it's the latest message (ie: the last message Mailpit received), then you can just call `/api/v1/message/latest` One challenge I have is not adding features for the sake of adding them, because once they are in the codebase it is effectively impossible to remove them. I'm still confused as to why a second API call is so inconvenient though? The search, which is already very flexible, returns the results (overview), and to get the information you require just requires another call.
Author
Owner

@buschtoens commented on GitHub (Mar 6, 2025):

Is that the latest message, or the latest message from a specific search?

The latest message matching the search query.

/api/v1/find?query=subject:"Your+OTP"+to:foo@example.org

I'm still confused as to why a second API call is so inconvenient though?

I agree that it’s not a necessary feature. Everything this feature offers functionally can already be achieved with today’s API.

Maybe my perception is skewed, because I’m approaching this from an automated testing angle, but in my previous experience the interaction between tests and Mailpit / fake-smtp-server is reduced to just a single task: finding the most recent matching mail to then extract information out of the body.

For this particular scenario (which I imagine common) such a proposed endpoint would be convenient and save a round-trip, marginally speeding up the test.

If you think the benefit / usefulness in the wild will not be as great as to outweigh the burden of maintaining this endpoint, I can accept that and continue working with the two-request solution. :)

<!-- gh-comment-id:2702840696 --> @buschtoens commented on GitHub (Mar 6, 2025): > Is that the latest message, or the latest message from a specific search? The latest message _matching the search query_. `/api/v1/find?query=subject:"Your+OTP"+to:foo@example.org` > I'm still confused as to why a second API call is so inconvenient though? I agree that it’s not a necessary feature. Everything this feature offers functionally can already be achieved with today’s API. Maybe my perception is skewed, because I’m approaching this from an automated testing angle, but in my previous experience the interaction between tests and Mailpit / fake-smtp-server is reduced to just a single task: finding the most recent matching mail to then extract information out of the body. For this particular scenario (which I imagine common) such a proposed endpoint would be convenient and save a round-trip, marginally speeding up the test. If you think the benefit / usefulness in the wild will not be as great as to outweigh the burden of maintaining this endpoint, I can accept that and continue working with the two-request solution. :)
Author
Owner

@axllent commented on GitHub (Mar 6, 2025):

While I would love to incorporate every requested feature, I genuinely believe that the benefits do not justify the overhead of implementation and maintenance. I try to avoid saying "never," as I've been proven wrong in the past when I initially declined a request only to add it much later, but it was usually driven by demand and often as part of a larger related feature. However, this particular feature seems to only serve the purpose of saving an additional HTTP request and a JSON parse, and doesn't add any actual functionality that isn't already there, be it in a two-step process instead of one.

Given the daily Docker pull counts, it seems reasonable to conclude that Mailpit is very widely utilized in automated testing. Although I can't support this assumption with specific statistics (as Mailpit does not track any usage), I believe it likely involves a combination of direct latest pulls along with search and regular API calls.

I do however appreciate the fact you're using Mailpit!

<!-- gh-comment-id:2702969213 --> @axllent commented on GitHub (Mar 6, 2025): While I would love to incorporate every requested feature, I genuinely believe that the benefits do not justify the overhead of implementation and maintenance. I try to avoid saying "never," as I've been proven wrong in the past when I initially declined a request only to add it much later, but it was usually driven by demand and often as part of a larger related feature. However, this particular feature seems to only serve the purpose of saving an additional HTTP request and a JSON parse, and doesn't add any actual functionality that isn't already there, be it in a two-step process instead of one. Given the daily Docker pull counts, it seems reasonable to conclude that Mailpit is very widely utilized in automated testing. Although I can't support this assumption with specific statistics (as Mailpit does not track any usage), I believe it likely involves a combination of direct `latest` pulls along with search and regular API calls. I do however appreciate the fact you're using Mailpit!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/mailpit#294
No description provided.