[GH-ISSUE #233] IMAP Consumption - Mails imported multiple times and not marked as read or deleted #185

Closed
opened 2026-02-25 21:31:23 +03:00 by kerem · 12 comments
Owner

Originally created by @l4rm4nd on GitHub (Nov 28, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/233

Originally assigned to: @ciur on GitHub.

Info

I am currently using the docker image provided by https://github.com/linuxserver/docker-papermerge. However, I guess it is a logical bug not based on docker itself. Papermerge is running on my Raspberry Pi 4.

Expected Behavior

Using the IMPORT_MAIL_DELETE option in papermerge.conf.py should delete processed emails from the IMAP account and the email should be imported once into the inbox of the superuser papermerge account. Further, the mail should be marked as read if not deleted .. or how does the papermerge workers differentiate new vs. old mails?

Current Behavior

The email is successfully processed by the worker and imported into the papermerge web application. However, emails are not marked as read or deleted. This leads to an issue, where the same email is being imported over and over again into the inbox folder of papermerge's web application.

grafik
grafik

Steps to Reproduce

  1. Configure papermerge.conf.py to process emails
  2. Send an email with a PDF attachment to the IMAP account watched by papermerge's worker process
  3. Log into the papermerge web application as superuser
  4. Observe that the same email is processed multiple times every 30s and imported over and over again. Further, the email won't be deleted or marked as read from the mailserver. Therefore, the inbox gets overwhelmed by the same email over time.

Environment

Raspberry Pi 4 - ARM
Linux omv 4.19.75-v7l+ #1271 armv7l GNU/Linux
Docker version 19.03.13, build 4484c46
linuxserver/papermerge:latest

Originally created by @l4rm4nd on GitHub (Nov 28, 2020). Original GitHub issue: https://github.com/ciur/papermerge/issues/233 Originally assigned to: @ciur on GitHub. ## Info I am currently using the docker image provided by https://github.com/linuxserver/docker-papermerge. However, I guess it is a logical bug not based on docker itself. Papermerge is running on my Raspberry Pi 4. ## Expected Behavior Using the `IMPORT_MAIL_DELETE` option in `papermerge.conf.py` should delete processed emails from the IMAP account and the email should be imported once into the inbox of the superuser papermerge account. Further, the mail should be marked as read if not deleted .. or how does the papermerge workers differentiate new vs. old mails? ## Current Behavior The email is successfully processed by the worker and imported into the papermerge web application. However, emails are not marked as read or deleted. This leads to an issue, where the same email is being imported over and over again into the inbox folder of papermerge's web application. ![grafik](https://user-images.githubusercontent.com/21357789/100526076-fabde780-31c5-11eb-99fc-07258bbae239.png) ![grafik](https://user-images.githubusercontent.com/21357789/100526011-753a3780-31c5-11eb-9abb-d0fdc0cd4755.png) ## Steps to Reproduce 1. Configure `papermerge.conf.py` to process emails 2. Send an email with a PDF attachment to the IMAP account watched by papermerge's worker process 3. Log into the papermerge web application as superuser 4. Observe that the same email is processed multiple times every 30s and imported over and over again. Further, the email won't be deleted or marked as read from the mailserver. Therefore, the inbox gets overwhelmed by the same email over time. ## Environment Raspberry Pi 4 - ARM Linux omv 4.19.75-v7l+ #1271 armv7l GNU/Linux Docker version 19.03.13, build 4484c46 linuxserver/papermerge:latest
kerem 2026-02-25 21:31:23 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@ciur commented on GitHub (Feb 22, 2021):

@l4rm4nd, I refactoring the code and added more detailed documentation for docs consumtion via IMAP.

I have just tested this feature and it works (i.e. with IMPORT_MAIL_DELETE=True email is deleted and with IMPORT_MAIL_DELETE=False it is not) for latest development version (soon to be out version 2.0).

<!-- gh-comment-id:783399450 --> @ciur commented on GitHub (Feb 22, 2021): @l4rm4nd, I refactoring the code and added [more detailed documentation](https://papermerge.com/docs/User%27s%20Manual/consumption.html#imap-email) for docs consumtion via IMAP. I have just tested this feature and it works (i.e. with ``IMPORT_MAIL_DELETE=True`` email is deleted and with ``IMPORT_MAIL_DELETE=False`` it is not) for latest development version (soon to be out version 2.0).
Author
Owner

@l4rm4nd commented on GitHub (Feb 25, 2021):

Hey @ciur, I've upgraded to version 2.0. Looks amazing, nicely done!

However, the IMAP consumption still doesn't work for me. The mails are properly processed and the attached documents are imported into Papermerge. So far, so good. I use the following config:

grafik

But the processed mail is not marked as read nor deleted and imported over and over again into Papermerge's inbox.

grafik

<!-- gh-comment-id:786247410 --> @l4rm4nd commented on GitHub (Feb 25, 2021): Hey @ciur, I've upgraded to version 2.0. Looks amazing, nicely done! However, the IMAP consumption still doesn't work for me. The mails are properly processed and the attached documents are imported into Papermerge. So far, so good. I use the following config: ![grafik](https://user-images.githubusercontent.com/21357789/109222778-18492000-77ba-11eb-8183-684268a89374.png) But the processed mail is not marked as read nor deleted and imported over and over again into Papermerge's inbox. ![grafik](https://user-images.githubusercontent.com/21357789/109223301-b1783680-77ba-11eb-9a9f-6dbc7700dfff.png)
Author
Owner

@l4rm4nd commented on GitHub (Mar 8, 2021):

I've also tested around and noticed that the following line of code can successfully mark my mails as read:

server.add_flags(uid, br'\Seen')

The following flag marks an email as deleted. However, the mail is hereby not marked as seen nor moved from INBOX to TRASH for my mail provider Strato:

server.add_flags(uid, br'\Deleted')

Tested these things against https://github.com/ciur/papermerge/blob/stable/1.5.x/papermerge/core/importers/imap.py (line 136).

Marking an imported e-mail as seen with the code above would fix the multiple mail imports.

<!-- gh-comment-id:792754320 --> @l4rm4nd commented on GitHub (Mar 8, 2021): I've also tested around and noticed that the following line of code can successfully mark my mails as read: `` server.add_flags(uid, br'\Seen') `` The following flag marks an email as deleted. However, the mail is hereby not marked as seen nor moved from INBOX to TRASH for my mail provider Strato: `` server.add_flags(uid, br'\Deleted') `` Tested these things against https://github.com/ciur/papermerge/blob/stable/1.5.x/papermerge/core/importers/imap.py (line 136). Marking an imported e-mail as seen with the code above would fix the multiple mail imports.
Author
Owner

@l4rm4nd commented on GitHub (Mar 8, 2021):

I've also tested around and noticed that the following line of code can successfully mark my mails as read:

server.add_flags(uid, br'\Seen')

The following flag marks an email as deleted. However, the mail is hereby not marked as seen nor moved from INBOX to TRASH for my mail provider Strato:

server.add_flags(uid, br'\Deleted')

Tested these things against https://github.com/ciur/papermerge/blob/stable/1.5.x/papermerge/core/importers/imap.py (line 136).

Marking an imported e-mail as seen with the code above would fix the multiple mail imports.

Here the basic code for my tests:

import ssl
import email
import tempfile
import logging
from imapclient import IMAPClient
from imapclient.exceptions import LoginError

def login(imap_server, username, password):
    ssl_context = ssl.create_default_context()
    ssl_context.check_hostname = False
    ssl_context.verify_mode = ssl.CERT_NONE

    server = IMAPClient(
        imap_server,
        ssl_context=ssl_context
    )

    try:
        server.login(username, password)
    except LoginError:
        print("IMAP Import: ERROR. Login failed.")
        return None

    return server

imap_server = "imap.strato.de"
username = "XXXX"
password = "XXXX"

server = login(
    imap_server=imap_server,
    username=username,
    password=password
)

if server:
    try:
        server.select_folder("INBOX")
    except:
        print("IMAP import: Failed to select folder with read-write permissions. ")

    messages = server.search(['UNSEEN'])

    print("IMAP Import: UNSEEN messages %s" % (messages))

    for uid, message_data in server.fetch(
        messages, 'RFC822'
    ).items():

    	email_message = email.message_from_bytes(message_data[b"RFC822"])
    	print(uid, email_message.get("From"), email_message.get("Subject"))
    	server.add_flags(uid, br'\Seen')
    	print(server.get_flags(uid))
    	# check whether the mail is marked as read

else:
	print("IMAP import: Failed to login to imap server")

<!-- gh-comment-id:792763995 --> @l4rm4nd commented on GitHub (Mar 8, 2021): > I've also tested around and noticed that the following line of code can successfully mark my mails as read: > > `server.add_flags(uid, br'\Seen')` > > The following flag marks an email as deleted. However, the mail is hereby not marked as seen nor moved from INBOX to TRASH for my mail provider Strato: > > `server.add_flags(uid, br'\Deleted')` > > Tested these things against https://github.com/ciur/papermerge/blob/stable/1.5.x/papermerge/core/importers/imap.py (line 136). > > Marking an imported e-mail as seen with the code above would fix the multiple mail imports. Here the basic code for my tests: ```` import ssl import email import tempfile import logging from imapclient import IMAPClient from imapclient.exceptions import LoginError def login(imap_server, username, password): ssl_context = ssl.create_default_context() ssl_context.check_hostname = False ssl_context.verify_mode = ssl.CERT_NONE server = IMAPClient( imap_server, ssl_context=ssl_context ) try: server.login(username, password) except LoginError: print("IMAP Import: ERROR. Login failed.") return None return server imap_server = "imap.strato.de" username = "XXXX" password = "XXXX" server = login( imap_server=imap_server, username=username, password=password ) if server: try: server.select_folder("INBOX") except: print("IMAP import: Failed to select folder with read-write permissions. ") messages = server.search(['UNSEEN']) print("IMAP Import: UNSEEN messages %s" % (messages)) for uid, message_data in server.fetch( messages, 'RFC822' ).items(): email_message = email.message_from_bytes(message_data[b"RFC822"]) print(uid, email_message.get("From"), email_message.get("Subject")) server.add_flags(uid, br'\Seen') print(server.get_flags(uid)) # check whether the mail is marked as read else: print("IMAP import: Failed to login to imap server") ````
Author
Owner

@ciur commented on GitHub (Mar 9, 2021):

@l4rm4nd, oh, man, I am impressed 👍 . I didn't know about server.add_flags(uid, br'\Seen')! Thank you for the tip! I will review/test/have a look at the issue in couple of days. Right now, I am recording screencasts and writing blog posts about Paparmerge features. If you want (maybe you need fix urgently, like today :) ), you may create a PR with the changes.

In 2.0 the whole IMAP code was refactored taking into account import by user and import by secret features; imho now code in papermerge.core.importers.imap module is cleaner and easier to read. The entry point for importing document attachments is here, which as you can see is part of different repository (papermerge-core is a reusable django app).

I like you security acumen 🥇 , maybe you may wear, for a day or two, developer hat 🎩 as well :)

Also, you might be interested in new (again in 2.0, thus it is in papermerge-core repository) management command called imap, I wrote it to quickly troubleshoot IMAP related issues. You use it like any standard django command:

$ ./manage.py  imap --connect # checks if can connect to IMAP account
$ ./manage.py  imap --import    # imports the documents from newly (unread) emails
<!-- gh-comment-id:793612821 --> @ciur commented on GitHub (Mar 9, 2021): @l4rm4nd, oh, man, I am impressed :+1: . I didn't know about ``server.add_flags(uid, br'\Seen')``! Thank you for the tip! I will review/test/have a look at the issue in couple of days. Right now, I am recording screencasts and writing blog posts about Paparmerge features. If you want (maybe you need fix urgently, like today :) ), you may create a PR with the changes. In 2.0 the whole IMAP code was refactored taking into account [import by user](https://www.papermerge.com/docs/User%27s%20Manual/consumption.html#matching-by-user) and [import by secret](https://www.papermerge.com/docs/User%27s%20Manual/consumption.html#one-imap-account-for-many-papermerge-users) features; imho now code in ``papermerge.core.importers.imap`` module is cleaner and easier to read. The entry point for importing document attachments is [here](https://github.com/papermerge/papermerge-core/blob/master/papermerge/core/importers/imap.py#L309), which as you can see is part of different repository (papermerge-core is a reusable django app). I like you security acumen :1st_place_medal: , maybe you may wear, for a day or two, developer hat :tophat: as well :) Also, you might be interested in new (again in 2.0, thus it is in papermerge-core repository) management command called [imap](https://github.com/papermerge/papermerge-core/blob/master/papermerge/core/management/commands/imap.py), I wrote it to quickly troubleshoot IMAP related issues. You use it like any standard django command: ``` $ ./manage.py imap --connect # checks if can connect to IMAP account $ ./manage.py imap --import # imports the documents from newly (unread) emails ```
Author
Owner

@l4rm4nd commented on GitHub (Mar 9, 2021):

@l4rm4nd, oh, man, I am impressed 👍 . I didn't know about server.add_flags(uid, br'\Seen')! Thank you for the tip! I will review/test/have a look at the issue in couple of days. Right now, I am recording screencasts and writing blog posts about Paparmerge features. If you want (maybe you need fix urgently, like today :) ), you may create a PR with the changes.

In 2.0 the whole IMAP code was refactored taking into account import by user and import by secret features; imho now code in papermerge.core.importers.imap module is cleaner and easier to read. The entry point for importing document attachments is here, which as you can see is part of different repository (papermerge-core is a reusable django app).

I like you security acumen 🥇 , maybe you may wear, for a day or two, developer hat 🎩 as well :)

Also, you might be interested in new (again in 2.0, thus it is in papermerge-core repository) management command called imap, I wrote it to quickly troubleshoot IMAP related issues. You use it like any standard django command:

$ ./manage.py  imap --connect # checks if can connect to IMAP account
$ ./manage.py  imap --import    # imports the documents from newly (unread) emails

Hah, funny. I've looked at your code to understand why mails are not deleted for my Strato IMAP account.

image

According to the IMAPClient documentation, the function delete_messages() only sets the DELETED flag for an email message via add_flags().

image

As tested by myself above, flagging an email as deleted won't do anything for Strato accounts. So it seems that this is an individual bug for the provider Strato. I'll research more, maybe I can find a fix. Turns out some mail providers are RFC compliant and only delete flagged mails after calling expunge(). See IMAPv4 RFC and IMAPClient documentation.

<!-- gh-comment-id:793755791 --> @l4rm4nd commented on GitHub (Mar 9, 2021): > @l4rm4nd, oh, man, I am impressed 👍 . I didn't know about `server.add_flags(uid, br'\Seen')`! Thank you for the tip! I will review/test/have a look at the issue in couple of days. Right now, I am recording screencasts and writing blog posts about Paparmerge features. If you want (maybe you need fix urgently, like today :) ), you may create a PR with the changes. > > In 2.0 the whole IMAP code was refactored taking into account [import by user](https://www.papermerge.com/docs/User%27s%20Manual/consumption.html#matching-by-user) and [import by secret](https://www.papermerge.com/docs/User%27s%20Manual/consumption.html#one-imap-account-for-many-papermerge-users) features; imho now code in `papermerge.core.importers.imap` module is cleaner and easier to read. The entry point for importing document attachments is [here](https://github.com/papermerge/papermerge-core/blob/master/papermerge/core/importers/imap.py#L309), which as you can see is part of different repository (papermerge-core is a reusable django app). > > I like you security acumen 🥇 , maybe you may wear, for a day or two, developer hat 🎩 as well :) > > Also, you might be interested in new (again in 2.0, thus it is in papermerge-core repository) management command called [imap](https://github.com/papermerge/papermerge-core/blob/master/papermerge/core/management/commands/imap.py), I wrote it to quickly troubleshoot IMAP related issues. You use it like any standard django command: > > ``` > $ ./manage.py imap --connect # checks if can connect to IMAP account > $ ./manage.py imap --import # imports the documents from newly (unread) emails > ``` Hah, funny. I've looked at your [code ](https://github.com/papermerge/papermerge-core/blob/master/papermerge/core/importers/imap.py#L309) to understand why mails are not deleted for my Strato IMAP account. ![image](https://user-images.githubusercontent.com/21357789/110463868-b9957780-80d2-11eb-9020-336d635d3ac4.png) According to the [IMAPClient documentation](https://imapclient.readthedocs.io/en/2.1.0/_modules/imapclient/imapclient.html#IMAPClient.delete_messages), the function `delete_messages()` only sets the `DELETED` flag for an email message via `add_flags()`. ![image](https://user-images.githubusercontent.com/21357789/110463607-6de2ce00-80d2-11eb-8a54-044a458c24c7.png) As tested by myself above, flagging an email as deleted won't do anything for Strato accounts. ~~So it seems that this is an individual bug for the provider Strato. I'll research more, maybe I can find a fix.~~ Turns out some mail providers are RFC compliant and only delete flagged mails after calling `expunge()`. See [IMAPv4 RFC](https://tools.ietf.org/html/rfc3501.html#section-6.4.3) and [IMAPClient documentation](https://imapclient.readthedocs.io/en/2.1.0/api.html).
Author
Owner

@l4rm4nd commented on GitHub (Mar 9, 2021):

Pushed a new pull request for papermerge-core.

This fixes the multiple import of mails by setting the SEEN flag via add_flags(uid, br'\Seen'). Further, it fixes the issue where mails are not deleted for some mail providers. Issue is that not all mail providers are RFC compliant and act differently on flagged mails. IMAPClient's delete_messages() function only marks mails with the Deleted flag. However, in order to delete flagged mails, one have to call expunge(). If both functions are used, we are RFC compliant and all processed mails should be successfully deleted across various mail providers.

Further, I recommend a new approach for deleting mails. For the current release of papermerge-core, if a user specifies the consumption option IMPORT_MAIL_DELETE=True to delete mails after processing, the mails are permanently deleted. You won't be able to find the mail again, since it is not moved to a folder like Trash or so. This can be kinda weird + the deletion might not even work for some mail providers (e.g. STRATO).

New approach is:
- Mark all processed mails as seen/read to indicate successful import
- Move mails to a folder like Trash if a user enabled IMPORT_MAIL_DELETE as consumption option

Users might specify a custom folder for 'deleted' mails in future releases as consumption option (e.g. IMPORT_MAIL_DELETE_FOLDER). This is not implemented yet, since the variable trash_folder in email_iterator() is currently not overwritten by a user's variabe and per default set to the string Trash.

<!-- gh-comment-id:793850768 --> @l4rm4nd commented on GitHub (Mar 9, 2021): Pushed a new [pull request](https://github.com/papermerge/papermerge-core/pull/6) for papermerge-core. This fixes the multiple import of mails by setting the SEEN flag via `add_flags(uid, br'\Seen')`. Further, it fixes the issue where mails are not deleted for some mail providers. Issue is that not all mail providers are RFC compliant and act differently on flagged mails. IMAPClient's `delete_messages()` function only marks mails with the `Deleted` flag. However, in order to delete flagged mails, one have to call `expunge()`. If both functions are used, we are RFC compliant and all processed mails should be successfully deleted across various mail providers. ~~Further, I recommend a new approach for deleting mails. For the current release of papermerge-core, if a user specifies the consumption option `IMPORT_MAIL_DELETE=True` to delete mails after processing, the mails are permanently deleted. You won't be able to find the mail again, since it is not moved to a folder like `Trash` or so. This can be kinda weird + the deletion might not even work for some mail providers (e.g. STRATO).~~ ~~**New approach is:**~~ ~~- Mark all processed mails as seen/read to indicate successful import~~ ~~- Move mails to a folder like ``Trash`` if a user enabled `IMPORT_MAIL_DELETE` as consumption option~~ ~~Users might specify a custom folder for 'deleted' mails in future releases as consumption option (e.g. ``IMPORT_MAIL_DELETE_FOLDER``). This is not implemented yet, since the variable ``trash_folder`` in `email_iterator()` is currently not overwritten by a user's variabe and per default set to the string `Trash`.~~
Author
Owner

@l4rm4nd commented on GitHub (Mar 9, 2021):

Another interesting resource states, that IMAPClient's delete_messages() has to additionally use expunge() to take effect.

image

<!-- gh-comment-id:794081480 --> @l4rm4nd commented on GitHub (Mar 9, 2021): Another interesting [resource](https://erlerobotics.gitbooks.io/erle-robotics-python-gitbook-free/content/internet_message_access_protocol_imap/flagging_and_deleting_messages.html) states, that IMAPClient's ``delete_messages()`` has to additionally use `expunge()` to take effect. ![image](https://user-images.githubusercontent.com/21357789/110499130-3c7cf900-80f8-11eb-9b3a-f6ee5259a0c3.png)
Author
Owner

@ciur commented on GitHub (Mar 10, 2021):

Now fix is in Papermerge Core 2.0.0rc48.

During next couple of days I will:

  • switch this repository CI to github actions (now it is using Travis CI, which is obsolete)
  • automate building and publishing of docker image with github actions
  • tag next release iteration (which will trigger building and publishing of docker image based on release tag)

It will take some time, but finally will be done right.
(now I manually build and publish docker image, which is a pain)

<!-- gh-comment-id:795421606 --> @ciur commented on GitHub (Mar 10, 2021): Now fix is in [Papermerge Core 2.0.0rc48](https://github.com/papermerge/papermerge-core/releases/tag/v2.0.0rc48). During next couple of days I will: * switch this repository CI to github actions (now it is using Travis CI, which is obsolete) * automate building and publishing of docker image with github actions * tag next release iteration (which will trigger building and publishing of docker image based on release tag) It will take some time, but finally will be done right. (now I manually build and publish docker image, which is a pain)
Author
Owner

@ciur commented on GitHub (Mar 12, 2021):

@l4rm4nd, release 2.0.0rc48 is out and it contains your IMAP related fixes! 🎉

<!-- gh-comment-id:797370536 --> @ciur commented on GitHub (Mar 12, 2021): @l4rm4nd, [release 2.0.0rc48](https://github.com/ciur/papermerge/releases/tag/v2.0.0rc48) is out and it contains your IMAP related fixes! :tada:
Author
Owner

@l4rm4nd commented on GitHub (Mar 12, 2021):

Hey @ciur , thanks for your new release!

Unfortunately, I am getting some internal server errors with the newest version.

 ModuleNotFoundError: No module named 'mglib'

See #345

<!-- gh-comment-id:797587774 --> @l4rm4nd commented on GitHub (Mar 12, 2021): Hey @ciur , thanks for your new release! Unfortunately, I am getting some internal server errors with the newest version. ``` ModuleNotFoundError: No module named 'mglib' ``` See #345
Author
Owner

@l4rm4nd commented on GitHub (Mar 15, 2021):

Fixed. See https://github.com/linuxserver/docker-papermerge/issues/32

<!-- gh-comment-id:799377419 --> @l4rm4nd commented on GitHub (Mar 15, 2021): Fixed. See https://github.com/linuxserver/docker-papermerge/issues/32
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#185
No description provided.