[GH-ISSUE #59] imap crawler: [error] error retrieving message b'20' failded to fetch #60

Closed
opened 2026-02-27 15:54:46 +03:00 by kerem · 2 comments
Owner

Originally created by @buster39 on GitHub (Aug 6, 2017).
Original GitHub issue: https://github.com/RD17/ambar/issues/59

Hello,

i tried different imap-servers for the crawler. But only a few local installations worked as expected.

I still have problems with gmail - and outlook.com gave me the same error:

2017-08-06 11:31:28.752: [info] filecrawler initialized
2017-08-06 11:31:30.049: [info] crawling xxx@gmail.com at imap.gmail.com
2017-08-06 11:31:30.349: [error] error retrieving message b'20' failded to fetch
2017-08-06 11:31:30.650: [info] done

My config:

{
"id": "Gmail",
"uid": "Gmail_d033e22ae348aeb5660fc2140aec35850c4da997",
"description": "Test",
"type": "imap",
"locations": [
{
"host_name": "imap.gmail.com",
"ip_address": "",
"location": "xxx@gmail.com"
}
],
"file_regex": "(\.doc[a-z])|(\\.xls[a-z]*)|(\.txt$)|(\.csv$)|(\.htm[a-z])|(\\.ppt[a-z]*)|(\.pdf$)|(\.msg$)|(\.zip$)|(\.eml$)|(\.rtf$)|(\.md$)|(\.png$)|(\.bmp$)|(\.tif[f])|(\\.jp[e]*g)|(\.hwp$)",
"credentials": {
"auth_type": "basic",
"login": "xxx@gmail.com",
"password": "
****",
"token": ""
},
"schedule": {
"is_active": false,
"cron_schedule": "
/15 * * * *"
},
"max_file_size_bytes": 30000000,
"verbose": true
}

Thank you!

Originally created by @buster39 on GitHub (Aug 6, 2017). Original GitHub issue: https://github.com/RD17/ambar/issues/59 Hello, i tried different imap-servers for the crawler. But only a few local installations worked as expected. I still have problems with gmail - and outlook.com gave me the same error: 2017-08-06 11:31:28.752: [info] filecrawler initialized 2017-08-06 11:31:30.049: [info] crawling xxx@gmail.com at imap.gmail.com 2017-08-06 11:31:30.349: [error] error retrieving message b'20' failded to fetch 2017-08-06 11:31:30.650: [info] done My config: { "id": "Gmail", "uid": "Gmail_d033e22ae348aeb5660fc2140aec35850c4da997", "description": "Test", "type": "imap", "locations": [ { "host_name": "imap.gmail.com", "ip_address": "", "location": "xxx@gmail.com" } ], "file_regex": "(\\.doc[a-z]*$)|(\\.xls[a-z]*$)|(\\.txt$)|(\\.csv$)|(\\.htm[a-z]*$)|(\\.ppt[a-z]*$)|(\\.pdf$)|(\\.msg$)|(\\.zip$)|(\\.eml$)|(\\.rtf$)|(\\.md$)|(\\.png$)|(\\.bmp$)|(\\.tif[f]*$)|(\\.jp[e]*g$)|(\\.hwp$)", "credentials": { "auth_type": "basic", "login": "xxx@gmail.com", "password": "******", "token": "" }, "schedule": { "is_active": false, "cron_schedule": "*/15 * * * *" }, "max_file_size_bytes": 30000000, "verbose": true } Thank you!
kerem 2026-02-27 15:54:46 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@akropp commented on GitHub (Aug 11, 2017):

I found the following change to imapcrawler.py:
from:
callResult, data = self.connection.fetch(messageId, '(RFC822)')

to:
callResult, data = self.connection.uid('fetch', messageId, '(BODY.PEEK[])')

Makes gmail work -- not sure why calling the fetch method directly instead of using the uid call makes it choke on the message ids. Also, changing RFC822 to BODY.PEEK[] keeps your mail unread.

<!-- gh-comment-id:321879229 --> @akropp commented on GitHub (Aug 11, 2017): I found the following change to imapcrawler.py: from: callResult, data = self.connection.fetch(messageId, '(RFC822)') to: callResult, data = self.connection.uid('fetch', messageId, '(BODY.PEEK[])') Makes gmail work -- not sure why calling the fetch method directly instead of using the uid call makes it choke on the message ids. Also, changing RFC822 to BODY.PEEK[] keeps your mail unread.
Author
Owner

@isido993 commented on GitHub (Aug 22, 2017):

Implemented, see 2fc84df85cd06895e0ec1b282348c64672d035ab
Thanks for your input!

<!-- gh-comment-id:323955716 --> @isido993 commented on GitHub (Aug 22, 2017): Implemented, see [2fc84df85cd06895e0ec1b282348c64672d035ab](https://github.com/RD17/ambar-crawler/commit/2fc84df85cd06895e0ec1b282348c64672d035ab) Thanks for your input!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#60
No description provided.