[GH-ISSUE #158] slow crawl #158

Closed
opened 2026-02-27 15:55:22 +03:00 by kerem · 2 comments
Owner

Originally created by @denis1482 on GitHub (May 14, 2018).
Original GitHub issue: https://github.com/RD17/ambar/issues/158

When crawling a large storage (100k+ docx/xlsx files) on cifs-mounted share, 2.0.0rc performs very poor, adding less than 50 files per day. Ubuntu 18.04 64-bit, lots of RAM and CPUs.

Originally created by @denis1482 on GitHub (May 14, 2018). Original GitHub issue: https://github.com/RD17/ambar/issues/158 When crawling a large storage (100k+ docx/xlsx files) on cifs-mounted share, 2.0.0rc performs very poor, adding less than 50 files per day. Ubuntu 18.04 64-bit, lots of RAM and CPUs.
kerem 2026-02-27 15:55:22 +03:00
Author
Owner

@sochix commented on GitHub (May 16, 2018):

Please update Ambar, we improved the prefomance. Also you should tune docker-compose to give ES as much RAM as you can and add 2-3 pipelines.

<!-- gh-comment-id:389441980 --> @sochix commented on GitHub (May 16, 2018): Please update Ambar, we improved the prefomance. Also you should tune docker-compose to give ES as much RAM as you can and add 2-3 pipelines.
Author
Owner

@denis1482 commented on GitHub (May 16, 2018):

Fixed in 2.1.8

<!-- gh-comment-id:389511315 --> @denis1482 commented on GitHub (May 16, 2018): Fixed in 2.1.8
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#158
No description provided.