[GH-ISSUE #213] Not crawling updated data source #208

Closed
opened 2026-02-27 15:55:38 +03:00 by kerem · 0 comments
Owner

Originally created by @s1rk1t on GitHub (Jan 15, 2019).
Original GitHub issue: https://github.com/RD17/ambar/issues/213

I uploaded a bunch of new data to be crawled, and when I changed the name of the data path in the compose file the crawler no longer sees the data. Is there something else I need to do to get the crawler pointed to the right spot?

Here's the code from my compose file:

newcrawler:
depends_on:
serviceapi:
condition: service_healthy
image: ambar/ambar-local-crawler
restart: always
networks:
- internal_network
expose:
- "8082"
environment:
- name=newcrawler
volumes:
- /home/ec2-user/ambar/AMBAR/newData:/usr/data

I'm running ambar on an ec2 instance, if that matters. I'm trying to crawl around 500 files, none of them very large.

Update: I got a fresh copy of AMBAR downloaded and it sees the data like it should, so I am retracing my steps to see where I might have messed it up somehow.

Any help would be greatly appreciated.

Update: got it working, but I had to download a fresh installation to do it. After downloading, I included my updated data source in the initial build and then moved the resulting db, es, and rabbit files, along with the new data to my older build, which then was able to integrate the new data into it's searching process. Seems like not the optimal way to do this, given that I should be able to just update the data source in the .yml file and have it crawl the new data.

Originally created by @s1rk1t on GitHub (Jan 15, 2019). Original GitHub issue: https://github.com/RD17/ambar/issues/213 I uploaded a bunch of new data to be crawled, and when I changed the name of the data path in the compose file the crawler no longer sees the data. Is there something else I need to do to get the crawler pointed to the right spot? Here's the code from my compose file: newcrawler: depends_on: serviceapi: condition: service_healthy image: ambar/ambar-local-crawler restart: always networks: - internal_network expose: - "8082" environment: - name=newcrawler volumes: - /home/ec2-user/ambar/AMBAR/newData:/usr/data I'm running ambar on an ec2 instance, if that matters. I'm trying to crawl around 500 files, none of them very large. Update: I got a fresh copy of AMBAR downloaded and it sees the data like it should, so I am retracing my steps to see where I might have messed it up somehow. Any help would be greatly appreciated. Update: got it working, but I had to download a fresh installation to do it. After downloading, I included my updated data source in the initial build and then moved the resulting db, es, and rabbit files, along with the new data to my older build, which then was able to integrate the new data into it's searching process. Seems like not the optimal way to do this, given that I should be able to just update the data source in the .yml file and have it crawl the new data.
kerem closed this issue 2026-02-27 15:55:39 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#208
No description provided.