mirror of
https://github.com/RD17/ambar.git
synced 2026-04-25 07:25:55 +03:00
[GH-ISSUE #273] Can't get rid of unwanted folders using ignoreFolders env var #259
Labels
No labels
$$ Paid Support
bug
bug
enhancement
help wanted
invalid
pull-request
question
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ambar#259
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @fabibio on GitHub (Dec 9, 2019).
Original GitHub issue: https://github.com/RD17/ambar/issues/273
Hi,
I tried to exclude some folders from crawler but still they shows up in search results.
I run the official docker-compose bundle. In order to ignore folders named "old" I wrote the docker-compose.yml crawler section this way:
documenti: depends_on: serviceapi: condition: service_healthy image: repo.ambar.cloud:443/ambar-local-crawler:2.1 restart: always networks: - internal_network expose: - "8082" environment: - name=documenti - ignoreFolders=**/old/** volumes: - /opt/ambar/mounts/Documenti:/usr/dataThe "docker inspect" output looks consistent with the above config. Am I miiss something?
Also, there's a way to ignore multiple folders? Are folder names passed by env vars case insensitive?
Thanks in advance.
Fabio
@sochix commented on GitHub (Dec 9, 2019):
Hello Fabio!
Did you add ignoreFolders env var on empty index? Or you already crawled some docs and want to ignore them later on?
@fabibio commented on GitHub (Dec 10, 2019):
Sorry, forgot to properly quote the wildcards: the relevant env var is
ignoreFolders=**/old/**@fabibio commented on GitHub (Dec 10, 2019):
At first I added the env var to an existing crawler with documents already... crawled, and it didn't worked. Then I suddenly realized that the variable is in the crawler scope and if a document has already been indexed there's no way for the api to know I don't want it :-) Thus I destroyed containers, rm -fr on the data es/mongo/rabbit dirs (external mount) and rebuilt the whole thing from scratch, but it didn't worked either.
Is there a way to enable some sort of debug logging in crawler?
I attached the whole docker-compose.yml. I made a couple of fix to frontend, so we use a local image.
docker-compose.yml.txt
@fabibio commented on GitHub (Dec 13, 2019):
Hi guys,
I had a look at the code and turned out that the issue is due to dots and/or dashes in the path (the minimatch globstar implementation is rather simplycistic, isn't it?). I implemented a new ignoreFoldersRe criteria based on regexp.
Cheers
@stale[bot] commented on GitHub (Dec 28, 2019):
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.