[GH-ISSUE #273] Can't get rid of unwanted folders using ignoreFolders env var #259

Closed
opened 2026-02-27 15:55:51 +03:00 by kerem · 5 comments
Owner

Originally created by @fabibio on GitHub (Dec 9, 2019).
Original GitHub issue: https://github.com/RD17/ambar/issues/273

Hi,

I tried to exclude some folders from crawler but still they shows up in search results.

I run the official docker-compose bundle. In order to ignore folders named "old" I wrote the docker-compose.yml crawler section this way:

documenti:
    depends_on:
      serviceapi:
        condition: service_healthy
    image: repo.ambar.cloud:443/ambar-local-crawler:2.1
    restart: always
    networks:
      - internal_network
    expose:
      - "8082"
    environment:
      - name=documenti
      - ignoreFolders=**/old/**
    volumes:
      - /opt/ambar/mounts/Documenti:/usr/data

The "docker inspect" output looks consistent with the above config. Am I miiss something?

Also, there's a way to ignore multiple folders? Are folder names passed by env vars case insensitive?

Thanks in advance.

Fabio

Originally created by @fabibio on GitHub (Dec 9, 2019). Original GitHub issue: https://github.com/RD17/ambar/issues/273 Hi, I tried to exclude some folders from crawler but still they shows up in search results. I run the official docker-compose bundle. In order to ignore folders named "old" I wrote the docker-compose.yml crawler section this way: <pre> documenti: depends_on: serviceapi: condition: service_healthy image: repo.ambar.cloud:443/ambar-local-crawler:2.1 restart: always networks: - internal_network expose: - "8082" environment: - name=documenti - ignoreFolders=**/old/** volumes: - /opt/ambar/mounts/Documenti:/usr/data </pre> The "docker inspect" output looks consistent with the above config. Am I miiss something? Also, there's a way to ignore multiple folders? Are folder names passed by env vars case insensitive? Thanks in advance. Fabio
kerem 2026-02-27 15:55:51 +03:00
  • closed this issue
  • added the
    wontfix
    label
Author
Owner

@sochix commented on GitHub (Dec 9, 2019):

Hello Fabio!
Did you add ignoreFolders env var on empty index? Or you already crawled some docs and want to ignore them later on?

<!-- gh-comment-id:563369201 --> @sochix commented on GitHub (Dec 9, 2019): Hello Fabio! Did you add ignoreFolders env var on empty index? Or you already crawled some docs and want to ignore them later on?
Author
Owner

@fabibio commented on GitHub (Dec 10, 2019):

Sorry, forgot to properly quote the wildcards: the relevant env var is ignoreFolders=**/old/**

<!-- gh-comment-id:564056493 --> @fabibio commented on GitHub (Dec 10, 2019): Sorry, forgot to properly quote the wildcards: the relevant env var is `ignoreFolders=**/old/**`
Author
Owner

@fabibio commented on GitHub (Dec 10, 2019):

At first I added the env var to an existing crawler with documents already... crawled, and it didn't worked. Then I suddenly realized that the variable is in the crawler scope and if a document has already been indexed there's no way for the api to know I don't want it :-) Thus I destroyed containers, rm -fr on the data es/mongo/rabbit dirs (external mount) and rebuilt the whole thing from scratch, but it didn't worked either.

Is there a way to enable some sort of debug logging in crawler?

I attached the whole docker-compose.yml. I made a couple of fix to frontend, so we use a local image.

docker-compose.yml.txt

<!-- gh-comment-id:564067027 --> @fabibio commented on GitHub (Dec 10, 2019): At first I added the env var to an existing crawler with documents already... crawled, and it didn't worked. Then I suddenly realized that the variable is in the crawler scope and if a document has already been indexed there's no way for the api to know I don't want it :-) Thus I destroyed containers, rm -fr on the data es/mongo/rabbit dirs (external mount) and rebuilt the whole thing from scratch, but it didn't worked either. Is there a way to enable some sort of debug logging in crawler? I attached the whole docker-compose.yml. I made a couple of fix to frontend, so we use a local image. [docker-compose.yml.txt](https://github.com/RD17/ambar/files/3945394/docker-compose.yml.txt)
Author
Owner

@fabibio commented on GitHub (Dec 13, 2019):

Hi guys,

I had a look at the code and turned out that the issue is due to dots and/or dashes in the path (the minimatch globstar implementation is rather simplycistic, isn't it?). I implemented a new ignoreFoldersRe criteria based on regexp.

Cheers

<!-- gh-comment-id:565437641 --> @fabibio commented on GitHub (Dec 13, 2019): Hi guys, I had a look at the code and turned out that the issue is due to dots and/or dashes in the path (the minimatch globstar implementation is rather simplycistic, isn't it?). I implemented a new ignoreFoldersRe criteria based on regexp. Cheers
Author
Owner

@stale[bot] commented on GitHub (Dec 28, 2019):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

<!-- gh-comment-id:569417575 --> @stale[bot] commented on GitHub (Dec 28, 2019): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#259
No description provided.