[GH-ISSUE #194] Indexing multiple languages in one dataset #192

Closed
opened 2026-02-27 15:55:33 +03:00 by kerem · 1 comment
Owner

Originally created by @Loo0D on GitHub (Oct 21, 2018).
Original GitHub issue: https://github.com/RD17/ambar/issues/194

Hello!

We are trialling Ambar as a lightweight e-discovery product.

The documentation states:

Replace ${langAnalyzer} value with language analyzer you want Ambar apply while indexing your documents, supported analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Our sample dataset was configured with ambar_en. The dataset contains English and Greek documents, and the search finds both, which is great. However, it would be good to understand what exactly langAnalyzer flag does, i.e. does it only apply to tesseract/OCR?

In other words, what are we missing on the Greek side (in this case) by setting the analyser to English?

Thanks!

Originally created by @Loo0D on GitHub (Oct 21, 2018). Original GitHub issue: https://github.com/RD17/ambar/issues/194 Hello! We are trialling Ambar as a lightweight e-discovery product. The documentation states: >Replace ${langAnalyzer} value with language analyzer you want Ambar apply while indexing your documents, supported analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk Our sample dataset was configured with `ambar_en`. The dataset contains English and Greek documents, and the search finds both, which is great. However, it would be good to understand what exactly `langAnalyzer` flag does, i.e. does it only apply to tesseract/OCR? In other words, what are we missing on the Greek side (in this case) by setting the analyser to English? Thanks!
kerem 2026-02-27 15:55:33 +03:00
  • closed this issue
  • added the
    wontfix
    label
Author
Owner

@stale[bot] commented on GitHub (Nov 5, 2018):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

<!-- gh-comment-id:436058403 --> @stale[bot] commented on GitHub (Nov 5, 2018): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#192
No description provided.