[GH-ISSUE #194] Indexing multiple languages in one dataset #192

New issue

Closed

opened 2026-02-27 15:55:33 +03:00 by kerem · 1 comment

kerem commented

2026-02-27 15:55:33 +03:00

Owner

Originally created by @Loo0D on GitHub (Oct 21, 2018).
Original GitHub issue: https://github.com/RD17/ambar/issues/194

Hello!

We are trialling Ambar as a lightweight e-discovery product.

The documentation states:

Replace ${langAnalyzer} value with language analyzer you want Ambar apply while indexing your documents, supported analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Our sample dataset was configured with ambar_en. The dataset contains English and Greek documents, and the search finds both, which is great. However, it would be good to understand what exactly langAnalyzer flag does, i.e. does it only apply to tesseract/OCR?

In other words, what are we missing on the Greek side (in this case) by setting the analyser to English?

Thanks!

Originally created by @Loo0D on GitHub (Oct 21, 2018). Original GitHub issue: https://github.com/RD17/ambar/issues/194 Hello! We are trialling Ambar as a lightweight e-discovery product. The documentation states: >Replace ${langAnalyzer} value with language analyzer you want Ambar apply while indexing your documents, supported analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk Our sample dataset was configured with `ambar_en`. The dataset contains English and Greek documents, and the search finds both, which is great. However, it would be good to understand what exactly `langAnalyzer` flag does, i.e. does it only apply to tesseract/OCR? In other words, what are we missing on the Greek side (in this case) by setting the analyser to English? Thanks!

kerem

2026-02-27 15:55:33 +03:00

closed this issue
added the
wontfix
label

kerem commented

2026-02-27 15:55:34 +03:00

Author

Owner

@stale[bot] commented on GitHub (Nov 5, 2018):

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale[bot] commented on GitHub (Nov 5, 2018): This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.