[GH-ISSUE #571] Additionally installed OCR language is rejected by web UI backend #443

Closed
opened 2026-02-25 21:31:56 +03:00 by kerem · 3 comments
Owner

Originally created by @lehnerpat on GitHub (Dec 31, 2023).
Original GitHub issue: https://github.com/ciur/papermerge/issues/571

Originally assigned to: @ciur on GitHub.

Description
After installing an additional OCR language (for example, Japanese) as described in the docs, the additional language can be used in OCR by setting it as the default, but it cannot be used from the web UI because the backend rejects it as an invalid value.

Expected
Additionally installed languages should be usable from web UI, just like the default languages.

Actual
The additional language shows up in the language selection dropdown for running OCR:
CleanShot 2023-12-31 at 17 12 25@2x

But when you click "Start", the backend responds with a 422 error saying the additional language is not an allowed value for the enum.

Additionally, the UI completely ignores this error and doesn't show any error message :(

Full error payload:

{
    "detail": [
        {
            "type": "enum",
            "loc": [
                "body",
                "lang"
            ],
            "msg": "Input should be 'deu','fra','eng','ita','spa','por' or 'ron'",
            "input": "jpn",
            "ctx": {
                "expected": "'deu','fra','eng','ita','spa','por' or 'ron'"
            }
        }
    ]
}

Browser console screenshot:
CleanShot 2023-12-31 at 17 12 41@2x

Info:

  • OS: macOS Sonoma 14.1.2 (23B92), Architecture: Intel (x86_64)
  • Browser: Safari 17.1.2 (19616.2.9.11.12)
  • Database: SQLite
  • Papermerge Version: 3.0

More info about setup:

  • Using custom docker image with Japanese language package for tesseract installed, following instructions: https://docs.papermerge.io/3.0/setup/add-ocr-langs/

    • Dockerfile:

      FROM papermerge/papermerge:3.0
      
      # add Japanese OCR language
      RUN apt install tesseract-ocr-jpn
      
    • Built with: docker build -t mypaper:3.0 -f Dockerfile .

  • Using Docker Compose, following instructions: https://docs.papermerge.io/3.0/setup/docker-compose/

    • Changed image to use my custom one (mypaper:3.0)
    • Changed username and password
    • Set additional env var PAPERMERGE__OCR__DEFAULT_LANGUAGE: jpn
Originally created by @lehnerpat on GitHub (Dec 31, 2023). Original GitHub issue: https://github.com/ciur/papermerge/issues/571 Originally assigned to: @ciur on GitHub. **Description** After installing an additional OCR language (for example, Japanese) as described [in the docs](https://docs.papermerge.io/3.0/setup/add-ocr-langs/), the additional language can be used in OCR by setting it as the default, but it cannot be used from the web UI because the backend rejects it as an invalid value. **Expected** Additionally installed languages should be usable from web UI, just like the default languages. **Actual** The additional language shows up in the language selection dropdown for running OCR: ![CleanShot 2023-12-31 at 17 12 25@2x](https://github.com/ciur/papermerge/assets/1099818/85fb3675-8313-4afc-a24a-62b5b52061d7) But when you click "Start", the backend responds with a 422 error saying the additional language is not an allowed value for the enum. Additionally, the UI completely ignores this error and doesn't show any error message :( Full error payload: ```json { "detail": [ { "type": "enum", "loc": [ "body", "lang" ], "msg": "Input should be 'deu','fra','eng','ita','spa','por' or 'ron'", "input": "jpn", "ctx": { "expected": "'deu','fra','eng','ita','spa','por' or 'ron'" } } ] } ``` Browser console screenshot: ![CleanShot 2023-12-31 at 17 12 41@2x](https://github.com/ciur/papermerge/assets/1099818/aa846a18-aa31-4bc2-8449-f72a609b7c82) **Info:** - OS: macOS Sonoma 14.1.2 (23B92), Architecture: Intel (x86_64) - Browser: Safari 17.1.2 (19616.2.9.11.12) - Database: SQLite - Papermerge Version: 3.0 **More info about setup:** * Using custom docker image with Japanese language package for tesseract installed, following instructions: https://docs.papermerge.io/3.0/setup/add-ocr-langs/ * Dockerfile: ```dockerfile FROM papermerge/papermerge:3.0 # add Japanese OCR language RUN apt install tesseract-ocr-jpn ``` * Built with: `docker build -t mypaper:3.0 -f Dockerfile .` * Using Docker Compose, following instructions: https://docs.papermerge.io/3.0/setup/docker-compose/ * Changed image to use my custom one (`mypaper:3.0`) * Changed username and password * Set additional env var `PAPERMERGE__OCR__DEFAULT_LANGUAGE: jpn`
kerem 2026-02-25 21:31:56 +03:00
Author
Owner

@ciur commented on GitHub (Dec 31, 2023):

Thank you for well structured bug report!

The issue happens because currently the language codes are hardcoded:

  1. in backaned
  2. in UI
  3. and here

The fix would be to, well, just extend current set of hardcoded values with another batch of languages (incl. Japanese).

<!-- gh-comment-id:1872905640 --> @ciur commented on GitHub (Dec 31, 2023): Thank you for well structured bug report! The issue happens because currently the language codes are hardcoded: 1. [in backaned](https://github.com/papermerge/papermerge-core/blob/e1e8ea107430bf2a0b13359dd6f0bff818145936/papermerge/core/schemas/tasks.py#L8) 2. [in UI](https://github.com/papermerge/papermerge-core/blob/master/ui/src/types.ts#L272) 3. [and here](https://github.com/papermerge/papermerge-core/blob/e1e8ea107430bf2a0b13359dd6f0bff818145936/ui/src/cconstants.ts#L10) The fix would be to, well, just extend current set of hardcoded values with another batch of languages (incl. Japanese).
Author
Owner

@ciur commented on GitHub (Jan 12, 2024):

PR#300 to include extra language codes (incl. Japanese)

Pull request was merged and it will available as part of Papermerge 3.0.1 release.

<!-- gh-comment-id:1888487844 --> @ciur commented on GitHub (Jan 12, 2024): [PR#300](https://github.com/papermerge/papermerge-core/pull/300) to include extra language codes (incl. Japanese) [Pull request](https://github.com/papermerge/papermerge-core/pull/300) was merged and it will available as part of Papermerge 3.0.1 release.
Author
Owner

@ciur commented on GitHub (Jan 25, 2024):

Fixed in 3.0.2

<!-- gh-comment-id:1910566683 --> @ciur commented on GitHub (Jan 25, 2024): Fixed in [3.0.2](https://github.com/ciur/papermerge/releases/tag/3.0.2)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#443
No description provided.