[GH-ISSUE #128] Problem with non ASCII characters in file names. #98

Closed
opened 2026-02-25 21:31:12 +03:00 by kerem · 13 comments
Owner

Originally created by @N3tFX on GitHub (Sep 18, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/128

If the file is uploaded with a file name that has non ASCII characters, Papermerge can't process it. It shows a broken Icon. If the same file is renamed before upload with only ASCII characters in its filename, then everything works. I've tried it in the docker version.

Originally created by @N3tFX on GitHub (Sep 18, 2020). Original GitHub issue: https://github.com/ciur/papermerge/issues/128 If the file is uploaded with a file name that has **non** ASCII characters, Papermerge can't process it. It shows a broken Icon. If the same file is renamed before upload with **only** ASCII characters in its filename, then everything works. I've tried it in the docker version.
kerem 2026-02-25 21:31:12 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@ciur commented on GitHub (Sep 19, 2020):

Hi @N3tFX,

can you please be more specific?
What non ASCII characters are used in file name ? Was it a pdf file ? Or maybe jpeg, tiff, png ? Was there another OCR background job running (working crunching another file in meantime)?

<!-- gh-comment-id:695156343 --> @ciur commented on GitHub (Sep 19, 2020): Hi @N3tFX, can you please be more specific? What non ASCII characters are used in file name ? Was it a pdf file ? Or maybe jpeg, tiff, png ? Was there another OCR background job running (working crunching another file in meantime)?
Author
Owner

@N3tFX commented on GitHub (Sep 19, 2020):

Basically, ASCII characters https://en.wikipedia.org/wiki/ASCII are the ones used in the English language and numbers. By non ASCII characters I am referring to characters that are not part of the English language. And I have tried it with pdf, jpg and png files. It was the file name that was the problem. When I renamed them with English characters from the ASCII table, everything worked. And no OCR was running in the background as these files where the first I've tried to upload.

<!-- gh-comment-id:695190049 --> @N3tFX commented on GitHub (Sep 19, 2020): Basically, ASCII characters [https://en.wikipedia.org/wiki/ASCII](url) are the ones used in the English language and numbers. By **non** ASCII characters I am referring to characters that are not part of the English language. And I have tried it with pdf, jpg and png files. It was the file name that was the problem. When I renamed them with English characters from the ASCII table, everything worked. And no OCR was running in the background as these files where the first I've tried to upload.
Author
Owner

@ciur commented on GitHub (Sep 19, 2020):

@N3tFX,
I tested application with German letters e.g. Änderungen.pdf, with Russian cyrillic e.g. файл.pdf and it worked as expected.
What characters did you use ?

<!-- gh-comment-id:695264694 --> @ciur commented on GitHub (Sep 19, 2020): @N3tFX, I tested application with German letters e.g. Änderungen.pdf, with Russian cyrillic e.g. файл.pdf and it worked as expected. What characters did you use ?
Author
Owner

@N3tFX commented on GitHub (Sep 19, 2020):

I have tried it in many other languages, even Japanese. Only if the filename is in English it uploads correctly. Remember, I am using the docker version (v1.4.2). Maybe this happens only in the docker container. By the way here is an error message I see in the logs when I upload a file with non English characters in the file name:

`UnicodeEncodeError: 'ascii' codec can't encode characters in position 105-107: ordinal not in range(128)`.

The exact same files with English filenames upload without problems.

<!-- gh-comment-id:695324130 --> @N3tFX commented on GitHub (Sep 19, 2020): I have tried it in many other languages, even Japanese. Only if the filename is in English it uploads correctly. Remember, I am using the docker version (v1.4.2). Maybe this happens only in the docker container. By the way here is an error message I see in the logs when I upload a file with non English characters in the file name: `UnicodeEncodeError: 'ascii' codec can't encode characters in position 105-107: ordinal not in range(128)`. The exact same files with English filenames upload without problems.
Author
Owner

@ciur commented on GitHub (Sep 22, 2020):

What docker image you use ? Is it this one? Or the one provided by linuxserverio?

<!-- gh-comment-id:696502952 --> @ciur commented on GitHub (Sep 22, 2020): What docker image you use ? Is it [this one](https://hub.docker.com/repository/docker/eugenci/papermerge)? Or the one provided by [linuxserverio](https://fleet.linuxserver.io/image?name=linuxserver/papermerge)?
Author
Owner

@N3tFX commented on GitHub (Sep 22, 2020):

I am using the linuxserver/papermerge docker. I have tried to run the eugenci/papermerge but I can't make it run. It complains about not finding tesseract and not finding the db. I have even try installing it using the instructions in the Papermerge manual https://papermerge.readthedocs.io/en/latest/setup/docker.html but is fails installing with the error:

ERROR: Cannot locate specified Dockerfile: app.dockerfile

Also another thing I like in the linuxserver/papermerge docker, is that it has enviroment options for bind volumes. This way if something happens to the Papermerge container, at least the documents would be safe outside the container in the host machine.

<!-- gh-comment-id:696658923 --> @N3tFX commented on GitHub (Sep 22, 2020): I am using the **linuxserver/papermerge** docker. I have tried to run the **eugenci/papermerge** but I can't make it run. It complains about not finding tesseract and not finding the db. I have even try installing it using the instructions in the Papermerge manual [https://papermerge.readthedocs.io/en/latest/setup/docker.html](url) but is fails installing with the error: `ERROR: Cannot locate specified Dockerfile: app.dockerfile` Also another thing I like in the **linuxserver/papermerge** docker, is that it has enviroment options for bind volumes. This way if something happens to the Papermerge container, at least the documents would be safe outside the container in the host machine.
Author
Owner

@ciur commented on GitHub (Sep 22, 2020):

Ah, and last question. What OS do you use ? Windows/Linux/MacOS ?
I will create an "issue template" to these questions automatically pop up on opening of the new issues.

<!-- gh-comment-id:696766262 --> @ciur commented on GitHub (Sep 22, 2020): Ah, and last question. What OS do you use ? Windows/Linux/MacOS ? I will create an "issue template" to these questions automatically pop up on opening of the new issues.
Author
Owner

@N3tFX commented on GitHub (Sep 22, 2020):

I am using Debian with the latest docker version from docker repositories.

<!-- gh-comment-id:696770011 --> @N3tFX commented on GitHub (Sep 22, 2020): I am using Debian with the latest docker version from docker repositories.
Author
Owner

@ciur commented on GitHub (Sep 23, 2020):

@N3tFX , thank you! I will investigate the issue.

<!-- gh-comment-id:697160421 --> @ciur commented on GitHub (Sep 23, 2020): @N3tFX , thank you! I will investigate the issue.
Author
Owner

@stoykovstoyk commented on GitHub (Sep 26, 2020):

I had the exact same problem, and the problem is with the linuxserver.io docker image. Don't use it this docker image. I have spend a lot of time until I realized that the non ascii filenames error is due to linuxerver docker image not due to anything related to Papermerge.

Use official documentation provided by Papermerge and install by yourself and you will have it up and running. This is what I did and it fixed it.

This is not a Papermerge bug and @ciur should not waste time on it.

<!-- gh-comment-id:699504599 --> @stoykovstoyk commented on GitHub (Sep 26, 2020): I had the exact same problem, and the problem is with the linuxserver.io docker image. Don't use it this docker image. I have spend a lot of time until I realized that the non ascii filenames error is due to linuxerver docker image not due to anything related to Papermerge. Use official documentation provided by Papermerge and install by yourself and you will have it up and running. This is what I did and it fixed it. This is not a Papermerge bug and @ciur should not waste time on it.
Author
Owner

@ciur commented on GitHub (Sep 26, 2020):

I agree with @stoykovstoyk
Bugs related to external docker images must be handled by respective 3rd parties.
@N3tFX, please open bug at linuxserver.io repo for papermerge docker image.

<!-- gh-comment-id:699512334 --> @ciur commented on GitHub (Sep 26, 2020): I agree with @stoykovstoyk Bugs related to external docker images must be handled by respective 3rd parties. @N3tFX, please open bug at[ linuxserver.io repo for papermerge docker image](https://github.com/linuxserver/docker-papermerge).
Author
Owner

@SokolovskiR commented on GitHub (Oct 10, 2020):

Hello, @stoykovstoyk ! Can you describe, how you managed to setup the official docker image? I only could install the linuxserver.io edition. When I try to install the original image via portainer.io on my Synology DS218+, I get the error message about not supported version. Is there another way to install it?

grafik

<!-- gh-comment-id:706554964 --> @SokolovskiR commented on GitHub (Oct 10, 2020): Hello, @stoykovstoyk ! Can you describe, how you managed to setup the official docker image? I only could install the linuxserver.io edition. When I try to install the original image via portainer.io on my Synology DS218+, I get the error message about not supported version. Is there another way to install it? ![grafik](https://user-images.githubusercontent.com/59457122/95657147-24865680-0b13-11eb-906b-0a8cd47ae022.png)
Author
Owner

@stoykovstoyk commented on GitHub (Oct 11, 2020):

I haven't used the docker image. I have installed it with the OS Specific instructions. You just have to follow the instructions and read carefully. I have done it and everything works fine.

<!-- gh-comment-id:706659434 --> @stoykovstoyk commented on GitHub (Oct 11, 2020): I haven't used the docker image. I have installed it with the OS Specific instructions. You just have to follow the instructions and read carefully. I have done it and everything works fine.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#98
No description provided.