[GH-ISSUE #30] Ambar is not extracting image text from pdf #28

Closed
opened 2026-02-27 15:54:32 +03:00 by kerem · 4 comments
Owner

Originally created by @N-CP on GitHub (May 4, 2017).
Original GitHub issue: https://github.com/RD17/ambar/issues/30

I upload pdf in ambar but not geting the image text extracted in output .txt file
CLASS-1-6.pdf

Originally created by @N-CP on GitHub (May 4, 2017). Original GitHub issue: https://github.com/RD17/ambar/issues/30 I upload pdf in ambar but not geting the image text extracted in output .txt file [CLASS-1-6.pdf](https://github.com/RD17/ambar/files/976358/CLASS-1-6.pdf)
kerem closed this issue 2026-02-27 15:54:33 +03:00
Author
Owner

@sochix commented on GitHub (May 4, 2017):

Hello! I explored a bit your sample file and found that it's a pdf with scanned images inside. It's have more than thousand pages. It depends on configuration of your machine, but on average it will take more than 8 hours to OCR 1000 pages. I recommend you to wait until Ambar process your file completely.

<!-- gh-comment-id:299215075 --> @sochix commented on GitHub (May 4, 2017): Hello! I explored a bit your sample file and found that it's a pdf with scanned images inside. It's have more than thousand pages. It depends on configuration of your machine, but on average it will take more than 8 hours to OCR 1000 pages. I recommend you to wait until Ambar process your file completely.
Author
Owner

@N-CP commented on GitHub (May 5, 2017):

Ok.Thanks

On 4 May 2017 at 20:41, Ilya Pirozhenko notifications@github.com wrote:

Hello! I explored a bit your sample file and found that it's a pdf with
scanned images inside. It's have more than thousand pages. It depends on
configuration of your machine, but on average it will take more than 8
hours to OCR 1000 pages. I recommend you to wait until Ambar process your
file completely.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/RD17/ambar/issues/30#issuecomment-299215075, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AawDCDJ4iULnO3zuaLyXjPuiR_Tuhs1-ks5r2eqvgaJpZM4NQtNP
.

<!-- gh-comment-id:299387965 --> @N-CP commented on GitHub (May 5, 2017): Ok.Thanks On 4 May 2017 at 20:41, Ilya Pirozhenko <notifications@github.com> wrote: > Hello! I explored a bit your sample file and found that it's a pdf with > scanned images inside. It's have more than thousand pages. It depends on > configuration of your machine, but on average it will take more than 8 > hours to OCR 1000 pages. I recommend you to wait until Ambar process your > file completely. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <https://github.com/RD17/ambar/issues/30#issuecomment-299215075>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AawDCDJ4iULnO3zuaLyXjPuiR_Tuhs1-ks5r2eqvgaJpZM4NQtNP> > . >
Author
Owner

@N-CP commented on GitHub (May 5, 2017):

I started ambar but is not loading on web???
I got error 502 bad gateway

<!-- gh-comment-id:299429286 --> @N-CP commented on GitHub (May 5, 2017): I started ambar but is not loading on web??? I got error 502 bad gateway
Author
Owner

@sochix commented on GitHub (May 10, 2017):

please create a new issue for this error

<!-- gh-comment-id:300473146 --> @sochix commented on GitHub (May 10, 2017): please create a new issue for this error
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#28
No description provided.