mirror of
https://github.com/ciur/papermerge.git
synced 2026-04-25 12:05:58 +03:00
[GH-ISSUE #99] OCR Working, But Automates Not; OCR not recognizing spaces in text; & Django Setting 'DEBUG = False' Breaks Login Page in Chrome v84 #78
Labels
No labels
2.1
3.0
3.0.1
3.0.2
3.0.3
3.0.3
3.1
3.2
3.2
3.3
3.5
3.x
Fixed. Waiting for feedback.
Fixed. Waiting for feedback.
UX
Version 2.1 - alpha
XSS
announcement
beta
blocker
bug
cannot reproduce
confirmed
confirmed
critical
demo
dependencies
deployment
detchnical debt
discussion
docker
documentation
donations
duplicate
enhancement
feature request
frontend
fundraising
good first issue
good issue
help wanted
high
implemented
important
improvement
incomplete
invalid
investigation
kubernetes
low
low impact
medium
medium
medium impact
migration from 2.0
migration from 2.1
missing-language
missing-ocr-language
no-activity
note
ocr
outofscope
packaging
performance
popular request
pull-request
pypi
question
raspberry pi
roadmap
search
security
setup
status
task
technical debt
updates
user xp
version 1.4.0 - demo
will be implemented
will not be implemented
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/papermerge#78
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @dohlin on GitHub (Aug 24, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/99
I have the following Automate set up:
And I know OCR seems to be working as I can search the inbox for the term the automate is looking for and it pulls a result:
But the document stays in the inbox, and won't move to the Utilities folder (I've waited a good 15+ minutes). Am I doing something wrong?
OCR also appears to not be recognizing spaces in the text - as if I open a document in the inbox, highlight a bunch of text and do a ctrl+c to copy, then paste it into a text editor (e.g. Word) the words are all there, but no spaces between words are included. Any setting to fix this?
Also - setting 'DEBUG = False' in settings.py breaks the login page in Chrome 84 - gives errors about the html mimetype and css refusals. Not sure if this is known or expected.
@ciur commented on GitHub (Aug 26, 2020):
Hi @dohlin, your Automates configuration looks correct. There is an issue indeed. At this point is difficult for me to track the problem. This is actually the real - there is no way at this point to track why automates missed certain documents. There should be some sort of UI log activity for the user. There is a duplicate issue #88
Copy paste thingy is another problem. This is how it works at this point.
And lastly the DEBUG=False => give errors about html mimetype... that is very strage. I will give it try and come back with details.
@mikkelnl commented on GitHub (Aug 26, 2020):
I'm also testing Papermerge as a possible upgrade from Perless ;-) and also found that Automates doesn't seem to work. I watched the console as I uploaded a new PDF, and found this error, which seems to point to automation?
@ciur commented on GitHub (Aug 29, 2020):
Hi guys, @mikkelnl, @dohlin I am investigating Automates related issues.
In order to makes it easier to track automates, I am adding so called "user logs". So that you will be able to track in UI directly main events like Automate run, Automate matched or there was a mismatch.
I will come back next week with more details.
@mikkelnl commented on GitHub (Aug 29, 2020):
Great, if there's anything I can do to test etc, let me know.
@ciur commented on GitHub (Aug 31, 2020):
ah, I found the issue! I made a stupid mistake! I automates were matched against hocr text, not text itself, which results in low rate matching. Anyway, I will fix this. As a bonus you will get UI logs where you will be able to follow the whole matching/mismatching process.
@ciur commented on GitHub (Sep 1, 2020):
I "fixed" automates issues.
There are some important changes:
In meantime I learned about another project - which inspired me to change initial design of automate plugins.
Instead user will be able to upload an yml file which will be sort of template describing data to extract. I will implement invioce2data approach in a later version of papermerge.
For now, I will leave automates as simple "match and move to destination folder".
In next version 1.5 - automates will enable user to assign tags to the matched documents.
The invoice2data approach of automatically extracting data from documents will be introduced in Papermerge 1.6.
Automates they are simpler now - but they work!