mirror of
https://github.com/RD17/ambar.git
synced 2026-04-25 07:25:55 +03:00
[GH-ISSUE #175] Adding custom tagging rules #173
Labels
No labels
$$ Paid Support
bug
bug
enhancement
help wanted
invalid
pull-request
question
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ambar#173
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @musical10441 on GitHub (Jul 28, 2018).
Original GitHub issue: https://github.com/RD17/ambar/issues/175
Good afternoon -
I wanted to add a custom rule to detect US Social Security Numbers and tag the documents as "SSN". Here are the steps I took:
as well as:
My next step was to remove the instance again and modify the AutoTagging.py file again:
Notice that in this case all I did was change the name of the archive tag to ensure that I wasn't using any evaluation logic and that I was just trying to get the tag name to take affect.
I then uploaded a zip file containing an image document. The tag "archive" (not "archive-test") was added.
I would think that modifying an existing tag with a new name without changing any logic would work but it didn't. Is there another place I should be looking?
Thanks for your help and what a great tool you've created!
@sochix commented on GitHub (Jul 31, 2018):
Hi, did you create a new docker image with your changes? Or where did you edit autotagging.py?
@musical10441 commented on GitHub (Jul 31, 2018):
I edited my changes at the ambar-master folder and then ran docker-compose
up -d to create a new docker application.
On Tue, Jul 31, 2018 at 10:22 AM Ilya Pirozhenko notifications@github.com
wrote:
--
Oran Sears
703-928-0923
@sochix commented on GitHub (Jul 31, 2018):
@musical10441 you need to build a new pipeline image with your changes and then edit your docker-compose file to referenece new pipeline image
@musical10441 commented on GitHub (Aug 1, 2018):
Thank you. That got me much further. I'm now able to see the changes I made to the names of the default ocr and archive tags.
Now that I know the changes are taking affect, I am using the following code in the autotagger.py to call the Regular expression.
My goal is to pass the contents of the document to the PIIParser and return true if there is a match in the document.
When I pass the hard coded value using if PIIParser.MatchCC('4485003891627515'): it works as expected, but when I try passing the document content using if PIIParser.MatchCC(AmbarFile['content']['text']): it does not return true. The document is a text file with only the credit card number (it's fake btw).
Am I correct in trying to pass (AmbarFile['content']['text']) or should I be passing something else?
Thanks in advance!
@sochix commented on GitHub (Aug 1, 2018):
Yes, everything is correct, I don't see any error. Can you please log the AmbarFile['content']['text'], and check what it contains?
@musical10441 commented on GitHub (Aug 1, 2018):
Logging the content text returns the content, so it is passing it properly. I will keep working on it.
By the way, building a new pipeline image gives errors in the log:
/envs/plarin-3.7.0a4/lib/python3.7/site-packages/pika/adapters/libev_connection.py", line 106
self.async = None
Changing the Requirements.txt to pika==0.12.0 resolves the issue.
@musical10441 commented on GitHub (Aug 1, 2018):
It seems to be working now. I made sure to change the requirements.txt per my prior post before building the new pipeline image. I removed the python3 image and the pipeline image and then ran docker-compose again.
Thanks for your help!