[GH-ISSUE #75] Feature Request:- Database Crawler #75

Closed
opened 2026-02-27 15:54:51 +03:00 by kerem · 4 comments
Owner

Originally created by @OutbackMatt on GitHub (Sep 13, 2017).
Original GitHub issue: https://github.com/RD17/ambar/issues/75

I'd love to see a database crawler for different database types. I deal with databases that store many tens of thousands of pdf or jpg or tiff as blobs, and I'd love to have a full OCR crawl performed of these documents / images.

Predominantly the database used is MS SQL Server 2008R2 or newer.

I love the concept of what you are doing here
**Just adding this to the wish list... :)

Originally created by @OutbackMatt on GitHub (Sep 13, 2017). Original GitHub issue: https://github.com/RD17/ambar/issues/75 I'd love to see a database crawler for different database types. I deal with databases that store many tens of thousands of pdf or jpg or tiff as blobs, and I'd love to have a full OCR crawl performed of these documents / images. Predominantly the database used is MS SQL Server 2008R2 or newer. I love the concept of what you are doing here **Just adding this to the wish list... :)
kerem 2026-02-27 15:54:51 +03:00
Author
Owner

@sochix commented on GitHub (Sep 29, 2017):

Hi @OutbackMatt ! Thanks for the post, but I think we will not create this type of crawler in near future as it requires a lot of work...

<!-- gh-comment-id:333107257 --> @sochix commented on GitHub (Sep 29, 2017): Hi @OutbackMatt ! Thanks for the post, but I think we will not create this type of crawler in near future as it requires a lot of work...
Author
Owner

@OutbackMatt commented on GitHub (Sep 29, 2017):

OK, so is there plans for a command line interface (CLI) or another interface so that we can programaticly send documents to be processed ad hoc.

At the moment it looks like I could create a 'drop folder', and send documents to the drop folder, and call a cron to assess all documents in this 'drop folder', and then remove the documents. That's OK, but a CLI or API would also be useful.

<!-- gh-comment-id:333239265 --> @OutbackMatt commented on GitHub (Sep 29, 2017): OK, so is there plans for a command line interface (CLI) or another interface so that we can programaticly send documents to be processed ad hoc. At the moment it looks like I could create a 'drop folder', and send documents to the drop folder, and call a cron to assess all documents in this 'drop folder', and then remove the documents. That's OK, but a CLI or API would also be useful.
Author
Owner

@sochix commented on GitHub (Oct 1, 2017):

@OutbackMatt you can use our REST API to upload files directly to Ambar. Check the documentation here https://github.com/RD17/ambar/blob/master/API_DOC.md#upload-file and the guide in our blog https://blog.ambar.cloud/ambar-use-case-integrated-parse-and-search-solution/

<!-- gh-comment-id:333365245 --> @sochix commented on GitHub (Oct 1, 2017): @OutbackMatt you can use our REST API to upload files directly to Ambar. Check the documentation here https://github.com/RD17/ambar/blob/master/API_DOC.md#upload-file and the guide in our blog https://blog.ambar.cloud/ambar-use-case-integrated-parse-and-search-solution/
Author
Owner

@sochix commented on GitHub (Apr 19, 2018):

Check our support options

<!-- gh-comment-id:382659037 --> @sochix commented on GitHub (Apr 19, 2018): [Check our support options](https://github.com/RD17/ambar#support)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ambar#75
No description provided.