[GH-ISSUE #169] Refactor of document importing #132

Closed
opened 2026-02-25 21:31:16 +03:00 by kerem · 4 comments
Owner

Originally created by @francescocarzaniga on GitHub (Oct 14, 2020).
Original GitHub issue: https://github.com/ciur/papermerge/issues/169

I am leaving a few issues at the same time, but this is just the things a noticed in the last couple of days and a reminder for me if I want to do some PRs.

Document importing is the major cornerstone of Papermerge, but right now the code is scattered in a few places and it does not support plugins at all. One solution would be to make an importer app and move the DocumentImporter class there. Then I propose there should be a split with a new class DocumentProcessor. As subclasses DocumentValidator would check mimetypes (using python-magic and not file extension) and count pages, while DocumentImporter does the actual importing itself. They could also be merged into DocumentProcessor. Then such a class could be used both in the upload view and other importers to do all the heavy lifting. This would allow to add custom pre- and post- processing on the documents and go a long way to provide more pluggability.

This is just a proposal, the spirit is to make document importing more unified and modular, so a number of solutions is possible.

Originally created by @francescocarzaniga on GitHub (Oct 14, 2020). Original GitHub issue: https://github.com/ciur/papermerge/issues/169 I am leaving a few issues at the same time, but this is just the things a noticed in the last couple of days and a reminder for me if I want to do some PRs. Document importing is the major cornerstone of Papermerge, but right now the code is scattered in a few places and it does not support plugins at all. One solution would be to make an importer app and move the DocumentImporter class there. Then I propose there should be a split with a new class DocumentProcessor. As subclasses DocumentValidator would check mimetypes (using python-magic and not file extension) and count pages, while DocumentImporter does the actual importing itself. They could also be merged into DocumentProcessor. Then such a class could be used both in the upload view and other importers to do all the heavy lifting. This would allow to add custom pre- and post- processing on the documents and go a long way to provide more pluggability. This is just a proposal, the spirit is to make document importing more unified and modular, so a number of solutions is possible.
kerem closed this issue 2026-02-25 21:31:16 +03:00
Author
Owner

@ciur commented on GitHub (Oct 15, 2020):

@francescocarzaniga, description is too generic. Maybe you can be more specific.

<!-- gh-comment-id:709431035 --> @ciur commented on GitHub (Oct 15, 2020): @francescocarzaniga, description is too generic. Maybe you can be more specific.
Author
Owner

@francescocarzaniga commented on GitHub (Oct 17, 2020):

I guess this and #167 stem from the same problem, lack of plugin support. This issue is a suggestion to change the document importing system to be more modular and plugin-friendly.
While I think this may not be urgent, the sooner you standardise a plugin system and document it the quicker people will get on board. If you truly want Papermerge to be plugin centric, then more people working on plugins => more features => bigger userbase.

<!-- gh-comment-id:711040630 --> @francescocarzaniga commented on GitHub (Oct 17, 2020): I guess this and #167 stem from the same problem, lack of plugin support. This issue is a suggestion to change the document importing system to be more modular and plugin-friendly. While I think this may not be urgent, the sooner you standardise a plugin system and document it the quicker people will get on board. If you truly want Papermerge to be plugin centric, then more people working on plugins => more features => bigger userbase.
Author
Owner

@ciur commented on GitHub (Oct 18, 2020):

@francescocarzaniga

lack of plugin support.

I am working on it ... :). In future Papermerge will be apps (app=plugin in django parlance) oriented.

<!-- gh-comment-id:711131839 --> @ciur commented on GitHub (Oct 18, 2020): @francescocarzaniga > lack of plugin support. I am working on it ... :). In future Papermerge will be apps (app=plugin in django parlance) oriented.
Author
Owner

@ciur commented on GitHub (Feb 28, 2021):

@francescocarzaniga
Document Pipelines are now part of Papermerge.

<!-- gh-comment-id:787441582 --> @ciur commented on GitHub (Feb 28, 2021): @francescocarzaniga [Document Pipelines](https://github.com/papermerge/papermerge-core/blob/master/papermerge/core/import_pipeline.py) are now part of Papermerge.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#132
No description provided.