[PR #33] [MERGED] WIP: Implementing backup and restore feature #534

Closed
opened 2026-02-25 21:32:10 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ciur/papermerge/pull/33
Author: @frenos
Created: 7/11/2020
Status: Merged
Merged: 7/13/2020
Merged by: @ciur

Base: masterHead: backup_restore


📝 Commits (7)

  • e246ccf start implementing backup and restore feature
  • 6f71198 implement backup management command
  • d2929c0 add unit test to backup command
  • b8c4d99 add more testing to backup
  • 44a117d add restore command
  • 5b3a218 also backup/restore lang, add unit test for restore
  • 33d3adc Merge branch 'master' into backup_restore

📊 Changes

6 files changed (+337 additions, -1 deletions)

View changed files

papermerge/core/backup_restore.py (+132 -0)
papermerge/core/management/commands/backup.py (+23 -0)
papermerge/core/management/commands/restore.py (+38 -0)
papermerge/test/data/testdata.tar (+0 -0)
papermerge/test/test_backup_restore.py (+143 -0)
📝 run_tests.sh (+1 -1)

📄 Description

Proposed Feature:

The admin user should be able to easily export all the documents with their respective metadata. Later he should be able to import the documents again. Short: A general backup and restore feature.

To first introduce this feature and because of changing UI in the near future I propose to start off with two custom manage.py commands. One allows you to backup everything to an archive, another one allows you to restore everything from such an archive.

Planned Implementation Detail:

We not only need the original document itself but also the metadata. For this I propose to create a backup.json that contains these information about all the documents in the current backup.

Imagine this structure in papermerge:

#root
|---receipts
|  |----aldi.pdf
|  |----lidl.pdf
|---letters
   |----bank1.pdf
   |----bank2.pdf

The backup-archive will have the same relative structure and additionally contain a file backup.json in the root.
This file has a few meta-keys at the start and then contains a list of documents.
Example:

{
  "version": "1.3.0",
  "created": "11.07.20 06:55",
  
  "documents": [
    {
      'path': 'receipts/aldi.pdf',
      'language': 'deu',
      'notes': "This is a receipt",
    },
        {
      'path': 'receipts/lidl.pdf',
      'language': 'deu',
      'notes': "This is also a receipt",
    }
    ...
  ]
}

The top-level keys are required, for the document-obj only the path is required. All other keys that are not set use the default value (e.g. empty string). Further keys that are set and not understood by the restore are ignored. That creates a certain backwards compatibility. Think for example papermerge supports tags in the future and you export them, then you can still import that in to the current version and at least get the features supported now working.

Todo:

  • Export documents to archive
  • Create backup.json and add to archive
  • Import documents from archive
  • Documentation

Future perspective:

I will try to make this first implementation as generic and plugable as possible. In the future we could improve by making it work in the worker and add it to the UI.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ciur/papermerge/pull/33 **Author:** [@frenos](https://github.com/frenos) **Created:** 7/11/2020 **Status:** ✅ Merged **Merged:** 7/13/2020 **Merged by:** [@ciur](https://github.com/ciur) **Base:** `master` ← **Head:** `backup_restore` --- ### 📝 Commits (7) - [`e246ccf`](https://github.com/ciur/papermerge/commit/e246ccf169dd4a6d7356090daf66205d366cad8a) start implementing backup and restore feature - [`6f71198`](https://github.com/ciur/papermerge/commit/6f71198432ea488a274d41dfdbb537efb9effc0a) implement backup management command - [`d2929c0`](https://github.com/ciur/papermerge/commit/d2929c048bef73f579e7c95a6d65b05be2049bbb) add unit test to backup command - [`b8c4d99`](https://github.com/ciur/papermerge/commit/b8c4d9943d8f892cad54c69b33d74fa8bdf096a8) add more testing to backup - [`44a117d`](https://github.com/ciur/papermerge/commit/44a117d8511d38561e149fbea1b949c797a3cf2c) add restore command - [`5b3a218`](https://github.com/ciur/papermerge/commit/5b3a2189a61d878d5fe88252d5103f8b191c4e8b) also backup/restore lang, add unit test for restore - [`33d3adc`](https://github.com/ciur/papermerge/commit/33d3adc70abaff62eb86cfb8889e86e0f8919d71) Merge branch 'master' into backup_restore ### 📊 Changes **6 files changed** (+337 additions, -1 deletions) <details> <summary>View changed files</summary> ➕ `papermerge/core/backup_restore.py` (+132 -0) ➕ `papermerge/core/management/commands/backup.py` (+23 -0) ➕ `papermerge/core/management/commands/restore.py` (+38 -0) ➕ `papermerge/test/data/testdata.tar` (+0 -0) ➕ `papermerge/test/test_backup_restore.py` (+143 -0) 📝 `run_tests.sh` (+1 -1) </details> ### 📄 Description # Proposed Feature: The admin user should be able to easily export all the documents with their respective metadata. Later he should be able to import the documents again. Short: A general backup and restore feature. To first introduce this feature and because of changing UI in the near future I propose to start off with two custom manage.py commands. One allows you to backup everything to an archive, another one allows you to restore everything from such an archive. # Planned Implementation Detail: We not only need the original document itself but also the metadata. For this I propose to create a backup.json that contains these information about all the documents in the current backup. Imagine this structure in papermerge: <pre> #root |---receipts | |----aldi.pdf | |----lidl.pdf |---letters |----bank1.pdf |----bank2.pdf </pre> The backup-archive will have the same relative structure and additionally contain a file `backup.json` in the root. This file has a few meta-keys at the start and then contains a list of documents. Example: ```javascript { "version": "1.3.0", "created": "11.07.20 06:55", "documents": [ { 'path': 'receipts/aldi.pdf', 'language': 'deu', 'notes': "This is a receipt", }, { 'path': 'receipts/lidl.pdf', 'language': 'deu', 'notes': "This is also a receipt", } ... ] } ``` The top-level keys are required, for the document-obj only the path is required. All other keys that are not set use the default value (e.g. empty string). Further keys that are set and not understood by the restore are ignored. That creates a certain backwards compatibility. Think for example papermerge supports tags in the future and you export them, then you can still import that in to the current version and at least get the features supported now working. # Todo: - [x] Export documents to archive - [x] Create backup.json and add to archive - [x] Import documents from archive - [ ] Documentation # Future perspective: I will try to make this first implementation as generic and plugable as possible. In the future we could improve by making it work in the worker and add it to the UI. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-25 21:32:10 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/papermerge#534
No description provided.