[GH-ISSUE #95] Wait and retry if VM is locked #88

Closed
opened 2026-02-26 17:44:21 +03:00 by kerem · 5 comments
Owner

Originally created by @dani on GitHub (Dec 9, 2024).
Original GitHub issue: https://github.com/Corsinvest/cv4pve-autosnap/issues/95

What happened?

I'm using autosnap on a 10 nodes cluster with ~100 VM running. It mostly works great, but sometimes, it fails when a VM which should be snapshotted is locked. In my case, the lock usually from one of these cases

  • A backup is running
  • Another cv4pve-autosnap is running (eg daily snap vs hourly snap)

This cluster is running on Ceph, which might not be the fastest to create snapshots (although it has more than decent perf globally, 40 NVMe OSD with a 2x25Gbps dedicated network)

I tried to spread the various jobs at different times (eg, hourly runs each hour past 4min, daily at 00:08, weekly at 00:12 on sunday etc. and backups only starts at xxh20), but I still have errors from time to time

Expected behavior

cv4pve-autosnap could wait and retry later if a VM is locked

Relevant log output

No response

Proxmox VE Version

8.2.8

Version (bug)

1.1.11

Version (working)

No response

On what operating system are you experiencing the issue?

Linux

Pull Request

  • I would like to do a Pull Request
Originally created by @dani on GitHub (Dec 9, 2024). Original GitHub issue: https://github.com/Corsinvest/cv4pve-autosnap/issues/95 ### What happened? I'm using autosnap on a 10 nodes cluster with ~100 VM running. It mostly works great, but sometimes, it fails when a VM which should be snapshotted is locked. In my case, the lock usually from one of these cases - A backup is running - Another cv4pve-autosnap is running (eg daily snap vs hourly snap) This cluster is running on Ceph, which might not be the fastest to create snapshots (although it has more than decent perf globally, 40 NVMe OSD with a 2x25Gbps dedicated network) I tried to spread the various jobs at different times (eg, hourly runs each hour past 4min, daily at 00:08, weekly at 00:12 on sunday etc. and backups only starts at xxh20), but I still have errors from time to time ### Expected behavior cv4pve-autosnap could wait and retry later if a VM is locked ### Relevant log output _No response_ ### Proxmox VE Version 8.2.8 ### Version (bug) 1.1.11 ### Version (working) _No response_ ### On what operating system are you experiencing the issue? Linux ### Pull Request - [ ] I would like to do a Pull Request
kerem 2026-02-26 17:44:21 +03:00
Author
Owner

@franklupo commented on GitHub (Dec 9, 2024):

If the vm is busy with other operations you can't run snap, even from web GUI. Wait how long?
Why should multiple cv4pve-autosnaps overlap?

<!-- gh-comment-id:2528022970 --> @franklupo commented on GitHub (Dec 9, 2024): If the vm is busy with other operations you can't run snap, even from web GUI. Wait how long? Why should multiple cv4pve-autosnaps overlap?
Author
Owner

@dani commented on GitHub (Dec 9, 2024):

I have overlaps because I run several cron, one for each snapshot label, eg

*/30 * * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label frequently --keep 6 --only-running || echo "autosnap failed"
4 * * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label hourly --keep 6 --only-running || echo "autosnap failed"
8 0 * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label daily --keep 4 --only-running || echo "autosnap failed"
12 0 * * 0 root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label weekly --keep 2 --only-running || echo "autosnap failed

On rare occasions, the first job (frequently) is still running when the second (hourly) starts. There's also cases where backups are running (say I'm starting the backup at 20:20, then at 20:30, when the frequently cv4pve-autosnap is triggered, backups are not always done, holding a lock on VM)

<!-- gh-comment-id:2528062349 --> @dani commented on GitHub (Dec 9, 2024): I have overlaps because I run several cron, one for each snapshot label, eg ``` */30 * * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label frequently --keep 6 --only-running || echo "autosnap failed" 4 * * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label hourly --keep 6 --only-running || echo "autosnap failed" 8 0 * * * root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label daily --keep 4 --only-running || echo "autosnap failed" 12 0 * * 0 root systemd-cat /usr/local/bin/cv4pve-autosnap @/etc/pve/priv/autosnap/common.conf snap --label weekly --keep 2 --only-running || echo "autosnap failed ``` On rare occasions, the first job (frequently) is still running when the second (hourly) starts. There's also cases where backups are running (say I'm starting the backup at 20:20, then at 20:30, when the frequently cv4pve-autosnap is triggered, backups are not always done, holding a lock on VM)
Author
Owner

@franklupo commented on GitHub (Dec 9, 2024):

yes, even backups fail if a snap is being created. You could use a script that does not take snapshots when there are backups, or take snapshots at the end of the backup.

https://git.proxmox.com/?p=pve-manager.git;a=blob;f=vzdump-hook-script.pl;h=a93eeec80bd09128e70a4a9775438ab658da2191;hb=refs/heads/master

<!-- gh-comment-id:2528071896 --> @franklupo commented on GitHub (Dec 9, 2024): yes, even backups fail if a snap is being created. You could use a script that does not take snapshots when there are backups, or take snapshots at the end of the backup. https://git.proxmox.com/?p=pve-manager.git;a=blob;f=vzdump-hook-script.pl;h=a93eeec80bd09128e70a4a9775438ab658da2191;hb=refs/heads/master
Author
Owner

@franklupo commented on GitHub (Jan 3, 2025):

news?

<!-- gh-comment-id:2568923779 --> @franklupo commented on GitHub (Jan 3, 2025): news?
Author
Owner

@dani commented on GitHub (Jan 3, 2025):

As a workaround, I now disable autosnap during the backup window

<!-- gh-comment-id:2568943208 --> @dani commented on GitHub (Jan 3, 2025): As a workaround, I now disable autosnap during the backup window
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/cv4pve-autosnap#88
No description provided.