[GH-ISSUE #201] nvme drives will no longer initialize #47

Open
opened 2026-03-07 19:19:52 +03:00 by kerem · 2 comments
Owner

Originally created by @jerrodlv on GitHub (Apr 24, 2025).
Original GitHub issue: https://github.com/007revad/Synology_M2_volume/issues/201

Model: DS1019+
DSM 7.2.2-72806 Update 3
2x Samsung 970 EVO 512 GB (might actually be 2x 980's)

I used the scripts and setup an nvme storage pool, seemed to go great. I had been using the drives for read/write cache before that. After i setup the volume, I was doing some io testing using dd, and seeing quite slow iops with various tests. Set it down for a week or so while I was on vacation.

Today, I setup the nics to do link aggregation, and after getting that going (most likely coincidentally), the nvme storage pool showed as degraded. one of the drives was completely missing from DSM.
I chalked it up to a bad drive, explaining the poor iops test. But in an effort to troubleshoot, i powered down and re-seated both nvme drives. Boot back into DSM, now both drives are missing.

Not exactly sure how to troubleshoot, dmesg shows them failing to initialize:

dmesg | grep nvme
[    5.654591] nvme nvme0: pci function 0000:02:00.0
[    5.660408] nvme nvme1: pci function 0000:05:00.0
[   36.220492] nvme nvme0: Device not ready; aborting initialisation
[   36.220495] nvme nvme0: Removing after probe failure status: -19
[   36.229496] nvme nvme1: Device not ready; aborting initialisation
[   36.229499] nvme nvme1: Removing after probe failure status: -19

I had also setup some iscsi targets to these disks as I was going to test using this as shared storage for a proxmox cluster.

I saw another issue where the pool was degregated, and it was advised to run the syno HDD DB script. I tried that, and it indicates it found no nvme drives. Anyone else have similar issues?

Originally created by @jerrodlv on GitHub (Apr 24, 2025). Original GitHub issue: https://github.com/007revad/Synology_M2_volume/issues/201 Model: DS1019+ DSM 7.2.2-72806 Update 3 2x Samsung 970 EVO 512 GB (might actually be 2x 980's) I used the scripts and setup an nvme storage pool, seemed to go great. I had been using the drives for read/write cache before that. After i setup the volume, I was doing some io testing using dd, and seeing quite slow iops with various tests. Set it down for a week or so while I was on vacation. Today, I setup the nics to do link aggregation, and after getting that going (most likely coincidentally), the nvme storage pool showed as degraded. one of the drives was completely missing from DSM. I chalked it up to a bad drive, explaining the poor iops test. But in an effort to troubleshoot, i powered down and re-seated both nvme drives. Boot back into DSM, now both drives are missing. Not exactly sure how to troubleshoot, dmesg shows them failing to initialize: ``` dmesg | grep nvme [ 5.654591] nvme nvme0: pci function 0000:02:00.0 [ 5.660408] nvme nvme1: pci function 0000:05:00.0 [ 36.220492] nvme nvme0: Device not ready; aborting initialisation [ 36.220495] nvme nvme0: Removing after probe failure status: -19 [ 36.229496] nvme nvme1: Device not ready; aborting initialisation [ 36.229499] nvme nvme1: Removing after probe failure status: -19 ``` I had also setup some iscsi targets to these disks as I was going to test using this as shared storage for a proxmox cluster. I saw another issue where the pool was degregated, and it was advised to run the syno HDD DB script. I tried that, and it indicates it found no nvme drives. Anyone else have similar issues?
Author
Owner

@jerrodlv commented on GitHub (Apr 25, 2025):

Some research has led me to learn that there is a firmware bug in samsung 980's that are leading to premature death. Since i had these running in a cache, with almost identical uptime, I am wondering if they encountered some bug and both died (however unlikely this would seem..).

I ordered a couple of new drives try and replace them with, we'll see if that does the trick..

<!-- gh-comment-id:2829229107 --> @jerrodlv commented on GitHub (Apr 25, 2025): Some research has led me to learn that there is a firmware bug in samsung 980's that are leading to premature death. Since i had these running in a cache, with almost identical uptime, I am wondering if they encountered some bug and both died (however unlikely this would seem..). I ordered a couple of new drives try and replace them with, we'll see if that does the trick..
Author
Owner

@007revad commented on GitHub (Apr 25, 2025):

That was what I was thinking. It's actually common for NVMe drives to die one after the other (especially drives used as a cache).

<!-- gh-comment-id:2829562669 --> @007revad commented on GitHub (Apr 25, 2025): That was what I was thinking. It's actually common for NVMe drives to die one after the other (especially drives used as a cache).
Sign in to join this conversation.
No labels
bug
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/Synology_M2_volume#47
No description provided.