mirror of
https://github.com/007revad/Synology_SMART_info.git
synced 2026-04-25 08:05:48 +03:00
[GH-ISSUE #33] Enable syno_smart_info to detect and report SMART value changes #68
Labels
No labels
enhancement
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/Synology_SMART_info#68
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @framps on GitHub (Oct 1, 2025).
Original GitHub issue: https://github.com/007revad/Synology_SMART_info/issues/33
Originally assigned to: @007revad on GitHub.
I run your script on my system and get an eMail notification only if there are any SMART values reported. This works great.
I also installed your script on four DS1821+ and right now every time all SMART reports are sent via eMail. This will be changed to send a report only if any SMART values are out of bounds. Unfortunately on one system the UDMA counts are high because the disks were used in a DX which had power issues and therefore increased UDMA counters.
It would be great to (1) get any notification only if existing SMART values increase. This way the UDMA values of disks caused by power instabilities on a DX and now used in another DS pool wouldn't be reported any more.
As a mitigation (2) for me it would make sense to add an option which causes the script to report only the SMART values for a disk where some values exceed any limits and no longer report all SMART values of all disks. This indeed will need much more coding than (1).
Just my thoughts ... maybe there is a chance to get either (1) or (2) implemented?
@007revad commented on GitHub (Oct 1, 2025):
I think (2) would actually take a lot less coding than (1).
One of the Xpenology developers uses my syno_smart_info.sh script in their SynoSmartInfo package.
https://github.com/PeterSuh-Q3/SynoSmartInfo
A month ago they added check_udma_crc.sh which logs UDMA CRC errors for Seagate HDDs and sends a message to telegram when the UDMA CRC error count increases. I don't know why they only log Seagate drives, or if they intend adding logging for other important attributes as well
I could adapt their code to log all of the important SMART attributes and only exit with an error code (to send the email) if those attributes change.
@framps commented on GitHub (Oct 2, 2025):
Another option would be to use a config file which allows to set the limits for any SMART value. That way it's possible to define the actual UDMA value and a warning is created if this value exceeds this defined value. If a SMART value is not defined in the config file the defaults are used. This will need more effort than (2) but less than (1).
The config file may just contain lines with the SMART names and the threshold - no complicated JSON, YAML or other structured format.
@007revad commented on GitHub (Oct 2, 2025):
I could use Synology's own tools that read and write ini files. So the attributes log file would contain something like:
And if any of the values for a drive's important attributes have increased from what's in the ini file I update the ini file and send the email for that drive listing the SMART attributes that have increased.
@007revad commented on GitHub (Oct 3, 2025):
Actually, the script would have to send the email which makes it a lot more complicated. Currently the script finishes with an exit code greater than zero to make task scheduler send the email, which contains the whole output of the script.
The easiest solution is to make the script only finish with an exit code greater than zero if any drive's important smart attributes have increased.
At the end of script it could output which drive(s) attributes have increased, like:
@framps commented on GitHub (Oct 3, 2025):
Great idea. Less typing required and less chance to create a typo
Agree. Just pass the email burdon to the task manager 😄
How do you want to control whether all disks are reported, as it's done right now, and enable the report for error disks only? Indirect if the threshold file exists or with a new option?
@007revad commented on GitHub (Oct 4, 2025):
I was going to report all disks as it's done now, but append a message at the end showing which drive's attributes increased and which attribute. This is definitely the easiest to do.
Another option I've been considering is:
This way I can control what gets emailed when the important attributes have increased.
Once I have it working I might change the temp files to arrays.
Or maybe I can use the 2nd temp file and if it exists clear the screen and cat the 2nd temp file. I don't know if that would work with storage manager. I might test that now.
@007revad commented on GitHub (Oct 4, 2025):
Task scheduler complains about the
clearcommand:TERM environment variable not set.It looks like task scheduler has it's own terminal.@007revad commented on GitHub (Oct 4, 2025):
I tried add
export TERM-xtermat the top of the script and it still doesn't like theclearcommand:So it looks like I'll have to redirect stdout.
@007revad commented on GitHub (Oct 5, 2025):
I'm getting there. Synology's
set_section_key_valuecommand has a few bugs that took a while to work around.The script enters each drive's serial number, drive number, model and device. Then if any important smart attributes are greater than 0 that attrubute name and value get logged.
I'm testing with 2 HDDs that I know have errors.
@framps commented on GitHub (Oct 5, 2025):
Looks pretty cool 👍
@007revad commented on GitHub (Oct 5, 2025):
I thought I might as well use the logged info for the regular output as well 😃
@framps commented on GitHub (Oct 5, 2025):
Great idea 👍
Maybe report the increase amount and the old amount like
Increased by 17348 from 175or some similar format? I think the increase amount is much more important than the old value.@007revad commented on GitHub (Oct 5, 2025):
You can see 17523 has increased from 175
I'm working on only showing the changed attributes. I've still got work to do but I'm close.
@framps commented on GitHub (Oct 5, 2025):
Yes. My point is, the increase amount is most probably more important than the old value. If an increase amount is high that's a serious issue whereas a small increase isn't that serious in most cases. That's why I suggest to report either the increase amount and the old value or the increase amount only. JM2C.
@007revad commented on GitHub (Oct 5, 2025):
Oh, so you want me to do some math 😄
So instead of "17523 Increased from 175" it would be "17523 Increased by 17348".
I've just made that change so it will now show "17523 Increased by 17348".
@framps commented on GitHub (Oct 5, 2025):
Much better than my proposed message format 😉
@007revad commented on GitHub (Oct 5, 2025):
Try v1.4.21-RC
To test it run the script once so it creates and populates the smart.log, which will be in the same folder as the script. Then you can reduce or delete the values in smart.log and run the scheduled task again.
@007revad commented on GitHub (Oct 6, 2025):
Try v1.4.22-RC instead. It also shows when important SMART attributes have decreased (like current pending sectors). I also improved the formatting so all the "Increased" or "Decreased" line up better.
To test it run the script once so it creates and populates the smart.log, which will be in the same folder as the script. Then you can reduce or delete the values in smart.log and run the scheduled task again.
@framps commented on GitHub (Oct 6, 2025):
Thank you very much for 1.4.22-RC.
That's what I did in order to get informed about an increased value:
Initially I got following contents in smart.log when I used option -i.
For my drive 3 I get following smart values:
Then I added one line for Raw_Read_Error_Rate to simulate a smaller rate.
But I don't get any warning about an increase Raw_Read_Error_Rate 🤔
Looks like there is some misunderstanding on my side. What do I do wrong?
PS: I noticed you now use some python code inside your script. 👍
@007revad commented on GitHub (Oct 6, 2025):
I see the problem. Lines 593, 607 and 636 were missing
zeroat the end.Fixed in v1.4.23
The python code was added by PeterSuh-Q3 for his Synosmartinfo package. The package is really nice. Though it's still using syno_smart_info.sh v1.3.15.
I updated the installed package on my Synology to add the
-ioption and make sure none of my recent script changes had broken anything for the package.@007revad commented on GitHub (Oct 7, 2025):
v1.4.23 had a bug where the first time an attribute with a value greater than zero was added to the log it was displayed as having increased.
Fixed in v1.4.24
@framps commented on GitHub (Oct 7, 2025):
1.4.24 works great now 👍
I did some additional testing with invalid data 🤡 (I think it doesn't make sense to write a parser but makes sense to check for and report common typos) and had following findings:
=or=misses it's ignored. Maybe interpret an empty argument as zero? That way a user gets an indication about the typo.abcit looks like the actual value is reported as the increased value. Same as in (2) ?So I'm fine with the current implementation. Again: Great work 👍Thank you very much.
Another idea popped up when I did some testing but question is whether there is actually need for this feature. It will definitely require much more coding 😟
The temperature and power on hours are not monitored. Not sure whether there will be need to check whether a value gets greater or lower than a defined threshold. For example it may be interesting for the temperature or power on hours. It also may be interesting for folks who don't care about an increase of a raw error read rate until it exceeds a defined threshold. Maybe just wait until somebody asks for this feature?
@framps commented on GitHub (Oct 7, 2025):
I support somebody who has four DS1821+ and an attached DX517. Unfortunately I don't have access rights right now to execute tests. But I'm going to update your script on the systems and will execute tests 😉
@PeterSuh-Q3 commented on GitHub (Oct 12, 2025):
You already analyzed my check_udma_crc.sh script last week.
It's not a script specifically for Seagate HDDs.
I only distinguished Seagate HDDs by the different syntax of the "Device Model"/"Serial Number" text. It works fine for other HDDs as well.
I think you can close my idea from yesterday's discussion.
It seems you've already covered what I suggested and are ahead of the curve. ^^
https://github.com/007revad/Synology_SMART_info/discussions/37
I also tested your final script to see how error values are recorded on disks experiencing errors.
As expected, SAS disks lack sufficient information, so I can't gain anything.
@007revad commented on GitHub (Oct 12, 2025):
I wasn't aware that the smartctl output for SAS HDDs was so different to SATA HDDs. Is the output for SAS SSDs the same as for SATA SSDs?
I can monitor "Elements in grown defect list" and possibly "Total uncorrected errors" from the Error counter log.
@007revad commented on GitHub (Oct 12, 2025):
What does the following output?
smartctl7 -H -d scsi -T permissive /dev/sata5And this?
smartctl7 -A -d scsi /dev/sata5And the smartctl v6 outputs:
smartctl -H -d scsi -T permissive /dev/sata5smartctl -A -d scsi /dev/sata5@PeterSuh-Q3 commented on GitHub (Oct 12, 2025):
@PeterSuh-Q3 commented on GitHub (Oct 12, 2025):
This is something you should consider after all your tests are complete.
If the -i option detects an error and sends an email notification, wouldn't that only happen when the user performs the test?
Shouldn't it detect errors periodically and send email notifications to the user?
My udma-crc-check uses a timer and service, as shown below, to check every hour and notify the user via Telegram when an increase in the ID:199 UDMA CRC count is detected.
https://github.com/PeterSuh-Q3/tcrp-addons/tree/main/udma-crc-check/auxfiles
https://github.com/PeterSuh-Q3/tcrp-addons/blob/main/udma-crc-check/src/install.sh
Shouldn't this kind of periodic detection be necessary?
@PeterSuh-Q3 commented on GitHub (Oct 12, 2025):
If you want to see it in more detail, you can see it like this.
@PeterSuh-Q3 commented on GitHub (Oct 12, 2025):
I tried using the third-party SMART utility provided by Seagate, but it didn't provide the information I wanted for SAS disks. I'll continue to investigate to see if there are any alternatives.
https://github.com/Seagate/ToolBin/blob/master/openSeaChest/bin-build/22.07.26/Lin64/openSeaChest_SMART
@PeterSuh-Q3 commented on GitHub (Oct 12, 2025):
I used this tool to find information matching ID:199, but I'm not sure how to use it. Apparently, these three items with numbers greater than 0 correspond to ID:199.
@007revad commented on GitHub (Oct 12, 2025):
The email gets sent by task scheduler, and only if the user has enabled "Send run details by email" and "Send run details only when the script terminates abnormally" for the scheduled task, and enabled email notifications in Control Panel.
Some people who downloaded the script schedule it to run every day. People who have installed your package could schedule
/var/packages/Synosmartinfo/target/bin/syno_smart_info.sh -i@007revad commented on GitHub (Oct 12, 2025):
Maybe Seagate adds "Invalid Dword Count", "Running Disparit Error Count" and "Loss of Dword Snchronization Count" together for the UDMA CRC Error Count.
@007revad commented on GitHub (Oct 12, 2025):
@PeterSuh-Q3
Can you try this one: syno_smart_info.zip
Please run it as
syno_smart_info.shandsyno_smart_info.sh -iAnd show me what output looks like for one of the SAS drives. And the contents of smart.log after running
syno_smart_info.sh -i@PeterSuh-Q3 commented on GitHub (Oct 12, 2025):
@007revad commented on GitHub (Oct 13, 2025):
@PeterSuh-Q3
Can you try this one: syno_smart_info2.zip
Again, please run it as
syno_smart_info.shandsyno_smart_info.sh -iAnd show me what output looks like for one of the SAS drives. And the contents of smart.log after running
syno_smart_info.sh -i@PeterSuh-Q3 commented on GitHub (Oct 13, 2025):
Coincidentally, my SAS system was running the version 2 script test at the same time the reservation ended.
After this ended, my SAS system (NAS5) became unable to post.
I will diagnose or replace the board tomorrow and then retake the test.
@PeterSuh-Q3 commented on GitHub (Oct 14, 2025):
@007revad commented on GitHub (Oct 14, 2025):
It looks good apart from the many, many blank lines.
You can delete the
in=lines from smart.log@007revad commented on GitHub (Oct 14, 2025):
One more test please. Just run this one as
syno_smart_info.sh. It should not show all those blank lines.syno_smart_info3.zip
It should look like this:
@PeterSuh-Q3 commented on GitHub (Oct 14, 2025):
Isn't smart.log just a simple log? Does it even have a config function that references certain settings?
@007revad commented on GitHub (Oct 14, 2025):
smart.log is just a simple log. The
in=are harmless, but they should never have been added to the log.@PeterSuh-Q3 commented on GitHub (Oct 15, 2025):
@PeterSuh-Q3 commented on GitHub (Oct 15, 2025):
There's typo
https://github.com/PeterSuh-Q3/Synology_SMART_info/blob/main/syno_smart_info.sh#L677
@PeterSuh-Q3 commented on GitHub (Oct 15, 2025):
Here are the results after removing the typos and testing again.
The results of smart.log are the same.
@PeterSuh-Q3 commented on GitHub (Oct 15, 2025):
./syno_smart_info.sh -a
@007revad commented on GitHub (Oct 15, 2025):
Somehow with syno_smart_info3.zip we've lost "Elements in grown defect list" from smart.log for the SAS drives.
Instead of
@framps commented on GitHub (Oct 16, 2025):
@007revad I just tested 1.4.24 and noticed, in contrast to the last time I ran the script, the initial run with no existing smart.log, doesn't detect any already existing errors any more. smart.log then has the detected error count recorded and any changes are successfully reported in future runs.
@007revad commented on GitHub (Oct 16, 2025):
@framps Can you provide screenshots of the shell output.
@framps commented on GitHub (Oct 16, 2025):
As you can see the ErrorRate was added in smart.log
When I now decrement the ErrorRate in smart.log by one I get
@007revad commented on GitHub (Oct 16, 2025):
How did it look before?
@framps commented on GitHub (Oct 16, 2025):
I don't have the old version any more to reproduce the message. But the first run without a smart.log reported as far as I remember
@007revad commented on GitHub (Oct 16, 2025):
You said 1.4.24 works great now 10 days ago
I don't understand what contrast you are referring to.
@007revad commented on GitHub (Oct 16, 2025):
And caused task scheduler to send an email saying Raw_Read_Error_Rate had Increased,
That was changed because those errors had not increased as the errors existed before the smart.log was created.
@framps commented on GitHub (Oct 16, 2025):
I'm sorry. Looks like I thought I tested this but actually didn't do 😔
It's just the first use of option
-iwhich doesn't report there is already an issue. In a strict sense option-ireports changes only and if you start with an existing event it's not a change. All future changes then will be reported as changes.But from an intuitive feeling even this initial event should be reported as a change. JM2C
For me the current implementation is fine. Maybe it's unexpected behavior for other users of the script. Maybe it's sufficient just to document this behavior 😉
@007revad commented on GitHub (Oct 16, 2025):
If a drive's serial number is not in smart.log and that drive has important attributes that should be zero but aren't I could show it without the "Increased by" text.
So if it's the first run, or a newly added drive, it would show:
Instead of:
@framps commented on GitHub (Oct 17, 2025):
👍 This will make sure already existing smart issues are not overlooked.
@007revad commented on GitHub (Oct 18, 2025):
This is what I've come up with:
First run with -i or --increased option when some drives have important attributes greater than zero.

Second run with -i or --increased option with the same drives.

I'll upload v1.4.25-RC after some more testing.
@007revad commented on GitHub (Oct 20, 2025):
v1.4.25 has been released.
-i, --increasedoption to only show drives with important attributes that have changed since last time the script was run.-i, --increasedoption monitors "Elements in grown defect list"./dev/sata1or/dev/sdaetc).-1, --increasedoption.@007revad commented on GitHub (Oct 20, 2025):
@PeterSuh-Q3
Before your workflow downloads the latest syno_smart_info.sh you should edit
api.cgiandindex.htmlto add the-ioption.api.cgi
indxex.html
@PeterSuh-Q3 commented on GitHub (Oct 20, 2025):
Your request has been completed as follows.
Since the capture is in a VMware environment, very little information is visible.
https://github.com/PeterSuh-Q3/SynoSmartInfo/releases/tag/v1.2.6
@framps commented on GitHub (Oct 20, 2025):
v1.4.25 works perfectly for me
@007revad commented on GitHub (Oct 21, 2025):
@PeterSuh-Q3
I'm confused. Why is the -i option printing the SAS (scsi) SMART header?
Does this
return "SMART Health Status: OK" ?
@PeterSuh-Q3 commented on GitHub (Oct 22, 2025):
As mentioned above, the captured image simply checks whether the -i option works.
Did the virtual environment test confuse you?
Do you want the results from the bare metal (NAS5) you were testing on?
@007revad commented on GitHub (Oct 22, 2025):
Yes 😄 It makes sense now.
@framps commented on GitHub (Oct 24, 2025):
v1.4.30 fails on my system 🥲
@007revad commented on GitHub (Oct 24, 2025):
Oops. I forgot that only models that use device tree have syno_slot_mapping. Older models use
synodisk --get_location_form.@007revad commented on GitHub (Oct 25, 2025):
Can you try v1.4.31
This time I tested it on my DS1821+ and DS1812+ 😄
@007revad commented on GitHub (Oct 25, 2025):
@PeterSuh-Q3
Does this command provide any more SMART information for SAS drives?
@PeterSuh-Q3 commented on GitHub (Oct 25, 2025):
@007revad commented on GitHub (Oct 25, 2025):
@PeterSuh-Q3
Do you want me to add all those for SAS drives? In the same format as SATA drives.
@PeterSuh-Q3 commented on GitHub (Oct 25, 2025):
Why is this information used? Does it provide detailed information about the disk's health? If SATA and SAS provide the same information, it seems like they could be used interchangeably.
@007revad commented on GitHub (Oct 25, 2025):
You previously said "As expected, SAS disks lack sufficient information"
And the output you previously posted has very little useful SMART attribute info.
@framps commented on GitHub (Oct 25, 2025):
v1.4.31 still fails on my system ...
@PeterSuh-Q3 commented on GitHub (Oct 25, 2025):
Synodisk seems to provide as much information as smartctl.
For SAS, synodisk actually provides more detailed information.
While synodisk doesn't display information as clearly and visually as smartctl,
it seems perfect for use with Syno Smart Info.
Are you planning to exclude smartctl from your existing scripts and use only synodisk?
@007revad commented on GitHub (Oct 25, 2025):
Fixed in v1.4.32
I didn't catch that grep issue because after disconnecting my DX213 while the NAS was running
/tmp/eunitinfo_2still existed.@007revad commented on GitHub (Oct 25, 2025):
@PeterSuh-Q3
I will parse the information from synodisk and then format it the same as smartctl does for SATA drives.
@framps commented on GitHub (Oct 25, 2025):
v1.4.32 works now again on my system 👍
PS: I just need one minute to check if a new release works for me with my update script I provided in my PR 😉
@007revad commented on GitHub (Oct 25, 2025):
Many of my other scripts have an auto update option. I use 2 variations of the same code:
--autoupdateoption was used and the new version is older than the specified age.Like in syno_hdd_db
After it's updated itself it runs the new version in same shell window.
When I originally wrote this script I never thought it would be more than a few hundred lines or get updated so often so I only included the "Just print that there's a new version available" version.
@framps commented on GitHub (Oct 25, 2025):
I didn't know you have this update feature already in your other scripts available when I created the PR.
I frankly don't like an autoupdate. I want to know when anything is updated on my system. But I think it's worth to add another option in your script,
-ufor example, to start a manual update and close my PR.@007revad commented on GitHub (Oct 25, 2025):
I'll add a
-uoption, and your updateLocalScript.shSo if updateLocalScript.sh is in the same folder as syno_smart_info.sh users will see:
And because I assume Peter's Synosmartinfo package won't have updateLocalScript.sh in the same folder as syno_smart_info.sh users of Synosmartinfo will see:
@007revad commented on GitHub (Oct 26, 2025):
It does 😄

I've done my part.

Which only appears when not run from

@appstore.@framps commented on GitHub (Oct 26, 2025):
I just updated the topic of this thread to reflect the primary reason for this topic 😃 The old topic was meant to start a discussion and in the meantime the proposed update was implemented 👍
@007revad commented on GitHub (Oct 27, 2025):
v1.4.33
@007revad commented on GitHub (Oct 27, 2025):
I was going to use synodisk for SAS drives. But then I thought synodisk must get the smart information from smartctl. So I tried to find out what smartctl options synodisk uses but haven't found anything yet.
I can display synodisk's output in more readable format.
At first I liked that synodisk decides if the raw value is ok or not. But it shows the following as OK!
I'm going to remove the STATUS column and just keep:
@framps commented on GitHub (Oct 27, 2025):
v1.4.33 works perfect.
I first thought there is an issue with the update (I used the updatescript to update my local version first) and then no update happened. But then I detected code which checks whether there is an updated version available. Nice way to execute the test 👍
@007revad commented on GitHub (Nov 23, 2025):
It actually works 😄
At midnight I got an email from my DS925+
@framps commented on GitHub (Dec 6, 2025):
I promised to test your script on a DS1817+ with a DX517. Today I installed v1.4.34 on the system and the script reported all drives including the DX disks 👍