Add to SONiC an ability to check storage health state. Basic functionality will be implemented as a CLI command. Optionally pmon daemon could be added for constant disk state monitoring.
show platform ssdhealth [verbose/vendor]
admin@sonic-switch: ~$ show platform ssdhealth
Device Model : InnoDisk Corp. - mSATA 3ME
Health: 72.9%
Temperature: N/A
admin@sonic-switch: ~$
admin@sonic-switch: ~$ show platform ssdhealth verbose
Device Model : InnoDisk Corp. - mSATA 3ME
FW Version : S140714
Serial Number : 20160429AA1134000035
Health : 72.9%
Capacity : 29.818199 GB
Temperature : N/A
Power On Hours : 1576 hours
Power Cycle count: 130
Something else???
admin@sonic-switch: ~$ show platform ssdhealth vendor
********************************************************************************************
* Innodisk iSMART V3.9.41 2018/05/25 *
********************************************************************************************
Model Name: InnoDisk Corp. - mSATA 3ME
FW Version: S140714
Serial Number: 20160429AA1134000035
Health: 72.900%
Capacity: 29.818199 GB
P/E Cycle: 3000
Lifespan : 1576 (Years : 4 Months : 3 Days : 26)
Write Protect: Disable
InnoRobust: Enable
--------------------------------------------------------------------------------------------
ID SMART Attributes Value Raw Value
--------------------------------------------------------------------------------------------
[09] Power On Hours [18304] [090200646480470000000000]
[0C] Power Cycle Count [ 130] [0C0200646482000000000000]
[AA] Total Bad Block Count [ 15] [AA0300646400000F00000000]
[AD] Erase Count Max. [ 883] [AD020064642D037303000000]
[AD] Erase Count Avg. [ 813] [AD020064642D037303000000]
[C2] Temperature [ 0] [000000000000000000000000]
[EB] Later Bad Block [ 0] [EB0200640000000000000000]
[EB] Read Block [ 0] [EB0200640000000000000000]
[EB] Write Block [ 0] [EB0200640000000000000000]
[EB] Erase Block [ 0] [EB0200640000000000000000]
[EC] Unstable Power Count [ 0] [EC0200646400000000000000]
admin@sonic-switch: ~$
New item under menu platform
in show/main.py
It will execute "ssdhealth -d /dev/sdX" [options]
New utility in sonic-utilities/scripts/
It will import device plugin ssdutil.py
and print the output returned by different API functions
Syntax:
root@mts-sonic-dut:/home/admin# ssdhealth -h
usage: ssdhealth -d DEVICE [-h] [-v] [-e]
Show disk device health status
optional arguments:
-h, --help show this help message and exit
-d, --device disk device to get information for
-v, --verbose show verbose output (more parameters)
-e, --vEndor show vendor specific disk information
Examples:
ssdhealth -d /dev/sda
ssdhealth -d /dev/sda -v
ssdhealth -d /dev/sda -e
Location: sonic-buildimage/src/sonic-platform-common/sonic_platform_base/sonic_ssd/ssd_base.py
Generic implementation of the API. Will use specific utilities for known disks or the systemctl
utility for others. Since not all disk models are in smartctl's database, some information can be unavailable or incomplete.
class SsdBase:
...
Inherited from SsdBase. Can be implemented by vendors to provide detailed info about the disk installed.
Location: sonic-buildimage/device/{{vendor}}/platform/plugins/ssdutil.py
class SsdUtil(SsdBase):
...
- get_disk_health(diskdev)
- Accepts:
- diskdev:string - disk device name (e.g. /dev/sda)
- Returns:
- res:float - Floating point in range 0-100 representing disk health in percentages. -1 if not available
- Accepts:
- get_temperature(diskdev)
- Accepts:
- diskdev:string - disk device name (e.g. /dev/sda)
- Returns:
- res:string - Integer (floating point?) disk temperature in centigrade. Zero if not available
- Accepts:
- get_model(diskdev)
- Accepts:
- diskdev:string - disk device name (e.g. /dev/sda)
- Returns:
- res:string - Human readable string holding disk model. Empty if not available
- Accepts:
- get_firmware(diskdev)
- Accepts:
- diskdev:string - disk device name (e.g. /dev/sda)
- Returns:
- res:string - Human readable string holding disk firmware version. Empty if not available
- Accepts:
- get_serial(diskdev)
- Accepts:
- diskdev:string - disk device name (e.g. /dev/sda)
- Returns:
- res:string - Human readable string holding disk serial number. Empty if not available
- Accepts:
- get_vendor_output(diskdev)
- Accepts:
- diskdev:string - disk device name (e.g. /dev/sda)
- Returns:
- res:string - Human readable string. Output of vendor application. Empty if not available
- Accepts:
Part of smartmontools package (1.9M)
PR: sonic-net/sonic-buildimage#2703
Utility for InnoDisk Corp. SSDs (<120K)
https://www.innodisk.com/en/iService/utility/iSMART
Need to be added as binary.
Utility for StorFly and Virtium (2.2M)
Daemon in Pmon (ssdmond) which will periodically query disk health (get_health()) and raise alarm when value decides to some critical value.
- Daemon and monitoring?
- SNMP needed?