Since the release of the Power9 servers, the new servers come with NVMe drives. Internal NVMe drives can be used to install VIOS. This raises the question of what condition the NVMe drive is in, is it degrading or perhaps about to fail? Using the “nvmemgr” command, you can check what state the drive is in by reading SMART data.
nvmemgr -M -l nvmeX
after executing the command, we see:
nvmemgr -M -l nvme0 Critical Warning ........................................ 0x0 Composite Temperature (Kelvin) .......................... 320 Available Spare (%) ..................................... 93 Percentage of NVM subsystem life used ................... 0 Data Units Read (1000 units of 512 bytes) ............... 11361920 Data Units written (1000 units of 512 bytes) ............ 16994397 Host Read Commands ...................................... 104784386 Host Write Commands ..................................... 1281447312 Number of Power Cycles .................................. 18 Power On Hours .......................................... 44243 Unsafe Shutdowns ........................................ 3 Media and Data Integrity Errors ......................... 0 Number of Error Information Log Entries ................. 46
As you can see from the above example, the drive is operational.
Its life used is 0% after more than 5 years of continuous operation (44243 hours of operation)!
The worst example I could find was a drive with 1% life used. Almost all NVMe drives on Power9/10 I have encountered have 0%. Most of them also have Spare at 100%. If the drive degrades at this rate, it will quietly last the entire life cycle of the server. I remember the days of Power7/8 servers with traditional disks, where disk replacement was a normal work. Meanwhile, since IBM switched to NVMe drives, I haven’t had the opportunity to replace any!
The command itself is much more wide-ranging and allows you to perform more operations than just reading the SMART data of the disk.
For example, you can read the disk write statistics.
nvmemgr -Q -l nvme0 Data Units Read Data Units Written Host Read Cmds Host Write Cmds (1 Unit = 1000 units of 512 bytes) 76 3 456 131 77 6 439 521 78 3 470 273 70 5 423 338 74 1 609 116 0 26 5 100 0 51 3 225 0 52 1 249 0 45 0 278 0 47 1 208
The command is unfortunately undocumented, I was unable to find information about it on IBM’s website. Although the command itself has very well described help.