Storage Spaces Direct Issue
Hi everyone.
I have an issue with Storage Space Direct (3 Nodes Cluster). I noticed by Virtual Disk was operational status “Degraded, InService” with a health status of warning. I also noticed Get-StoragePool has a job Suspended for a while.
PS C:\Windows\system32> Get-VirtualDisk
FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach Size
------------ --------------------- ----------------- ------------ -------------- ----
CSV-01 Mirror {Degraded, InService} Warning True 14.55 TB
PS C:\Windows\system32> Get-StorageJob
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair False 27.20:25:44 Running 0
Repair True 00:00:00 Suspended 0 0 273804165120
The rest look all fine. The Get-Physical Disk command return an Health Status of Healthy on all my disk. The Failover Cluster Manager look good. Yet, when looking at the event log, I have found this.
Log Name: Microsoft-Windows-StorageSpaces-Driver/Operational
Source: Microsoft-Windows-StorageSpaces-Driver
Date: 9/6/2017 7:03:34 PM
Event ID: 203
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: HyperV01
Description:
Physical disk {b42ee394-d7d2-4f18-2730-e52dfdd9782b} failed an IO operation. Return Code: The device is unresponsive.. Additional related events may be found in the System event log for Disk 5022.
This disk may need to be replaced. To view its reliability counters, run this command in PowerShell:
Get-PhysicalDisk | ?{ $_.ObjectId -Match "{b42ee394-d7d2-4f18-2730-e52dfdd9782b}" } | Get-StorageReliabilityCounter
This disk may be located using the following information:
Drive Manufacturer: TOSHIBA
Drive Model Number: MG04SCA40EN
Drive Serial Number: 27B0A1EQFVNC
Strangely, most command show that disk were OK. But the commands result was slow.
PS C:\Windows\system32> Get-PhysiCalDisk
FriendlyName SerialNumber CanPool OperationalStatus HealthStatus Usage Size
------------ ------------ ------- ----------------- ------------ ----- ----
ATA INTEL SSDSC2BX20 BTHC649408CG200TGN False OK Healthy Auto-Select 186.31 GB
ATA INTEL SSDSC2BX40 BTHC7033012Q400VGN False OK Healthy Journal 372.5 GB
ATA INTEL SSDSC2BX40 BTHC703308JT400VGN False OK Healthy Journal 372.5 GB
TOSHIBA MG04SCA40EN 27C0A1JEFVNC False OK Healthy Auto-Select 3.64 TB
TOSHIBA MG04SCA40EN 27C0A1JZFVNC False OK Healthy Auto-Select 3.64 TB
TOSHIBA MG04SCA40EN 27C0A1JDFVNC False OK Healthy Auto-Select 3.64 TB
ATA INTEL SSDSC2BX20 BTHC649408L0200TGN False OK Healthy Journal 186.25 GB
ATA INTEL SSDSC2BX40 BTHC7033012N400VGN False OK Healthy Journal 372.5 GB
TOSHIBA MG04SCA40EN 27C0A1JWFVNC False OK Healthy Auto-Select 3.64 TB
ATA INTEL SSDSC2BX40 BTHC703301TM400VGN False OK Healthy Journal 372.5 GB
ATA INTEL SSDSC2BX40 BTHC703308JV400VGN False OK Healthy Journal 372.5 GB
TOSHIBA MG04SCA40EN 27C0A1JMFVNC False OK Healthy Auto-Select 3.64 TB
ATA INTEL SSDSC2BX40 BTHC7033011Z400VGN False OK Healthy Journal 372.5 GB
TOSHIBA MG04SCA40EN 27B0A1EYFVNC False OK Healthy Auto-Select 3.64 TB
ATA INTEL SSDSC2BX40 BTHC703308K2400VGN False OK Healthy Journal 372.5 GB
TOSHIBA MG04SCA40EN 27B0A1CBFVNC False OK Healthy Auto-Select 3.64 TB
TOSHIBA MG04SCA40EN 27C0A1JCFVNC False OK Healthy Auto-Select 3.64 TB
ATA INTEL SSDSC2BX40 BTHC7033012G400VGN False OK Healthy Journal 372.5 GB
TOSHIBA MG04SCA40EN 27C0A1JJFVNC False OK Healthy Auto-Select 3.64 TB
TOSHIBA MG04SCA40EN 27B0A1EQFVNC False OK Healthy Retired 3.64 TB
TOSHIBA MG04SCA40EN 27C0A1JFFVNC False OK Healthy Auto-Select 3.64 TB
ATA INTEL SSDSC2BX40 BTHC703308FA400VGN False OK Healthy Journal 372.5 GB
TOSHIBA MG04SCA40EN 27B0A1ETFVNC False OK Healthy Auto-Select 3.64 TB
ATA INTEL SSDSC2BX20 BTHC649408LE200TGN False OK Healthy Journal 186.25 GB
ATA INTEL SSDSC2BX40 BTHC70330136400VGN False OK Healthy Journal 372.5 GB
ATA INTEL SSDSC2BX40 BTHC7033012A400VGN False OK Healthy Journal 372.5 GB
ATA INTEL SSDSC2BX20 BTHC649408WB200TGN False OK Healthy Journal 186.25 GB
ATA INTEL SSDSC2BX40 BTHC7033012R400VGN False OK Healthy Journal 372.5 GB
Then, I try to Enable Storage Maintenance Mode on the disk but it’s fail.
PS C:\Windows\system32> Get-PhysicalDisk -SerialNumber 27B0A1EQFVNC | Enable-StorageMaintenanceMode
Invoke-CimMethod : Currently unsafe to perform the operation
Extended information:
One or more virtual disks are not healthy.
Virtual Disks:
4DABF2FD1F351643B3435CDA4962346C
Recommended Actions:
- Repair associated virtual disks that have lost their redundancy.
- To continue with the operation do not use the 'VirtualDisksHealthy' flag.
The associated virtual disks may be at a greater risk of becoming unavailable.
Activity ID: {c8a15747-46f3-4bfb-b97f-69d173bc1c2c}
At C:\Windows\system32\WindowsPowerShell\v1.0\Modules\Storage\StorageScripts.psm1:3904 char:17
+ ... Invoke-CimMethod -MethodName "Maintenance" -Arguments $ar ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (StorageWMI:) [Invoke-CimMethod], CimException
+ FullyQualifiedErrorId : StorageWMI 9,Microsoft.Management.Infrastructure.CimCmdlets.InvokeCimMethodCommand
I then try a repair function on the Virtual Disk but it fails because of this.
PS C:\Windows\system32> Get-VirtualDisk | Where-Object -FilterScript {$_.HealthStatus -Eq "Warning"} | Repair-VirtualDisk
Repair-VirtualDisk : Not enough available capacity
Please note that I only have 1 volume on this cluster create using the “New-Volume -StoragePoolFriendlyName "S2D*" -FriendlyName CSV-01 -FileSystem CSVFS_ReFS -UseMaximumSize” command.
Then, I have decided to get the disk and retired it with this command.
$Data = Get-PhysicalDisk -SerialNumber 27B0A1EQFVNC
Set-PhysicalDisk -InputObject $Data -Usage Retired
Once the disk was retired, I have stop receiving the error in the Microsoft-Windows-StorageSpaces-Driver Event Log. Then, I try resetting the disk with this command.
Reset-PhysicalDisk -InputObject $Data
It seems the disk was added in the pool and I don’t have new error in the Microsoft-Windows-StorageSpaces-Driver Event Log. But I still have a Repair StorageJob suspended my Virtual Disk is still Degraded.
PS C:\Windows\system32> Get-StorageJob
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair True 01:54:22 Suspended 0 0 3998614552576
PS C:\Windows\system32> Get-VirtualDisk
FriendlyName ResiliencySettingName OperationalStatus HealthStatus IsManualAttach Size
------------ --------------------- ----------------- ------------ -------------- ----
CSV-01 Mirror Degraded Warning True 14.55 TB
Strangely, at first, it was “Degraded, InService” and now, it only shows “Degraded”. The “InService” portion is not show any more but my VM running on top of Virtual Disk are still running fine.
P.S. The Virtual Disk is own by Node 1 but the problematic disk is on Node 2. Node 3 is healthy.
I’m very confused by all this. My Virtual Disk is only using 13.3 TB on the 14.55 TB. I know in the pass, I had issue when I patch the server where the disk wouldn’t sync after a reboot. If I remember correctly, I had to repair de Virtual Disk with this command “Get-VirtualDisk | Where-Object -FilterScript {$_.HealthStatus -Eq "Warning"} | Repair-VirtualDisk” but this time, it doesn’t work. I also try the “Get-StoragePool<storage pool friendly name> | Optimize-StoragePool” and it didn’t help either.
I could have a bad disk but I would find it strange only the EventLog would show that to me and not PowerShell.
Anyone have any suggestion for me? At some point, I wanted to reboot Node2 but I couldn’t gracefully put in maintenance mode because my Virtual Disk was not healthy. The 3 idea I have at this moment.
- Put the disk was as Retired and leave it Retired for a while.
- Forcedly reboot node 2.
- Change the disk.
Anyone have more idea?
P.S. Sorry for any grammatical error. It late for me now.