One of the disks based on Storage Spaces on 2012 R2 is not behaving well, storing home folders. Several times per hour during the working day the disk freezes up and doesn't respond to requests. The duration of the freezes are random and have been observed
to last up to 70 seconds. It seems like this happens when users are at work and accessing their home folders, not during the weekend.
The freeze is observed in Resource Monitor with zero disk activity on the disk. No apparent events in event logs. A Powershell script running on the server is logging file access time that exceeds one second. See the log below for the first half of 25th October.
Delay in milliseconds
25/10-08:11:54;28292
25/10-08:12:31;12353
25/10-08:12:44;3859
25/10-08:15:37;28980
25/10-08:16:17;5576
25/10-08:31:27;1173
25/10-08:46:30;2931
25/10-08:49:52;47122
25/10-08:50:49;17503
25/10-08:51:07;1003
25/10-08:55:13;1918
25/10-08:55:22;2396
25/10-09:12:20;1829
25/10-09:33:42;1971
25/10-09:39:30;1213
25/10-09:52:46;34630
25/10-09:54:50;12199
25/10-09:55:05;1017
25/10-10:01:50;27485
25/10-10:02:25;2390
25/10-10:11:47;25251
25/10-10:12:18;2559
25/10-10:19:57;27027
25/10-10:20:29;2772
25/10-10:20:33;5138
25/10-10:29:09;26653
25/10-10:29:43;2681
25/10-10:32:13;6486
25/10-10:46:03;2164
25/10-10:46:31;20590
25/10-10:46:56;7333
25/10-10:47:04;1049
25/10-10:51:26;2194
25/10-10:51:34;2638
25/10-10:57:27;1758
25/10-11:00:38;2967
25/10-11:18:28;23101
25/10-11:18:58;2613
25/10-11:29:28;16156
We can see that the server occasionally have drops of network traffic to almost 0, while activity drops on the disk to no activity at all (for up to 20-80 sec). See the image below for details. Since the users cannot access their data, the activity drops for
a while because of that, naturally.
We do not know immediately why this is happening. We get lots of complains from users working in applications that use files on the user's home directory. When this lock happens, the whole application freezes and this creates a lot of frustration.
We have initially reconfigured backup to run at 7 pm and fragmentation of the disk is turned off.
We have run the script that is attached here from different admin desktops to the file server. The script is trying to access a file on the file server each second and prints any delays in the log.
Does anyone have a clue to where we could troubleshoot this scenario?
Freddy