Quantcast
Channel: File Services and Storage forum
Viewing all articles
Browse latest Browse all 13565

S2D automatic repair/rebalance/regeneration kick-off criteria

$
0
0

Hi,

Earlier this week we has some issues with one of our S2D Hyper-V clusters. 

One of the nodes was put into maintenance mode, so the cluster roles were drained. But I also put the physical-disks of that node into maintenance mode, so they are no longer active in the S2D pool as a test to avoid another issue with the cluster loosing one of the Cluster Virtual Disks when the node was rebooted (see : https://jtpedersen.com/2018/08/storage-issues-when-rebooting-a-s2d-node-after-may-patches/ )

But that issue is beside the point. Let's zoom in on putting local disks of an S2D node into maintenance mode. This kicks of all sort of StorageJobs, which is to be expected. It will do so anyway when the node in maintenance is rebooted. These jobs should run for a while and then finish. As least, on paper.

But this week a lot more seem to happen then just a repair, even before the node went into reboot. I also noticed a rebalance. This put such a load on the IOPS that my other nodes in the cluster suffered a big performance impact. I did not expect this, which was also my first reaction. The background storagejobs never had an impact on performance on the remaining nodes in previous cases. But then again, this was the first time I also put the disks into maintenance mode.

My next reaction was that I perhaps need to simply stop the repair/rebalance jobs. Because I know the node will be rebooted, come back online, and then I will put the disks back into the pool (end maintenance mode). And then restart the storage jobs. This will prevent the IOPS storm on my pool during maintenance. This off course means I loose the resiliency. No more 3 way-mirror during until all the disks in the pool are back in sync.

So, I seem to have to choose between 2 bad's; suffer the IOPS storm during maintenance, or trade-off some resiliency during maintenance (which is always the case, but more if I forcefully stop the automatic jobs).

Funny enough, this was never an issue last year when our S2D cluster we less loaded then they are now. There are Windows Updates waiting that could fix/improve this, but how can I install them when I have this issue. Shutting down all workloads seems like the only option left.

Any thoughts ?

Grtz


Viewing all articles
Browse latest Browse all 13565

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>