Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 4519

Problem with virtual disk on 4 node cluster.

$
0
0
Hi Guys



I am going out of my mind.. Been struggling with this for days unable to find something that can bring me along the right path.

My cluster was powered down when starting up and that resulted in a virtual disk being stuck in an "online pending" -> "Failed" -> "Online pending" loop. And then i tries to start it on another server. So it keeps bouncing around all 4 servers.



I have tried almost all articles i could find. When running get-storagejob i have 1 job that keeps running:

Name   IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
----   ---------------- ----------- -------- --------------- -------------- ----------
Repair True             00:01:25    Running  0               0              45097156608



It seems that every 2-3 minutes the jobs restarts. I am getting this info in the event log (Sorry for missing pics i was not allowed to post them):

EventID: 1069

Cluster resource 'Cluster Virtual Disk (HyperVDisk1)' of type 'Physical Disk' in clustered role '96fd0e69-9c2d-41c0-92e3-09bdcd126686' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.



EventID: 5142

Cluster Shared Volume 'HyperVdisk1' ('Cluster Virtual Disk (HyperVDisk1)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.



EventID: 5142

Cluster Shared Volume 'HyperVdisk1' ('Cluster Virtual Disk (HyperVDisk1)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.



EventID: 1793

Cluster physical disk resource online failed.

Physical Disk resource name: Cluster Virtual Disk (HyperVDisk1)
Device Number: 5
Device Guid: {a75e8b5d-a226-4b0e-b6d4-cde8fffa4d1b}
Error Code: 5008
Additional reason: WaitForVolumeArrivalsFailure



EventID: 1795

Cluster physical disk resource terminate encountered an error.

Physical Disk resource name: Cluster Virtual Disk (HyperVDisk1)
Device Number: 5
Device Guid: {a75e8b5d-a226-4b0e-b6d4-cde8fffa4d1b}
Error Code: 1168



What i have tried:

This article from kreelbits: storage-spaces-direct-storage-jobs-hung



Tried optimize-storagePool and repair-virtualDisk with no success 



Found a great article from JTpedersen on troubleshooting-failed-virtualdisk-on-a-storage-spaces-direct-cluster



Every time i tried to run: 
Remove-Clustersharedvolume -name "Cluster Virtual Disk (HyperVDisk1)"

1 time i got that the job failed because the disk was moving to another server (Not the exact wording)

The normal response is it just hangs on the command and have been doing that for +24 hours.



To me it seems that the problem is that before any commands can get a hold of the disk it restarts the storageJob og moves the disk to another server and restarts the loop.



Thanks i advance.



/Peter






Viewing all articles
Browse latest Browse all 4519


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>