Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 4519

Windows 2012 r2 Cluster issues - Guest vms fail when one specific node hosts CSV

$
0
0

I have a Windows Server 2012 r2 cluster set up with 3 nodes.

2 nodes, vm3 and vm5, have no issues acting as owner of any role, including the CSV volumes, Quorum Disk Witness, and the individual VMs.  

1 node, vm1, has no issues owning any of the individual VM roles, one of the CSV volumes (high-speed-lun), or the Quorum Disk Witness.  However, if vm1 is set as the owner of LUN_1 or LUN_2, any of the VMs that have their OS vhd(x) file hosted on those LUNS and are not owned by vm1, fail and can't be restarted. 

The VMs that 

  • a) are owned by vm1 and have their os vhd(x) files on the a LUN that is owned by vm1 or,
  • b) are owned by any vm host and have their os vhd(x) files on the "high-speed-lun" no matter what node owns "high-speed-lun"

are not affected and have no issues booting or running.  It does not matter if LUN/CSV ownership fails over automatically, or if I manually change the owner node to vm1, any running VM that does not fit one of the above 2 descriptions will immediately die and not be able to restart.

Some scenarios that will hopefully clarify this issue a bit:

  1. vmguest1 and vmguest2 are hosted on vm1 node and their os storage is located on LUN_2, which is owned by vm5 node. this is not a problem and everything works.  also no issues if this is reversed.
  2. vmguest1 is owned by vm1 and vmguest2 is owned by vm3 node and their os storage is located on "high-speed-lun", which is owned by vm1 node.  This is not a problem and everything works.
  3. vmguest1 is owned by vm1 and vmguest2 is owned by vm3 node, with both os storage located on LUN_1, which is owned by vm1 node.  vmguest1 will be fine, while vmguest2 will fail to run/start.

When this issue occurs, I see the following errors in the Cluster Events/Event Viewer:

  • Error, Event ID 1069 "Cluster resource 'Virtual Machine vmguest1' of type 'Virtual Machine' in clustered role 'vmguest1' failed. The error code was '0x780' ('The file cannot be accessed by the system.').
  • Error, Event Id 1205 "The Cluster service failed to bring clustered role 'vmguest1' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role."

I know this is a lot of info, just trying to give as clear of an outline of the issues I'm seeing as possible up front.

Any thoughts anyone has to help get this all cleaned up would be greatly appreciated.


In the interest of reducing questions about the cluster setup/environment, I'm going to try and get all of the potentially relevant info here in one fell swoop below.

Node info ("vm1", "vm3", "vm5"):

  • all 3 nodes are running 2012 r2,
  • all have the same updates [verified by cluster validation],
  • 2x xeon e5-2430l hex-core, 64gb memory,
  • 2x onboard nics teamed for cluster comms,
  • 2x onboard nics teamed and assigned to hyper-v switch,
  • 4x nics on individual subnets for communication with SAN
  • only known physical difference between the nodes is that vm1 has it's OS drive set up as a 2-disk 558GB RAID1, while vm3/vm5 have their OS drives set up as 4-disk 1.1tb RAID10.
  • all AD Joined with 3 DCs in 2 locations, 2 remote in the satellite office, 1 in the dc local to this cluster on separate hardware.  All AD tests/replication/etc have been tested and are, to the best of my knowledge, working properly.

Storage hardware ("dcsan"):

  • Dell MD3200i with dual controllers
  • each controller has 4 nics that are set up on individual subnets to match how the server nics are configured
  • One disk group set up as RAID10 across 8 physical 2tb, 7.2k rpm drives, with 7,430 gb total storage available ("Disk Group 0")
  • One disk group set up as RAID5 across 4 physical 600gb, 15k rpm drives, with 1,660 gb total storage available ("Disk Group 2")
  • MPIO is configured on each server node

Dell MDSM host mappings (see screenshot, actual host names changed for security):


The LUNs are available in Storage->Disks on each node as follows (LUN name in screenshot above, LUN Size, disk group, assigned to, Disk Number):

  1.     High-Speed-lun (HighSpeed1, 1.6 tb, Disk Group 2, Cluster Shared Volume, 4)
  2.     LUN_1 (Lun_1, 3.5tb, Disk Group 1, Cluster Shared Volume, 3)
  3.     LUN_2 (LUN_2, 3.5tb, Disk Group 1, Cluster Shared Volume, 3)
  4.     Quorum Witness (Cluster_Quorum, 520 mb, Disk Group 1, Disk Witness in Quorum, 1)

Cluster Roles:

    approx 20-25 guest vms, majority running 2012 r2, with a few running ubuntu (14.04-18.04 os)



Viewing all articles
Browse latest Browse all 4519

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>