Cluster dies when 3rd node is on

August 20, 2014, 1:04 pm

≫ Next: How to schedule Cluster logs to be generated for Microsoft Failover Clusters 2008 R2

≪ Previous: Failover VM's Servers all grouping together under one server?

Hi,

At work we have 3 servers within a cluster (Windows Server 2012 R2). On Monday the cluster failed and started to live migrate boxes to servers which were rebooting. We had a major site outage, where our proxy, exchange and lync went down. In the Failover Cluster Manager all the virtuals were stuck saying "loading", the only console which was working properly was "Hyper-V Manager". . We managed to get everything back up by rebooting each server to allow it to install Windows Updates.

On Tuesday, we had a similiar outage which was caused by one of the servers trying to take ownership of a store. The cluster then went into a "zombie state", which only occurred when the 3rd node was on. We now have a option where we can evict the node from the cluster and add it back in.

Any ideas why this might have happened?

↧

How to schedule Cluster logs to be generated for Microsoft Failover Clusters 2008 R2

August 19, 2014, 2:08 am

≫ Next: The file cannot be opened because it is in the process of being deleted

≪ Previous: Cluster dies when 3rd node is on

Hi, As per my understanding we always have to generate Cluster logs manually on the cluster nodes to get these generated.

Is there any way we can schedule Cluster Logs to be generated every time so that it would be easy for us to analyze the issue?

Kevin

↧

The file cannot be opened because it is in the process of being deleted

August 20, 2014, 11:49 pm

≫ Next: Connection Broker session information

≪ Previous: How to schedule Cluster logs to be generated for Microsoft Failover Clusters 2008 R2

Hi,

I'm cleaning up a Hyper-V Cluster WS2012R2. There is only 1 node the cluster for the moment. All VMs have been moved to another cluster already except for 2. When I try to delete them I get the msg "The file cannot be opened because it is in the process of being deleted".

I want to destroy the cluster but I can't because the 2 VMs aren't removed yet. I have tried to remove them via powershell "Remove-ClusterGroup VMName" , this results in an error: The object has been deleted from the cluster.

How can I remove the 2 remaining VMs?

↧

Connection Broker session information

August 22, 2014, 2:56 am

≫ Next: Hyper-V Cluster with Storage Server AH

≪ Previous: The file cannot be opened because it is in the process of being deleted

Hello

I was wondering if it was possible to view the active sessions or monitor the session information of the Connection Broker. The Connection Broker decides on the server on which the client can connect and save this information in case they disconnect, so they can be redirect to the same server when they log in again.

Can this information be viewed?

Thank you

↧

Hyper-V Cluster with Storage Server AH

August 22, 2014, 5:48 am

≫ Next: Proper steps to fail over to another host in a cluster

≪ Previous: Connection Broker session information

Hello Everyone,

I have a question about Clustering&Storage.

I have two servers and what I wanted to do is:

-A cluster of 2 nodes

-The Storage directly on the Nodes

-The storage AH

I don't want a third server as Storage because, if the storage server fail the cluster will stop.

I wanted to know if it is possible to create a 2 node cluster with built-in Storage, so that if NODE1 fail, the VM or Data ar still available on NODE2.

↧

Proper steps to fail over to another host in a cluster

August 22, 2014, 7:50 pm

≫ Next: Build a SQL 2008 R2 Failover clustering

≪ Previous: Hyper-V Cluster with Storage Server AH

Hello,

Pardon my ignorance. What is the proper steps to force a fail over to the standby host in a cluster with two nodes?

My secondary host is the currently the active host for custer name. I would like to force it to fail to the primary, which is acting as a standby. Thank you in advance.

↧

Build a SQL 2008 R2 Failover clustering

August 23, 2014, 2:10 am

≫ Next: Failed VM in Failover Cluster Manager

≪ Previous: Proper steps to fail over to another host in a cluster

Hi,

Am looking for a 2 node active passive SQL server clustering for testing purpose

I believe that for building a cluster we need a AD, DNS and DHCP and shared disk ..

I have searched in Internet and as of now I haven't found a complete solution for a SQL Server 2008 R2 fail over clustering

I have tried my self for building a SQL cluster with the available information in internet . But every time am getting one or other error...

Could some one here help me here to find out a COMPLETE solution for building a 2 node active passive SQL Server 2008 R2 fail over cluster.

Video links or step by step screen shot will be much appreciated.

Oracle VM virtual manager , and Starwind I have used to create a Windows cluster.

↧

Failed VM in Failover Cluster Manager

August 17, 2014, 8:12 pm

≫ Next: Cluster Networking - Minimal configuration

≪ Previous: Build a SQL 2008 R2 Failover clustering

We have a problem that seems to be caused by 3 VMs in our cluster that have failed.
The cluster is a server 2012 R2 cluster.

The VMs are not in Hyper-V anymore, but they still appear in the Failover Cluster Manager. We have tried removing each VM and we receive the following error message, "The file cannot be opened because it is in the process of being deleted."

We have tried moving the failed VM to another server in the cluster but receive the same message.

Is anyone aware of a way to manually delete a specific VM from a cluster or know a solution to our problem?

Thanks

↧

Cluster Networking - Minimal configuration

August 18, 2014, 1:25 pm

≫ Next: Generic Application fail-over is restarting

≪ Previous: Failed VM in Failover Cluster Manager

I am looking for the best way to configure our 3 node Hyper-V cluster. The cluster nodes have 2x 1Gbit NICs and 2x 10Gbit nics. The 2 10Gbit nics are conifgured for iSCSI, so cluster validation warns that they are disabled:
These paths will not be used for cluster communication and will be ignored. This is because interfaces on these networks are connected to an iSCSI target

This made me start looking at the best way to use my 4 network adapters.

My current configuration is:

1Gig1 -Mgmt and Cluster (10.1.2.0/24 subnet)

1Gig2 -Hyper-V switch for guest VMs (no IP defined on host)

10Gig1 -iSCSI (10.2.21.0/24 subnet, no gateway)

10Gig2-iSCSI (10.2.22.0/24 subnet, no gateway)

In Failover cluster manager, I have "Cluster and Client" for the 3 networks that are visible to the host machine (but cluster validation now tells me that the 2 iSCSI adapters can't be used)

A few months ago we had several issues with NIC teaming (errors about MAC addresses and a few BSOD crashes that WinDBG pointed to NIC teaming as the cause), so we moved away from using it. Not sure if the issues have been resolved.

Is there anything "wrong" with the way it is currently set up? Is there a better way to set it up using the 4 network cards and still keep things pretty simple?

James Right Size Solutions

↧

Generic Application fail-over is restarting

August 5, 2014, 12:08 pm

≫ Next: Network Name Resource Availability - failover cluster error 1196

≪ Previous: Cluster Networking - Minimal configuration

I have setup a two node cluster on 2012. I am trying to use the generic application to fail fail over and to pick up where it left off i.e. Move what is in memory to the fail over server. I can not get this to work. When it fails over it restarts the program and anything in memory is lost. I read that the memory was supposed to be written to disk for the fail over.

To make it easier to see what is going on I created a program that just counts to a text file on the C drive. It over writes itself so you will only have the last number in the text file. If I run the program and then do a fail over after a few minutes (when the text file should be at 100 or so)it actually starts back at one again.

Any ideas?

↧

Network Name Resource Availability - failover cluster error 1196

August 26, 2014, 4:23 am

≫ Next: Validating Windows 2012 R2 Cluster Fail

≪ Previous: Generic Application fail-over is restarting

Hello,

We're getting this error in our even logs of our four node failover cluster, we tried deleting Host A record in DNS management, that did nothing.

Failover cluster event: 1196

"Cluster network name resource 'CAUCrgt8' failed registration of one or more associated DNS name(s) for the following reason: This operation returned because the timeout period expired.
Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server."

And this resource http://technet.microsoft.com/en-us/library/cc773529%28v=WS.10%29.aspx did not help in solving this.

Do you guys have any other suggestions we could try to resolve this error?

↧

Validating Windows 2012 R2 Cluster Fail

August 26, 2014, 4:40 am

≫ Next: Clustered role 'Availability Role' has exceeded its failover threshold

≪ Previous: Network Name Resource Availability - failover cluster error 1196

Dear Reader,

I am Trying to build a Windows 2012 DC R2 Cluster.

I am having 2 AD Servers in Subnet else than the Subnet where I am trying to build the cluster, there is Firewall Between those 2 Subnets.

We have Limited RPC Port to be from 50000 to 50225, and configured other ports for Name Resolution and AD communication.

I have successfully join those 2 servers to AD, However, when I am trying to validate cluster, I am getting bellow error:

Validate Active Directory Configuration

Connectivity to a writable domain controller from node XX01.XX.X could not be determined because of this error: Could not get domain controller name from machine XX01.

Node(s) XX01.XX.X cannot reach a writable domain controller. Please check connectivity of these nodes to the domain controllers.

-----------------------------------------

After checking on Firewall between AD and those 2 Windows Server, I have find that the Cluster Service is trying to communicate on Dynamic Ports which is denied (Coz we configured windows to use dynamic ports between 50000 to 50225),so is there any way to force Failover cluster dynamic ports to be between 50000 to 50225?

Please let me know your suggestion about this.

↧

Clustered role 'Availability Role' has exceeded its failover threshold

August 26, 2014, 6:00 am

≫ Next: CAU: The plug-in argument HotfixRootFolderPath has invalid value

≪ Previous: Validating Windows 2012 R2 Cluster Fail

I am getting this alert on SQL 2012 R2 SP1. So please kindly tell me the solution of the below given alert on windows failover clustering .

Clustered role 'Availability Role' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster.Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

↧

CAU: The plug-in argument HotfixRootFolderPath has invalid value

August 25, 2014, 3:46 am

≫ Next: Persistent Reservation not present on Test Disk 0 from node......

≪ Previous: Clustered role 'Availability Role' has exceeded its failover threshold

I am trying to apply a hotfix using CAU and have configured the self-updating options. When I come to preview the update for the cluster and select Microsoft.Hotfixplugin as the plugin, I get the following error:

I am unsure why this is being generate or how to correct the problem.

Any ideas?

↧

Persistent Reservation not present on Test Disk 0 from node......

August 26, 2014, 9:03 am

≫ Next: Network Configuration Problems

≪ Previous: CAU: The plug-in argument HotfixRootFolderPath has invalid value

Hi all,

I did a research and find many similar issue in this forum. However, the issue persist.

There is a Windows Server 2012 R2 Hyper-V Failover Cluster in my lab, with HP P2000 storage with Fiber Channel. Everything goes well after a problem with one node(let's call it Node01) in the cluster. So I perform a reinstallation with Node01. After the fresh installation , the LUN for VMs show "unknown" in Disk Management on Node01, you can see the following screenshot. However, the LUN(from the same storage) for quorum is OK.

I try to add Node01 to the existing cluster and the cluster validation give the following warning.

Failure. Persistent Reservation not present on Test Disk 0 from node <<MY_SERVER_FQDN>> after successful call to update reservation

holder’s registration key 0xb.

Test Disk 0 does not support SCSI-3 Persistent Reservations commands needed to support clustered Storage Pools. Some storage devices

require specific firmware versions or settings to function properly with failover clusters. Please contact your storage administrator or

storage vendor to check the configuration of the storage to allow it to function properly with failover clusters.

If I ignore the warning and keep Node01 in the cluster and run the command:get-ClusterSharedVolumeState. It will show:
Node : Node01
StateInfo : BlockRedirected

↧

Network Configuration Problems

August 26, 2014, 11:51 am

≫ Next: Microsoft DSM versions

≪ Previous: Persistent Reservation not present on Test Disk 0 from node......

Hi,

I am really, really struggling to get networking correct on a Hyper-V cluster. The public LAN (Management and virtual machine network) is fine. This is purely focusing on the network adapters involved in the cluster.

I have 2 cluster nodes, and 4 switches. I have 1 server and 2 switches in each building, the switches are HP 2920.

We will call building 1 ServerRoom, and Building 2, BackupRoom

Switch 1 in ServerRoom is connected to switch 1 in BackupRoom on port A1 of the switch by fibre optic

Switch 2 in Server Room is connected to switch 2 in BackupRoom on port A1 of the switch by fibre optic

Switch 1 and 2 in ServerRoom are stacked

Switch 1 and 2 in BackupRoom are stacked

Below shows the IP addresses on each server and what ports they are connected to on the switches

This is how my cables go from server to the switches

ServerRoom

Switch Network Port IP Address

sw1 iscsi Primary 1/a2 10.10.1.1

sw1 Live Mig Primary 1/1 10.10.2.1

sw1 HB Primary 1/2 10.10.3.1

sw2 iscsi Backup 2/a2 10.10.5.1

sw2 Live Mig Backup 2/1 10.10.6.1

sw2 HB Backup 2/2 10.10.4.1

BackupRoom

Switch Network Port IP Address

sw1 iscsi Primary 1/a2 10.10.1.2

sw1 live mig Primary 1/1 10.10.2.2

sw1 HB Primary 1/2 10.10.3.2

sw2 iscsi Backup 2/a2 10.10.5.2

sw2 Live Mig Backup 2/1 10.10.6.2

sw2 HB Backup 2/2 10.10.4.2

I have created a trunk between switch 1 and switch 2 on the fibre optic ports

Configure

Trunk 1/a1,2/a1 trk1 lacp

I have ran the same command on the switch in BackupRoom

Because I have a primary physical network cable and a backup physical cable in each server for each cluster resource I have to put these in different subnets, so they all use subnet mask 255.255.255.0

I have created 6 vlans, on each switch stack to keep the traffic separate, but I have then tagged trk1 into each vlan so I can benefit from the LACP

These commands are on the switch stack, I have used an IP ending in 5 (10.10.1.5 for example) to designate as the “management IP” or whatever HP refers to it as for each vlan (I am not too good with switches), and in BackupRoom I have used an IP ending in 6 (10.10.3.6 for example)

Vlan 2

Ip address 10.10.1.5 255.255.255.0

untag 1/a2

tag trk1

vlan 3

ip address 10.10.2.5 255.255.255.0

untag 1 /1

tag trk1

vlan 4

ip address 10.10.3.5 255.255.255.0

untag 1/2

tag trk1

vlan 5

ip address 10.10.5.5 255.255.255.0

untag 2/a2

tag trk1

vlan 6

ip address 10.10.6.5 255.255.255.0

untag 2/1

tag trk1

vlan 7

ip address 10.10.3.5 255.255.255.0

untag 2/2

tag trk1

I have repeated the above for BackupRoom switch stack.

However, I use a product called starwind to synchronise the storage, and when I use this configuration it seems to crash the product. When I remove the network cables from one of the switches or remove both the ISCSI cables the product frees itself up. This would indicate there is something wrong with my switch configuration, as if the network is being flooded (by the way I enabled spanning tree protocol too). So I’m really stuck on this….. can anyone suggest anything?

Thank you

Steve

↧

Microsoft DSM versions

August 27, 2014, 7:07 am

≫ Next: Cluster Windows 2012 in different subnet

≪ Previous: Network Configuration Problems

I have a 2 node 2012 r2 cluster and am trying to add another 2012 r2 server

I fully pathced the new server and when adding a node it runs validation which fails on the dsm qfe versions

the 2 original units are qfe 16384, the new fully patched server is qfe 17088

so I can't add it to the cluster without validating

I can skip the validation and add it that way, but

I need to know if adding this node with the diffrent mpio dsm version is ok for a while so I can move the vm over and update and restart the other 2 older servers?

the issue is I don't have enough resources to put all the vm's on 1 server

we are a healthcare facility and I don't wan thte cluster to go down

any help would be appreciated

↧

Cluster Windows 2012 in different subnet

August 27, 2014, 7:51 am

≫ Next: Hyper-v Live Migration not completing when using VM with large RAM

≪ Previous: Microsoft DSM versions

I prepare to configure 4 windows 2012 in a cluster. All windows 2012 have 2 network card (1 = DATA and 1 = Heartbeat)

It's possible to configure:

- All 4 servers network card in same VLAN for "DATA" (Ex.: 10.10.10.0 /24)

- 3 servers network card in VLAN "Heartbeat" (Ex.: 192.168.0.0 /24) and 1 server network card in VLAN (Ex.: 172.168.0.0/23)

Thanks

↧

Hyper-v Live Migration not completing when using VM with large RAM

August 27, 2014, 1:45 pm

≫ Next: Windows 2012 FSW

≪ Previous: Cluster Windows 2012 in different subnet

hi,

i have a two node server 2012 R2 cluster hyper-v which uses 100GB CSV, and 128GB RAM across 2 physical CPU's (approx 7.1GB used when the VM is not booted), and 1 VM running windows 7 which has 64GB RAM assigned, the VHD size is around 21GB and the BIN file is 64GB (by the way do we have to have that, can we get rid of the BIN file?).

NUMA is enabled on both servers, when I attempt to live migrate i get event 1155 in the cluster events, the LM starts and gets into 60 something % but then fails. the event details are "The pending move for the role 'New Virtual Machine' did not complete."

however, when i lower the amount of RAM assigned to the VM to around 56GB (56+7 = 63GB) the LM seems to work, any amount of RAM below this allows LM to succeed, but it seems if the total used RAM from the physical server (including that used for the VMs) is 64GB or above, the LM fails.... coincidence since the server has 64GB per CPU.....

why would this be?

many thanks

Steve

↧

Windows 2012 FSW

August 27, 2014, 7:40 am

≫ Next: How to configure current SQL high availability cluster using mirroring with dedicated replication NICS?

≪ Previous: Hyper-v Live Migration not completing when using VM with large RAM

I prepare to configure 5 windows 2012 in cluster with a File Share Withness.

My question: It's possible to configure a share (for a file share withness) on one of my servers Windows 2012?

If Yes, do you a link to explain that?

Thanks

↧