2012 Cluster Network Question

December 5, 2013, 12:40 pm

≫ Next: seeing lots of FailoverClustering Errors in Windows 2012 Datacenter server

≪ Previous: Quorom Disk on multiple hosts in Cluster

I have a 3 node cluster and I am experiencing some strange network issues.

I have 4 10G adaptors in each node. 2 teamed for normal network traffic using LACP and 2 for ISCSI using MPIO.

The cluster throws a ton of errors out if I select "do not allow cluster network communication on this network" on the ISCSI network. I enabled it a while back, because the guests were failing over every 5 minutes.

I would think that with 2 10G NIC's there would be enough throughput for the heartbeat. I am thinking that is what causes this.

Any ideas?

Thank you

↧

seeing lots of FailoverClustering Errors in Windows 2012 Datacenter server

December 5, 2013, 1:16 pm

≫ Next: File Share Witness is not a valid File share path

≪ Previous: 2012 Cluster Network Question

We have a critical issue where our Windows 2012 Datacenter server which hosts MSMQ is unable to send message to BizTalk servers. I am seeing lots of FailoverClustering error in the System event logs. Earlier it was

Cluster network name resource 'XXXXXXXXXXXX' failed registration of one or more associated DNS name(s) for the following reason:
DNS bad key.

Now when we have the issue I am seeing it as slightly different.

Cluster network name resource 'XXXXXXXXXXXX' failed registration of one or more associated DNS name(s) for the following reason:
The handle is invalid.

Any suggestions are appreciated.

-Thanks

↧

File Share Witness is not a valid File share path

November 25, 2013, 2:34 pm

≫ Next: Cluster Disk i/o Timeout

≪ Previous: seeing lots of FailoverClustering Errors in Windows 2012 Datacenter server

I have two Windows 2008 R2 VMs (ESXi) running Double-Take.

We currently have a Health-check running on our Netscaler to force a failover from the primary VM to the secondary VM if the Netscaler cannot access the an SMB share on the Primary VM.

This hasn't proven to be a good method of providing Fault Tolerance, so I am trying to cluster these two VMs using a File Share Witness.

(I should add that my client engineers this solution, I only implement according to their requirements).

The Witness share has been created on an EMC VNX. My domain account and the domain Cluster Computer Name account both have Full Control permissions on the Share.

I have installed the File Services Role as well as the Failover Clustering Feature on both VMs.

To configure the Quorum, I am selecting Node and File Share Majority (For Clusters with special configuration).

I am providing the IP address and share name in the form of \\<ipaddress>\<sharename>.

The wizard returns \\<ipaddress>\<sharename> is not a valid share path.

Reading the documentation from Microsoft, there is a statement saying that the witness share should have nothing else stored or using that share. Within the shared folder is an .etc folder with Everyone, Unix UID=0x0, and Unix GID=0x0. All three accounts have 'special' permissions, Traverse, List, Read. I am unable to delete this folder (haven't tried to take ownership) because I don't know what the impact could be.

Is it possible that this folder is preventing the share from becoming a Witness Share?

Any other thoughts, ideas?

The client did test this in their lab, but used a file share on another Windows server rather than a NAS.

↧

Cluster Disk i/o Timeout

November 30, 2013, 11:27 am

≫ Next: Windows Failover Cluster (Errors retrieving file shares)

≪ Previous: File Share Witness is not a valid File share path

Hi ,

We are stuck in problem with our private cloud protection , when ever DPM trying to backup virtual machines the cluster shared volume i.o timeout and the disk disappeared for a moment and that cause my virtual machine rebooted unexpectedly and move to different nodes of cluster .

Follow are the configuration of my Infrastructure .

1. Windows server 2012 Cluster X 6 nodes

2. DPM 2012 SP1

Event Generated when backup initiate :

Cluster Shared Volume 'Volume2' ('CSV 7TB Cluster Disk Production') is no longer available on this node because of 'STATUS_CLUSTER_CSV_AUTO_PAUSE_ERROR(c0130021)'. All I/O will temporarily be queued until a path to the volume is reestablished.

I have applied hot fix as Microsoft recommended

http://support.microsoft.com/kb/2813630/en-us

Disabled ODX as well , because my storage doesn't support this feature .

please help me out to resolve this matter .

Best Regards,

Muzammil

Muzammil Ubaray

↧

Windows Failover Cluster (Errors retrieving file shares)

December 6, 2013, 11:23 am

≫ Next: cluster failovered often

≪ Previous: Cluster Disk i/o Timeout

I'm having an issue with Windows Failover Cluster with a Windows Server 2012 R2 machine. I have two cluster nodes (nodeA and nodeB). My issue is that when nodeA is the owner node, and I open failover cluster manager <clusterName> >> roles >> <fileserver role> >> shares tab it will hang and say that it is loading, but this will occur infinitely. Although when I go to nodeB (not the owner node) and I go to shares it will show me all of the shares that I have. Next when I go to <clusterName> >> Nodes >> click on Roles tab the information says "There were errors retrieving file shares."

Now when I switch the nobeB to the owner node, I cannot view the shares on that machine but can now view them on nodeA.

We alse have a test network where I have recreated the machines, environment and the failover cluster to as close as the production network as I can except everything works great in the test network

↧

cluster failovered often

November 19, 2013, 5:31 pm

≫ Next: Need suggestion on Clustring Books

≪ Previous: Windows Failover Cluster (Errors retrieving file shares)

Hi all,

We have two-node Windows 2008 R2 cluster which is set up for DHCP. (two VMs on VMware)

Both public and private NICs are teamed. The private team has gateway set up and DNS set up.

The cluster failovered every other day.

What should I troubleshoot?

Thank you.

↧

Need suggestion on Clustring Books

December 3, 2013, 7:30 am

≫ Next: Different but compatible version of the cluster service software

≪ Previous: cluster failovered often

Hi Team,

Can anyone recommend good books for learning on Microsoft Windows Clustring.

Thanks,

S.V.Ramana.

Ramana rao

↧

Different but compatible version of the cluster service software

December 3, 2013, 12:25 pm

≫ Next: Server 2012 Cluster nodes hang and VMs lock up, memory leaks and critical stops on both nodes.

≪ Previous: Need suggestion on Clustring Books

We have two physical servers in FCI and third server in different datacenter for DR with SQL Always On Configuredwhich is VM running same version of windows, we are getting the below alert

Node 'N1' which is physical established a communication session with node 'N3' which is VM and detected that it is running a different but compatible version of the cluster service software. It is recommended that the same version of the cluster service software be installed on all nodes in the cluster.

Can we ignore this safely?

↧

Server 2012 Cluster nodes hang and VMs lock up, memory leaks and critical stops on both nodes.

December 8, 2013, 9:57 am

≫ Next: can anyone share?

≪ Previous: Different but compatible version of the cluster service software

Last night my two-node cluster went down for no apparent reason. All VMs (4) were down even though the cluster manager said they were running. The cluster shared volume on my SAN was not accessible through Windows Explorer but the Dell mpio software showed it was connected and the SAN itself showed a connection and did not have any problem. It took me five hours of struggle to get the cluster running again. I had to remotely restart each node several times from another server using the command line because the RDP session would stop responding due to Explorer locking up. I ended up removing the antivirus software from each node but that was in desperation; I don't know if that was the problem or not. It finally started to work again when I manually brought the cluster IP back online, manually moved all resources to node1 and then did a pause and drain of node2 and restarted node2. This error shows up twice in the Application log of both nodes:

Possible Memory Leak. Application (C:\Windows\Cluster\rhs.exe -key SYSTEM\CurrentControlSet\Services\ClusSvc\Parameters\Rhs\0428d6b3-5c3b-4757-bc31-70379129ad89 -parentPid 3060 -initEvent 1dbde958-779b-4cd7-8daa-7c9299d0303c -replyEndpoint OLEAA17D0EF8BDFFAD1F4F33871C878) (PID: 4520) has passed a non-NULL pointer to RPC for an [out] parameter marked [allocate(all_nodes)]. [allocate(all_nodes)] parameters are always reallocated; if the original pointer contained the address of valid memory, that memory will be leaked. The call originated on the interface with UUID ({4b324fc8-1670-01d3-1278-5a47bf6ee188}), Method number (64). User Action: Contact your application vendor for an updated version of the application.

There are also two critical stops logged in the Dell OpenManage logs on each node.

The symptoms are very similar to this technet article for Server 2008 R2:

http://support.microsoft.com/kb/2798093

Both nodes are fully updated with hotfix 2870270.

Can anyone shed some light on this? What went wrong and how do I prevent it from happening again?

↧

can anyone share?

December 8, 2013, 2:21 pm

≫ Next: Failed fail-over when pulling network cable

≪ Previous: Server 2012 Cluster nodes hang and VMs lock up, memory leaks and critical stops on both nodes.

Hi All,

We have sql 2005 clustering and can anyone share what's your NetBIOS settings for the public NIC?

Thank you.

↧

Failed fail-over when pulling network cable

December 2, 2013, 4:10 am

≫ Next: SQL Server Failover Cluster & SAN Mirroring

≪ Previous: can anyone share?

Hello

Setup:

2-node cluster with physical servers. 3 network teams, 1 Heartbeat, 1 LAN and 1 SAN. Its the Windows Server 2012 Team SW thats been used.

Problem:

All fail-over tests work fine, except "pulling the network cable" on the LAN-network. By pulling the cable I mean disabling the NICs that creates the LAN-team. That triggers the fail-over from that server, but the IP-address in the cluster fails with following error messages: "

The Cluster service failed to bring clustered service or application 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application"

"Cluster resource 'Cluster IP Address' of type 'IP Address' in clustered role 'Cluster Group' failed."

I tried finding relevant information online, but nothing seems to clearly solve the issue. The Cluster Validation completes without any errors at all. The cluster can be brough online manually, but not by itself.

Are there any views on how the cluster is supposed to handle sudden network losses? Any suggestions?

Regards

Alex

↧

SQL Server Failover Cluster & SAN Mirroring

December 8, 2013, 3:36 am

≫ Next: Cluster network name resource failed registration (Event ID 1196)

≪ Previous: Failed fail-over when pulling network cable

Hi,

We have set up SQL Fail over Cluster (Windows Server 2012 Standard Edition). the 2 SQL Servers have access to 2 SAN Disks (2 x HP P2000 G3).

Now, we are able to see the 2 SANs as 2 drives in Windows Cluster without a problem, We need to set up SAN mirroring so if 1 SAN unit fails, the SQL serevr can retrieve the needed info from the second SAN unit.

We've realized that it's possible to mirror the 2 SAN disks from Windows rather (by going to computer management, and mirroring the disks) We tested and the mirrored volume seems to be accessible from SQL Server A and SQL Server B, as well as the cluster disks. We were just wondering if setting up mirrored SAN disks from Computer Management is recommended for that purpose? open to any other suggestions.

Thanks

↧

Cluster network name resource failed registration (Event ID 1196)

December 2, 2013, 11:48 am

≫ Next: FailoverCount is not getting reset for QuorumResource in Windows2012 R2 failover clusters

≪ Previous: SQL Server Failover Cluster & SAN Mirroring

I recently implemented Cluster Aware Updating on my Windows 2012 cluster. During the setup of the role I selected the option to "Add the CAU clustered role with self-updating mode enabled, to this cluster." The wizzard created a computer object "CAUHyper4gf" in AD for this role. Since that time, roughly every 15 mintues Event ID 1196 with the following error is logged:

"Cluster network name resource 'CAUHyper4gf' failed registration of one or more associated DNS name(s) for the following reason:
This operation returned because the timeout period expired.
.

Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server."

The Computer Object is listed in DNS and responds to pings. I haven't been able to find any articles addressing this specific issue. I'm not sure how to correct this issue. What can be done to eliminate these errors?

↧

FailoverCount is not getting reset for QuorumResource in Windows2012 R2 failover clusters

December 10, 2013, 6:07 am

≫ Next: ask your expertise.

≪ Previous: Cluster network name resource failed registration (Event ID 1196)

Hi,

I have two-node failover cluster on windows server 2012 R2 with third party resource as quorum with typeNode and Disk Majority. on fault of quorum resource FOC is not failing over "Cluster Group" to other cluster node. Following log lines are seen in cluster log.

Here is cluster log from fail node.

00008bc.000014d8::2013/12/06-11:45:49.591

INFO  [RCM] rcm::RcmGroup::Failover:(ClusterGroup)

000008bc.000014d8::2013/12/06-11:45:49.592

WARN  [RCM]Not failing over groupClusterGroup, failoverCount 2,

failoverThresholdSetting 4294967295, lastFailover 2013/12/06-03:39:54.190

000008bc.000014d8::2013/12/06-11:45:49.592

INFO  [RCM]Willretry online fromlong delay restart of quoDG in3600000

milliseconds.

Quorum resource failover policy’s Maximum failover count is set to one.

000008bc.000014d8::2013/12/06-11:45:49.591

INFO  [RCM] resource quoDG: failure count:1, restartAction:2

persistentState:1.

Is there a way to reset this FailoverCount ? When does FOC increments and resets this failovercount for a resource ?

Thanks in advance

Rakesh

Rakesh Agrawal

↧

ask your expertise.

December 9, 2013, 8:01 am

≫ Next: How to create a local, non-clustered storage pool

≪ Previous: FailoverCount is not getting reset for QuorumResource in Windows2012 R2 failover clusters

Windows 2008 R2 and SQL 2005 clustering
When I run cluster validation, I got the following warning "
The RegisterAllProviderIP property for network name 'Name: winclustername' is set to 1 For the current cluster configuration this value should be set to 0." The server has two NICs and one is public and another one is private (heartbeat)
should it be a concern?

Also, If I run ipconfig /all, I also see Microsoft failover cluster virtual adapter in addtion to public and private NICs, Microsoft failover cluster virtual adapter has 169.254.X.X private address, is this by design?

Thank you.

↧

How to create a local, non-clustered storage pool

December 9, 2013, 3:18 am

≫ Next: Failover Cluster & DFS on NAS SRV2012 R2

≪ Previous: ask your expertise.

Hello,

I have setup a two-node Failover Cluster, with a shared SAS DAS. So far so good.

One of the nodes also has internal disks that I wish to use for system backups.

This storage pool should not be clustered, as the disks cannot be seen from the other node. The trouble is that as soon as I create the pool it gets added to the cluster (in failed state).

In fact, the "Storage Pools" window in the server manager will only show me the "clustered storage spaces", with my internal disks in the Primordial pool.

Get-StorageSubSystem will show me both subsystems (Clustered Storage Space on ... + Storage Spaces on node-1) but fails to create a storage pool on the "local" subsystem.

How can I create a local, non clustered storage pool on internal disks ?

Cheers

alex

↧

Failover Cluster & DFS on NAS SRV2012 R2

December 9, 2013, 2:40 am

≫ Next: Microsoft Cluster error

≪ Previous: How to create a local, non-clustered storage pool

Dear all,

I've been searching for NAS solution to configure Failover Clustering and DFS.. There are too many scenarios as well definitions, and I need a best practices..

Shortly

I have 2 HP Servers each one has the following.

HP 8G

Domain Controller (Physically)
File Server (Hyper-v)
Exchange Server (Hyper-v)

---------------

HP G6

Additional Domain Controller (Physically)
Additional File Server (Hyper-v)
Additional Exchange Server (Hyper-v)

I believe that I can configure the Failover Clusters on the servers it self..?!

But i need to configure the files server DFS to be placed on the NAS, but I'm not sure about the NAS compatibility with 2012 R2 e.g (WD DX4000, QNAP, DELL..) and also whether this is a right figure.. So please can anyone help me?

↧

Microsoft Cluster error

December 11, 2013, 2:23 am

≫ Next: Cluster Aware Updating - Failed to enter maintenance mode.

≪ Previous: Failover Cluster & DFS on NAS SRV2012 R2

Hi,

I am getting the following error in Windows 2008 R2 cluster with sql 2008 event logs...can anybody haelp me resolving the same.

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 9/26/2013 7:25:16 AM
Event ID: 1207
Task Category: Network Name Resource
Level: Error
Keywords:
User: SYSTEM
Computer: NODEDB1.NODE.COM
Description:
Cluster network name resource 'SQL Network Name (SQLNODE)' cannot be brought online. The computer object associated with the resource could not be updated in domain 'NODE.COM' for the following reason:
Unable to update password for computer account.

The text for the associated error code is: Access is denied.

The cluster identity 'NODECLUSTER$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
<Provider Name="Microsoft-Windows-FailoverClustering" Guid="{BAF908EA-3421-4CA9-9B84-6689B8C6F85F}" />
<EventID>1207</EventID>
<Version>0</Version>
<Level>2</Level>
<Task>19</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2013-09-26T01:55:16.483346600Z" />
<EventRecordID>18934</EventRecordID>
<Correlation />
<Execution ProcessID="4068" ThreadID="6400" />
<Channel>System</Channel>
<Computer>NODEDB1.NODE.COM</Computer>
<Security UserID="S-1-5-18" />
</System>
<EventData>
<Data Name="ResourceName">SQL Network Name (SQLNODE)</Data>
<Data Name="DomainName">NODE.COM</Data>
<Data Name="FailureString">Unable to update password for computer account</Data>
<Data Name="Status">Access is denied.
</Data>
<Data Name="ClusterIdentity">NODECLUSTER$</Data>
<Data Name="BinaryParameterLength">4</Data>
<Data Name="BinaryData">05000000</Data>
</EventData>
</Event>

↧

Cluster Aware Updating - Failed to enter maintenance mode.

December 11, 2013, 2:39 am

≫ Next: Hyper-V Failover Cluster - Inconsistent Network Availability

≪ Previous: Microsoft Cluster error

Hello everyone,

I am trying to run Cluster Aware Updating and I thought everything went well:-) But that was just on the first node of 2. When the second node of cluster is trying to get to maintenance mode I get an error: "Node XYZ failed to enter maintenance mode. No retries left." I tried to go trough logs but found just: "InvokeCauRunOperation:Node drain failed on node XYZ. Additional: System.Management.Automation.RemoteException: Node drain failed on node XYZ."

All VMs are clustered.

Analyze cluster updating readiness - successful

I would appreciate any advice!

Thank you very much!

Roman

↧

Hyper-V Failover Cluster - Inconsistent Network Availability

December 11, 2013, 5:58 am

≫ Next: CAU: Analyze Clutser Passes all nodes except local node

≪ Previous: Cluster Aware Updating - Failed to enter maintenance mode.

We've got a Small cluster with, 7 hosts and a dozen or two VM's. For some reason i'm getting inconsistent availability with the Cluster networks. The host seem to function fine on there own but theres all types of issues using Migration which i'm assuming is because certain hosts think other hosts are unavailable. For Example:

Cluster Network 1 - From Host 8

Cluster Network 1 - From Host 10

As far as I can tell all of the networks are UP. I can ping all hosts on all interfaces. What criteria goes into determining host availability?

↧