Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

Server 2012 DataCenter and Server 2012 DataCenter R2 Clustering

$
0
0

We have a 2012 Datacenter Server in our network, we are introducing a 2012 Datacenter R2 server as well as a SAN and iscsi for clustering and high availability of Virtual servers. We are currently using Hyper-V on the 2012 DC we already have. 

I've built the 2012 DC R2 server and have clustering installed, I install clustering and try to add the 2012 DC server and it does not add, in the report it tells me it is because one server is 2012 Datacenter and the other is 2012 Datacenter R2. 

Our issue is this, the original server is a production server with 10 Virtuals on it right now, we would prefer to get that to R2 in order to have the latest version as well as the live migration on failover. 

Our options appear to be: 

1. Blow away new server, install 2012 Datacenter non-R2 and cluster. 

2. Do #1, then try to upgrade each server to R2. 

3. Move all virtuals from original server to the R2 server, upgrade the original to R2 and then cluster that. 

What seems to be the best method? What licensing should I be concerned of? If I don't have a 2012 R2 license available we would need to buy that correct? 


San failover, disk timeout, iscsi and mpio

$
0
0

Hi

I am testing san controller failover. It takes around 2 mins for the second controller to come online after the first has failed.

There are some registry settings that can be configured to increase disk timeout but they don't seem to work when failover clustering is enabled.

I am testing this on a Hyper-V 2012 R2 failover cluster (regular clustered disks and CSVs - the same issue occurs on both)

I have changed the following registry settings 

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\disk\TimeoutValue = 240

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\mpio\Parameters\PDORemovePeriod = 240

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e97b-e325-11ce-bfc1-08002be10318}\0003\Parameters\LinkDownTime = 60

But as soon as the second controller comes up the cluster registers a failure of all the clustered disks and restarts the VMs. Just wondering whether the 2nd controller coming online is somehow triggering the clustered disk failure.

I am seeing the following in the event log.

Connection to the target was lost. The initiator will attempt to retry the connection.

\Device\MPIODisk3 is currently in a degraded state. One or more paths have failed, though the process is now complete.

Ownership of cluster disk 'Cluster Disk 1' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

Thanks

Daniel


Server 2012 Failover Cluster No Disks available / iSCSI

$
0
0

Hi All,

I am testing out the Failover Clustering on Windows Server 2012 with hopes of winding up with a clustered File Server once I am done. 

I am starting with a single node in the cluster for testing purposes; I have connected to this cluster a single iSCSI LUN that is 100GB in size.

When I right click on Storage -> Disks  and then click 'Add Disk', I get No disks suitable for cluster disks were found.

I get this, even if I add a second server to the cluster, and connect it to the iSCSI drive as well.

Any ideas?

Getting Error: cluster ip address not added to tcpip properties

$
0
0

I have 2 2008 R2 physical servers on the same subnet and they have been using NLB for the past 1.5 years.  We had a firewall issue and I took one of the servers out of the cluster to do testing, while the other main server (priority 1) was left serving up the virtual IPs. The main server continues to work properly.

The servers have 2 NICs, one for NLB and one just for regular traffic.  The NICs also have their own IP addresses and then there is a cluster IP and 2 virtual IPs.

Error:

When I try and add the second server to the cluster, I first connect to existing cluster which works fine.  Then I do a Add Host to Cluster, and type the name of the server and select the NLB NIC.  It sees the other server and it seems to start the process, however soon after the NLB NIC goes to having internet access to a "enabled" state and the gateway gets taken out of the settings.  I try to add it back, but as soon as I get out of the settings it disappears again.  NLB manager tells me: cluster ip address (192.#.#.#) not added to tcpip properties.  It lists this error 4 times, once for each IP (2 virtual, 1 cluster, and then once for the dedicated NLB NIC IP).  I have also tried adding all virtual IPs to the NLB NIC's settings and still same exact error.  Registry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters\Interfaces    -even reg looks good.

Any help would be appreciated.  If I can't get any resolution my next step is going to be to delete the NLB cluster on the main server and recreate it....but this requires downtime and got to make sure it comes back up!

Random Reboots

$
0
0

Hello, I am experiencing random reboots on random servers during the week at random time... I did some research and I am suspecting that the "automatic recovery for application health monitoring" is what causes that but I am not sure... Can any expert make a suggestion? I am attaching a copy of the log that is being generated during the shutdown...

00001028.0000367c::2014/02/25-06:12:34.316 INFO  [RHS] Resource Virtual
Machine Configuration Computer-1 called SetResourceLockedMode.
LockedModeEnabled1, LockedModeReason0.
0000095c.00003230::2014/02/25-06:12:34.316 INFO  [RCM] HandleMonitorReply:
LOCKEDMODE for 'Virtual Machine Configuration Computer-1', gen(0)
result 0/0.
0000095c.00003230::2014/02/25-06:12:34.316 INFO  [RCM] Virtual Machine
Configuration Computer-1: Flags 1 added to StatusInformation. New
StatusInformation 1
00001028.0000367c::2014/02/25-06:12:34.316 INFO  [RHS] Resource Virtual
Machine Computer-1 called SetResourceLockedMode. LockedModeEnabled1,
LockedModeReason0.
0000095c.00003230::2014/02/25-06:12:34.316 INFO  [RCM] Computer-1:
Added Flags 1 to StatusInformation. New StatusInformation 1
0000095c.00001364::2014/02/25-06:12:34.316 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.000020d0::2014/02/25-06:12:34.316 INFO  [RCM] HandleMonitorReply:
LOCKEDMODE for 'Virtual Machine Computer-1', gen(0) result 0/0.
0000095c.000020d0::2014/02/25-06:12:34.316 INFO  [RCM] Virtual Machine
Computer-1: Flags 1 added to StatusInformation. New StatusInformation 1
0000095c.00001364::2014/02/25-06:12:34.316 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.000020d0::2014/02/25-06:12:34.320 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.000020d0::2014/02/25-06:12:34.320 INFO  [GUM] Node 6: Processing
RequestLock 6:321
0000095c.00000f30::2014/02/25-06:12:34.321 INFO  [GUM] Node 6: Processing
GrantLock to 6 (sent by 3 gumid: 3827)
0000095c.000020d0::2014/02/25-06:12:34.321 INFO  [GUM] Node 6: executing
request locally, gumId:3828, my action: /dm/update, # of updates: 1
0000095c.000020d0::2014/02/25-06:12:34.321 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.00001b00::2014/02/25-06:12:34.323 INFO  [RCM] HandleMonitorReply:
INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine Computer-1', gen(0)
result 0/0.
0000095c.0000356c::2014/02/25-06:12:34.324 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
00001028.0000367c::2014/02/25-06:12:34.927 INFO  [RHS] Resource Virtual
Machine Configuration Computer-1 called SetResourceLockedMode.
LockedModeEnabled0, LockedModeReason0.
0000095c.0000356c::2014/02/25-06:12:34.927 INFO  [RCM] HandleMonitorReply:
LOCKEDMODE for 'Virtual Machine Configuration Computer-1', gen(0)
result 0/0.
0000095c.0000356c::2014/02/25-06:12:34.927 INFO  [RCM] Virtual Machine
Configuration Computer-1: Flags 1 removed from StatusInformation. New
StatusInformation 0
00001028.0000367c::2014/02/25-06:12:34.928 INFO  [RHS] Resource Virtual
Machine Computer-1 called SetResourceLockedMode. LockedModeEnabled0,
LockedModeReason0.
0000095c.00000140::2014/02/25-06:12:34.928 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.00000c8c::2014/02/25-06:12:34.928 INFO  [RCM] HandleMonitorReply:
LOCKEDMODE for 'Virtual Machine Computer-1', gen(0) result 0/0.
00001028.0000367c::2014/02/25-06:12:34.928 INFO  [RES] Virtual Machine
<Virtual Machine Computer-1>: Current state 'Online', event 'VmStopped'
0000095c.00000c8c::2014/02/25-06:12:34.928 INFO  [RCM] Virtual Machine
Computer-1: Flags 1 removed from StatusInformation. New
StatusInformation 0
0000095c.00000c8c::2014/02/25-06:12:34.928 INFO  [RCM] Computer-1:
Removed Flags 1 from StatusInformation. New StatusInformation 0
0000095c.00000140::2014/02/25-06:12:34.928 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
00001028.0000367c::2014/02/25-06:12:34.928 INFO  [RES] Virtual Machine
<Virtual Machine Computer-1>: State change 'Online' -> 'Offline'
0000095c.00000140::2014/02/25-06:12:34.928 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.000017a8::2014/02/25-06:12:34.929 INFO  [RCM]
rcm::RcmApi::OfflineResource: (Virtual Machine Computer-1, 1)
0000095c.000017a8::2014/02/25-06:12:34.929 INFO  [GUM] Node 6: executing
request locally, gumId:3829, my action: /dm/update, # of updates: 1
0000095c.000017a8::2014/02/25-06:12:34.930 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.000017a8::2014/02/25-06:12:34.931 INFO  [RCM] Res Virtual Machine
Computer-1: Online -> WaitingToGoOffline( StateUnknown )
0000095c.000017a8::2014/02/25-06:12:34.931 INFO  [RCM]
TransitionToState(Virtual Machine Computer-1)
Online-->WaitingToGoOffline.
0000095c.000017a8::2014/02/25-06:12:34.931 INFO  [RCM]
rcm::RcmGroup::UpdateStateIfChanged: (Computer-1, Online --> Pending)
0000095c.000017a8::2014/02/25-06:12:34.931 INFO  [RCM] Res Virtual Machine
Computer-1: WaitingToGoOffline -> OfflineCallIssued( StateUnknown )
0000095c.000017a8::2014/02/25-06:12:34.931 INFO  [RCM]
TransitionToState(Virtual Machine Computer-1)
WaitingToGoOffline-->OfflineCallIssued.
0000095c.0000356c::2014/02/25-06:12:34.931 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.00000de8::2014/02/25-06:12:34.931 INFO  [RCM] ignored non-local
state Pending for group Computer-1
0000095c.0000356c::2014/02/25-06:12:34.931 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
00001028.000025cc::2014/02/25-06:12:34.931 INFO  [RES] Virtual Machine
<Virtual Machine Computer-1>: Current state 'Offline', event 'Offline'
0000095c.0000356c::2014/02/25-06:12:34.932 INFO  [RCM] HandleMonitorReply:
OFFLINERESOURCE for 'Virtual Machine Computer-1', gen(0) result 0/0.
0000095c.0000356c::2014/02/25-06:12:34.932 INFO  [RCM] Res Virtual Machine
Computer-1: OfflineCallIssued -> OfflineSavingCheckpoints( StateUnknown
)
0000095c.0000356c::2014/02/25-06:12:34.932 INFO  [RCM]
TransitionToState(Virtual Machine Computer-1)
OfflineCallIssued-->OfflineSavingCheckpoints.
0000095c.00000c8c::2014/02/25-06:12:34.932 INFO  [RCM] Res Virtual Machine
Computer-1: OfflineSavingCheckpoints -> Offline( StateUnknown )
0000095c.00000c8c::2014/02/25-06:12:34.932 INFO  [RCM]
TransitionToState(Virtual Machine Computer-1)
OfflineSavingCheckpoints-->Offline.
0000095c.00000c8c::2014/02/25-06:12:34.932 INFO  [RCM]
rcm::RcmGroup::UpdateStateIfChanged: (Computer-1, Pending --> Offline)
0000095c.0000356c::2014/02/25-06:12:34.932 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.00000c8c::2014/02/25-06:12:34.932 INFO  [RCM] moved 0 tasks from
staging set to task set.  TaskSetSize=0
0000095c.00000c8c::2014/02/25-06:12:34.932 INFO  [RCM]
rcm::RcmPriorityManager::StartGroups: [RCM] done, executed 0 tasks
0000095c.0000356c::2014/02/25-06:12:34.932 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.00000de8::2014/02/25-06:12:34.932 INFO  [RCM] ignored non-local
state Offline for group Computer-1
0000095c.000017a8::2014/02/25-06:12:34.955 INFO  [GUM] Node 6: executing
request locally, gumId:3830, my action: /dm/update, # of updates: 1
0000095c.000017a8::2014/02/25-06:12:34.955 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.000017a8::2014/02/25-06:12:34.957 INFO  [RCM] HandleMonitorReply:
INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine Computer-1', gen(0)
result 0/0.
0000095c.000017a8::2014/02/25-06:12:34.957 INFO  [GEM] Node 6: Sending 1
messages as a batched GEM message
0000095c.00000f30::2014/02/25-06:12:35.315 INFO  [GEM] Node 6: Deleting
[3:1497 , 3:1497] (both included) as it has been ack'd by every node

2 node failover cluster power down

$
0
0

I have a 2node failover cluster. When I power down a node that has the SQL server instance and resources, all the resources and service failover to the other node.   When I see that all the resources and service report "online" I then power that node.  I am being told that this is improper because failover may not have completed.  Is that correct?

Also, in our 2 node failover cluster is there a proper sequence to restarting the powered down nodes?

moving sql to new service application

$
0
0
i currently have a 2 sql servers in a cluster. i created a new service/app and now I want to move the whole sql server app to this new app. how can this be done?

MSDTC Service Interrupted due to Low Disk Space in MSDTC Log drive (Cluster 2008)

$
0
0

Hi,

I am working on SQL Cluster 2008 along with MSDTC service configured for the purpose of Distributed transactions.

I was looking into cluster logs and found that there is fail over happened of MSDTC service. I have analyzed the event viewer logs and found the MSDTC service interrupted at particular time and same was restored automatically. Below is the details of MSDTC logs captured:

Event ID: 4103, Source MSDTC Client 2

Description: Service: MSDTC$d02c9ba0-70c4-4b72-994c-30170ab14f2d is still running. Attempt to cleanup the service has failed.

I have gone through the Health status of server and found the cause of Service interrupt is due to high disk space utilization. I have verified the MSDTC log file but haven't found any error which indicates any issue from MSDTC service end.

Generally MSDTC log consuming 4 MB space and currently there is 5 GB disk space is allocated to MSDTC service.

I am still struggling to find the RCA... Please help on same.



MS SQL Server 2008 Metro Cluster with same subnet

$
0
0

Hi All,

I want to setup SQL 2008 Active / Passive Cluster for OCS 2007 between two Data centers, Both DCs are same location which building 1(DC1) and building 2(DC2).

I am planning to use Hyper-V cluster between two DCs and then create SQL 2008 VMs then  create another windows 2008 with SQL clusters.

For the above solution, I need to use ISCSI based EMC VNX-e storages on both DCs and do the replication.

Is there any other better solution to perform? Please help me to provide more infromation.

win server 2012 two node cluster, local "cliuser" issue

$
0
0

Hello,

I have a two node Windows Server 2012 STN Cluster with a few SQL instances installed inside it.  Recently in my security event log I see these errors on both nodes:

An attempt was made to reset an account's password.

Subject:
Security ID: SYSTEM
Account Name:<>$
Account Domain:<>
Logon ID: 0x3E7

Target Account:
Security ID: lcoalmachinename\CLIUSR
Account Name: CLIUSR
Account Domain:localmachine name

==

When I look at the local account on both nodes, I see that password is set to never expire, and not be able to be reset.  I am quite confused then, how the above could happen.  Any advice or ideas would be greatly appreciated.

Thank you

Migration 2008R2 Cluster to 2012R2 - SAN Compatible?

$
0
0

Hello All,

We have a Server 2008 R2 two-node Hyper-V cluster running on an IBM DS3400 SAN.
We would like to upgrade the nodes to Server 2012 R2. The node hardware supports 2012 R2. But IBM hasn't listed support of 2012 R2 for the DS3400 SAN. The latest supported OS is 2008 R2.

My question is, despite this lack of support, what is the likelihood that 2012 R2 will work with the DS3400 SAN? Is it one of those things where it is likely to work, or is it something that is a risky undertaking due to the lack of listed compatibility, and I should leave it alone? A lot of the general documentation seems to suggest that, in general, if a SAN worked for 2008 R2, then it should be fine with 2012 R2 .. but, as mentioned, it is quite general information.

Thanks!

'Beowulf Cluster' under a Windows environment ?

$
0
0

Hello,

Anyone have thoughts on our experiment to build a simple 'Beowulf Cluster' under a Windows environment ?  Presently we have four older Dell single core machines, nearly identical, that would fit well into this test. I have been reviewing the book from MIT entitled "Beowulf Cluster Computing with Linux" published in 2003 but that uses Ubuntu as the basis for the cluster.

I have also been looking at articles from the Ubuntu wiki entitled: "Ubuntu Kerrighed Cluster Guide" at  https://wiki.ubuntu.com/EasyUbuntuClustering/UbuntuKerrighedClusterGuide with a step by step guide on how to set up a Kerrighed Cluster.

In 2006, Microsoft introduced Windows Compute Cluster Server however this appears to be for high-performance computing clusters and I think was meant to replace Linux. It's probably not applicable to this project.  Windows CCS is also fairly expensive for just an experiment.

I am learning my way through this experimental project and therefore am starting with no knowledge on the subject. 

Any help or direction would be greatly appreciated.

Thanks,

Dan

VMs Fail Randomly on 2012 Cluster

$
0
0
Our 2-node Server 2012 Hyper-V cluster is having an issue where VMs seem to randomly fail for no apparent reason.  We are using Dell R900s with a MD3200i SAN and we have separate networks for iSCSI and Heartbeat.  On one of the nodes we get an error 1069 "Cluster Resource 'Virtual Machine VM200X32' of type 'Virtual Machine' in clustered role 'VM200X32' failed."  Below I have listed the relevant cluster log events from around the time it fails.  The cluster has passed validation and is running 50+ test VMs just fine, it's only a few of them that seem to be having this issue.

Just wondering if anyone else might have some input on what the problem could be.

Cluster event log:

2013/08/25-16:41:18.448 WARN  [RHS] Resource Virtual Machine VM200X32 IsAlive has indicated failure.
2013/08/25-16:41:18.463 INFO  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'Virtual Machine VM200X32', gen(0) result 1/0.
2013/08/25-16:41:18.463 INFO  [RCM] Res Virtual Machine VM200X32: Online -> ProcessingFailure( StateUnknown )
2013/08/25-16:41:18.463 INFO  [RCM] TransitionToState(Virtual Machine VM200X32) Online-->ProcessingFailure.
2013/08/25-16:41:18.463 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (VM200X32, Online --> Pending)
2013/08/25-16:41:18.463 ERR   [RCM] rcm::RcmResource::HandleFailure: (Virtual Machine VM200X32)
2013/08/25-16:41:18.463 INFO  [RCM] resource Virtual Machine VM200X32: failure count: 0, restartAction: 0 persistentState: 1.
2013/08/25-16:41:18.463 INFO  [RCM] Will queue immediate restart (500 milliseconds) of Virtual Machine VM200X32 after terminate is complete.
2013/08/25-16:41:18.463 INFO  [RCM] Res Virtual Machine VM200X32: ProcessingFailure -> WaitingToTerminate( DelayRestartingResource )
2013/08/25-16:41:18.463 INFO  [RCM] TransitionToState(Virtual Machine VM200X32) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].
2013/08/25-16:41:18.463 INFO  [RCM] Res Virtual Machine VM200X32: [WaitingToTerminate to DelayRestartingResource] -> Terminating( DelayRestartingResource )
2013/08/25-16:41:18.463 INFO  [RCM] TransitionToState(Virtual Machine VM200X32) [WaitingToTerminate to DelayRestartingResource]-->[Terminating to DelayRestartingResource].
2013/08/25-16:41:18.463 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: Current state 'Online', event 'Terminate'
2013/08/25-16:41:18.463 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: State change 'Online' -> 'Terminated'
2013/08/25-16:41:18.463 INFO  [RCM] ignored non-local state Pending for group VM200X32
2013/08/25-16:41:18.479 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine Configuration VM200X32', gen(0) result 0/0.
2013/08/25-16:41:18.479 INFO  [RCM] Virtual Machine Configuration VM200X32: Flags 1 added to StatusInformation. New StatusInformation 1
2013/08/25-16:41:18.479 INFO  [RCM] VM200X32: Added Flags 1 to StatusInformation. New StatusInformation 1
2013/08/25-16:41:18.479 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:19.275 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine Configuration VM200X32', gen(0) result 0/0.
2013/08/25-16:41:19.275 INFO  [RCM] Virtual Machine Configuration VM200X32: Flags 1 removed from StatusInformation. New StatusInformation 0
2013/08/25-16:41:19.275 INFO  [RCM] VM200X32: Removed Flags 1 from StatusInformation. New StatusInformation 0
2013/08/25-16:41:19.275 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:19.275 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: Current state 'Terminated', event 'VmStopped'
2013/08/25-16:41:19.306 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:19.836 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:19.836 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:19.836 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: State change 'Terminated' -> 'Offline'
2013/08/25-16:41:19.836 INFO  [RCM] HandleMonitorReply: TERMINATERESOURCE for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:19.836 INFO  [RCM] Res Virtual Machine VM200X32: [Terminating to DelayRestartingResource] -> DelayRestartingResource( StateUnknown )
2013/08/25-16:41:19.836 INFO  [RCM] TransitionToState(Virtual Machine VM200X32) [Terminating to DelayRestartingResource]-->DelayRestartingResource.
2013/08/25-16:41:19.836 WARN  [RCM] Queueing immediate delay restart of resource Virtual Machine VM200X32 in 500 ms.
2013/08/25-16:41:20.351 INFO  [RCM] Delay-restarting Virtual Machine VM200X32 and any waiting dependents.
2013/08/25-16:41:20.351 INFO  [RCM-rbtr] giving default token to group VM200X32
2013/08/25-16:41:20.351 INFO  [RCM-rbtr] giving default token to group VM200X32
2013/08/25-16:41:20.351 INFO  [RCM] Res Virtual Machine VM200X32: DelayRestartingResource -> OnlineCallIssued( StateUnknown )
2013/08/25-16:41:20.351 INFO  [RCM] TransitionToState(Virtual Machine VM200X32) DelayRestartingResource-->OnlineCallIssued.
2013/08/25-16:41:20.351 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: Current state 'Offline', event 'Online'
2013/08/25-16:41:20.351 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: State change 'Offline' -> 'OnlinePending'
2013/08/25-16:41:20.351 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'Virtual Machine VM200X32', gen(1) result 997/0.
2013/08/25-16:41:20.351 INFO  [RCM] Res Virtual Machine VM200X32: OnlineCallIssued -> OnlinePending( StateUnknown )
2013/08/25-16:41:20.351 INFO  [RCM] TransitionToState(Virtual Machine VM200X32) OnlineCallIssued-->OnlinePending.
2013/08/25-16:41:20.351 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:20.351 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine Configuration VM200X32', gen(0) result 0/0.
2013/08/25-16:41:20.351 INFO  [RCM] Virtual Machine Configuration VM200X32: Flags 1 added to StatusInformation. New StatusInformation 1
2013/08/25-16:41:20.351 INFO  [RCM] VM200X32: Added Flags 1 to StatusInformation. New StatusInformation 1
2013/08/25-16:41:20.367 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:20.694 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:21.911 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine Configuration VM200X32', gen(0) result 0/0.
2013/08/25-16:41:21.911 INFO  [RCM] Virtual Machine Configuration VM200X32: Flags 1 removed from StatusInformation. New StatusInformation 0
2013/08/25-16:41:21.911 INFO  [RCM] VM200X32: Removed Flags 1 from StatusInformation. New StatusInformation 0
2013/08/25-16:41:21.911 INFO  [RCM] HandleMonitorReply: LOCKEDMODE for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:21.911 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: Current state 'OnlinePending', event 'VmRunning'
2013/08/25-16:41:21.942 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:21.942 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: 'Virtual Machine VM200X32' successfully started the virtual machine.
2013/08/25-16:41:21.958 INFO  [RES] Virtual Machine <Virtual Machine VM200X32>: State change 'OnlinePending' -> 'Online'
2013/08/25-16:41:21.958 INFO  [RHS] Resource Virtual Machine VM200X32 has come online. RHS is about to report status change to RCM
2013/08/25-16:41:21.958 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:21.958 INFO  [RCM] Res Virtual Machine VM200X32: OnlinePending -> Online( StateUnknown )
2013/08/25-16:41:21.958 INFO  [RCM] TransitionToState(Virtual Machine VM200X32) OnlinePending-->Online.
2013/08/25-16:41:21.958 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (VM200X32, Pending --> Online)
2013/08/25-16:41:21.958 INFO  [RCM] HandleMonitorReply: INMEMORY_NODELOCAL_PROPERTIES for 'Virtual Machine VM200X32', gen(1) result 0/0.
2013/08/25-16:41:21.958 INFO  [RCM] ignored non-local state Online for group VM200X32

Hyper-V 2008 R2 cluster down

$
0
0

Hello,

We have a 3 node Hyper-V 2008 R2 failover cluster. Yesterday we had problems and all VM's on node 3 went down. At 17:00 the back-up starts of the VM's which are on CSV(with DPM 2010 and hardware providers for Dell) but then one nodes crashed. It look like node 3 tries to take CSV04 ownership but node 1 doesn't accept that. See the cluster log below. Can someone tell me what triggered this? And how to fix this? Many thanks!

It start with a message that some Volume manager disk group and Geocluster disks are not found. This says nothing to me.

00000af0.00000c24::2014/02/23-17:00:27.738 WARN  Resource type Volume Manager Disk Group not found.
00000af0.00000c24::2014/02/23-17:00:27.738 WARN  Resource type GeoCluster Replicated Disk not found.
00000af0.000017fc::2014/02/23-17:00:35.117 INFO  [RCM] rcm::RcmApi::MoveGroup: (6d07505c-cd56-4354-bb78-f0d452eb7350, 1)
00000af0.000017fc::2014/02/23-17:00:35.117 INFO  [RCM] rcm::RcmGroup::Move: (6d07505c-cd56-4354-bb78-f0d452eb7350, 1)
00000af0.000017fc::2014/02/23-17:00:35.117 INFO  [RCM] rcm::RcmGroup::Move: Bringing group '6d07505c-cd56-4354-bb78-f0d452eb7350' offline first...
00000af0.000017fc::2014/02/23-17:00:35.117 INFO  [RCM] TransitionToState(CSV04) Online-->OfflineCallIssued.
00000af0.000017fc::2014/02/23-17:00:35.117 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (6d07505c-cd56-4354-bb78-f0d452eb7350, Online --> Pending)
00000af0.000017fc::2014/02/23-17:00:35.117 INFO  [DCM] PreOffline for CSV resource CSV04
00000af0.000017fc::2014/02/23-17:00:35.117 INFO  [DCM] Unmapping volumes for cfs resource CSV04
00000af0.00002330::2014/02/23-17:00:35.117 INFO  [NM] Received request from client address 172.16.0.1.
00000af0.00000d04::2014/02/23-17:00:35.179 INFO  [DCM] Processing message dcm/pause
00000af0.00000d04::2014/02/23-17:00:35.179 INFO  [DCM] Push.AsyncPauseDisk for 7177671d-4b75-4c6e-ad3d-4ff3671ce779
00000af0.00001594::2014/02/23-17:00:35.179 INFO  [DCM] SyncHandler for 7177671d-4b75-4c6e-ad3d-4ff3671ce779
00000af0.00001594::2014/02/23-17:00:35.179 INFO  [DCM] enter_AllGood(7177671d-4b75-4c6e-ad3d-4ff3671ce779) P0..75 P0..150
00000af0.00001594::2014/02/23-17:00:35.179 INFO  [DCM] MappingManager::PauseVolume 'Volume4'
00000af0.00001594::2014/02/23-17:00:35.179 INFO  [DCM] Filter.ChangeState (ctx=2, state=CfsVolumeStatePaused)
00000af0.00002130::2014/02/23-17:00:35.273 INFO  [NM] Received request from client address HV03.
00000af0.00002130::2014/02/23-17:00:35.288 ERR   [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_NOT_ONLINE(5004)' because of 'PreprocessControl(16777765) failed for resource 'CSV04'.'
00000af0.00002130::2014/02/23-17:00:35.288 WARN  [RCM] ResourceControl(STORAGE_GET_SHARED_VOLUME_INFO) to CSV04 returned 5004.
00000af0.00002130::2014/02/23-17:00:35.288 INFO  [NM] Received request from client address HV03.
00000af0.00000c24::2014/02/23-17:00:35.320 ERR   [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_NOT_ONLINE(5004)' because of 'PreprocessControl(16777765) failed for resource 'CSV04'.'
00000af0.00000c24::2014/02/23-17:00:35.320 WARN  [RCM] ResourceControl(STORAGE_GET_SHARED_VOLUME_INFO) to CSV04 returned 5004.
00000f30.00001888::2014/02/23-17:00:35.663 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00002960::2014/02/23-17:00:35.663 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
00000af0.00002960::2014/02/23-17:00:35.663 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node.  Attempting to find a good node...
00000f30.00001888::2014/02/23-17:00:35.663 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00000c24::2014/02/23-17:00:35.663 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
00000f30.00001888::2014/02/23-17:00:35.710 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00000c24::2014/02/23-17:00:35.710 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
00000f30.00001888::2014/02/23-17:00:35.912 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00002960::2014/02/23-17:00:35.912 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000af0.00002960::2014/02/23-17:00:35.912 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node.  Attempting to find a good node...
00000f30.00001888::2014/02/23-17:00:35.912 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00000c24::2014/02/23-17:00:35.912 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000f30.00001888::2014/02/23-17:00:35.912 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00000c24::2014/02/23-17:00:35.912 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000af0.00000c24::2014/02/23-17:00:35.928 WARN  Resource type Volume Manager Disk Group not found.
00000af0.00000c24::2014/02/23-17:00:35.928 WARN  Resource type GeoCluster Replicated Disk not found.
00000af0.00002960::2014/02/23-17:00:36.209 INFO  [NM] Received request from client address HV03.
00000f30.000029d4::2014/02/23-17:00:36.599 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00000c24::2014/02/23-17:00:36.599 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
00000af0.00000c24::2014/02/23-17:00:36.599 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node.  Attempting to find a good node...
00000f30.000029d4::2014/02/23-17:00:36.599 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00002960::2014/02/23-17:00:36.599 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
00000f30.000029d4::2014/02/23-17:00:36.599 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00000c24::2014/02/23-17:00:36.599 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000af0.00000c24::2014/02/23-17:00:36.599 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node.  Attempting to find a good node...
00000f30.000029d4::2014/02/23-17:00:36.599 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00002960::2014/02/23-17:00:36.599 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000f30.000029d4::2014/02/23-17:00:36.614 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00002960::2014/02/23-17:00:36.614 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
00000f30.000029d4::2014/02/23-17:00:36.630 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00002960::2014/02/23-17:00:36.630 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000af0.00000c24::2014/02/23-17:00:36.911 ERR   [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_NOT_ONLINE(5004)' because of 'PreprocessControl(16777765) failed for resource 'CSV04'.'
00000af0.00000c24::2014/02/23-17:00:36.911 WARN  [RCM] ResourceControl(STORAGE_GET_SHARED_VOLUME_INFO) to CSV04 returned 5004.
00000af0.00002960::2014/02/23-17:00:36.989 ERR   [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_NOT_ONLINE(5004)' because of 'PreprocessControl(16777765) failed for resource 'CSV04'.'
00000af0.00002960::2014/02/23-17:00:36.989 WARN  [RCM] ResourceControl(STORAGE_GET_SHARED_VOLUME_INFO) to CSV04 returned 5004.
00000af0.000014c8::2014/02/23-17:00:39.251 INFO  [DCM] filter.Event ->CfsVolumeStatePaused FromPause for ctx=2 status 00000000
00000af0.00001594::2014/02/23-17:00:39.251 INFO  [DCM] volume paused 'Volume4'
00000af0.000017fc::2014/02/23-17:00:39.391 INFO  [DCM] dcm/pause successfully completed on all nodes
00000af0.000017fc::2014/02/23-17:00:39.391 INFO  [DCM] removing share 7177671d-4b75-4c6e-ad3d-4ff3671ce779-135266304$, status 0
00000af0.00002960::2014/02/23-17:00:39.391 INFO  [NM] Received request from client address 172.16.0.1.
00000af0.000017fc::2014/02/23-17:00:39.438 INFO  [DCM] ClearVolumeStates: resource 'CSV04' states <vector len='2'>
00000af0.000017fc::2014/02/23-17:00:39.438 INFO      <item>1</item>
00000af0.000017fc::2014/02/23-17:00:39.438 INFO      <item>135266304 0</item>
00000af0.000017fc::2014/02/23-17:00:39.438 INFO  </vector>
00001680.000023b4::2014/02/23-17:00:39.438 INFO  [RES] Physical Disk <CSV04>: Offline request.
00001680.00001e04::2014/02/23-17:00:39.438 INFO  [RES] Physical Disk: DriveLetter mask: 0x0
00000af0.000017fc::2014/02/23-17:00:39.438 INFO  [RCM] HandleMonitorReply: OFFLINERESOURCE for 'CSV04', gen(0) result 997.
00000af0.000017fc::2014/02/23-17:00:39.438 INFO  [RCM] TransitionToState(CSV04) OfflineCallIssued-->OfflinePending.
00001680.00001e04::2014/02/23-17:00:39.454 INFO  [RES] Physical Disk <CSV04>: HardDiskpCloseSVIHandles: Exit
00001680.00001e04::2014/02/23-17:00:39.454 INFO  [RES] Physical Disk <CSV04>: VolumeIsNtfs: Volume\\?\GLOBALROOT\Device\Harddisk2\Partition2\ has FS type NTFS
00001680.00001e04::2014/02/23-17:00:39.454 INFO  [RES] Physical Disk <CSV04>: OfflineThread: partition 2 offset 135266304 is a CSV volume, skipping lock volume
00001680.00001e04::2014/02/23-17:00:40.421 INFO  [RES] Physical Disk: ReleaseDisk: stop reserve succeeded on device 2 (sig ceba743a)
00001680.00001e04::2014/02/23-17:00:40.452 INFO  [RHS] Resource CSV04 has come offline. RHS is about to report resource status to RCM.
00000af0.00002960::2014/02/23-17:00:40.452 INFO  [RCM] HandleMonitorReply: OFFLINERESOURCE for 'CSV04', gen(0) result 0.
00000af0.00002960::2014/02/23-17:00:40.452 INFO  [RCM] TransitionToState(CSV04) OfflinePending-->OfflineSavingCheckpoints.
00000af0.00002960::2014/02/23-17:00:40.452 INFO  [RCM] TransitionToState(CSV04) OfflineSavingCheckpoints-->Offline.
00000af0.00002960::2014/02/23-17:00:40.452 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (6d07505c-cd56-4354-bb78-f0d452eb7350, Pending --> Offline)
00000af0.00000c24::2014/02/23-17:00:40.452 INFO  [RCM] rcm::RcmGum::GroupMoveOperation(6d07505c-cd56-4354-bb78-f0d452eb7350,1)
00000af0.000017fc::2014/02/23-17:00:40.452 WARN  [RCM] rcm::RcmApi::ResourceControl: forwarded, no retry on error 5908
00000af0.000017fc::2014/02/23-17:00:40.452 WARN  [RCM] ResourceControl(GET_CLASS_INFO) to CSV04 returned 5908.
00000af0.00002960::2014/02/23-17:00:40.452 WARN  [RCM] rcm::RcmApi::GetResourceState: retrying: 6d07505c-cd56-4354-bb78-f0d452eb7350, 5908.
00000af0.000017fc::2014/02/23-17:00:40.452 ERR   [RCM] s_RcmRpcGetResourceState: ERROR_CLUSTER_GROUP_MOVING(5908)' because of ''CSV04' is owned by node 1, not 3.'

Print server clustering and current print server

$
0
0

Hi,

I'm reading about server HA/clustering and in particular print server HA/clustering, I have not got as far as setting anything up so please bare with my lack of knowledge. I have one question, can you incorporate a current print server into a HA/cluster environment or do you need to start from scratch? We currently have a single print server setup on our VMWare environment, obviously this includes all the print drivers, ports etc and the spool folder. From what I have read about adding printers to a HA/cluster environment it mentions about going through the Failover Management Cluster (MMC) to add printers, say I decided incorporate our current print server as one of the servers in the cluster would I need to re-add the printers? Hopefully that makes sense, I'm just trying to clarify in my head the steps I need to go through.

Regards,

Ross






Unknown Issue with 2012 Failover Cluster

$
0
0

Hello

I have 2 Hosts , HP Proliant DL580 G7 with Server windows server 2012 Installed and updated with the last windows updates and drivers and hardware firmware. And the Hyper-v Role installed.

With FC SAN storage connected.

Each server have 6 NICs and only 2 NICs in each server are connected and the NIC Team is enabled for with 2 NICs.

I installed the Windows Failover Cluster Feature and the validation process passed with no failed results.

I have one 2012 domain controller running 2008 R2 AD Functional Level.

but when i start creating the cluster it stops with the following errors:

Creating a new computer account (object) for 'HVC' in the 
domain.
Unable to successfully cleanup.
An error occurred while creating the cluster and the nodes will 
be cleaned up. Please wait...
An error occurred while creating the cluster and the nodes will 
be cleaned up. Please wait...
There was an error cleaning up the cluster nodes. Use 
Clear-ClusterNode to manually clean up the nodes.
There was an error cleaning up the cluster nodes. Use 
Clear-ClusterNode to manually clean up the nodes.
An error occurred while creating the cluster.An error 
occurred creating cluster '1'.An operations error occurred
To troubleshoot cluster creation problems, run the Validate a 
Configuration wizard on the servers you want to cluster.


i enabled the diagnostic log on the failover cluster Client in the event log and i got the following errors:

OpenClusterImpl (1119): Could not connect to cluster.  sc = 1753
ConnectCluster (979): Connect to Local Cluster: ApiGetClusterName failed, sc = 1753.
ConnectCluster (999): Failed to open remote cluster 'HOST1.domain.com', status == 1753.
ConnectRemoteCluster (856): Couldn't resolve RPC binding to cluster 'HOST1.domain.com', Status = 1753
CreateCluster (1883): Create cluster failed with exception. Error = 8224, msg: Failed to create cluster name HVC on DC \\DC.domain.com. Error 8224.
CreateClusterNameCOIfNotExists (6879): Failed to create computer object HVC on DC \\DC.domain.com with OU CN=Computers,DC=domain,DC=com. Error 8224.

I didn't find any solution or anyone talking about these errors,

Some Guys say that i have to install hotfix 976424 from here http://support.microsoft.com/kb/2784261

but i cant install it on windows server 2012 its not running.

Your help is appreciated :D

Thanks



Daoud Ghannam

Testing failover

$
0
0
I am brand new to MS Failover clustering.   I have created from the checklists that microsoft provides a 2-node failover cluster with Windows 2008 in my test lab.   I setup SQL Server as the application.   When I did a failover test, all the services and applications and disk drives and network came up as "online" or "up".   Sounds like it worked But does this mean that all the process for failover have completed?  The reason i ask is because I need to down the nodes for our SQL DBAs to work on the SQL instance.   I am told this will cause another failover.   So, I wonder if there is a time lag that I need to consider before downing the servers.

Cal Miyatake

can make a DHCP VM in cluster role

$
0
0

hi

i have created fail over cluster with windows server 2012

i have a DHCP vm 

when i choose roles>> dhcp role to configure that DHCP vm it says no dhcp server found!!

so my question is isnt it possibel to make a DHCP vm as DHCP fail over role ??

but if i install DHCP role in physical machine then it shows availabel in cluster role


istiaq

Error when adding a disk to Cluster Shared Volumes

$
0
0

When adding a disk to Cluster Shared Volumes via Failover Cluster Manager, I get a couple of errors.

Event ID 5145 in System Log:

While adding the disk ('Cluster Disk 1') to Cluster Shared Volumes, setting explicit snapshot diff area association for volume ('\\?\Volume{420e2cc4-4fb4-41be-afb1-65f2ee62457a}\') failed with error 'HrError(0x8004230d)'. The only supported software snapshot diff area association for Cluster Shared Volumes is to self.

Cluster disk resource 'Cluster Disk 1' failed to delete a software snapshot.  The diff area on volume '\\?\Volume{420e2cc4-4fb4-41be-afb1-65f2ee62457a}\' could not be dissociated from volume '\\?\Volume{420e2cc4-4fb4-41be-afb1-65f2ee62457a}\'. This may be caused by active snapshots. Cluster Shared Volumes requires that the software snapshot be located on the same disk.

Any ideas why I'm getting this error?  This disk was previously added as a CSV to different Windows failover cluster, if that matters.  Thanks.

Loopback adapters and DSR: DAG Cluster node--which is not Cluster Host--crashes when another node restarts

$
0
0

An all-hardware Exchange 2010 SP3 UR4 DAG cluster is having an issue when the Microsoft Loopback adapter is installed (from Device Manager...Add Legacy Hardware) to support DSR operations with hardware load balancer (HLB).

  • The HLB provides HA endpoint for RPC Client Access, SMTP, etc. DSR is required to preserve source IP--on which      Exchange receive connectors that filter on source IP for security depend.
  • It is server DAG, with 3 x production severs at the datacenter and 2 x DAG DR servers located in a DR site.
  • Only the 3 x production servers at the main site have the loopback adapter installed.
  • The loopback-DSR-specific settings like 'weakhostrecive, etc' are in effect.

The problem only involves the 3 servers in the DAG with loopback adapters.

The issue is that when a DAG member restarts, sometimes it will cause the online production cluster node which isnot the Cluster Host Server to fail. Consider:

  • DAGNode1, Loopback enabled, Healthy, Is Cluster Host Server
  • DAGNode2, Loopback enabled, Healthy
  • DAGNode3, Loopback enabled, is Restarted

In this scenario, the cluster service on DAGNode2 will experience a loss of network connectivity when DAGNode3 rejoins the cluster (DAGNode2 reports cluster failure on all other nodes) and shortly afterwards the Cluster Service on DAGNode2 will terminate. FailoverClustering 1572 is seen on DAGNode2:

Node 'DAGNode2' failed to join the cluster because it could not send and receive failure detection network messages with other cluster nodes. Please run the Validate a Configuration wizard to ensure network settings. Also verify the Windows Firewall 'Failover Clusters' rules.

Interestingly, if you disable the Loopback on DAGNode3, DAGNode2 will immediately rejoin the cluster! Re-enable the Loopback on DAGNode3 and DAGNode2 immediately fails again! With some more server restarts possibly, you get a stable cluster again with Loopback enabled on all production nodes. The status of the loopback (enabled or not) on the Cluster Host does not impact this issue.

As I mentioned, it is only some restarts that this occurs, usually there is no problem. Also note the Loopback network/adapters do not appear in Cluster Manager and are not listed as cluster networks with cluster.exe. Cluster Validation Wizard passes everything except noting that every node has a duplicate IP on an installed adapter.

Looking for others with experience that have combined DSR-based HLB with CAS/Hub/MBX DAG Cluster on same Exchange computers and were able to use reliably.

There is an unanswered thread from 2010 on this topic:

http://social.technet.microsoft.com/Forums/windowsserver/en-US/7616b0e5-6fb6-4be7-a859-14baa2e9b925/cluster-network-is-partitioned-due-to-loopback-adapter?forum=winserverClustering

Some questions / any answers are very welcome!

  • Can I add the Loopback adapter to the cluster configuration so that I can use Cluster.exe to ignore the loopback adapter?
  • Can I prevent other cluster nodes from seeing the loopback adapters in the other nodes? Is there an ‘ignore partner adapter’ setting?
Thank you!


John Joyner MVP-SC-CDM

Viewing all 4519 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>