Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

Windows NLB - Multicast

$
0
0

Hi,

 I configured windows NLB in multicast mode and gave the MAC to network team for adding a static ARP entry in switches, i am wondering whether this MAC id is dynamic or it will stay the same untill i change the mode, or should i go for IGMP mode


same subnet but cluster places in separate networks

$
0
0

Hello,

I have an issue where a host with IP configuration that is on the same subnet as the rest of the hosts keeps getting placed into a separate network. Failover validation doesn't report any issues, it can communicate with all of the other hosts on that network.

Checking the cluster logs I see the event where it's occurring:

INFO  [ClNet] Adapter Hyper-V Virtual Ethernet Adapter #3 is still attached to network Cluster Network 1.

And here is the event where it skips attaching to the right network:

INFO  [ClNet] Ignoring configuration entry for cluster network Public (8d603185-4e36-4222-9c92-be4e3a22ac1e) because it has no previous matching adapter. Processing has not yet completed so an adapter may still be found for this network.

Any idea what can be causing this?

WS2K8 R2 Cluster does not detect Generic Service failure

$
0
0

We have a service set up as a Generic Service cluster resource named QTrans-BPPLog. We have the resource set up to be restarted automatically in case of failure.

What's happening is that when this service sometimes fails or crashes, the cluster is unaware of the fact that the service is down and doesn't restart it. If I go to the services.msc applet, I can see that the service is not running. The service process is gone in task manager. However, the cluster administrator still shows the service as online. To get it to restart, I have to bring the resource offline then online again. Can someone help?

Here is an excerpt of the cluster log from one of the times I brought it online and it crashed right away but the cluster doesn't see it. Note that there is another resource that is failed in this group but there are no dependencies between that resource and QTrans-BPPLog/

00000d14.00001ea8::2015/06/24-15:26:23.248 INFO  [NM] Received request from client address NCSMCDWTST02.

00000d14.00002134::2015/06/24-15:31:23.131 INFO  [NM] Received request from client address NCSMCDWTST02.

---- I am bringing offline QTrans-BPPLOG, which is not really running but the cluster thinks it's online because it didn't detect the previous failure
00000d14.00002134::2015/06/24-15:31:34.706 INFO  [RCM] rcm::RcmApi::OfflineResource: (QTrans-BPPLog)
00000d14.00002134::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) Online-->OfflineCallIssued.
00000d14.00002134::2015/06/24-15:31:34.862 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Failed --> Pending)
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] HandleMonitorReply: OFFLINERESOURCE for 'QTrans-BPPLog', gen(2) result 997.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflineCallIssued-->OfflinePending.
00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RES] Generic Service <QTrans-BPPLog>: Service died or not active any more; status = 1062.
---- Now the cluster realized that the service was down, but only when I brought it offline

00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RES] Generic Service <QTrans-BPPLog>: Service is now offline.
00000f20.000021a0::2015/06/24-15:31:34.862 INFO  [RHS] Resource QTrans-BPPLog has come offline. RHS is about to report resource status to RCM.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] HandleMonitorReply: OFFLINERESOURCE for 'QTrans-BPPLog', gen(2) result 0.
00000d14.00002010::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflinePending-->OfflineSavingCheckpoints.
00000d14.000008ac::2015/06/24-15:31:34.862 INFO  [RCM] TransitionToState(QTrans-BPPLog) OfflineSavingCheckpoints-->Offline.
00000d14.000008ac::2015/06/24-15:31:34.862 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Pending --> Failed)

---- bringing QTrnas-BPPLog back online...
00000d14.00002134::2015/06/24-15:31:38.139 INFO  [RCM] rcm::RcmApi::OnlineResource: (QTrans-BPPLog)
00000d14.00002134::2015/06/24-15:31:38.201 INFO  [RCM] TransitionToState(QTrans-BPPLog) Offline-->OnlineCallIssued.
00000d14.00002134::2015/06/24-15:31:38.201 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Failed --> Pending)
00000d14.00001e80::2015/06/24-15:31:38.217 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'QTrans-BPPLog', gen(2) result 997.
00000d14.00001e80::2015/06/24-15:31:38.217 INFO  [RCM] TransitionToState(QTrans-BPPLog) OnlineCallIssued-->OnlinePending.
00000f20.00002334::2015/06/24-15:31:39.745 INFO  [RES] Generic Service <QTrans-BPPLog>: Service is now running.
00000f20.00002334::2015/06/24-15:31:39.745 INFO  [RHS] Resource QTrans-BPPLog has come online. RHS is about to report status change to RCM
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'QTrans-BPPLog', gen(2) result 0.
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] TransitionToState(QTrans-BPPLog) OnlinePending-->Online.
00000d14.00001e80::2015/06/24-15:31:39.745 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (ncsmcdwTST-B, Pending --> Failed)
---- QTrans-BPPLOG crashed at 15:31:48, but the cluster doesn't see the failure

00000d14.00002520::2015/06/24-15:34:14.047 INFO  [NM] Received request from client address NCSMCDWTST02.

Failover Cluster on S-2012 R2 - CNO Issue

$
0
0

Set up is with 2 Nodes (6 VMs per node running from clustered storage).
I have a CNO in the Active Directory & registered in the DNS, within its security tab it has full control to its named object ('CNO-C1$') and both nodes ( 1 & 2).

Node 2: Failover Cluster Manager reports no errors currently. (can ping CNO)
Node 1: Failover Cluster Manager reports 1 error of event ID 1207 (can ping CNO)

"
The computer object associated with the cluster network name resource 'Cluster Name' could not be updated in domain 'mydomain.contoso.com' during the 
Resource post online operation.

The text for the associated error code is: There is no such object on the server.


The cluster identity 'CNO-C1$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

"

Each FOC Manager reports 2 functional Nodes, 2 functional NIC with 2 different sub-nets for communication ( 1.xxx & 0.xxx)
Clustered Storage is currently online for both nodes, I can currently live migrate and 'shared nothing' migrate.

The error says there is no object... can anyone see a hole here?

Thanks in advance.


DTCProxy is not running: java.net.ConnectException: Connection timed out

$
0
0

Hi All,

While starting the jboss server we are facing below issue on MSDTC. The DB used is SQL Server 2008 r2. This is a clustered DB environment and MSDTC is working fine on non-clustered environments.

2015-06-09 23:48:18,444 ERROR [STDERR] (main) javax.transaction.xa.XAException: DTCProxy is not running: java.net.ConnectException: Connection timed out

2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.b.a(Unknown Source)

2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.b.start(Unknown Source)

2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.e.start(Unknown Source)

2015-06-09 23:48:18,445 ERROR [STDERR] (main) at org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.start(XAManagedConnection.java:213)

Please help.

Issue with setting up a File Server Role error 1254,1205 and 1069

$
0
0

Hi Everyone

I am current building a new 2012R2 file cluster to replace our 2008R2 file cluster

on each node (total of 3) I have enabled the following roles and features

2 Nic Internal and heartbeat

heartbeat network for cluster only  

"File Server roles, failover clustering features, File Server Resource management tools and Share and Storage management tools"

I have mapped 2 Luns to the nodes

Lun 1 quorum

Lun 2 File Storage

both Luns can access by all the nodes 

during the creation of the cluster complete successfully without any error

In configure role High Availability wizard >  File Server > File Server for general use

In Client Access Point I specify the NetBIOS name and  IP address

Select the available cluster Disk2 > in the next wizard screen "you are ready to configure high availability for file cluster screen" > Next > Finish

I can see the role service create successfully how ever I can't see "test" account object create in the OU.

As you can see status show "Failed"

I am able to move the share cluster disk to another node.

Add Share option also grey out

Please help

Many thanks  

Clustered role 'Test' has exceeded its failover threshold.  It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state.  No additional attempts will be made to bring the role online or fail it over to another node in the cluster.  Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

The Cluster service failed to bring clustered role 'Test' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Cluster resource 'Test' of type 'Network Name' in clustered role 'Test' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Storage:
Cluster Disk 2
Network Name:
test
OU:
OU=FileCluster,OU=Servers,OU=,DC=,DC=,DC=
IP Address:
Started
25/06/2015 9:53:47 p.m.
Completed
25/06/2015 9:53:49 p.m.
Creating the group test.
Creating File Server resources.
Configuring the cluster storage device.
Configuring File Server resources.
Configuring File Server networking.
Verifying the client access point settings are valid.
Configuring network name resource.
Configuring new IP address resources.
Configuring the dependencies for the IP address resources.
Configuring the network name dependencies.
The client access point has been configured successfully.
Configuring File Server resources.
Creating the highly available file server resource.
Verifying required dependencies are configured.
A File Server has been successfully created.




Network Drops for 30 seconds During Hyper-V Live Migration

$
0
0

I have 3 physical Hyper-V hosts setup with clustered storage. I disabled VMQ because I was getting errors when trying to do live migrations. I have also ran the network portion of the cluster validation tests without errors. What happens is basically when I do a live migration from any host to any other host I lose network connectivity to any VM running on those hosts. During this time I have a SQL application that is running and locks up and freezes all the users. Many will have to use task manager to kill the application to get back in or even reboot their machines to free it up.

I have been doing a ton of reading on network settings and configurations and have made no progress. Any help to point me in a direction to get this solved will be appreciated. I need to be able to do Live Migrations on my cluster storage.

Thanks for any help.


2012 R2 Guest Cluster Network Failure

$
0
0

We have a 2 node guest cluster (2012 R2) using a shared VHDX located on a CSV providing resilient File Server services and it seems to work well....most of the time.

We've had two instances lately where the File Server cluster has failed due to "network" issues where neither node can see each other and both are removed from the cluster. The problem is that we can't see any other VM reporting networking problems at the same time.

We did some Firmware and Driver updates on the physical nodes recently to resolve a known problem with VMQs an thought that our problems were solved. unfortunately we had a re-occurrence of the problem this morning so we seem to be back to the drawing board.

Has anyone else had similar problems with Guest Clusters in 2012 R2?

Cheers for now

Russell


Cluster IP address fails with error 1077

$
0
0

Hi all,

I have a windows 2012r2 failover cluster and recently I noticed that the cluster name is offline due to a failure of its IP Address:

Health check for IP interface 'Cluster IP Address' (address '10.16.18.70') failed (status is '55'). Run the Validate a Configuration wizard to ensure that the network adapter is functioning properly.

I've run the validation wizard (only for the network part coz the cluster is in production since some months) and everything is ok. If I try to bring online the IP addrees I receive a failure.

Failover on 2008R2 Fileserver cluster takes 7 minutes when a server reboots for updates

$
0
0

I have a 2008R2 domain with (2)2008R2 nodes in a cluster. I have 2 questions

1. If I "validate the cluster" and it is a cluster with several 10TB disk on it, etc. will it take a long time to validate?  Just want to make sure that step does not cause any issues on a running cluster in production

2. When a node reboots for Windows updates, the cluster does a good job on continuing to ping, but it is basically offline for about 7 minutes.  Is there something I can do to speed up the failover process?

Thanks,


Dave




Query on Multi-subnet Failover Cluster Setup

$
0
0

I have setup Failover Cluster Instance on 2 nodes with Windows Server 2008 and SQL Server 2012 running on a SAN storage with "Node and Disk based Quorum", DTC and public and Private IP communication.

Now I want to setup multi-subnet cluster by adding another two nodes and a SAN storage from different subnet.

I have following questions in mind for which I am looking for your expert help:-

  1. Do we need to enable communication between public IP of site1 servers and public IP of site2 servers and how?
  2. How our existing cluster in site1 can be stretched up-to site2 so that DC2 will also have Active-Passive architecture when failover to site2?
  3. How the Quorum will work in Node and Disk Majority mode in site2?
  4. How the Private IP communication will work when SQL Server 2012 Cluster failover to site2?
  5. What are the components we will replicate from site1 to site2 using SAN replication? Will quorum will also get replicated or we should exclude quorum from replication?

I am new in multi-subnet cluster setup and looking for your urgent help for my setup.

Recovery steps for a failed single-JBOD SOFS cluster

$
0
0

Just wondering here.

So a single, sole JBOD goes down which is connected to 2 SOFS nodes.

What are the options for quickly spinning up the VMs that resided on the JBOD?

The Hyper-V cluster cannot access the shared storage provided by the SOFS cluster, so what happens? The HV nodes will still be in a clustered state - is there a way of placing the cluster 'on hold', in order to possible run VMs from the local storage of the HV nodes?

Windows 2003 Cluster Domain / Change - Request off domian

$
0
0

Hello.  This is sort of a weird question.  We have a customer that had a two node cluster.  The customer was originally going to move their cluster to a new domain.  Before they contacted us, they removed both machines from the domain but haven't rebooted so its working okay.  As MS says you need to be i'm pretty sure after the reboot it will not work.  They have since changed the request to keep it off of the domain even if that means its online 1 server.  My thoughts are that even if the 2nd node is taken marked as failed, and the remaining server is active, after a reboot the cluster services wont start.  Task asked that i convert it to a non cluster setup.  Does anyone know what they might even mean.  I've never of converting a cluster to a non-cluster.

In this server, I think they either need a new non cluster we can migrate that is in a workgroup.  Does anyone see another way to get them down to a single domain, non-domain member?

Multi-site cluster with different connections

$
0
0

Hello,

At the moment we have a sql 2014 cluster in one datacenter where all of our customers connect to

In the near future we are going to expand to a second datacenter and we want to move 1 node to this datacenter and create a DMDW connection in between.

In this second datacenter we also want to let customers connect to, but different customers. So the customers connecting to the first datacenter will not connect to the second datacenter. 

The customers connecting to the second datacenter will be routed via the DMDW connection to the first node in the first datacenter.

Now my problem: It could happen that the DMDW connection breaks down and all the customer connections from the 2nd to the 1st datacenter will be lost. Now i want that the customers of the 2nd datacenter connect to the 2nd node and continue to work.

In the setup i have now, that's not possible, because you will create  split brain issue. But how can i make this to work?

Help in creating first Windows Failover Cluster

$
0
0

This is my first attempt in creating a failover cluster. I've followed the instructions on "Failover Cluster Step-by-Step Guide: Configuring Accounts in Active Directory" along with the required ports.

https://technet.microsoft.com/en-us/library/cc731002(v=ws.10).aspx#BKMK_steps_precreating
http://cybergav.in/2013/07/28/windows-server-failover-cluster-port-requirements-for-intra-node-connectivity/

Added the Failover Cluster Featurs via Manager on both server-dev01 & server-dev02. I ran the validation and all seems to check out.  I then create the cluster, "server-cdev", via "Create Cluster Wizard".  That didn't go well.

I pulled the logs via PowerShell command "Get-ClusterLog" and tried looking at the logs but can't really make heads or tails with it.

There's some things I don't get and I'm hoping you can help me out.
 - Why there are random IPv4 and IPv6 (IPv6 not enabled) on the logs that I know nothing off and not on our DNS records in the logs?
 - Both 137 & 3343 ports are open on our firewall but can't telnet to server-dev01 <- either -> server-dev2 on port 3343 / 137. Does this need to be fixed before running the cluster wizard?
 - I see these two but have no idea what they mean (I did see a similar post but no answer to the question - https://social.technet.microsoft.com/Forums/windowsserver/en-US/9d25c123-a763-405f-8c20-61da2d4b4390/cluster-creation-error?forum=winserver8gen)
[DCM] DiskControlManager bitlocker load status 126
[API] DmQueryString failed to retrieve the security   descriptor status 2, default security descriptor will be used for authorizing client connections

I tried posting the logs here but got an error message about the "Body must be 4 - 60000 characters long"

Your help is greatly appreciated..


Live Migration of Server 2012 R2 Remote Desktop Server Disconnects Users

$
0
0

Hi. I have a 2 node Server 2012 R2 failover cluster and amongst the services running on this cluster is a Server 2012 R2 remote desktop server (this has the session host, web access, licensing and connection broker roles). When I move the 2012 R2 remote desktop server to the other cluster node using live migration, users are disconnected from RDS and get an error to say "failed to reconnect session". Users can immediately, manually reinitiate their connection to the server.

I have other 2012 R2 servers running on this cluster, which don't have the remote desktop services roles installed, and these servers do not exhibit the same behaviour of disconnecting users after a live migration.

Have found only a few references to similar issues on a couple of other TechNet discussions, but no resolution etc...

I'm guessing this is not expected behaviour and there is some kind of configuration issues somewhere?

Waindows server 2012 r2 failover cluster access denied

$
0
0

Dear Experts,

I cant access windows failover cluster 2012 r2. the error shows below

"You do not have administrative privilages on the cluster. contact your network administrator to request access."

Error code: 0x80070005

Access denied.

PS C:\> Get-ClusterAccess
Get-ClusterAccess : You do not have administrative privileges on the cluster. Contact your network administrator to
request access.
    Access is denied
At line:1 char:1
+ Get-ClusterAccess
+ ~~~~~~~~~~~~~~~~~
    + CategoryInfo          : AuthenticationError: (:) [Get-ClusterAccess], ClusterCmdletException
    + FullyQualifiedErrorId : ClusterAccessDenied,Microsoft.FailoverClusters.PowerShell.GetClusterAccessCommand

The current user is enterprise administrator,


Add Node to Cluster - Keyset does not exist

$
0
0

Hi,

I am trying to add third node to a Windows 2012 fail over cluster, but gets the following error.

The server 'DR.domain.com' could not be added to the cluster.
An error occurred while adding node 'DR.domain.com' to cluster 'domain-fc'.

Keyset does not exist

The User I am using to Add Node is Domain Admin, so it may not be a permission issue.

All nodes are Windows 2012 R2 VMs on Azure


Usman Shaheen MCTS BizTalk Server http://usmanshaheen.wordpress.com


Clustered Task giving RPC Server unavailable error

$
0
0

Hi everyone,

I have a clustered environment with SQL AlwaysOn. When I try to create a clustered task using powershell using this command, I get an error Register-ClusteredScheduledTask : The RPC server is unavailable.

Register-ClusteredScheduledTaskClusterMyClusterTaskNameMyResourceSpecificTaskTaskTypeResourceSpecificResourceMyResourceNameAction $action Trigger $trigger

I've checked the nodes and cluster which are running, so I'm not sure what's causing this error.

Some or all CSVs will Fail Only on the Weekends. They work just fine during the 5 day week.

$
0
0

Good afternoon,

There is a problem which I have noticed only occurs during the weekends. 

The problem: All or just some CSVs go into the offline state only on the weekends and this of course brings down all virtual machines as they depend on the vhds that are stored on those CSVs.

This is the 2nd weekend where the CSVs have gone offline. There are no problems bringing the CSVs back online and it takes me about 20 minutes to get all the VMs running again but this is horrible if I have deal with this every weekend. 

I have looked through the Event logs and there is nothing which points to an actual problem. Errors in EV include live migration failed errors and hyper-v host and cluster errors. These errors all take place at 3am and are related to the CSVs going offline. The Hyper-V witness is one the CSVs that go offline.

I have a feeling that there might be something happening on the Compellent which hosts the CSVs but I would like to know your thoughts on this. Have you experienced something like this, where the CSVs go offline during the weekends but not during the work day? 

This could also be a networking error between the cluster nodes and the Compellent or drivers? Maybe or maybe not.

The cluster nodes are all R720 with exactly the same hardware configurations and they are all running Windows Server 2012 R2 and all OS updates have been applied.

Any assistance and suggestions would be helpful.

Thanks in advance!

Viewing all 4519 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>