Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

Windows 2003 Clustrer- Resources in Evict node

$
0
0

We had a Windows 2003 cluster environment where we have evicted one node (1b) now.when user tires to take a RDP connection to the active node (1a) it says Socket error.The active node (1a) was rebooted.The issue is when the user connects to the evicted node

(1b) he is able to view  Q drive, Z drive which is actually residing on the active node (1a).Could someone please let me know why is this happening ?



FailoverCount is not getting reset for QuorumResource in Windows2012 R2 failover clusters

$
0
0

Hi,

I have two-node failover cluster on windows server 2012 R2 with third party resource as quorum with typeNode and Disk Majority. on fault of quorum resource FOC is not failing over "Cluster Group" to other cluster node. Following log lines are seen in cluster log.

Here is cluster log from fail node.

00008bc.000014d8::2013/12/06-11:45:49.591
INFO  [RCM] rcm::RcmGroup::Failover:(ClusterGroup)
 
000008bc.000014d8::2013/12/06-11:45:49.592
WARN  [RCM]Not failing over groupClusterGroup, failoverCount 2,
failoverThresholdSetting 4294967295, lastFailover 2013/12/06-03:39:54.190
 
000008bc.000014d8::2013/12/06-11:45:49.592
INFO  [RCM]Willretry online fromlong delay restart of quoDG in3600000
milliseconds.

 Quorum resource failover policy’s Maximum failover count is set to one.

000008bc.000014d8::2013/12/06-11:45:49.591
INFO  [RCM] resource quoDG: failure count:1, restartAction:2
persistentState:1.

Is there a way to reset this FailoverCount ? When does FOC increments and resets this failovercount for a resource ?

Thanks in advance

Rakesh


Rakesh Agrawal

volume added to SQL cluster but coludnot be found

$
0
0

Dear

i add volume to SQL server cluster i found it in cluster storage and move to to SQL server cluster service but i cont find it in volume when try to make backup 

VSS on File Server Cluster 2012 R2

$
0
0

We just configured a fail over cluster and added file services.  We would like to configure the storage with VSS to allow for easier file restoral by the end users (volume is snapped at the SAN hourly)

We have two drives that are added to the file server role that are connected to both servers via iSCSI.  The fail over process works as expected between the servers.

What we don't understand is that when configuring VSS for the disk in the cluster, it appears that you can set the schedule, but no other settings. 

Reading some of the other threads on this topic, it was suggested to create the shadow storage through vssadmin.  When trying to configure, we see this error:

vssadmin add shadowstorage /for=h: /on=i: /maxsize=450gb

Error: Maximum number of shadow copy storage associations already reached

When listing storage:

Shadow Copy Storage association
   For volume: (W:)\\?\Volume{dd03bbcb-5f94-11e3-80be-00505699496f}\
   Shadow Copy Storage volume: (W:)\\?\Volume{dd03bbcb-5f94-11e3-80be-00505699496f}\
   Used Shadow Copy Storage space: 0 bytes (0%)
   Allocated Shadow Copy Storage space: 0 bytes (0%)
   Maximum Shadow Copy Storage space: UNBOUNDED (100%)

Shadow Copy Storage association
   For volume: (H:)\\?\Volume{a4c64ad7-d8b4-47d4-8936-e6151b1bff4b}\
   Shadow Copy Storage volume: (H:)\\?\Volume{a4c64ad7-d8b4-47d4-8936-e6151b1bff4b}\
   Used Shadow Copy Storage space: 944 MB (0%)
   Allocated Shadow Copy Storage space: 3.40 GB (0%)
   Maximum Shadow Copy Storage space: 50.0 GB (10%)

Thank you in advance for any guidance.

CSV iSCSI LUN slower than iSCSI LUN on same netapp filer

$
0
0

First time for me playing with CSV and Windows Server 2012 R2. (Looking Forward to use SMB 3 and CSV Cache for hyper-v)

Old productive using iscsi luns as vhdx/vhd storage from each hyper-v host.

Question for Performance Issue / degredation:

Using the same test netapp filer with 2 iscsi luns and testing Connection Speed with one host:

Using the CSV Share connected to iscsi lun shows 240 MB /s transfer Speed (copy iso file), using only one single iscsi lun is showing our exptected 560 MB/s.

All tests on same Hardware hosts (3 x HP DL380p gen8) and to same netapp filer. (all 10Gbit and new Hardware)

Testing csv Speed using iscsi lun compared to iscsi lun without csv?

Server 2012 Cluster nodes hang and VMs lock up, memory leaks and critical stops on both nodes.

$
0
0

Last night my two-node cluster went down for no apparent reason.  All VMs (4) were down even though the cluster manager said they were running.  The cluster shared volume on my SAN was not accessible through Windows Explorer but the Dell mpio software showed it was connected and the SAN itself showed a connection and did not have any problem.  It took me five hours of struggle to get the cluster running again.  I had to remotely restart each node several times from another server using the command line because the RDP session would stop responding due to Explorer locking up.  I ended up removing the antivirus software from each node but that was in desperation; I don't know if that was the problem or not.  It finally started to work again when I manually brought the cluster IP back online, manually moved all resources to node1 and then did a pause and drain of node2 and restarted node2.  This error shows up twice in the Application log of both nodes:

Possible Memory Leak. Application (C:\Windows\Cluster\rhs.exe -key SYSTEM\CurrentControlSet\Services\ClusSvc\Parameters\Rhs\0428d6b3-5c3b-4757-bc31-70379129ad89 -parentPid 3060 -initEvent 1dbde958-779b-4cd7-8daa-7c9299d0303c -replyEndpoint OLEAA17D0EF8BDFFAD1F4F33871C878) (PID: 4520) has passed a non-NULL pointer to RPC for an [out] parameter marked [allocate(all_nodes)]. [allocate(all_nodes)] parameters are always reallocated; if the original pointer contained the address of valid memory, that memory will be leaked. The call originated on the interface with UUID ({4b324fc8-1670-01d3-1278-5a47bf6ee188}), Method number (64). User Action: Contact your application vendor for an updated version of the application.

There are also two critical stops logged in the Dell OpenManage logs on each node.

The symptoms are very similar to this technet article for Server 2008 R2:

http://support.microsoft.com/kb/2798093

Both nodes are fully updated with hotfix 2870270.

Can anyone shed some light on this?  What went wrong and how do I prevent it from happening again?

Failover Cluster Network Name Failed and Can't be Repaired

$
0
0

I have an issue that seem to be a different problem than any others have encountered.

I've scoured everything I can find and nothing has fixed my problem.

The problem starts with the common problem of the cluster network name failing on my 2 node server 2012 file server cluster.  The computer object was still in AD and appeared to be fine so it was not the common problem of the object getting deleted somehow.  At the time, there was no other object with that name in the recycling bin, so I don't think it was mistakenly deleted and quickly recreated to cover any tracks, so to speak.

Following one guide, I tried to find the registry key that corresponded with the GUID of the object, but neither node in the cluster had it in its registry (which may be part of the problem).

Since it was in the failed state, I tried to do the repair on the object to no avail.

We run a "locked down" DC environment so all computer objects have to be pre-provisioned.  They were all pre-provisioned successfully and successfully assigned during cluster creation.  The cluster was running with no issues for a month or so before this problem came up.

When I do a repair on the object while taking diagnostic logs the following 4609 error appears:

The action 'Repair' did not complete. - System.ApplicationException: An error occurred resetting the password for 'Cluster Name'. ---> System.ComponentModel.Win32Exception: Unknown error (0x80005000)

There appears to be a corresponding 4771 error with a failure code 0x18 that comes from the security log of the DC that states there was a Kerberos pre-authentication failure for the cluster network name object (Domain\Clustername$)

I believe this is what is causing the repair failure.  All the information I found related to security error 4771 was either a bad credentials given for a user account or the fix was to reconnect the computer to the domain.  I can't seem to find a way to do this with the cluster network name.  If there's a way please let me know.

I've tried a number of things, like resetting the object, disabling it, deleting and creating a new object with the same name, deleting that new object and recovering the original, etc...

Can anyone shed some light on what is going on and hopefully how to fix it other than rebuilding the cluster?  I'm quite close to just tearing it down and building it back up but am hesitant because this cluster in currently in production...

Any help would be appreciated

Problem in NLB

$
0
0

Hello expert,

I am using WS 2008RS SP1, I have configured NLB between two of my servers (say, Node1 and Node2) in unicast mode. Now the problem I am facing is, I am not getting a continuous ping on my nodes, that is why I am getting disconnected from the remote session on my two nodes. please suggest.


Swaprakash..


Need to cluster two virtual webs and few services

$
0
0

Environment

OS = Windows 2003 R2

I was to HA an application which has two virtual webs hosted at IISand also around 8 windows services on which the application is running. Please advice


Any comment will be appreciated. Thanks. Zahid Haseeb.

urgent:cluster ip address

$
0
0

Windows 2008 R2 clustering
Microsoft Failover cluster virtual adapter
some cluster resources and it shows

resource                          Group                                    node                                     status

cluster IP address             cluster group                          node1                                     failed

cluster IP address (172.18.x.x) cluster group                   node1                                     online

I tried to bring "cluster IP address" on line and it fails with ipv6 address.

I can not find out where it has ipv6 configured on the cluster group.
can anyone help?

Thank you.


Server 2012 R2 failover cluster

$
0
0

Hi Guys,

I created a 2 node cluster using server 2012 R2 ..I have VMM 2012 R2 running on both nodes with SQL 2012 SP1

VMM is configured for High availability and SQL always-on feature is also enabled.

Now when I stop cluster service on one of the node, the roles and quorum disk failover to the other node but the quorum disk does not come back online and as a result the cluster fails until I bring back the cluster service on the current owner (owner of the quorum disk)

I'm at a loss for what to do, any help will be greatly appreciated.

Richard

Howto change Quoram configuration in win2003R2

$
0
0

Howto change Quorum configuration in win2003R2. I have two nodes cluster and I want to change quorum setting as below.

Node and Disk Majority (recommended for clusters with an even number of nodes)




Any comment will be appreciated. Thanks. Zahid Haseeb.

Windows server 2012 hangs when accessing a Cluster Shared Volume

$
0
0

Hi,

We have 2 windows server 2012 Datacenter member of windows cluster, servers are running hyper-v. the 2 nodes hang once we try to access CSV volume (c:\clusterstorage\volume1) when i tried to do live migration from node1 for virtual servers and move it to node2 it fails. also when i restart node1 the server keep on "Please wait for the System Event Notification Service"

cluster events:

1- Cluster resource 'Virtual Machine' (resource type 'Virtual Machine', DLL 'vmclusres.dll') did not respond to a request in a timely fashion. Cluster health detection will attempt to automatically recover by terminating the Resource Hosting Subsystem (RHS) process running this resource. This may affect other resources hosted in the same RHS process. The resources will then be restarted.

The suspect resource 'Virtual Machine' will be marked to run in an isolated RHS process to avoid impacting multiple resources in the event that this resource failure occurs again. Please ensure services, applications, or underlying infrastructure (such as storage or networking) associated with the suspect resource is functioning properly.

2- The cluster Resource Hosting Subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually associated with recovery of a crashed or deadlocked resource.  Please determine which resource and resource DLL is causing the issue and verify it is functioning properly

3- Cluster resource 'Virtual Machine Configuration' of type 'Virtual Machine Configuration' in clustered role  failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

4- Cluster Shared Volume 'Volume1' ('Cluster Disk 2') is no longer accessible from this cluster node because of error 'ERROR_TIMEOUT(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

5- Cluster Shared Volume 'Volume1' ('Cluster Disk 2') is no longer available on this node because of 'STATUS_IO_TIMEOUT(c00000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished.

IIS is not failover when I stop IIS

$
0
0

I configured HA and I have resource of IIS configured(because I have few virtual web sites hosted on it) via the below link (through generic script, provided by MIcrosoft). Then I stop the IIS via iisreset /stop command to verify that when the service stops there should be a failover happen. But still the IIS services are not failing over to other node.

http://support.microsoft.com/kb/887417/en-us

====================================

I have also noticed one thing. I created a generic service resource for WWW service. Now when I stop it via "net stop W3svc" (multiple times) , the WWW service gets stop and the cluster group failover to another node...

Now I dont need the below script which provided by the below link:

http://support.microsoft.com/kb/887417/en-us


Any comment will be appreciated. Thanks. Zahid Haseeb.


Automation to insert cluster nodes

$
0
0
Hello guys,

How do you make automation to insert nodes in a cluster?

My method is not 100% yet, still need some manual interaction after delivering the image to the server.

I wonder if any of you have managed to automate the whole process.

Happy holidays!

Rafael Bernardes - http://www.cooperati.com.br


server 2012 high availability error

$
0
0

Hello Guys,

I have two servers(Node A and Node B) with Server 2012 R2 OS. the two servers are set up in a 2 node cluster with a 4GB storage LUN as witness disk(quorum). Presently, from the fail overcluster manager, the Node B is the "owner Node" of the Witness disk. My issue is this, when i stop cluster service on Node B, the witness disk fails over to Node A as the "owner Node" but the witness Disk goes offline and fails to start, when i try to bring it back online it does not come online at all, i get the error "failed to bring quorum resource online Error 0x8007139a", then i start the cluster service on Node B and try to move the witness disk back to Node B but it does not move, but if i stop "cluster service" on Node A,the witness disk moves back to Node B as the "owner Node" and it comes back online. During all these, all the clustered services keeps running while the witness disk is offline.
My 2nd issue is: if i shutdown Node A on the cluster, the services on the cluster keeps running, the cluster migrates services on Node A to Node B, but if i shutdown Node B, the clustered services fails and all the services stops.
I will really appreciate your help and input on this. Thanks guys.
Regards...

Hyper-V Failover Cluster - Inconsistent Network Availability

$
0
0

We've got a Small cluster with, 7 hosts and a dozen or two VM's.  For some reason i'm getting inconsistent availability with the Cluster networks.  The host seem to function fine on there own but theres all types of issues using Migration which i'm assuming is because certain hosts think other hosts are unavailable. For Example:

Cluster Network 1 - From Host 8

Cluster Network 1 - From Host 10

As far as I can tell all of the networks are UP. I can ping all hosts on all interfaces.  What criteria goes into determining host availability?





Server 2012 Hyper-V Cluster NIC Teaming Officially Supported?

$
0
0

Our current Hyper-V (Server 2012) setup has us using 3 NICs on each host and cluster node that are set up as external switches with each physical adapter set to a single VLAN (untagged.)  What I would like to do is tag all 3 VLANs on those 3 switch ports then team them and set the teamed adapter up as a single external switch in Hyper-V.  I realize that I will have to set each VM's NIC to go to whatever VLAN it belongs on but this is perfectly fine by me.

I have tested it on a standalone Hyper-V host and it works great.  My question is about our cluster - I am wondering if it's officially supported by Microsoft.  We just went through a 2-month long headache (which was never resolved) with premium support and I'd just like to know what we're getting in to before we spend more resources on it.

How to test node failover in Windows 2008 R2 Failover Cluster?

$
0
0
Can anyone give me advice on how to properly test a node failure with a 2 mode Failover Cluster in Windows 2008 R2?

win server 2012 two node cluster, local "cliuser" issue

$
0
0

Hello,

I have a two node Windows Server 2012 STN Cluster with a few SQL instances installed inside it.  Recently in my security event log I see these errors on both nodes:

An attempt was made to reset an account's password.

Subject:
Security ID: SYSTEM
Account Name:<>$
Account Domain:<>
Logon ID: 0x3E7

Target Account:
Security ID: lcoalmachinename\CLIUSR
Account Name: CLIUSR
Account Domain:localmachine name

==

When I look at the local account on both nodes, I see that password is set to never expire, and not be able to be reset.  I am quite confused then, how the above could happen.  Any advice or ideas would be greatly appreciated.

Thank you

Viewing all 4519 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>