Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

Windows 2012 Cluster with Exchange 2013 DAG

$
0
0

Hi,

I have a question regarding Windows 2012 Failover clustering.

We have Windows 2012 running with Exchange 2013 DAG. 8 nodes 1 witness server 

There are few instances where one of the nodes lost the quorum due to network issues.  When ever that happens cluster service goes in restarting (crashing).  I tried to change Cluster service to manual and then start it but, it just keep crashing until I restart the server after that it works fine that node once again gets added into the quorum without any issues.

My question - Is it normal behavior if node lose the quorum cluster service keep restarting until you restart the server?  Or is there any way to bring back that server in the quorum without restart of the server.

clussvc.exe version 6.2.9200.21268

Error

The Cluster Service service terminated unexpectedly.  It has done this 15 time(s).  The following corrective action will be taken in 60000 milliseconds: Restart the service.

Thanks,



Raman


can not fix corrupt system files

$
0
0

I am doing the  cluter node health examination, but can can not fix the corrupt file.

PS C:\Users\Administrator.000> Dism /Online /Cleanup-Image /RestoreHealth

Deployment Image Servicing and Management tool
Version: 6.3.9600.17031

Image Version: 6.3.9600.17031

[==========================100.0%==========================]

Error: 0x800f0906

The source files could not be downloaded.
Use the "source" option to specify the location of the files that are required to restore the feature. For more informat
ion on specifying a source location, see http://go.microsoft.com/fwlink/?LinkId=243077.

The DISM log file can be found at C:\Windows\Logs\DISM\dism.log

PS C:\Users\Administrator.000> sfc /scannow

Beginning system scan.  This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection found corrupt files but was unable to fix some
of them. Details are included in the CBS.Log windir\Logs\CBS\CBS.log. For
example C:\Windows\Logs\CBS\CBS.log. Note that logging is currently not
supported in offline servicing scenarios.
PS C:\Users\Administrator.000>

Please help

Issues with MSMQ over HTTP in Windows Cluster - Not Working

$
0
0

Background:

We have set up a cluster 'net_cluster' and configured message queuing service 'net_clusterMsmq' in it. Please refer the [screen-shot] below for the cluster configuration. We have two physical servers in cluster. Screen shot shows two IP addresses which are virtual each of which points to the physical server. We have created non-transactional private queues on physical servers.



Please see this image https://social.technet.microsoft.com/Forums/getfile/668427

net_clusterMsmq has two Virtual IPs,  VIP1 and VIP2 which point to .NET1 and .NET2 respectively. Currently .NET1 is up and running. 

VIP1 ----> PIP1 (.NET1) (PIP = Physical IP)

VIP2 ----> PIP2 (.NET2)



Please see this image https://social.technet.microsoft.com/Forums/getfile/668451

H$ - This is storage drive attached to either .NET servers whichever is active. It contains msmq\storage and msmq\mapping.



Issue

NOTE: Everything works and has been working well since long time if I use OS:\net_clusterMsmq. Problem starts when I use HTTP.

We are having issues with MSMQ over HTTP in Windows Cluster environment. When app sends message to the queue using HTTP:

  1. Outgoing queues on IIS (Web App) server shows referred queue with "Waiting to Connect" State and "Connection is ready to transfer messages" message in Connection History column. And queue messages stay stuck forever.
  2. IIS logs on .NET1 server show:
    [VIP1 here] POST /msmq/private$/queuename - 80 - [IIS - Web app server IP here] - 200 0 0 46
    This clearly tells that post request from Web App IIS server was received by .NET1 server. Status = 200. However in S-IP field appears Virtual IP (VIP1) that points to .NET1 server. This is due to the fact that we send requests via cluster node.


Below is what I have checked/tried so far with no luck:

  1. Checked if port 1801 is listening - Yes
  2. Modified sample_map.xml file in H$ (storage drive attached to active .NET server) as well as C:\Windows\System32\msmq\mapping and restarted MSMQ service but didn't work.
    This was done because I found a blog stating message request reaches msmq server but local queue manager does not recognize the Virtual IP (VIP1) and looks for the Physical IP (PIP1) in received message. Since it does not find it, discards the message.
  3. Added ANONYMOUS LOGON with full rights to destination queue on .NET1 server.


NOTE: MSMQ over HTTP works fine in Non-Cluster environment. So this is definitely cluster specific issue.





Hyper-V data cannot be stored on a disk witness that is not already used by another virtual machine.

$
0
0

I'm trying to move a VM from local to shared storage to make it highly available. The shared storage (S:) was active on the server I moved the files from. I imported it into Hyper-V and selected it as a VM when configuring it as a HA virtual machine role in Failover Cluster Manager. It failed with the following:

There was a failure configuring the virtual machine role for 'TEST'.
The path 'S:\Hyper-V' for a virtual machine configuration is on the disk witness for the cluster. Hyper-V data cannot be stored on a disk witness that is not already used by another virtual machine.

This seems to suggest that there must always be at least one VM on the shared storage before you can import another? How do you get the first VM on there?

Making NLB highly available

$
0
0

I get how to use NLB to cluster services such as web, ftp, etc. and I've successfully set it up before.  I'm now setting up an ADFS farm and I got to wondering how do I make NLB itself highly available?  If my lone NLB server were to fail, I lose access to all the clustered services.  I've googled around for the best answer, but the results are for how to create clusters the NLB will manage.

I am thinking the best answer is to failover cluster my NLB server?  When I search for failover cluster nlb, I just get results talking about the difference between the two types of clustering.  Can you failover cluster NLB?  If so, how?  If not, what is the proper way to make NLB highly available.

Thanks for your help in advance.

2012 R2 Guest Cluster Network Failure

$
0
0

We have a 2 node guest cluster (2012 R2) using a shared VHDX located on a CSV providing resilient File Server services and it seems to work well....most of the time.

We've had two instances lately where the File Server cluster has failed due to "network" issues where neither node can see each other and both are removed from the cluster. The problem is that we can't see any other VM reporting networking problems at the same time.

We did some Firmware and Driver updates on the physical nodes recently to resolve a known problem with VMQs an thought that our problems were solved. unfortunately we had a re-occurrence of the problem this morning so we seem to be back to the drawing board.

Has anyone else had similar problems with Guest Clusters in 2012 R2?

Cheers for now

Russell

Failover Cluster on Server 2012 r2

$
0
0

Hi

Does anyone knows whether Failover Clustering is available on server 2012 r2 standard?

Thank you!

CNO, VCO clustering in 2012 R2

$
0
0
I built a cluster, and the CNO is created but the VCO and proper permissions of creating objects is not granted for that. Becasue I will create a Listener for SQL server HA, the question how can I automate when the CNO is created to have that permissions so that not to go through the manual process of adding CREATE OBJECT ,READ, etc tot he CNO?

Live migration of 'Virtual Machine ADVM-01 ' failed. Event ID : 21502

$
0
0

I've HA Cluster running on Windows 2012 R2 with configured fail over cluster. it's running Windows 2008 , 2008 R2 , 2012 VMs.

already installed the Integration Services. when i tried to Live Migrate to other Node , it's getting failed.

in the event viewer below error message shows.

" Live migration of 'Virtual Machine ADVM-01' failed.

Virtual machine migration operation for 'ADVM-01' failed at migration source 'NODE01'. (Virtual machine ID D840382C-194B-4B4F-8BF5-19552537D0EF)

'ADVM-01' failed to delete configuration: The request is not supported. (0x80070032). (Virtual machine ID D840382C-194B-4B4F-8BF5-19552537D0EF) "

please advise me.


Regards, COMDINI

Quorum Failover Cluster - Windows Server 2008 r2

$
0
0

Hello,

I have a fileserver cluster environment with two members, NODEA and NODEB. The cluster name is FS01.

We have the quorum disk in the cluster that is active on owner node. 

My question is, when we perform the failover of cluster resources for NODEB, the quorum disk must migrate to NODEB too? If the quorum disk is not migrated instantly, that means we have a problem in the cluster?

Another question is, my cluster is making failover of resources with a certain frequency, it is possible to detect the cause? it is possible to increase the cluster threeshould?

Thank you

Sharepoint Website responding very slow, using windows server 2012 Network Load Balancing

$
0
0

Hello Team,

Greetings for the day!

I have 2 Windows server 2012, with Network Load Balancing role enabled on it, on both the server sharepoint 2010 R2 (Sharepoint Farm) is installed. I have enabled Network load balancer, with total of 5 IPS assigned between those 2 server.

The sharepoint site is working very slow some time (1min 30 sec), and sometime it respond very quickly(10 sec).

I have verified both the server performance which is more than good.

Me not sure what can I troubleshoot it further, I am also not sure how to check which server the request is going.

Help me coming out of this situation and optimize the performance.


Paresh Jain

The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted

$
0
0

Server 2012 R2 on beefy PowerEdge R720 dual cluster

CAU worked fine on my cluster long time ago when there was few simple VMs.

As the setup got a big more complicated (more VMs with more disks, more backend iSCSI storage, some VMs in cluster setup), CAU just does not work reliable at all.

So this time I was doing hosts WU by hand (migrated VMs to Host A, updated Host B, restarted, updated again as some updated failed, restarted fresh again, Resume/Do not fail back roles (to be on the safe side...)

Then I selected some running VMs on Host A & Live Migration them back to Host B... at which point Host B (one just updated & freshly rebooted) thrown a fit & killed all VMs that were selected to be moved to it...

"The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue."

Cluster service did restart & did bring the machines up, but that is less then user friendly, had to select them one more time to migrate (and this time they did)

But the whole solution just feels not like Enterprise product (in fact it would not be acceptable even for home use)

Do not expect any miracle solution, but if anybody has any experience to chip in, it would be appreciated

Seb

Processor Information for Lizard

$
0
0

I've just put together a small cluster, with all nodes using Intel E5-2665 cpu's, and all running server 2012 R2. I'm trying to benchmark it using Lizard, but it is asking for information on the CPU.

"To calculate the theoretical efficiency of your HPC cluster, Lizard needs to determine the number of floating-point operations per clock cycle that each core in the processor of the head node of your HPC cluster is capable of performing, and uses that information as a reference for the rest of the compute nodes in your cluster. If Lizard cannot automatically determine this information about the processor, you will be required to provide it."

When I look here (https://msdn.microsoft.com/en-us/library/ee146526(v=ws.10).aspx) on the Microsoft help page, it just says look at the manufacturers website. I've done that and can't see the information I need.

Does anyone know where to look for the necessary info please?

Many thanks.

Long time of creating VM checkpoint on file share storage

$
0
0

Hi,

Why VM checkpoint creating time so long ~6min  on file share storage ?

RAM of VM 10Gb

Same VM on local disk checkpoint creating time <3s

When I reduce RAM size of VM to 1Gb - checkpoint time <1min

When checkpoint was creating I looked to file share storage server and Hyper-v server performance monitor and found that disk and network activity was loaded as usual.

Thanks

Node being removed from Windows Server 2012 R2 cluster

$
0
0

Windows Server 2012 R2 multi-site cluster with 5 nodes.  Node 1 (at the main site) fails multiple times a day with the same issue in the cluster log as shown below.  All the other nodes log missed heartbeats.  If it were latency I would expect all nodes to fail at one point or another but node1 is the only one that drops out.  The servers are all the same model with the same drivers.  The switch ports show no errors.  I see no UDP dropped packates in perfmon.  I have checked everything in the following blog: http://blogs.technet.com/b/askcore/archive/2012/02/08/having-a-problem-with-nodes-being-removed-from-active-failover-cluster-membership.aspx and this one: http://blogs.technet.com/b/askcore/archive/2012/07/09/having-a-problem-with-nodes-being-removed-from-active-failover-cluster-membership.aspx.

I do not want to change the cluster delay and threshold values except as a last resort.  The ping times between sites is down in the 3-5ms range even during the issue which is well below the recommended range.

Does anyone know what causes the failure that seems to kick off the issue?
"Failed to retrieve the results of overlapped I/O: (10054)"


Node1:
00000284.00002498::2015/06/15-05:58:35.919 DBG   [CHANNEL 169.254.x.x:~3343~]/recv: Failed to retrieve the results of overlapped I/O: (10054)
00000284.00002498::2015/06/15-05:58:35.919 DBG   [CHANNEL 169.254.x.x:~3343~] Closing due to error: (0).
00000284.00002498::2015/06/15-05:58:35.919 DBG   [CHANNEL 169.254.x.x:~3343~] Close().
00000284.00002498::2015/06/15-05:58:35.919 WARN  [CHANNEL 169.254.x.x:~3343~] failure, status (0)

All the other nodes:
00003700.000025a4::2015/06/15-05:58:33.825 DBG   [NETFTEVM] FTI NetFT event handler got event: LocalEndpoint 10.x.x.x:~3343~ has missed two consecutive heartbeats from 10.x.x.x:~3343~
00003700.000025a4::2015/06/15-05:58:33.825 DBG   [NETFTEVM] TM NetFT event handler got event: LocalEndpoint 10.x.x.x:~3343~ has missed two consecutive heartbeats from 10.x.x.x:~3343~


Hi, Is it Posible to Cluster Forest Trust Servers?

$
0
0

Hi, i am kind having problem trying to Cluster 2 Servers from different Domains. I had both also in SCVMM but i was wondering if is possible to cluster them also.

Hope to hear a good news from anyone.

Thanks a lot.

CSV - System Volume Information Problem

$
0
0

Hi,

Not sure if this is the correct place to ask my question...

We are having an issue with the System Volume Information on one of our Cluster Shared Volumes growing to a large size.
It currently sits at 598GB. 

We have two hosts - Three volumes, C:\ClusterStorage\ - Volume1 - Volume2 and Volume3.

The issue only appears to be happening on Volume2 in the System Volume Information folder and not sure what's causing it.

I have checked Shadow Copies on both hosts and all are disabled.

I am hoping someone could point me in the direction to figure out what is causing the consumption of lots of space.


We run a daily windows backup on each host to separate USB drives also.

Thanks,
Adam



Network Configuration for SOFS & Hyper-V Cluster

$
0
0

Hello,

Nice opportunity for a good laugh here..potentially.

Looking to deploy an SOFS and HV cluster over the coming summer. Going through the network configuration and have come up with the following (link to image below). Each cluster is comprised of 2 nodes, so nothing too heavy on the network side of things. 

SOFS cluster will utilise SMB 3.0 app shares, on a JBOD.

Would welcome suggestions/bad mistakes/judgement calls..

Network Configuration

Live Migration of Server 2012 R2 Remote Desktop Server Disconnects Users

$
0
0

Hi. I have a 2 node Server 2012 R2 failover cluster and amongst the services running on this cluster is a Server 2012 R2 remote desktop server (this has the session host, web access, licensing and connection broker roles). When I move the 2012 R2 remote desktop server to the other cluster node using live migration, users are disconnected from RDS and get an error to say "failed to reconnect session". Users can immediately, manually reinitiate their connection to the server.

I have other 2012 R2 servers running on this cluster, which don't have the remote desktop services roles installed, and these servers do not exhibit the same behaviour of disconnecting users after a live migration.

Have found only a few references to similar issues on a couple of other TechNet discussions, but no resolution etc...

I'm guessing this is not expected behaviour and there is some kind of configuration issues somewhere?

no disks suitable for cluster disks were found

$
0
0

I have a 2 node cluster and I added a 50GB volume to both nodes. but in failover cluster manager, adding disks does not show the disk.

I see the disk in disk manager

Viewing all 4519 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>