Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

Can we "suspend" a Hyper-V cluster while keeping all VM running

$
0
0

We have multiple large Hyper-V clusters with 100's of VM and the network people need to perform maintenance that will take ~ 10 minutes with unpredictable network connectivity. The only option I can find is to shutdown / save all VM - The cluster is already tuned to ride the maximum allowed network failure of 45 seconds (SameSubnetThreshold=30 & SameSubnetDelay=1500). Is there a way to "suspend" the entire cluster such that VM continue running (All VHDx are thick so not VHDx expansion needed)? VM will also lose network, but they would handle it like when a stand-alone server looses network - up to the application.

IE temporarily configure it to take no action and ignore all failures? VMWare appear to have a way to pause the entire cluster functionality (disable Host Monitoring) without affecting running VM.


Problem with deleting custom resource (Other server) type in Windows 2012 R2

$
0
0

Hi

I created custom resource type dll by SDK sample (ClipbookServer.dll).

For registering resource type to 2-node cluster using PowerShell Add-CustomResourceType. After execute Add-CustomResourceType ClipbookServer.dll appears in C:\Windows\Cluster directory in all cluster noded.

But after execute Remove-CustomResourceType ClipbookServer.dll not deleted from C:\Windows\Cluster.

How can perform automatically delete the file from c:\windows\cluster on all cluster node?

Windows 2012 R2 DHCP cluster changing mode

$
0
0
In a Windows 2012 R2 DHCP cluster hot-standby mode can you switch the modes between servers? I want the current standby server to be the primary and the primary to standby without losing the current zones.

windows load balancing

$
0
0

Hi gys

does windows server works as load balancer  or just works for high availability ?

I tried NLB for two web servers, just one of servers (who has top priority) answers me,so there is no load balancing?

thanks

Live migrations fail during drain from Cluster-Aware Updating

$
0
0

We're trying to implement Cluster-Aware updating but we keep running into issues where virtual machine migrations fail to migrate.

Our cluster(s) have plenty of memory allowing frequently for 2 nodes of a 6 node cluster to be completely devoid of roles. We've kicked off CAU, it patches and reboots the nodes with no roles and then moves onto the others. While attempting to drain one of the remaining nodes, it will kick off live migrations (no low priority roles). Since our max migration value is 2, we will continually get 21501 warnings as it works through the list. Towards the end, and only occasionally, the last few will fail with a 21502 due to not enough memory. This then hangs the drain until manual intervention.

21501
Live migration of 'SCVMM BRMWD-SPDEV02' failed.
Virtual machine migration operation for 'BRMWD-SPDEV02' failed at migration destination 'BRMWD-HYPV02'. (Virtual machine ID 2A4EC899-079C-4355-A503-F097FAF33E2B)
Failed to perform migration on virtual machine 'BRMWD-SPDEV02' because virtual machine migration limit '2' was reached, please wait for completion of an ongoing migration operation. (Virtual machine ID 2A4EC899-079C-4355-A503-F097FAF33E2B)

21502
Live migration of 'Virtual Machine BRMWT-FE01' failed.
Virtual machine migration operation for 'BRMWT-FE01' failed at migration destination 'BRMWD-HYPV02'. (Virtual machine ID 385026E5-7B2F-46EA-ADFE-EF854F76A4FE)
'BRMWT-FE01' could not initialize. (Virtual machine ID 385026E5-7B2F-46EA-ADFE-EF854F76A4FE)
Not enough memory in the system to start the virtual machine BRMWT-FE01 with ram size 2048 megabytes. (Virtual machine ID 385026E5-7B2F-46EA-ADFE-EF854F76A4FE)

I know we could likely just increase the number of live migrations to get around this or even assigning all VMs to preferred owners to keep the cluster more balanced. This is unfounded but it seems like when a CAU drain is initiated it is picking a static host to move all VMs to rather than using the best possible node on each migration.

Can someone confirm for me if this is accurate or if there is any way of changing this?


2012R2 SOFS with high Disk Response Times and Hyper-V VM peformance

$
0
0

We are running a 2012R2 Cluster with SOFS serving up Hyper-V.  Dell Hardware.  In the Event Logs |Application and Services Logs|Microsoft|Windows|SMBServer we're seeing hundreds of repeated Warning Events SMBServer Event ID1020

File system operation has taken longer than expected.

Client Name: \\[fe80::d199:8860:d21:d7d7]
Client Address: [fe80::d199:8860:d21:d7d7%26]:49353
User Name: XXXXX\CLIUSR
Session ID: 0x80C0400000061
Share Name: \\*\b03a302b-1fdc-4c75-8c79-25d058749253-135266304$
File Name: SHARES\SW01DATAVOL2\ts15075a59.NNN.XXXXXX.com\Virtual Machines\C92B52CC-5739-4747-B6AE-CF4725B0505E\C92B52CC-5739-4747-B6AE-CF4725B0505E.vsv
Command: 11
Duration (in milliseconds): 208159633
Warning Threshold (in milliseconds): 120000

Guidance:

The underlying file system has taken too long to respond to an operation. This typically indicates a problem with the storage and not SMB.

The Disk Response(ms) times are very as shown in the Task Manager Resource Monitor.  Currently in the 300-1000ms.  This is occurring on Standalone 2012R2 Storage Spaces servers along with the Clustered SOFS.  Performance of the VM is very bad if even able to logon.  Most times the servers become inaccessible and kick current user off the system.  We previously saw this in 2012R2 Clustered Storage Spaces in 2014.    Anyone else aware of this issue.

We had a MS ticket on it back in 2014 but dropped the case when it became to time consuming and returned the hardware. We could never get past the MS Tier1 and Tier2 Engineers to get the ticket escalated.  If I remember correctly the issue had to do with Disk Cache Flushing.    My understanding is that MS created a patch to resolve the issue for another company but since our ticket wasn't elevated MS was unaware of our issue until later.

Thanks

Update to this:  Back in 2014 when we went to a MS meeting the term was "excess disk cache flushes".  

This blog  https://blogs.msdn.microsoft.com/clustering/2014/06/05/cluster-shared-volume-performance-counters/

The perf counters for Cluster CSV File System Flushes.  The values for the 4 volumes are

401,161     272,914  115,836   778,944    

These seems to be very high but I don't have a gauge to determine it.  


Dave Kreitel


Heartbeat Configuration

$
0
0
for a windows2012 cluster is it necessary to have a private network configuration? sep nic, separate vlan? I see conflicting articles. What is msft stance now?

Windows 2012 R2 Cluster Issue

$
0
0

Dear All,

We are facing cluster issue on windows 2012 R2 cluster. We have configured Windows guest clustering on VMware 5.5 and getting below error on one of the cluster disk.

"Cluster resource 'Cluster Disk 4' of type 'Physical Disk' in clustered role 'KMPRODDCTMCSSRV' failed. The error code was '0xaa' ('The requested resource is in use.').

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet."

Regards,

Hakim. B


Hakim.B Sr.System Administrator


setting up Network cards for Failover Clustering

$
0
0

I’m hoping I can get some advice on setting up physical network cards in a failover cluster environment. We have 2 nodes that use SAS to connect to the storage controller. The 2 nodes each have a 4 port 10G network adapter card. My plan is to setup a 2 node failover cluster teaming NICs 1-2 and creating a converged network to be used for Live Migration, Host Management, and CSV. NICs 3-4 will be teamed for the virtual networks used by the guest VMs.

The other way is to team all 4 network cards and create converged networks for all 5 networks.

Please advise

Move cluster to another location

$
0
0

Hi

I have build windows 2012 R2 cluster and it is all working fine. However both nodes in the cluster have to be physically moved to another location.

Never done something like this.

Location that I am supposed to move servers in just few meters again. Would it be proper solution to shut one node first move it  to another location and bring it back and then shut another node

Servers at this moment dont have any roles or even disk attached to it so basically it is just clean cluster with nothing on it. is it ok just shut down and move. As I said no even shared disk no quorum it is just bare cluster


Dalibor Bosic

Not able to see cluster resources in the failover cluster manager

$
0
0

We are using Windows Server 2008 R2 and configured SQL Server Cluster services.

Randomly every 10-15 days  cluster service gets hang and to load resource list in Failover Cluster manager it takes very long time(almost 10-15 mins). When this issue is happening then in SQL server if we try to add any anew DB then it does not allow us to create and query run on and on.

From the SQL Server logs we came to know waittype as PREEMPTIVE_CLUSAPI_CLUSTERRESOURCECONTROL it means cluster service is hung some where.

Not sure how to troubleshoot this issue, I have checked event log for cluster service but not getting any clue.

When we restart our server then cluster server becomes normal and after 10-15 days it again starts behaving same way.

Need help how to find out the reason where cluster service is hung?

Cluster shared volume disappear... STATUS_MEDIA_WRITE_PROTECTED(c00000a2)

$
0
0

Hi all, I am having an issue hopefully someone can help me with. I have recently inherited a 2 node cluster, both nodes are one half of an ASUS RS702D-E6/PS8 so both nodes should be near identical. They are both running Hyper-V Server 2008 R2 hosting some 14 VM's.

Each node is hooked up via cat5e to a PromiseVessRAID 1830i via iSCSI using one of the servers onboard NICs each, whose cluster network is setup as Disabled for cluster use (the way I think it is supposed to be not the way I had originally inherited it) on it's own Class A Subnet and on it's own private physical switch...

The SAN hosts a 30GB CSV Witness Disk and 2 2TB CSV Volumes, one for each node labeled Volume1 and Volume2. Some VHD's on each.

The Cluster Clients connect to the rest of the company via the Virtual ExternalNIC adapters created in Hyper-V manager but physically are off of Intel ET Dual Gigabit adapters wired into our main core switch which is set up with class c subnets.

I also have a crossover cable wired up running to the other ports on the Intel ET Dual Port NICs using yet a third Class B Subnet and is configured in the Failover Cluster Manger as internal so there are 3 ipv4 Cluster networks total.

Even though the cluster passes the validation tests with flying colors I am not convinced all is well. With Hyperv1 or node 1, I can move the CSV's and machines over to hyperv2 or node 2, stop the cluster service on 1 and perform maintenance such as a reboot or install patches if needed. When it reboots or I restart the cluster service to bring it back online, it is well behaved leaving hyperv2 the owner of all 3 CSV's Witness, Volume 1 and 2. I can then pass them back or split them up any which way and at no point is cluster service interrupted or noticed by users, duh I know this is how it is SUPPOSED to work but...

if I try the same thing with Node 2, that is move the witness and volumes to node 1 as owner and migrate all VM's over, stop cluster service on node 2, do whatever I have to do and reboot, as soon as node 2 tries to go back online, it tries to snatch volume 2 back, but it never succeeds and then the following error is logged in cluster event log:

Hyperv1

Event ID: 5120

Source: Microsoft-Windows-FailoverClustering

Task Category: Cluster Shared Volume

The listed message is:Cluster Shared Volume 'Volume2' ('HyperV1 Disk') is no longer available on this node because of 'STATUS_MEDIA_WRITE_PROTECTED(c00000a2)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Followed 4 seconds later by:

Hyperv1

event ID: 1069

Source: Microsoft-Windows-FailoverClustering

Task Catagory: Resource Control Manager

Message: Cluster Resource 'Hyperv1 Disk in clustered service or application '75d88aa3-8ecf-47c7-98e7-6099e56a097d' failed.

- AND -

2 of the following:

Hyperv1

event ID: 1038

Source: Microsoft-Windows-FailoverClustering

Task Catagory: Physical Disk Resource

Message: Ownership of cluster disk 'HyperV1 Disk' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

Followed 1 second later by another 1069 and then various machines are failing messages.

If you browse to\\hyperv-1\c$\clusterstorage\ or\\hyperv-2\c$\Clusterstorage\, Volume 2 is indeed missing!!

This has caused me to panic a few times as the first time I saw this I thought everything was lost but I can get it back by stopping the service on node 1 or shutting it down, restarting node 2 or the service on node 2 and waiting forever for the disk to list as failed and then shortly thereafter it comes back online. I can then boot node 1 back up and let it start servicing the cluster again. It doesn’t pull the same craziness node 2 does when it comes online; it leaves all ownership with 2 unless I tell I to move.

I am very new to clusters and all I know at this point is this is pretty cool stuff but basically if it is running don’t mess with it is the attitude I have taken with it but there is a significant amount of money tied up in this hardware and we should be able to leverage this as needed, not wonder if it is going to act up again. 

To me it seems for a ‘failover’ cluster it should be way more robust than this...

I can go into way more detail if needed but I didn’t see any other posts on this specific issue no matter what forum I scoured. I’m obviously looking for advice on how to get this resolved as well as advice on whether or not I wired the cluster networks correctly. I am also not sure about what protocols are bound to what nics anymore and what the binding order should be, could this be what is causing my issue?

I have NVSPBIND and NVSPSCRUB on both boxes if needed.

Thanks!

-LW

Live migration of 'Virtual Machine ADVM-01 ' failed. Event ID : 21502

$
0
0

I've HA Cluster running on Windows 2012 R2 with configured fail over cluster. it's running Windows 2008 , 2008 R2 , 2012 VMs.

already installed the Integration Services. when i tried to Live Migrate to other Node , it's getting failed.

in the event viewer below error message shows.

" Live migration of 'Virtual Machine ADVM-01' failed.

Virtual machine migration operation for 'ADVM-01' failed at migration source 'NODE01'. (Virtual machine ID D840382C-194B-4B4F-8BF5-19552537D0EF)

'ADVM-01' failed to delete configuration: The request is not supported. (0x80070032). (Virtual machine ID D840382C-194B-4B4F-8BF5-19552537D0EF) "

please advise me.


Regards, COMDINI

SMB Access denied for Cluster Role Resource

$
0
0

Dear All,

   I have Window 2008 R2 File Server Fail over cluster which is having in Production. As part of DR fail-over test i have created another stand alone Windows 2008 R2 Server with File Server role enabled. 

currently File Server disk (Disk) replication to DR with 3rd party product and during fail-over productioncluster role offline and attaching production disk to DR stand alone machine

Once disk attached to the DR host then changing the "Cluster Role - DNS "A" record IP Address pointing to DR Server .

when the users are trying to access the user Home folder or shared folder user getting access denied error. tried the \\DNS and FQDN (the access denied error. )

when i login to any workstation or Server with local administrator try to access same SMB using \\DNS and FQDN name it's working fine.

Any idea? 

Failover Cluster Manager in Windows 10

$
0
0

I am trying to figure out if there is a way to connect through failover cluster manager on windows 10 to a failover cluster in a server 2008r2 failover cluster?.

I know the versions aren't compatible , with windows 10 being Version 10.0 and 2008r2 being version 6.1. Has anyone tried this or is having the same issues ?.

Thanks.




Can't add new node to existing failover cluster

$
0
0

Hi,

i have problem adding new node to existing failover cluster. Existing failover cluster is two node cluster with node and file share majority. i'm using this cluster for SQL AlwaysOn Availability group. There are no shared volumes.

when i use failover cluster manager console i'm getting error:

The server 'N3.local' could not be added to the cluster.
An error occurred while adding node 'N3.local' to cluster 'Cluster1'.

The parameter is incorrect

Also i have this error in Application and services log/Microsoft/FailoverClustering-Manager/Diagnostic:

Exception occurred in background operation - System.ApplicationException: An error occurred while adding nodes to the cluster 'Cluster1'. ---> System.ApplicationException: An error occurred while adding node 'N3.local' to cluster 'Cluster1'. ---> System.ComponentModel.Win32Exception: The parameter is incorrect
   --- End of inner exception stack trace ---
   at MS.Internal.ServerClusters.ClusApiExceptionFactory.CreateAndThrow(Cluster cluster, Int32 sc, String format, Object arg0, Object arg1)
   at MS.Internal.ServerClusters.Cluster.AddNode(String nodeName, ClusterActionCallback callback)
   at MS.Internal.ServerClusters.Configuration.AddNodeManagement.AddNodes(ActionArgs actionArgs, ActionUpdateHelper updateHelper)
   --- End of inner exception stack trace ---
   at MS.Internal.ServerClusters.Configuration.AddNodeManagement.AddNodes(ActionArgs actionArgs, ActionUpdateHelper updateHelper)
   at MS.Internal.ServerClusters.Configuration.AddNodeManagement.PerformAddNodes(ActionArgs actionArgs)
   at MS.Internal.ServerClusters.Configuration.ConfigurationBase.PerformActionWrapper(BackgroundOperationStatus backgroundOperationStatus, BackgroundOperationArgs parameter)
   at MS.Internal.ServerClusters.BackgroundOperation`2.BackgroundOperationProc(Object state)


i have tried to add node from powershell with same error (parameter is  incorrect). I have tried to remove Failover cluster role and add it again but i'm still getting the same error.

Please advice,

Thank you

Node in cluster - status changes to "paused"

$
0
0

We have seven Windows 2012 R2 nodes in a Hyper-V cluster. They are all identical hardware (HP BladeSystem). For a while, we had only six nodes, and there were no problems.

Recently, we added the seventh node, and the status keeps reverting to "paused". I can't find any errors that directly point to why this is happening - either in the System or Application log of the server, in the various FailoverClustering logs, or in the Cluster Event logs. I created a cluster.log using the get-clusterlog command, but if it explains why this is happening, I can't figure it out (it's also a very large file - 150 MB, so it's difficult to determine what lines are the important ones).

As far as I can tell, everything on this new node is the same as the previous ones - the software versions, network settings, etc. The Cluster Validation report also doesn't give me anything helpful.

Any ideas on how to go about investigating this? Even before I can solve the problem, I'd like to at least know when and why the status reverts to paused.

Thanks,

David

Not able to see cluster resources in the failover cluster manager

$
0
0

We are using Windows Server 2008 R2 and configured SQL Server Cluster services.

Randomly every 10-15 days  cluster service gets hang and to load resource list in Failover Cluster manager it takes very long time(almost 10-15 mins). When this issue is happening then in SQL server if we try to add any anew DB then it does not allow us to create and query run on and on.

From the SQL Server logs we came to know waittype as PREEMPTIVE_CLUSAPI_CLUSTERRESOURCECONTROL it means cluster service is hung some where.

Not sure how to troubleshoot this issue, I have checked event log for cluster service but not getting any clue.

When we restart our server then cluster server becomes normal and after 10-15 days it again starts behaving same way.

Need help how to find out the reason where cluster service is hung?

active-active to active-passive

$
0
0

hi
I want to know how I can change the active-active cluster role to active-passive?

in other words I want to change active server to passive server

how to remove the Client Access point from cluster?

$
0
0

hi, experts

windows 2012 cluster, i added 1 unuseful Client access point to my role, how to remove it?

Regards

Garey

Viewing all 4519 articles
Browse latest View live