Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

Application hang / no access to C:\ClusterStorage Windows Server 2012 R2

$
0
0

Hello tech-guys

we have a "just another hang" problem.

I am going to build a 4 node hyper-v failover cluster. At the moment we have two nodes. 

Node 1 & Node 2 Windows Server 2012 R2 Datacenter Eng - Networks:

1. Live-Mig, 2. Management, 3. Cluster, 4. CSV,5.  SAN1,6. SAN2 (SAN 1 + 2 = MPIO)

Our SAN Server is installed with OPEN-E DSS 7.

From the beginning till now i set up 6 CSV Volume (CSV01, CSV07 to CSV11). Everything works fine. Clustervalidation shows no errors, migration works... fine!

Now - I will add another CSV Volume (CSV02), connect mpio iscsi and try test as "local" volume F:\. Works fine, copy with 850 MB/s. But then I try to add these volume to failovercluster and to CSV. At the point i`ve added the volume to csv , it was impossible to access c:\clusterstorage with explorer, cmd or powershell. All Hyper-V Roles (VMs) works and no ressource went offline. No Event (accept explorer.exe hang). Nothing!!!! :-( I disabled the CSV02 and C:\clusterstorage is accessable again.

All my iscsi targets are on the same open-e cluster. Only one iscsi volume causes crashing!

I dont think that the iscsi connection is the problem. At the local test it works fine and 6 other csv's with the same config works fine. 

Thank you for any help

best

steffen


Clustered role 'Cluster Group' has exceeded its failover threshold.

$
0
0

Hello.

I’m hoping to get some help with a cluster issue I’m having using Windows Storage Server 2012.

When the cluster is created my Cluster Core Resources are all happy and online.

I can more the Cluster Name using “move Core Cluster Resources” between the two nodes without any problems.

If I select ‘Simulate Failure’ on the IP Address resource, it works the first time

If I do it again shortly after it fails and I get an Event ID 1254, 1205 and 1069.

Event ID 1254

Clustered role 'Cluster Group' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure.  After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

Event ID 1205

The Cluster service failed to bring clustered service or application 'Cluster Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.

Event ID 1069

Cluster resource 'Cluster IP Address' of type 'IP Address' in clustered role 'Cluster Group' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Basically I’m trying to simulate a network failure to make sure the failover kicks in.

If I click on it and ‘Bring Online’ it comes up fine.

Where do I find this Threshold Policy and set it to initiate failover if the IP Address resources fails?

Thank you in advance for your help.

Cluster Failover

$
0
0

Hello,

I have a cluster with two nodes. Is there a way to know exactly when the last failover happened, like an event or something like that?

Thanks & Best Regards,

Fabio S Rodrigues.

Windows Server 2012 R2 with NLB/OCSP - Core/GUI

$
0
0
I try to understand what i have done wrong.

I have two Windows Server 2012 R2 with a OCSP Cluster and a NLB Cluster. I removed the Server GUI after configuration. Now i wanted to look up some configuration for my NLB so i installed the gui again with

Install-WindowsFeature Server-Gui-Mgmt-Infra,Server-Gui-Shell –Restart –Source c:\mountdir\windows\winsxs

Now i can't launch my OCSP or NLB GUI. I don't see a shortcut in Administrative Tools and i can't launch nlb.exe for example but my pkiview shows me ocsp is still functioning.

Looking at Server Manager the role for OCSP and feature for NLB is still installed. Do i have to do some additional steps to get the gui back for my applications?

CSV Online But Missing in Disk Management

$
0
0

Bit of a weird one to me...

I have a CSV on a Server 2012 (non-R2) Hyper-V cluster node that is online and accessible at c:\ClusterStorage\CSVolume1 but it does not show up in Disk Management.

Get-ClusterSharedVolume | select -ExpandProperty sharevolumeinfo shows no maintenance mode or redirected access?

how can this be?

thanks,

Dan

Manage MSMQ is missing from Failover Cluster Manager when configured using powershell

$
0
0

Hi,

I am hoping someone would be able to help me as I have looked on the internet for an answer to this. We deploy a number of servers that are configured using Powershell. I am in the process of creating a clustered WIN2K8R2 cluster with MSMQ. I am able to do this successfully through the Failover Mgr with no issues.  In addition, I can do this via Powershell (code listed below) with one caveat.

However, when I create the same exact MSMQ in Powershell, I am unable to right click on the MSMQ service to manage it as the "Manage MSMQ" is missing when I right click on it. The settings are the same, including dependencies. The only difference I have been able to find is the icon in the Failover Manager shows the Service as a Generic Service icon when created in Powershell, but when it is created in the GUI it shows up as the MSMQ icon. I was able to verify this in the registry in HKLM\Cluster\Groups\<GUID>\: GroupType HEX: 68 for msmq icon. When it is the Generic Service icon it is HEX: 270f. When I change it from 270f to 68, the icon changes in Failover Manager and I am able to open, but then I get an invalid handle and I am unable to manage it.

This is causing an issue, because I want to automate this build and hand it over, but they would be unable to manage it except by programming which the operators are not ready for.

Here is the code which I have created in Powershell:

    Write-host "Configuring MS MSMQ Cluster Failover..."
    $CluName = "Cluster Name"
    $ClsMSMQName = $CluName.Name + "MSMQ"
    $ClsMSMQResourceName = "MSMQ-" + $ClsMSMQName
    $Response = Read-host "Enter the IP Address of the Clustered MSMQ"

    $ClsIpRes = get-clusterresource "Cluster IP Address"
    $MSMQIpAddr = New-Object Microsoft.FailoverClusters.PowerShell.ClusterParameter $ipres,Address,$Response

    Add-ClusterServerRole -Name $ClsMSMQName -Storage "Cluster Disk" -StaticAddress $MSMQIpAddr.value
    # Add the MSMSMQ Service to the new Server Role
    Get-ClusterGroup $ClsMSMQName | Add-ClusterResource -Name $ClsMSMQResourceName -ResourceType "MSMQ"
    # Create Dependencies for the MSMQ group
    Add-ClusterResourceDependency $ClsMSMQResourceName $ClsMSMQName
    Add-ClusterResourceDependency $ClsMSMQResourceName "Cluster Disk"
    # Start MSMQ group
    Start-ClusterGroup $ClsMSMQName

You would just have to change "Cluster Disk" and "Cluster Name".

Thank you


Hyper-V with SOFS Appalling Write Performance

$
0
0

Hi,

We are testing our new scale out file server and are having major issues with performance.  

We have 2 server clustered running the sofs attached to a shared jbod with 8x 4tb sas drives.  We have pooled and added the disks to a csv and begun hosting test vm's on the new share.

When testing the volumes locally we get 1,000mb/s of throughput read and write in a spanned volume.

When testing accessing the same volume via the sofs share we are seeing 400mb/s of read and 10-50mb/s of write!!!

This performance is completely unacceptable.  Is this due to the sofs forcing write through and no write caching?

All servers have quad port 10gb NIC's with SMB direct.  The NIC's and network infrastructure have been tested and all run at line speed.  The performance dip is clearly as a result of the sofs tech.

If we create a standard share and copy between servers we get full speeds again.

All servers have all the latest updates and hotfixes applied.  

Any suggestions as this is going to completely kill and hopes we had of using this technology.

Windows Server 2008 R2 SP1 2-Node cluster - Replace failed node

$
0
0

Hi -  I have a two node Windows Server 2008 R2 SP1 fail-over cluster (DHCP, File, Print) where one of the nodes have failed beyond recovery. What I would like to do is to evict the failed cluster node and install a new machine with Windows Server 2008 R2  SP1 and re use the same name and Ip adress and then join this machine as a node in the cluster. 

Is there any recommended steps to do this, i'm mostly thinking about the part of re-using the same name and ip address for the new node? (e.g. is there any cleanup more than evict the node?)


Enfo Zipper
Christoffer Andersson – Principal Advisor
http://blogs.chrisse.se - Directory Services Blog



can you have a cluster within a cluster?

$
0
0

Hi,

I have a Hyper-V Cluster with 6 nodes.

is it possible to have SQL VMs running within this Hyper-V HA cluster and also hosting an SQL always on cluster?

the SQL DBs will replicate to another server that is not within the Hyper-V cluster.

Thanks

Losing CSV LUNS during back-up

$
0
0

We have a problem that during the back-up of our 4 node Hyper-v cluster, the CSV's are placed in redirected mode and do not return back in the normal state when the back-up finishes on one of the nodes. When a new back-up is started on one of the other nodes in the cluster, normally the back-up should not start because the luns are locked by the other host because this host has the luns in redirected mode. In Our case the back-up is starting on the other host and after trying to claim the luns our csv's are disappearing on the cluster and the vm's are crashing. Has anyone experienced the same issue? Is there a solution?

We are using Hyper-v 2008 R2 with HP Dataprotector 8.x and 3PAR VSS Provider.

Strange problems with 2012 R2 Hyper-V Server creating cluster

$
0
0

I wasn't sure which forum to put this in to be honest, as it could be a general server issue, Hyper-V or clustering. 

I'm having a really frustrating issue with Hyper-V Server 2012 R2. I'm currently in the process of upgrading our old 2008 R2 Hyper-V cluster based on HP DL380 G7 servers using iSCSI through a HP P2000 SAN. To do this, I've removed a single node from the old cluster and have rebuilt it using 2012 R2 and created a new fresh cluster with new LUNs on the SAN.

The server has 4 onboard Broadcom NIC ports, and 8 PCI-E card Intel NIC ports. I'm using the latest HP ProLiant Support Pack, which has updated the BIOS and firmware to the latest version, as well as the drivers. I've also manually inserted the latest INF for the Intel NIC.

At first the build goes smoothly, up until the point I create the failover cluster. I start to lose network connectivity to the server, and even processes run locally on the server fail and time out. The symptoms I've come across are:

-Cannot log in through RDP (errors in logs about winlogon terminating unexpectedly)

-Constant error logs about iSCSI disconnecting and reconnecting

-Disk management from client works for a bit and then starts to time out and not connect

-Any changes in Hyper-V Manager from a client times out

-Broadcom Management Suite (BACS 4) times out connecting to localhost when run locally from console

-Powershell runs really slowly when run locally from console

I've disabling offloading, RSS, VMQ and IPv6, and I've tried building the server without using the Proliant Support Pack and using the default Microsoft drivers, but it did exactly the same thing.

I've built the server using a full copy of Server 2012 R2 instead of Hyper-V Server using the exact same build process and it worked fine. 

Any one got any ideas?


what different with failover cluster and HPC, can i install the SQL server on HPC for high availability?

$
0
0

why i found there have some application can use the failover and HPC same time, can i use HPC as te SQL server high availability? what is the excatly different?

Cannot add storage to new cluster

$
0
0

Hello,

I have a newly created failover cluster based on Hyper-V Server 2012. I currently only have one node, as this is an upgrade process where I'll add the other nodes as I move virtual machines across from our old cluster.

The cluster consists of a HP DL380 G7 server with a HP P2000 iSCSI SAN. This setup is currently being used by our old cluster too without any problems under Hyper-V Server 2008 R2.

The server is fully patched and updated, and iSCSI and MPIO have been set up, both LUNs are showing (a large CSV volume and a smaller witness volume) in iSCSI and in Disk Management. I've brought the disks online, initialised and formatted them. They're both basic disks with a simple volume in NTFS.

However, when I try to add the the storage to the cluster, I get an error about no suitable disks being found. I've tried rebooting the server amd deleting and recreating the volumes but it doesn't help.

Any suggestions?

Kerberos errors on the CNO, error in Server Manager and can't use Veeam

$
0
0

So I think this error is the same as Ive had before on Server 2012, where you move the CNO into a different OU, then after 60 days when the password for the computer account expires you get into problems. This because some permissions makes it not possible to reset the password. You could always resolve this by simulating failure in the failover cluster manager and then repair it.

Now Im running Server 2012 R2, I read about a bug in which the repair function in FCM was not working correctly but this was supposed to be fixed in the big update in April, which I have installed.

I can simulate failure and then repair but it doesn't seem to make a difference. The CNO still lists with a kerberos security error in Server Manager, and I can't connect to the cluster with external programs such as Veeam. Im getting the feeling that the computer password for the CNO isn't synchronized in the KDC somehow.

At first the eventvwr mentioned that it could also be an SPN-issue since it was trying to call the CNO by its HTTP SPN that wasn't available, adding this manually didn't make a difference though.

The error Im getting in eventvwr is 0x80090322 KRB_AP_ERR_MODIFIED.

Anyone got any ideas?


Installing .NetFramework 3.5 in VMs

$
0
0

Hello!

Today I ran into rather strange issue: when I tried to add .NetFramework 3.5 to my virtual machines (Guest1 and Guest2, clustered) I got this error:

It looks like Windows doesn't see the Windows Server 2012 R2 iso file that was used for its own installation...

I then tried to add Windows Backup feature to rule out this possibility and all worked fine:

???

Thank you in advance,

Michael



Live Migrations of VMs with VHD disks / Long blackout times

$
0
0

Hi all, I'm having some issues with live migrations of virtual machines that have .vhd files in our Windows 2012R2 hyper-v clusters. When I perform a live migration of a virtual machine that has a .vhd disk file, it randomly pauses around 60% for about 90 seconds and then completes the migration. During the time that it pauses at or around 60%, the virtual machine no longer pings. If you look in the Hyper-v logs, it posts a 20417 Event and says that the vm had an unexpectedly long blackout time of 110 seconds.  It's always usually around 110 seconds. The live migration will complete but its causing problems for applications that run on the virtual machines, especially the SQL servers. It took awhile to narrow it down to where I only noticed this on vms with .vhd files. We had over 60 vms on a Windows 2008 R2 cluster that were migrated over to our Windows 2012 R2 hyper-v cluster. That's why we still have .vhd files on some vms.

Anyone else experiencing this problem? I'm looking for a solution to stop this from occurring. I understand that I can convert .vhd files to vhdx files, but I've got a lot of vms that I'd have to do this for.

Any feedback appreciated.

    

The IRPStackSize parameter on Windows 2012

$
0
0

Hello All:

 I need your help.

 Look at this article: http://support.microsoft.com/default.aspx?scid=kb;EN-US;285089

 It's related to a setting that we use on Windows 2008 R2 on Clustered Servers, with value: 20.

 Details: "The IRPStackSize parameter specifies the number of stack locations in I/O request   packets (IRPs) that are used by Windows 2000 Server, by Windows Server 2003, and by Windows XP.   You may have to increase this number for certain transports, for media access   control (MAC) drivers, or for file system drivers. Each stack uses 36 bytes of   memory for each receive buffer. This value is set in the following registry   subkey:

  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters

The   default value of the IRPStackSize parameter is 15. The range is from 11 (0xb  hexadecimal) through 50 (0x32 hexadecimal)"

 My question is if this parameter is still valid on Windows 2012 and if there is an official Microsoft article.

 Thanks in advance for your answers and comments.

Regards,


Felipe Román http://feliperoman.wordpress.com

DAG Kerberos Authentication Issue Exchange 2010 on 2008R2 Servers

$
0
0

I have 2 Exchange 2010 servers in a DAG. The witness server is in site A along with one the Exchange servers. The second Exchange server is in a DR site. The DAG has been functioning fine for 1.5 yrs. Last weekend after a scheduled reboot of all 3 servers involved (2 e-mail servers and the witness server), the e-mail server in the DR site cannot gain access to the witness share directory per the failover cluster manager. It says to check to see if the witness directory is on-line, etc... Using pings and explorer, there is no problem for the DR site e-mail server to contact the witness server and directory. Even restablished the Quorem to the same directory, no issues. Upon doing a network trace though, I am receiving KERBEROS pre-authentication errors when you start the Cluster service on the DR site e-mail server when it tries to contact the witness server:

(1.4 is the Witness server; 6.5 is the e-mail server in the DR site)

Source              Destination

192.168.1.4","192.168.6.5","KRB5","319","KRB Error: KRB5KDC_ERR_PREAUTH_REQUIRED"
192.168.6.5","192.168.1.4","TCP","54","26049 > kerberos [FIN, ACK] Seq=235 Ack=266 Win=65792 Len=0"
192.168.6.5","192.168.1.4","TCP","66","26050 > kerberos [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1"
192.168.1.4","192.168.6.5","TCP","60","kerberos > 26049 [ACK] Seq=266 Ack=236 Win=66048 Len=0"
192.168.1.4","192.168.6.5","TCP","60","kerberos > 26049 [RST, ACK] Seq=266 Ack=236 Win=0 Len=0"
192.168.1.4","192.168.6.5","TCP","66","kerberos > 26050 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1406 WS=256 SACK_PERM=1"
192.168.6.5","192.168.1.4","TCP","54","26050 > kerberos [ACK] Seq=1 Ack=1 Win=66048 Len=0"
192.168.6.5","192.168.1.4","KRB5","368","AS-REQ"
192.168.1.4","192.168.6.5","KRB5","282","KRB Error: KRB5KDC_ERR_PREAUTH_FAILED"
192.168.6.5","192.168.1.4","TCP","54","26050 > kerberos [FIN, ACK] Seq=315 Ack=229 Win=65792 Len=0"
192.168.1.4","192.168.6.5","TCP","60","kerberos > 26050 [ACK] Seq=229 Ack=316 Win=66048 Len=0"
192.168.1.4","192.168.6.5","TCP","60","kerberos > 26050 [RST, ACK] Seq=229 Ack=316 Win=0 Len=0"

Thoughts anyone?

SIMPLE QUESTION: HOW TO MIGRATE FROM WINDOWS 2008 R2 + SQL 2012 FAILOVER CLUSTER to WINDOWS SERVER 2012 CLUSTER WITH ALWAYS ON AVAILABILITY GROUP

$
0
0

Hello,

We have 2-node Windows 2008 R2 Enterprise Edition failover cluster with Fibre shared storage (SAN) running SQL Server 2012 SP1. Below is current configuration - very simple and classic, I would say everything by the book:

This is what I think we want to achieve:

Objectives:

1. Upgrade Windows Operating System from Windows Server 2008 R2 to Windows Server 2012

2. Migrate to SQL Server 2012 Always On Availability Group (AAG) for High Availability and Disaster Recovery

My question is how to achieve both goals?

If possible I would like to upgrade OS first. Ideally I would like to upgrade on the same hardware (because it should be minimal impact - no need to migrate data). If this is not possible, we have new hardware I can use also. But I guess it will be more impact and actual data migration will be required.

For AAG what I'm honestly missing is what would be the name of the second SQL server? Lets say my servers called DB1 and DB2, and SQL server called DB. If I create AAG, and fail-over to replica server, would SQL server name be DB as well?

I know there is lots of documentation on AAG and I went through it but I cannot find any specific information about names.

Another question I have - would 3rd server (DB3) be part of the same MSCS cluster? Or it will be separate server? How fail-over exactly works - do I use Fail-over cluster Manager to initiate failover?

Sorry for lots of questions, but any information would be appreciated very much.

Thanks!



Host Unreachable intermittently within a Windows Network Load Balancing Cluster

$
0
0

Hi,

We have 2 Windows 2008 R2 servers running multiple IIS web sites and load balanced across Windows Network Load Balancer in unicast mode. Although there are two interfaces in each server, only 1 interface in each server participates in load balancing and other interface is used for a different backup LAN. The problem I am going to mention was not seen within the NLB for almost 1 year.

I have noticed intermittent "host unreachable" detected from NLB in each host from time to time since 3 weeks ago. After servers are rebooted, both hosts can be reached and can be detected from NLB manager. However it becomes unreachable in both servers within minutes and then becomes reachable again after several minutes. This behavior is noticed in the load balancer and pings do not work between the two hosts when the issue occurs. I did a packet capture to see what was going on with ARP message when the issue occurs. ARP entry goes missing in each server when the problem occurs and no ARP replies are returned from each server. But ARP requests are dispatched from both servers when the issue occurs. ARP replies come back after sometime after which hosts become reachable again.

I tried to create a permanent static ARP entry (By copying the MAC address from ARP table when the two hosts are reachable) in each host but that hasn't solved the issue either. It seems like the individual MAC address generated by each host is a virtual one and it doesn't seem to respond when the problem occurs.

However load balancing and web sites are fully functional without any issues even while "host unreachability" issue is detected.

Appreciate if someone could help me to dig the real problem out.

Thank you.

Viewing all 4519 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>