Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

Redirect print server webpage in a 2008 R2 Failover Cluster.

$
0
0

I am configuring a 2 node 2008 R2 Failover Cluster with print services as a resource. On a client, i can install a printer by going to the website http://virtualprintserver.domain/printers/, selecting a printer and click on connect.

To prevent showing the default IIS page, i want to redirect http://virtualprintserver.domain to http://virtualprintserver.domain/printers/.

How can i achieve this? I have searched in IIS manager, but could not find anything.

Thank you for your help.



CAS Server not Migrating

$
0
0

Hope ur all doing well, we have 2 physical Servers HV1 and HV2, both servers are part of VMs cluster, each server hosts a CAS and an mbx HA VM i.e HV1 hosts CAS1, CAS2, and HV2 hosts MBX1 and MBX2. Few days ago CAS1 Server crashed, we created a new VM CAS3 from Fail-over Cluster Manager Console>> Services and applications, new vm created in the CSV storage with the same settings (Processors, RAM, NICs) as of crashed CAS1 server, and recovered it with setup /m:RecoverServer..all goes well Server recovered successfully and added back to the CAS WNLB successfully and working all fine, just one PROBLEM, CAS3 vm is unable to Live migrate, quick migrate or even unable to move on to the other host Server. When we live migrate it, it completes transfer 100% but comes back onto the original Host again with the following error in Cluster Events. 

Even ID: 1205

"The Cluster service failed to bring clustered service or application 'CAS3' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application."

Please suggest how can I dig into this error where can I start troubleshooting with what resources might be missing? In Failover Cluster Manager all the Resources are Green.


Virgo




CSV AND DPM Redirected Access event id 5125 sis driver error

$
0
0

We have 2x  DPM 2012 servers witch we need to use to run hyper-v cluster as well.

When we bring the cluster up every thing is fine all disk validations work 100%.

As soon as you make the disk an csv you get an error that the disk is being redirectd over the network.

And the error states that there is a problem with the SIS filter driver  (single instance storage that dpm installs to do its dedup)

Any idea how to work around this or if DPM on a cluster node is supported?

failover cluster - get disk path ?

$
0
0

Hi

i have created a failover cluster, with two disk cluster disk1 and cluster disk 2, is there any powershell command which can get me the path of the disk. like return value should be S:/ 

Thanks

Sid


sid

HeartBeat Metric

$
0
0

Hello All,

We have a 3 node 2008 R2 Hyper-V Cluster, our setup;

4 Network Adapters

1- ISCSI - NA

2- LAN - Metric - 10000

3- CSV - Metric - 1100

4- HEARTBEAT - Metric - 1200

I know the CSV uses the lowest metric but i can't find any information on the Heartbeat metric, will this use the 2nd lowest as stated above?

Please advise.

Thank you.

Event 1587

$
0
0

Hi,

I have observed in cluster with shared folder  as one of resource use to automatically fail over for no reason ,this I faced in different clusters even all things like File server service,storage and DNS are good

----->My question is will the front end network issue will do it or what is causing it to happen.

Windows NLB Issues in VMware environment

$
0
0

Hi there,

We are planning to use Windows NLB cluster for high availability solution, and found several blog post in Vmware stating the issues of Windows NLB and unicast network configurations.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006580

I am not sure whether the above issue is only with WS2003 OS or also exist in the WS2012/WS2012 R2 operating system as well.

Any help is appreciated.

Thanks,

Vineeth

Cluster Events Query Incomplete

$
0
0
I have a six-node 2012 R2 Failover Cluster running Hyper-V.  When I go to Cluster Events, it says on the title: "Cluster Events (0 events, query incomplete)". When I query one node at a time, only node 2 returns results, and every other node says "query incomplete". I can read all the event logs separately through remote MMC.  What does this mean, "Cluster Events (query incomplete)", and how do I get all the nodes to report in? 

Multiple identical Windows 2008 Clusters for file sharing, how to create a failover, since DNS CNAME is not an option?

$
0
0

Hello,

We have multiple SAN devices that expose block storage to multiple Windows 2008 Storage Edition clusters, which consume them via iSCSI. To simplify, here is what we have:

machine1+machine2 --> cluster --> storage1 (accessible via\\storage1.domain.name\shares)

machine3+machine4 --> cluster --> storage2 (accessible via\\storage2.domain.name\shares)

The 2 clusters have replication from active (storage1) to standby (storage2), so in theory they both include pretty much all the files.

We have hundreds of applications that are pointing to \\storage1\shares. In case of a disaster, however, it is an immensely laborious task to change all the end-points from\\storage1\shares to \\storage2\shares, and then back. 

I was trying to declare a CNAME (\\activestorage)  in our internal DNS, and point it to thecurrently active storage system. I tried both an alias and an IP pointing to the cluster. However, when I try to access\\activestorage\shares, I got an error that the network share does not exist. I have been reading about this,  and now I understand that CNAMEs are not possible in 2008 and only allowed in 2012 for this particular purpose.

I also understand that I may have to register a CAP (Client Access Point) to create an "alias" for my storage. However, the CAP approach does not seem to address (or does it?) my initial problem - I am not looking to create an alias just for aparticular storage cluster, but rather for the active storage cluster. This way, if disaster strikes, I can simply go and change an IP somewhere, and all the applications will starting using the standby storage system.

I realize that there is an extra administration of ensuring that the two clusters are identical from shares perspective, and that's something we are already doing. All I need is a clue into how to make this architecture work in a less painful way.

Thank you for your help,

Isaac



HyperV Failover Cluster - twice some vms lost network

$
0
0

So i run a 4 node Hyper V Failover Cluster and twice now.... out of months of operations out of the blue on a node a portion of the VMs just lose network access(this has happened on two different nodes). I can just pause the node and everything migrates off, and then its back up and going. Give it some time and i can unpause and move back. I am looking for ideas on what could be causing this.

There servers are Dell Power Edge 620 with the latest MS patches and Dell drivers and firmware. On my public side i have a 2 nic team using MS software teaming.

Network Name Resource Availability - failover cluster error 1196 on Hyper-V 2012 R2 nodes

$
0
0

Hello,

We're getting this error in our even logs of our four node failover cluster, we tried deleting Host A record in DNS management, that did nothing.

Failover cluster event: 1196

"Cluster network name resource 'CAUCrgt8' failed registration of one or more associated DNS name(s) for the following reason: This operation returned because the timeout period expired.
Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server."

And this resource http://technet.microsoft.com/en-us/library/cc773529%28v=WS.10%29.aspx did not help in solving this.

Do you guys have any other suggestions we could try to resolve this error?




NLB - Duplicate packets

$
0
0

Hi All,

I am running Server 2012 R2 and I have configured NLB between two of my webservers.

When I ping the NLB IP address from a Mac or Linux box I notice that I get duplicate packets. I send 10 packets and get 20 returns.  Is this normal?  I would have thought the NLB would send the ping to one address to answer not both.

10 packets transmitted, 10 packets received, +9 duplicates, 0.0% packet lossround-trip min/avg/max/stddev = 0.222/1.353/10.805/3.238 ms

Thanks.

Cloning a Microsoft Cluster

$
0
0

Hi Experts

We are planning our Disaster Recovery Plan,  we have some SAP applications which are difficult of install and configure,

so we are thinking that our best option is cloning the entire server to the secondary site, we have a tool named double take which can replicate from server to server.

next is the schemma:

we have a Cluster in Windows server 2008 R2  with two nodes,

it has installed SQL SERVER 2008 R2 on cluster and SAP.

we want to clon this cluster to the secondary site and start a replication with the mentioned tool,

we want to maintain the actual name of servers but we will change the IP of servers,

my principal questionis if we can clon the cluster and if exist a document or something where we could follow best practices..

thanks ind advance 

Need hotfix for BUG Check reboots for 2012R2

$
0
0

I have 5 Node cluster of 2012R2 which will go to production in next few days, but am struck with the Bug check issue, which reboots atleast 3 nodes frequently. Am aware of this bug check and know the hotfixes for 2012 and 2008. However am not able to find the hotfix  or steps to resolve this issue on 2012r2.

Need assistance to fix this problem.

Thanks in advance, below the debug report.

3: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

USER_MODE_HEALTH_MONITOR (9e)
One or more critical user mode components failed to satisfy a health check.
Hardware mechanisms such as watchdog timers can detect that basic kernel
services are not executing. However, resource starvation issues, including
memory leaks, lock contention, and scheduling priority misconfiguration,
may block critical user mode components without blocking DPCs or
draining the nonpaged pool.
Kernel components can extend watchdog timer functionality to user mode
by periodically monitoring critical applications. This bugcheck indicates
that a user mode health check failed in a manner such that graceful
shutdown is unlikely to succeed. It restores critical services by
rebooting and/or allowing application failover to other servers.
Arguments:
Arg1: ffffe8010a271900, Process that failed to satisfy a health check within the
 configured timeout
Arg2: 00000000000004b0, Health monitoring timeout (seconds)
Arg3: 0000000000000005
Arg4: 0000000000000000

Debugging Details:
------------------

***** Kernel symbols are WRONG. Please fix symbols to do analysis.

*************************************************************************
***                                                                   ***
***                                                                   ***
***    Either you specified an unqualified symbol, or your debugger   ***
***    doesn't have full symbol information.  Unqualified symbol      ***
***    resolution is turned off by default. Please either specify a   ***
***    fully qualified symbol module!symbolname, or enable resolution ***
***    of unqualified symbols by typing ".symopt- 100". Note that   ***
***    enabling unqualified symbol resolution with network symbol     ***
***    server shares in the symbol path may cause the debugger to     ***
***    appear to hang for long periods of time when an incorrect      ***
***    symbol name is typed or the network symbol server is down.     ***
***                                                                   ***
***    For some commands to work properly, your symbol path           ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: nt!_KPRCB                                     ***
***                                                                   ***
*************************************************************************
*************************************************************************
***                                                                   ***
***                                                                   ***
***    Either you specified an unqualified symbol, or your debugger   ***
***    doesn't have full symbol information.  Unqualified symbol      ***
***    resolution is turned off by default. Please either specify a   ***
***    fully qualified symbol module!symbolname, or enable resolution ***
***    of unqualified symbols by typing ".symopt- 100". Note that   ***
***    enabling unqualified symbol resolution with network symbol     ***
***    server shares in the symbol path may cause the debugger to     ***
***    appear to hang for long periods of time when an incorrect      ***
***    symbol name is typed or the network symbol server is down.     ***
***                                                                   ***
***    For some commands to work properly, your symbol path           ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: nt!_KPRCB                                     ***
***                                                                   ***
*************************************************************************
*************************************************************************
***                                                                   ***
***                                                                   ***
***    Either you specified an unqualified symbol, or your debugger   ***
***    doesn't have full symbol information.  Unqualified symbol      ***
***    resolution is turned off by default. Please either specify a   ***
***    fully qualified symbol module!symbolname, or enable resolution ***
***    of unqualified symbols by typing ".symopt- 100". Note that   ***
***    enabling unqualified symbol resolution with network symbol     ***
***    server shares in the symbol path may cause the debugger to     ***
***    appear to hang for long periods of time when an incorrect      ***
***    symbol name is typed or the network symbol server is down.     ***
***                                                                   ***
***    For some commands to work properly, your symbol path           ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: nt!_KPRCB                                     ***
***                                                                   ***
*************************************************************************

ADDITIONAL_DEBUG_TEXT: 
You can run '.symfix; .reload' to try to fix the symbol path and load symbols.

MODULE_NAME: netft

FAULTING_MODULE: fffff80021089000 nt

DEBUG_FLR_IMAGE_TIMESTAMP:  5215f788

PROCESS_OBJECT: ffffe8010a271900

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  0x9E

CURRENT_IRQL:  0

ANALYSIS_VERSION: 6.3.9600.17237 (debuggers(dbg).140716-0327) x86fre

LAST_CONTROL_TRANSFER:  from fffff800a291ac08 to fffff800211dcfa0

STACK_TEXT: 
ffffd001`d60c6938 fffff800`a291ac08 : 00000000`0000009e ffffe801`0a271900 00000000`000004b0 00000000`00000005 : nt!KeBugCheckEx
ffffd001`d60c6940 fffff800`a291a892 : 00000000`00000000 00000000`00000000 ffffd001`d601c180 ffffe001`1f716ec8 : netft+0x2c08
ffffd001`d60c6980 fffff800`210e2810 : ffffd001`d60c6b00 ffffe001`1f716ec8 ffffe001`24e7d220 ffffd001`d601c180 : netft+0x2892
ffffd001`d60c69b0 fffff800`211e0aea : ffffd001`d601c180 ffffd001`d601c180 ffffd001`d6028dc0 ffffe001`24e7d080 : nt!KeRemoveQueueEx+0x3b80
ffffd001`d60c6c60 00000000`00000000 : ffffd001`d60c7000 ffffd001`d60c1000 00000000`00000000 00000000`00000000 : nt!KeSynchronizeExecution+0x2efa


STACK_COMMAND:  kb

FOLLOWUP_IP:
netft+2c08
fffff800`a291ac08 cc              int     3

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  netft+2c08

FOLLOWUP_NAME:  MachineOwner

IMAGE_NAME:  netft.sys

BUCKET_ID:  WRONG_SYMBOLS

FAILURE_BUCKET_ID:  WRONG_SYMBOLS

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:wrong_symbols

FAILURE_ID_HASH:  {70b057e8-2462-896f-28e7-ac72d4d365f8}

Followup: MachineOwner
---------


ndraj

How to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster) with scrpiting

$
0
0

I have a scenario with the three nodes with server 2012 standard, each running an instance of SQL Server 2012 enterprise, participate in asingle Windows Server Failover Cluster (WSFC) that spans two data centers.

If the nodes in the primary data center are unavailable due to data center outage. Then how I can able to access node in the WSFC (Windows Server Failover Cluster) in the secondary disaster recovery data center automatically with some script.

  • I want to write script that can be able to check primary data center by pinging some IP after every 5 or 10 minutes.
  • If that IP is unable to respond then script can be able to Perform Forced Manual Failover of Availability Group (SQL Server) and WSFC (Windows Server Failover Cluster)

Can you please guide me for script writing for automatic failover in case of primary datacenter outage?


Reinstall OS on node Failover Cluster

$
0
0

Hello! 

I have Failover Cluster with two nodes on Windows Server 2012 R2 with File services role.

I need reinstall OS on one node. Are there any restrictions? Or simply reinstall the operating system, to give the same name and network settings and connect to the cluster?

Thanks


C уважением к Вам, Я

Cluster VMs sometime fail while doing an export-vm

$
0
0

I'm using a powershell script to export some Clustered (2012 R2 Hyper-V) VMs through task scheduler. 

Every now and then a VM is restarted by cluster during the export-vm. The errors found from the event viewer are located in the end of this message. I cannot see a proper cause for the failure, is there anyway to debug this problem more deeply?

I would also like to know if there is some switch I could set on the cluster-resource, while doing the export-vm, to prevent cluster from trying to restart the VM, even if it is not responding for a while during the export-vm.

The powershell script used:

$VMS = get-vm -Name VM1,VM2,VM3 -EA SilentlyContinue
    foreach ($VM in $VMS.vmname) {
   del \\fileserver\HyperVexport\$VM -force -recurse
        Export-VM -Name $VM -Path \\fileserver\HyperVexport
if ( $? -ne "True" )
{
$date = get-date -format s
"$date $VM Export failed" | out-file -FilePath c:\hyper-v\scripts\ExportVMs.log -Append
send-mailmessage -from "xxx@xx.xx" -to "xxx@xx.xx" -subject "Export of $VM in $env:COMPUTERNAME failed" -smtpServer mailserver
}
    } #close foreach

Event logs from the Hyper-V host where the VM is running at the time of the failure:

TimeLogEvent-IDDescription
22:56:07Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state Online to state ProcessingFailure.
22:56:07Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state ProcessingFailure to state WaitingToTerminate. Cluster resource 'Virtual Machine VM1' is waiting on the following resources: .
22:56:07Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state WaitingToTerminate to state Terminating.
22:56:07Windows Logs/System1069Cluster resource 'Virtual Machine VM1' of type 'Virtual Machine' in clustered role 'VM1' failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.
22:57:07Applications and Services Logs/Microsoft/Windows/Hyper-V-High-Availability/Admin21128Virtual Machine VM1' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
22:57:07Applications and Services Logs/Microsoft/Windows/Hyper-V-High-Availability/Admin21119Virtual Machine VM1' succesfully started the virtual machine during the resource termination. The virtual machine.
22:57:13Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state Terminating to state DelayRestartingResource.
22:57:13Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state DelayRestartingResource to state OnlineCallIssued.
22:57:13Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state OnlineCallIssued to state OnlinePending.
22:57:13Applications and Services Logs/Microsoft/Windows/Hyper-V-VMMS/Admin14070 Virtual machine 'VM1' (ID=9510686F-BE3C-4CAA-99A5-EB756ED8DED1) has quit unexpectedly.
22:57:13Applications and Services Logs/Microsoft/Windows/Hyper-V-VMMS/Admin15190VM1' failed to take a checkpoint. (Virtual machine ID 9510686F-BE3C-4CAA-99A5-EB756ED8DED1)
22:57:13Applications and Services Logs/Microsoft/Windows/Hyper-V-VMMS/Admin15140VM1' failed to turn off. (Virtual machine ID 9510686F-BE3C-4CAA-99A5-EB756ED8DED1)
22:57:13Applications and Services Logs/Microsoft/Windows/Hyper-V-VMMS/Admin18350Export failed for virtual machine 'VM1' (9510686F-BE3C-4CAA-99A5-EB756ED8DED1) with error 'The process terminated unexpectedly.' (0x8007042B).
22:57:17Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1637Cluster resource 'Virtual Machine VM1' in clustered role 'VM1' has transitioned from state OnlinePending to state Online.
22:57:17Applications and Services Logs/Microsoft/Windows/FailoverClustering/Operational1201The Cluster service successfully brought the clustered role 'VM1' online.


In the VM1 event-viewer I can only see the "The previous system shutdown at ... was unexpected", so it was forcefully shutdown as can be seen from the logs above.
 

Cluster IP address in Failed status

$
0
0

Windows 2012 2-node failover cluster for Hyper-V. 2 LANs in place, one for iSCSI communication. In failover cluster manager, when viewing the summary page cluster core resources, I have my cluster name listed, with two IP addresses under it, 192.168.2.x for LAN, and 10.168.2.x for iSCSI. The second IP is in a "Failed" status.

In the Networks section, I have two cluster networks, the LAN network is enabled for cluster use, and the iSCSI network is disabled for cluster use. Here, both networks show as UP. In the properties of network 2, "Do not allow cluster network communication" is selected. The iSCSI network contains two NICS, each with an IP on the iSCSI subnet. MPIO is handeled by the initiator, with Dell's HIT kit layered on top (EqualLogic SAN).

Only problem is, when selecting Move Cluster Resources -> Best Possible Node, I get error 1223, "Cluster IP address resource 'Cluster IP Address 10.168.2.58' cannot be brought online because the cluster network 'Cluster Network 2' is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network."

No movement of cluster resources occur.

Does this indicate a problem, or is this typical of my setup?


hyper-v cvluster live migration takes a long offline time

$
0
0

Hi

We have a Windows 2008 R2 cluster with hyper-V instances running on it.

It has two nodes and if I make a live migration from node 1 to node 2 it works fine, only takes the VM offline  for 1 second or so. But if I make a live migration from node 2 to node 1 it takes much longer, in some tests it took 20 seconds, most of the cases it took 10 secons, but everytime much more than doing it forom node 1 to node 2

What could be wrong?

Thank you very much.

best regards.

David.

SQL CLUSTER NAME REMAIN IN A FAILED STATE

$
0
0

We have the following configuration in our enterprise;

- Two physical servers ( both running windows server 2008 R2)

- The physical servers have been clustered (CLUSTER1).

- Six Virtual Machines ( two for each tier of our SharePoint 2010 installation)

- Two web front ends, two application servers, two database servers.

- The two database servers (running windows server 2008 R2) have been clustered (SQLCLUSTER2) and point to a SAN.

After an unexpected power failure, the SQLCLUSTER2 has refused to come online. The IP Address comes "Online" when turned on manually. The disks also come online when turned on manually using "Bring this resource online". But once the Cluster name is attempted to be brought online all other resources turn into "failed" state and the entire SharePoint portal becomes unavailable. See image below

SQL CLUSTER

Note: We have recently had a problem with one of our Domain Controller and had to be decommissioned one of the two(2) DC's. My investigation reveals that the decommissioned DC was used to create the Cluster Name Object (CNO) SQLCLUSTER2.

I have been battling to restore the cluster to normalcy for over a week now. But periodically (after some hours) all the resources turn to "failed" state and down goes our SharePoint portal.

Please any assistance anyone can offer to help resolve this issue would be highly welcomed.

Viewing all 4519 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>