Cluster Shared Volume error after server not shutting down properly

May 14, 2015, 2:34 am

≫ Next: can not fix corrupt system files

≪ Previous: no disks suitable for cluster disks were found

Hi,
We have two IBM X240 servers ( we call it server A and server B) connecting to IBM disk system:V3700 via fibre HBA.

The both servers are installing windows 2012 R2.

We have implemented VM cluster and everything is working well.

Last week this two server is down due to power shortage in my server room.

After turning on the server A, it will come out the below error:

Windows failed to start, a recent hardware or software change might be cause.
File: \windows\system32\drivers\msdsm.sys
status: 0xc0000017
Info:the operation system could't be loaded because a critical system drive is missing or contain errors.

After using the Last Good Configuration, we can log in to the system and turn on the clustered virtual machine.

it seems everything is fine now.

So i go and start the server B and log in to the system using the same method with server A.

I found all the VM will be shut down or running error due to Cluster Shared Volume error.

Refer to below some errors captured from system system logs.

* Event 5142, Cluster Shared Volume 'Volume7' ('Cluster Disk 10') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

* Event 5120,Cluster Shared Volume 'Volume3' ('Cluster Disk 4') has entered a paused state because of '(c00000be)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Now we only can turn on only one server and shut down another server, if i turn on both server, the error will come out again & the server will go down.

Any suggestion or need me provide more information.

Thanks.

↧

can not fix corrupt system files

June 11, 2015, 7:26 am

≫ Next: Internal netbios traffic on WSFC 2012

≪ Previous: Cluster Shared Volume error after server not shutting down properly

I am doing the cluter node health examination, but can can not fix the corrupt file.

PS C:\Users\Administrator.000> Dism /Online /Cleanup-Image /RestoreHealth

Deployment Image Servicing and Management tool
Version: 6.3.9600.17031

Image Version: 6.3.9600.17031

[==========================100.0%==========================]

Error: 0x800f0906

The source files could not be downloaded.
Use the "source" option to specify the location of the files that are required to restore the feature. For more informat
ion on specifying a source location, see http://go.microsoft.com/fwlink/?LinkId=243077.

The DISM log file can be found at C:\Windows\Logs\DISM\dism.log

PS C:\Users\Administrator.000> sfc /scannow

Beginning system scan. This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection found corrupt files but was unable to fix some
of them. Details are included in the CBS.Log windir\Logs\CBS\CBS.log. For
example C:\Windows\Logs\CBS\CBS.log. Note that logging is currently not
supported in offline servicing scenarios.
PS C:\Users\Administrator.000>

Please help

↧

Internal netbios traffic on WSFC 2012

June 4, 2015, 6:57 am

≫ Next: DTCProxy is not running: java.net.ConnectException: Connection timed out

≪ Previous: can not fix corrupt system files

All hello.

Current configuration:
SQL Server of group of availability (AG) 2012 on Windows Server 2012 consisting of two nodes is developed. On each node two network interfaces, one for public access, the second for interconnect (heartbeat) are used.

First node:
Eth1 10.16.0.41
Eth2 192.168.10.1

Second node:
Eth1 10.16.0.42
Eth2 192.168.10.2

The second interface with IP 192.168.10.1 and 192.168.10.2 is private connection, allocated for internal cluster communication.

The administrator of a network noticed strange circulation of a traffic, and suggested it to block:
there is a traffic with IP 10.16.0.41 under a cluster user to the internal address 192.168.10.1 with UDP of port 137 on port 137 according to the netbios-ns appendix, in the same way addresses 10.16.0.42 on 192.168.10.2

Question:

why he addresses to himself?

↧

DTCProxy is not running: java.net.ConnectException: Connection timed out

June 11, 2015, 3:51 am

≫ Next: print cluster - want to forward the spool file?

≪ Previous: Internal netbios traffic on WSFC 2012

Hi All,

While starting the jboss server we are facing below issue on MSDTC. The DB used is SQL Server 2008 r2. This is a clustered DB environment and MSDTC is working fine on non-clustered environments.

2015-06-09 23:48:18,444 ERROR [STDERR] (main) javax.transaction.xa.XAException: DTCProxy is not running: java.net.ConnectException: Connection timed out
2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.b.a(Unknown Source)
2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.b.start(Unknown Source)
2015-06-09 23:48:18,445 ERROR [STDERR] (main) at com.inet.tds.e.start(Unknown Source)
2015-06-09 23:48:18,445 ERROR [STDERR] (main) at org.jboss.resource.adapter.jdbc.xa.XAManagedConnection.start(XAManagedConnection.java:213)
Please help.

↧

print cluster - want to forward the spool file?

June 17, 2015, 12:53 am

≫ Next: VSS and Failover Clustering

≪ Previous: DTCProxy is not running: java.net.ConnectException: Connection timed out

can anyone give me an idea, really not very experienced, we have a two node print cluster running on windows server 2008, we want to forward on the spool file to another printer, the spool file forwarder program we have runs as an application, where would we need it installed and what files (and where) would we need to monitor?

thanks :)

↧

VSS and Failover Clustering

June 17, 2015, 12:05 pm

≫ Next: Sharepoint Website responding very slow, using windows server 2012 Network Load Balancing

≪ Previous: print cluster - want to forward the spool file?

Hi,

I have a 4-node physical cluster. External storage is fiber attached to an HP disk array.

The cluster runs about 25 virtual machines. All the VM's are based on CSV's.

I'm trying to understand where VSS puts its snapshots, when it is dealing with a failover cluster with VM's.

Does each VM use part of its virtual disks, to store the VSS snapshot, or does the cluster snapshot each VM and add the VSS snapshot storage to the CSV's?

thanks for any help!

Mark

↧

Sharepoint Website responding very slow, using windows server 2012 Network Load Balancing

June 14, 2015, 10:15 pm

≫ Next: Need step by step instruction to install sql server 2012 failover cluster

≪ Previous: VSS and Failover Clustering

Hello Team,

Greetings for the day!

I have 2 Windows server 2012, with Network Load Balancing role enabled on it, on both the server sharepoint 2010 R2 (Sharepoint Farm) is installed. I have enabled Network load balancer, with total of 5 IPS assigned between those 2 server.

The sharepoint site is working very slow some time (1min 30 sec), and sometime it respond very quickly(10 sec).

I have verified both the server performance which is more than good.

Me not sure what can I troubleshoot it further, I am also not sure how to check which server the request is going.

Help me coming out of this situation and optimize the performance.

Paresh Jain

↧

Need step by step instruction to install sql server 2012 failover cluster

June 18, 2015, 6:08 pm

≫ Next: How to retreive the groupcomponent information from MSCluster_ClusterSharedVolumeToPartition

≪ Previous: Sharepoint Website responding very slow, using windows server 2012 Network Load Balancing

We need to install sql server 2012 failover cluster within next 12 hrs. If some one send us step step instruction for it would be nice help.

I am new to sql cluster installation. We need help to configure MSDTC with cluster.

We are running windows 2012 server.

↧

How to retreive the groupcomponent information from MSCluster_ClusterSharedVolumeToPartition

June 19, 2015, 4:28 am

≫ Next: Cluster Aware Updating, WSUS, and SCCM

≪ Previous: Need step by step instruction to install sql server 2012 failover cluster

I'm trying to find the CSV information by using the below WQL query. I pass in disk partition information to fetch the CSV information.

$csv = Get-WmiObject -Namespace "root/MSCluster" -Query "SELECT GroupComponent FROM MSCluster_ClusterSharedVolumeToPartition where PartComponent='MSCluster_DiskPartition.Name=`"C:\ClusterStorage\Volume1`"'"

Write-Host $csv.Name

Write-Host $csv.VolumeName

But doesn't seem to be working :(

↧

Cluster Aware Updating, WSUS, and SCCM

June 19, 2015, 8:46 am

≫ Next: File cluster migration from Windows 2003 to 2008

≪ Previous: How to retreive the groupcomponent information from MSCluster_ClusterSharedVolumeToPartition

Howdy,

Wasn't sure where to post this so hopefully it works OK here.

We use SCCM to handle pushing patches to all of our workstations and servers. However, we don't have it push to Clustered machines since I don't think it supports Cluster Updating. All of our machines are set to use the SCCM server as their WSUS server but we don't actually approve anything there since SCCM takes care of that for us.

So, what is the best way to go around patching up our clustered machines? Do I need to have a 2nd WSUS server or can I use the SCCM server?

Here's what I'm thinking so please let me know what I'm missing.

Option 1: Approve updates on the SCCM Server's WSUS program but set all workstations and servers to never check for updates so they don't get pushed out at all. This should not affect SCCM so everything should patch how it works now. However, I could then run CAU and I'm guessing that it would see the approved updates and compare those to the servers and know what they need and would patch them up properly?

Option 2: Have a second WSUS server and only point the Clustered Servers at it. Then I could run CAU against this server and it should work normally. Only problem would be figuring out how I could make sure the same updates were being installed from SCCM and WSUS so all the servers were in sync.

Option 3: I've seen some pretty complicated scripting methods to accomplish Cluster Updating via SCCM using Orchestrator or other things. Those I know nothing about to know if they are easy to setup and make work or not but since it bypasses WSUS I guess it'd be a way of keeping everything in sync.

Are there any other options or would any of these just not work? I'm guessing #1 would be a No since I generally see people saying to not touch WSUS on the SCCM server or bad things can happen.

Thanks!

↧

File cluster migration from Windows 2003 to 2008

June 19, 2015, 8:02 am

≫ Next: troubleshoot cluster service does not start

≪ Previous: Cluster Aware Updating, WSUS, and SCCM

Hi,

we need to migrate file cluster on 2003 server to windows server 2008.

how i can do that?

↧

troubleshoot cluster service does not start

June 21, 2015, 7:36 am

≫ Next: cluster error 1090 and 7024

≪ Previous: File cluster migration from Windows 2003 to 2008

Hi,

We have two virtual node windows 2012 failover cluster. But we are getting error like cluster service does not start.

Can I get some basic/ advanced troubleshooting steps or advise ?

Thanks

↧

cluster error 1090 and 7024

June 19, 2015, 9:32 am

≫ Next: Waindows server 2012 r2 failover cluster access denied

≪ Previous: troubleshoot cluster service does not start

Windows Server 2012 R2 Standard

i am recovering an Exchange 2013 Mailbox server and have reformatted this Windows Server 2012 R2. when i restart the cluster service is disabled. if i enable it and start it, it gives those errors in event viewer as 1090:

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          6/19/2015 7:16:17 PM
Event ID:      1090
Task Category: Startup/Shutdown
Level:         Critical
Keywords:
User:          SYSTEM
Computer:      ruh1mb02.ALJOMAIHBEV.com
Description:
The Cluster service cannot be started. An attempt to read configuration data from the Windows registry failed with error '2'. Please use the Failover Cluster Management snap-in to ensure that this machine is a member of a cluster. If you intend to add this machine to an existing cluster use the Add Node Wizard. Alternatively, if this machine has been configured as a member of a cluster, it will be necessary to restore the missing configuration data that is necessary for the Cluster Service to identify that it is a member of a cluster. Perform a System State Restore of this machine in order to restore the configuration data.

and 7024:

Log Name:      System
Source:        Service Control Manager
Date:          6/19/2015 7:16:17 PM
Event ID:      7024
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      ruh1mb02.ALJOMAIHBEV.com
Description:
The Cluster Service service terminated with the following service-specific error:
The system cannot find the file specified.

i would have tried the system state restore but there is no system state backup for this particular server.

how does one recover a cluster member then?

↧

Waindows server 2012 r2 failover cluster access denied

June 9, 2015, 1:48 am

≫ Next: can not fix corrupt system file

≪ Previous: cluster error 1090 and 7024

Dear Experts,

I cant access windows failover cluster 2012 r2. the error shows below

"You do not have administrative privilages on the cluster. contact your network administrator to request access."

Error code: 0x80070005

Access denied.

PS C:\> Get-ClusterAccess
Get-ClusterAccess : You do not have administrative privileges on the cluster. Contact your network administrator to
request access.
    Access is denied
At line:1 char:1
+ Get-ClusterAccess
+ ~~~~~~~~~~~~~~~~~
    + CategoryInfo          : AuthenticationError: (:) [Get-ClusterAccess], ClusterCmdletException
    + FullyQualifiedErrorId : ClusterAccessDenied,Microsoft.FailoverClusters.PowerShell.GetClusterAccessCommand

The current user is enterprise administrator,

↧

can not fix corrupt system file

June 22, 2015, 6:15 am

≫ Next: Migration did not succeed. Not enough disk space at '\'. Windows Server 2012 R2

≪ Previous: Waindows server 2012 r2 failover cluster access denied

even I run the dism online restore, there have still have a corrupted file can not fixed, please help

2015-06-22 21:38:39, Info                  CBS    This session already attempted mapping cache rebuild, skip.
2015-06-22 21:38:39, Info                  CBS    Failed to find package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5 from the index with mapping index packages recently rebuilt, [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to get WU category/updateID for package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5 [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to get the mapping of package: Package_2_for_KB3022345~31bf3856ad364e35~amd64~~6.3.1.5, continue. [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to find [HRESULT = 0x800f090e - CBS_E_EMPTY_PACKAGE_MAPPING_INDEX]
2015-06-22 21:38:39, Info                  CBS    Failed to collect payload and there is nothing to repair. [HRESULT = 0x800f0906 - CBS_E_DOWNLOAD_FAILURE]
2015-06-22 21:38:39, Info                  CBS    Failed to repair store. [HRESULT = 0x800f0906 - CBS_E_DOWNLOAD_FAILURE]
2015-06-22 21:38:39, Info                  CBS    Ensure CBS corruption flag is clear
2015-06-22 21:38:39, Info                  CBS
=================================

Checking System Update Readiness.

(p) CSI Payload Corrupt amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\utc.app.json
Repair failed: Missing replacement payload.
(p) CSI Payload Corrupt amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\telemetry.ASM-WindowsDefault.json
Repair failed: Missing replacement payload.

2015-06-22 20:57:06, Info                  CSI    000008fc [SR] Could not reproject corrupted file [ml:520{260},l:114{57}]"\??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings"\[l:24{12}]"utc.app.json"; source file in store is also corrupted
2015-06-22 20:57:06, Info                  CSI    000008fd Hashes for file member \??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings\telemetry.ASM-WindowsDefault.json do not match actual file [l:66{33}]"telemetry.ASM-WindowsDefault.json" :
Found: {l:32 b:ErEvcGxrC5RD30CwVgig/0sasSdfpRLjd18ZiXseYV4=} Expected: {l:32 b:EeQJzlVPvq9GNIcA2FEwrOjEeuDam1G+ol3x61gKasQ=}
2015-06-22 20:57:06, Info                  CSI    000008fe Hashes for file member \SystemRoot\WinSxS\amd64_microsoft-windows-u..ed-telemetry-client_31bf3856ad364e35_6.3.9600.17747_none_90df8130dac08ee0\telemetry.ASM-WindowsDefault.json do not match actual file [l:66{33}]"telemetry.ASM-WindowsDefault.json" :
Found: {l:32 b:ErEvcGxrC5RD30CwVgig/0sasSdfpRLjd18ZiXseYV4=} Expected: {l:32 b:EeQJzlVPvq9GNIcA2FEwrOjEeuDam1G+ol3x61gKasQ=}
2015-06-22 20:57:06, Info                  CSI    000008ff [SR] Could not reproject corrupted file [ml:520{260},l:114{57}]"\??\C:\ProgramData\Microsoft\Diagnosis\DownloadedSettings"\[l:66{33}]"telemetry.ASM-WindowsDefault.json"; source file in store is also corrupted

↧

Migration did not succeed. Not enough disk space at '\'. Windows Server 2012 R2

June 22, 2015, 7:10 am

≫ Next: Failover Cluster Manager shows node down/wrong owner

≪ Previous: can not fix corrupt system file

Hi.
I get the error referenced here:
https://support.microsoft.com/en-us/kb/2913461but the os is Windows Server 2012 R2 (not Windows Server 2012).

The machine is one of the two Windows Server 2012 R2 hyper-v failover clustering nodes using CSV on iscsi storage.
The vm machine was created locally in hyper-v and is using 2 vhdx (one for os the other for data) on local disk.
To make vm highly available i configure role in Failover Cluster Manager and it is ok then i move storage on CSV (os vdisk on one CSV data disk on the other) with move > Virtual Machine Storage and i get two errors (in event viewer):

Hyper-V-VMMS Event id 20820

Storage migration for virtual machine 'machinename' (21330etc...) failed with error 'There is not enough space on the disk.' (0x80070070).

Hyper-V-VMMS Event id 20750

Migration did not succeed. Not enough disk space at '\'.

I've already done the same thing for other vm's without any problem and there is a lot of space in CSVs.

So, is the fix valid for windows server 2012 R2 also?

Thank you

↧

Failover Cluster Manager shows node down/wrong owner

June 22, 2015, 11:14 am

≫ Next: Windows 2012 R2 Hyper-V cluster survive a switch reboot

≪ Previous: Migration did not succeed. Not enough disk space at '\'. Windows Server 2012 R2

I'm getting up to speed on clustering and ran into something I can't figure out. Any help would be appreciated!

The Hyper-V hosts are running Server 2008 R2. The VM's are running Server 2012 R2.

Failover Cluster Manager is showing VM "FS1" online on host "HV1". However it says VM "FS2" is offline and lists the wrong owner (shows HV2 but actual host = HV3).

If I login to either VM, Failover Cluster Manager shows both nodes online, and everything seems to be working fine. My suspicion is that FCM on the Hyper-V hosts are using the config from a server of the same name that used to reside on HV1. What's the best way to correct this?

↧

Windows 2012 R2 Hyper-V cluster survive a switch reboot

June 22, 2015, 12:23 pm

≫ Next: Server 2012 R2 Witness Disk won't failover

≪ Previous: Failover Cluster Manager shows node down/wrong owner

Hello,

I currently manage a 5 node Windows 2012 R2 Hyper-V cluster using shared iSCSI SAN storage and the virtual machines run on CSVs.

I need to reboot our switch stack which could bring down the networking for up to two minutes. Please note: The iSCSI switches are isolated from these switches so connectivity to iSCSI storage will not be affected during this process.

The NICs are teamed on the Hyper-V hosts but since both switches in the stack need to go down at the same time, this seems irrelevant.

The goal is to not have to shut down 70+ servers along with the entire cluster during this network outage and having the cluster nodes maintain quorum and not go bezerk.

Does anyone have any ideas or suggestions on how this can be done?

Is changing the heartbeat interval settings (Samesubnetdelay / threshold) a viable option in this case?

I have also seen some suggestions on forcing a Disk only Quorum so as long as all nodes can see the Quorum disk, they will stay online. I already have a Quorum disk configured.

I appreciate your help with this.

Regards,

Chris

↧

Server 2012 R2 Witness Disk won't failover

May 25, 2015, 2:12 am

≫ Next: Windows 2012 Failover Clusters "An error occurred connecting to the Cluster"

≪ Previous: Windows 2012 R2 Hyper-V cluster survive a switch reboot

Hi,

I have a 2 node cluster with a shared witness disk for Quorum. when I lose connection to the disk from 1 node, the ownership fails over to the other node, this is what I expect. however if I test again shortly afterwards, it doesn't failover, it just goes into the offline state, I have to manually move it then it fails over.

it doesn't matter how many times I test this, it simply doesn't failover after that first attempt. however, several hours later (after I slept and tested it the next morning), it again fails over correctly, but subsequent tests bring it offline. I have looked through every tab in the properties for the witness, cluster name and IP address and can not find anything that could relate to this timeout of several hours. all values are at their defaults, however I did increase the number of failures within the specified time from 1 to 10... but that made no difference.

I get the following two events

ID 1038

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it

ID 1069

Cluster resource 'Cluster Disk 1' of type 'Physical Disk' in clustered role 'Cluster Group' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

How can I make the witness disk repeatedly failover more than once in several hours? i'm wondering if it's a powershell only configuration but I have no idea what command that would be.

thanks

Steve

↧

Windows 2012 Failover Clusters "An error occurred connecting to the Cluster"

June 8, 2015, 3:46 am

≫ Next: Windows NLB - Multicast

≪ Previous: Server 2012 R2 Witness Disk won't failover

Good Morning

I have 4 Failover clusters. 1 SQL Cluster, 1 HyperV Cluster, 1 IIS Cluster (Not NLB) and 1 File Cluster. All running windows 2012. The File cluster is fully upto date and has all the latest Firmware and Drivers installed and a couple off Windows Hotfixes for Windows Clustering. The Hyper-V and SQL Cluster is scheduled for updates in the couple off weeks but both have been updated within the last couple weeks. The problem I have on all four cluster is after a period off time, we are no-longer able to connect to the cluster from any node in the cluster. What I mean is on the SQL Cluster after a period of time (This might be 3-4 weeks), if I connect to any of the nodes in the SQL Cluster and open fail-over cluster manager I am unable to connect to the cluster. Fail-Over cluster manager first starts with "Connecting to Cluster" "The Operations is taking longer than expected", then after about 2-3 minutes an error comes up saying "The Operations has Failed", "An error occurred connecting to the Cluster '<Cluster-Name'>", If I then click on "See details" I get "An error occurred Trying to Display the cluster information", "One or more errors occurred","Provider load failure".

This same problem happens on all of our windows 2012 Clusters, we do have a 2012 R2 Hyper-V cluster but this is managed by a different team and I don't know if they have the same issue or not.

It seems to be one of the host that is causing the problem, normally the host that has Quorum, and if we reboot that host, which causes the cluster to fail over all the roles to a different node the cluster is then accessible again. Once the node that was rebooted comes back online everything is fine again for a period of time until this problem happens again.

If anyone has any suggest I would be very grate full.

Richard

↧