Failed to failover disk resource after adding it, Registry Cleanup.

February 21, 2014, 9:13 am

≫ Next: Cluster Is Failing if a the second node is down

≪ Previous: moving active node resources to the inactive node

Hello Guys,

May I remove the keys under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters\AvailableDisks\ for disk signatures that already are on HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters\Signatures?

Using 2003R2.

Aren't the disks on "AvailableDisks" the ones that are not supposed to be part of the cluster, local disks?

The above text is my doubt right now as summarized as it gets, the following text is just for reference to future searches by other users and it is how i got here:

Thanks,

David

1 - Cluster Disk fails to failover. Happened after removing it from the cluster and readd it on another cluster group using other path , mount point.

2 - Checked the Cluster.log and found:

00001190.00001978::2014/02/21-15:09:01.137 INFO Physical Disk <Archive Storage 10>: [DiskArb] DisksOpenResourceFileHandle: Attaching to disk with signature da5203aa

00001190.00001978::2014/02/21-15:09:01.137 INFO Physical Disk <Archive Storage 10>: [DiskArb] DisksOpenResourceFileHandle: Disk unique id present trying new attach

00001190.00001978::2014/02/21-15:09:01.137 INFO Physical Disk <Archive Storage 10>: [DiskArb] DisksOpenResourceFileHandle: Retrieving disk number from ClusDisk registry key

00001190.00001978::2014/02/21-15:09:01.137 ERR Physical Disk <Archive Storage 10>: Online: Unable to read disk name.

00001190.00001978::2014/02/21-15:09:01.137 ERR Physical Disk <Archive Storage 10>: Online, DisksOpenResourceFileHandle failed. Error: 87

00001190.00001978::2014/02/21-15:09:01.137 INFO Physical Disk <Archive Storage 10>: Online, setting ResourceState 4 .

3 - Noticed that Cluster nodes registry had some differences.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters\Signatures\

This key should have all the disk signatues inlcuded in the cluster.

4- Manually created the the signature key and devicename string value as explained on http://support.microsoft.com/kb/932465/en-us.

5- Disk was possible to be failed over.

6- Disk signature is also created on HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters\AvailableDisks\

↧

Cluster Is Failing if a the second node is down

February 20, 2014, 8:37 pm

≫ Next: Hyper-V Cluster CPU Compatibility

≪ Previous: Failed to failover disk resource after adding it, Registry Cleanup.

With a scenario of a cluster of 2 nodes and a storage, i had to evict node 1 and replace the server physically, after that fail over went nice with only 1 big problem: if the the node 2 is off the cluster will fail due of loosing the quorum disk, even if quorum and all resources are hosted on node 1, but if node 1 is down the node 2 will keep hosting the cluster and services.

Is there anything that should be configured for the the new node (node 1) concerning the quorum? or release something from node 2 in order to let the first node to keep hosting it?

And by the way if I turn the node 1 alone, I only can see the Shared disks in Disk management as reserved but they are not coming online.

Thank you.

↧

Hyper-V Cluster CPU Compatibility

February 21, 2014, 7:59 am

≫ Next: node failed over then later failed back

≪ Previous: Cluster Is Failing if a the second node is down

I have (2) 2012 hosts in a HA HyperV cluster that both use the E5-2660 processor. Can I add a new hardware host that has the E5-2660v2 processor without having to tick the 'Migrate to a physical computer with a different processor version" The instruction sets from the Intel website show they are the same.

Just wondered if anyone has any experience with this - because I have none.

↧

node failed over then later failed back

February 21, 2014, 11:45 am

≫ Next: Uncluster SQL

≪ Previous: Hyper-V Cluster CPU Compatibility

We have a 2node,disk majority, failover on windows 2008R2 failover cluster.

Recently, we had a DataCenter shutdown and our Active node was shutdown and did, in fact, failover to the inactive node. After the active node shutdown, 5 minutes later the other node (now the active node) also shutdown. About 3 hrs later,the nodes were powered up. I checked the new active node and all services and resources still existed in the new active node. Then, the other node was powered up.

Later, we checked both nodes and we discovered the Active node no longer had resources and services. They had returned to the original node making it again active. What could cause this?

Also, is there a log that captures or provides information for when a failover cluster fails over. It may tell me what happened

↧

Uncluster SQL

February 22, 2014, 12:43 am

≫ Next: Sql Clustering

≪ Previous: node failed over then later failed back

Hi,

We have recently decided to uncluster our 2-node active/passive SQL cluster with the intention of then virtualising the remaining single node.

After some research it seems I have 2 options:

Convert two node SQL cluster to standalone instance of SQL
Retire one node of cluster or convert two node SQL cluster to single node SQL cluster

What exactly is the difference here?

Surely option 2 is the easier, so what benefit is there of actually performing option 1 instead.

It I simply evict the passive node, I can still operate as a 1-node cluster...in this case, can I then simply virtualise the disks on the active node (including the quorum, SAN attached disks).

↧

Sql Clustering

February 22, 2014, 2:44 am

≫ Next: cluster validation failed - unable to connect via WMI

≪ Previous: Uncluster SQL

HI,

I m having node1 and node2. configured windows clustering and also configured MSDTC.

SQL Server and SQL Agent is not showing in the other resources.

I ran cluster.exe restype "SQL Server Agent" /create /DLL:SQAGTRES.DLL

Created successful

cluster.exe restype "SQL Server" /create /DLL:SQSRVRES.DLL

Created successful

also not showing listing other resources and more resources in failover clustering management.

Rgds,

Udaiyar

↧

cluster validation failed - unable to connect via WMI

February 22, 2014, 2:47 pm

≫ Next: request time out when pinging sql cluster ip

≪ Previous: Sql Clustering

when i try to do a cluster validation on my 2 node cluster, one of the remote nodes shows this

Failed to validate node xxx. Unable to connect to xxx via WMI. This may be a networking issue or firewall configuration on xxx. Access is denied. (Exception from HRRESULT: 0x8..

↧

request time out when pinging sql cluster ip

February 22, 2014, 10:17 am

≫ Next: moving sql to new service application

≪ Previous: cluster validation failed - unable to connect via WMI

i am getting a lot of request time outs when pinging the cluster ip address on my 2 node cluster. But when i do continuous ping on the server ip address itself, it seems to be ok. The timeout seems to be happening only when pining the cluster ip. any idea?

I created a new cluster ip on the same cluster and pining it seems to be fine.

The problem with the other ip is a sql cluster ip

any idea?

↧

moving sql to new service application

February 22, 2014, 4:57 pm

≫ Next: 'Beowulf Cluster' under a Windows environment ?

≪ Previous: request time out when pinging sql cluster ip

i currently have a 2 sql servers in a cluster. i created a new service/app and now I want to move the whole sql server app to this new app. how can this be done?

↧

'Beowulf Cluster' under a Windows environment ?

February 23, 2014, 10:26 pm

≫ Next: Choosing the correct clustering method

≪ Previous: moving sql to new service application

Hello,

Anyone have thoughts on our experiment to build a simple 'Beowulf Cluster' under a Windows environment ? Presently we have four older Dell single core machines, nearly identical, that would fit well into this test. I have been reviewing the book from MIT entitled "Beowulf Cluster Computing with Linux" published in 2003 but that uses Ubuntu as the basis for the cluster.

I have also been looking at articles from the Ubuntu wiki entitled: "Ubuntu Kerrighed Cluster Guide" at https://wiki.ubuntu.com/EasyUbuntuClustering/UbuntuKerrighedClusterGuide with a step by step guide on how to set up a Kerrighed Cluster.

In 2006, Microsoft introduced Windows Compute Cluster Server however this appears to be for high-performance computing clusters and I think was meant to replace Linux. It's probably not applicable to this project. Windows CCS is also fairly expensive for just an experiment.

I am learning my way through this experimental project and therefore am starting with no knowledge on the subject.

Any help or direction would be greatly appreciated.

Thanks,

Dan

↧

Choosing the correct clustering method

February 6, 2014, 3:32 pm

≫ Next: 8.3 Name Generation in Cluster

≪ Previous: 'Beowulf Cluster' under a Windows environment ?

Hi, I currently have a small 3 server environment (win 2012), 1 PDC, and 2 Servers running hyper-v. I have 8 or so hyper v machines on one of the servers and using hyper-v replica to replicate all of them to the other server. I woud like to upgrade this to something that will provide hot migration/failover for the VM's. I was considering having a storage cluster with 2 machines for storage and a VM cluster with 2 machines for migration. My confusion is what to cluster? Do I cluster the physical machines, the VM's or both? Note SQL server is running on one of the VM's.

Lee

↧

8.3 Name Generation in Cluster

February 24, 2014, 5:31 am

≫ Next: Hyper-V 2008 R2 cluster down

≪ Previous: Choosing the correct clustering method

I have a Scale Out File Server based on Server 2012 which is used for storage for our Hyper-V VHDX's. We have two nodes in the Cluster and the underlying storage is a Dell PowerVault MD3220 (SAS).

We have had some VMs randomly shut off and not be recovered by Failover Cluster Manager in our 2012 Hyper-V Cluster. SC Ops Man 2012 indicates that the problem is:

"The SMB client failed to resume a handle on a file share with continuous
availability"

This is because:

"This error can occur if the Resume Key Filter did not acknowledge the handle, which indicates Resume Key Filter is not attached to the necessary cluster disk"

My research tells me that this is caused by 8.3 Naming Convention being enabled on the SMB Server's C: partition which is where the CSVs are stored. SMB Transparent Failover does not support 8.3 Naming Convention.

I understand that 8.3 Naming Convention needs to be disabled and the system scanned for any short file names. I am happy to do this but was wondering if there are any implications to doing it. E.G. could it stop the File Server from working or could it rename VHD's and stop the Hyper-V machines working?

Many thanks

↧

Hyper-V 2008 R2 cluster down

February 24, 2014, 8:14 am

≫ Next: Cannot clear "Current read-only" on pass through disk

≪ Previous: 8.3 Name Generation in Cluster

Hello,

We have a 3 node Hyper-V 2008 R2 failover cluster. Yesterday we had problems and all VM's on node 3 went down. At 17:00 the back-up starts of the VM's which are on CSV(with DPM 2010 and hardware providers for Dell) but then one nodes crashed. It look like node 3 tries to take CSV04 ownership but node 1 doesn't accept that. See the cluster log below. Can someone tell me what triggered this? And how to fix this? Many thanks!

It start with a message that some Volume manager disk group and Geocluster disks are not found. This says nothing to me.

00000af0.00000c24::2014/02/23-17:00:27.738 WARN Resource type Volume Manager Disk Group not found.
00000af0.00000c24::2014/02/23-17:00:27.738 WARN Resource type GeoCluster Replicated Disk not found.
00000af0.000017fc::2014/02/23-17:00:35.117 INFO [RCM] rcm::RcmApi::MoveGroup: (6d07505c-cd56-4354-bb78-f0d452eb7350, 1)
00000af0.000017fc::2014/02/23-17:00:35.117 INFO [RCM] rcm::RcmGroup::Move: (6d07505c-cd56-4354-bb78-f0d452eb7350, 1)
00000af0.000017fc::2014/02/23-17:00:35.117 INFO [RCM] rcm::RcmGroup::Move: Bringing group '6d07505c-cd56-4354-bb78-f0d452eb7350' offline first...
00000af0.000017fc::2014/02/23-17:00:35.117 INFO [RCM] TransitionToState(CSV04) Online-->OfflineCallIssued.
00000af0.000017fc::2014/02/23-17:00:35.117 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (6d07505c-cd56-4354-bb78-f0d452eb7350, Online --> Pending)
00000af0.000017fc::2014/02/23-17:00:35.117 INFO [DCM] PreOffline for CSV resource CSV04
00000af0.000017fc::2014/02/23-17:00:35.117 INFO [DCM] Unmapping volumes for cfs resource CSV04
00000af0.00002330::2014/02/23-17:00:35.117 INFO [NM] Received request from client address 172.16.0.1.
00000af0.00000d04::2014/02/23-17:00:35.179 INFO [DCM] Processing message dcm/pause
00000af0.00000d04::2014/02/23-17:00:35.179 INFO [DCM] Push.AsyncPauseDisk for 7177671d-4b75-4c6e-ad3d-4ff3671ce779
00000af0.00001594::2014/02/23-17:00:35.179 INFO [DCM] SyncHandler for 7177671d-4b75-4c6e-ad3d-4ff3671ce779
00000af0.00001594::2014/02/23-17:00:35.179 INFO [DCM] enter_AllGood(7177671d-4b75-4c6e-ad3d-4ff3671ce779) P0..75 P0..150
00000af0.00001594::2014/02/23-17:00:35.179 INFO [DCM] MappingManager::PauseVolume 'Volume4'
00000af0.00001594::2014/02/23-17:00:35.179 INFO [DCM] Filter.ChangeState (ctx=2, state=CfsVolumeStatePaused)
00000af0.00002130::2014/02/23-17:00:35.273 INFO [NM] Received request from client address HV03.
00000af0.00002130::2014/02/23-17:00:35.288 ERR   [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_NOT_ONLINE(5004)' because of 'PreprocessControl(16777765) failed for resource 'CSV04'.'
00000af0.00002130::2014/02/23-17:00:35.288 WARN [RCM] ResourceControl(STORAGE_GET_SHARED_VOLUME_INFO) to CSV04 returned 5004.
00000af0.00002130::2014/02/23-17:00:35.288 INFO [NM] Received request from client address HV03.
00000af0.00000c24::2014/02/23-17:00:35.320 ERR   [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_NOT_ONLINE(5004)' because of 'PreprocessControl(16777765) failed for resource 'CSV04'.'
00000af0.00000c24::2014/02/23-17:00:35.320 WARN [RCM] ResourceControl(STORAGE_GET_SHARED_VOLUME_INFO) to CSV04 returned 5004.
00000f30.00001888::2014/02/23-17:00:35.663 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00002960::2014/02/23-17:00:35.663 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00000af0.00002960::2014/02/23-17:00:35.663 WARN [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node. Attempting to find a good node...
00000f30.00001888::2014/02/23-17:00:35.663 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00000c24::2014/02/23-17:00:35.663 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00000f30.00001888::2014/02/23-17:00:35.710 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00000c24::2014/02/23-17:00:35.710 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00000f30.00001888::2014/02/23-17:00:35.912 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00002960::2014/02/23-17:00:35.912 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000af0.00002960::2014/02/23-17:00:35.912 WARN [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...
00000f30.00001888::2014/02/23-17:00:35.912 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00000c24::2014/02/23-17:00:35.912 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000f30.00001888::2014/02/23-17:00:35.912 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00000c24::2014/02/23-17:00:35.912 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000af0.00000c24::2014/02/23-17:00:35.928 WARN Resource type Volume Manager Disk Group not found.
00000af0.00000c24::2014/02/23-17:00:35.928 WARN Resource type GeoCluster Replicated Disk not found.
00000af0.00002960::2014/02/23-17:00:36.209 INFO [NM] Received request from client address HV03.
00000f30.000029d4::2014/02/23-17:00:36.599 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00000c24::2014/02/23-17:00:36.599 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00000af0.00000c24::2014/02/23-17:00:36.599 WARN [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node. Attempting to find a good node...
00000f30.000029d4::2014/02/23-17:00:36.599 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00002960::2014/02/23-17:00:36.599 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00000f30.000029d4::2014/02/23-17:00:36.599 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00000c24::2014/02/23-17:00:36.599 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000af0.00000c24::2014/02/23-17:00:36.599 WARN [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node. Attempting to find a good node...
00000f30.000029d4::2014/02/23-17:00:36.599 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00002960::2014/02/23-17:00:36.599 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000f30.000029d4::2014/02/23-17:00:36.614 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
00000af0.00002960::2014/02/23-17:00:36.614 WARN [RCM] Failed to load restype 'MSMQ': error 21.
00000f30.000029d4::2014/02/23-17:00:36.630 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
00000af0.00002960::2014/02/23-17:00:36.630 WARN [RCM] Failed to load restype 'MSMQTriggers': error 21.
00000af0.00000c24::2014/02/23-17:00:36.911 ERR   [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_NOT_ONLINE(5004)' because of 'PreprocessControl(16777765) failed for resource 'CSV04'.'
00000af0.00000c24::2014/02/23-17:00:36.911 WARN [RCM] ResourceControl(STORAGE_GET_SHARED_VOLUME_INFO) to CSV04 returned 5004.
00000af0.00002960::2014/02/23-17:00:36.989 ERR   [RCM] rcm::RcmResControl::DoResourceControl: ERROR_RESOURCE_NOT_ONLINE(5004)' because of 'PreprocessControl(16777765) failed for resource 'CSV04'.'
00000af0.00002960::2014/02/23-17:00:36.989 WARN [RCM] ResourceControl(STORAGE_GET_SHARED_VOLUME_INFO) to CSV04 returned 5004.
00000af0.000014c8::2014/02/23-17:00:39.251 INFO [DCM] filter.Event ->CfsVolumeStatePaused FromPause for ctx=2 status 00000000
00000af0.00001594::2014/02/23-17:00:39.251 INFO [DCM] volume paused 'Volume4'
00000af0.000017fc::2014/02/23-17:00:39.391 INFO [DCM] dcm/pause successfully completed on all nodes
00000af0.000017fc::2014/02/23-17:00:39.391 INFO [DCM] removing share 7177671d-4b75-4c6e-ad3d-4ff3671ce779-135266304$, status 0
00000af0.00002960::2014/02/23-17:00:39.391 INFO [NM] Received request from client address 172.16.0.1.
00000af0.000017fc::2014/02/23-17:00:39.438 INFO [DCM] ClearVolumeStates: resource 'CSV04' states <vector len='2'>
00000af0.000017fc::2014/02/23-17:00:39.438 INFO      <item>1</item>
00000af0.000017fc::2014/02/23-17:00:39.438 INFO      <item>135266304 0</item>
00000af0.000017fc::2014/02/23-17:00:39.438 INFO </vector>
00001680.000023b4::2014/02/23-17:00:39.438 INFO [RES] Physical Disk <CSV04>: Offline request.
00001680.00001e04::2014/02/23-17:00:39.438 INFO [RES] Physical Disk: DriveLetter mask: 0x0
00000af0.000017fc::2014/02/23-17:00:39.438 INFO [RCM] HandleMonitorReply: OFFLINERESOURCE for 'CSV04', gen(0) result 997.
00000af0.000017fc::2014/02/23-17:00:39.438 INFO [RCM] TransitionToState(CSV04) OfflineCallIssued-->OfflinePending.
00001680.00001e04::2014/02/23-17:00:39.454 INFO [RES] Physical Disk <CSV04>: HardDiskpCloseSVIHandles: Exit
00001680.00001e04::2014/02/23-17:00:39.454 INFO [RES] Physical Disk <CSV04>: VolumeIsNtfs: Volume\\?\GLOBALROOT\Device\Harddisk2\Partition2\ has FS type NTFS
00001680.00001e04::2014/02/23-17:00:39.454 INFO [RES] Physical Disk <CSV04>: OfflineThread: partition 2 offset 135266304 is a CSV volume, skipping lock volume
00001680.00001e04::2014/02/23-17:00:40.421 INFO [RES] Physical Disk: ReleaseDisk: stop reserve succeeded on device 2 (sig ceba743a)
00001680.00001e04::2014/02/23-17:00:40.452 INFO [RHS] Resource CSV04 has come offline. RHS is about to report resource status to RCM.
00000af0.00002960::2014/02/23-17:00:40.452 INFO [RCM] HandleMonitorReply: OFFLINERESOURCE for 'CSV04', gen(0) result 0.
00000af0.00002960::2014/02/23-17:00:40.452 INFO [RCM] TransitionToState(CSV04) OfflinePending-->OfflineSavingCheckpoints.
00000af0.00002960::2014/02/23-17:00:40.452 INFO [RCM] TransitionToState(CSV04) OfflineSavingCheckpoints-->Offline.
00000af0.00002960::2014/02/23-17:00:40.452 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (6d07505c-cd56-4354-bb78-f0d452eb7350, Pending --> Offline)
00000af0.00000c24::2014/02/23-17:00:40.452 INFO [RCM] rcm::RcmGum::GroupMoveOperation(6d07505c-cd56-4354-bb78-f0d452eb7350,1)
00000af0.000017fc::2014/02/23-17:00:40.452 WARN [RCM] rcm::RcmApi::ResourceControl: forwarded, no retry on error 5908
00000af0.000017fc::2014/02/23-17:00:40.452 WARN [RCM] ResourceControl(GET_CLASS_INFO) to CSV04 returned 5908.
00000af0.00002960::2014/02/23-17:00:40.452 WARN [RCM] rcm::RcmApi::GetResourceState: retrying: 6d07505c-cd56-4354-bb78-f0d452eb7350, 5908.
00000af0.000017fc::2014/02/23-17:00:40.452 ERR   [RCM] s_RcmRpcGetResourceState: ERROR_CLUSTER_GROUP_MOVING(5908)' because of ''CSV04' is owned by node 1, not 3.'

↧

Cannot clear "Current read-only" on pass through disk

February 24, 2014, 8:41 am

≫ Next: Cluster Updating Readiness Results

≪ Previous: Hyper-V 2008 R2 cluster down

This is not the end of the world, but it's very annoying so I'm hoping somebody can explain what's going on or perhaps suggest some additional troubleshooting.

Here's the scenario: Server 2012 Core VM with the file services role installed and running as a role on a Server 2012 Hyper-V failover cluster. IDE 0 is a standard VHDX in clustered storage. There are 2 pass through disks on SCSI targets 1 and 2. I proceed to attempt to add a 3rd pass-through disk:

1. Create a 750GB LUN
2. Mask the LUN to the cluster
3. Online the new disk on a cluster node
4. Initialize the disk on the node
5. Offline the disk
6. Using Failover manager, add the disk as new Available Storage in the cluster
7. Using Failover manager to modify the file server VM's settings, add the new storage to SCSI target 3
8. Using diskpart on the VM, clear the readonly flag from the disk

At this point, I was able to create a partition on the disk using new-partition with the -usemaximumsize flag, but I was unable to format it. It turns out the new partition size was 0 bytes. So I went back into diskpart and lo and behold, although the readonly flag is cleared, the "Current readonly status" on the disk is still yes.

To test the issue, I offlined the disk in the VM, removed it from the virtual SCSI chain, removed it from cluster storage and then onlined it in the owner node. I was able to partition it, format it and create an empty folder, so it is not flagged readonly at the host or SAN.

So I offlined it on the node and added it back to cluster available storage and then added it back to SCSI target 3 on the VM.

Again, I removed the readonly flag from the disk and again it cleared but the disks "current" status remained "Yes" and I was unable to manipulate the disk.

Stop/start vds did nothing and as this is a production server I could not restart it midday.

So I offlined the disk in the VM and removed it from SCSI target 3, then added it to SCSI target 4. This time, when I online it in the VM and use diskpart to clear readonly, both readonly and "current" readonly clear just fine and now the pass-through disk is operating as expected alongside the other 2 pass-through disks on the server.

Any ideas what went wrong in all this or how I can clear SCSI target 3 for another disk without having to restart the VM?

↧

Cluster Updating Readiness Results

February 24, 2014, 11:48 am

≫ Next: SQL 2012 std Cluster/ W2012R2 on ESXi 5.1

≪ Previous: Cannot clear "Current read-only" on pass through disk

I am configuring Cluster-Aware Updating on a 5-node cluster. At this time I'm not enabling Self-Updating mode. Instead I'm going to use Remote-Updating mode. In the Cluster Aware Updating console I ran Analyze cluster updating readiness. I have two warnings that I can safely ignore (about local machine proxy and CAU clustered role not being enabled). But there is an error that I'm stuck on. Rule ID 13 gives me an error saying, "The configured CAU plug-in must be registered on all failover cluster nodes."

The resolution says to ensure that the configured CAU plug-in is inatlled on the all cluster nodes. I ran Get-CauPlugin | fl -Property * on each node and each of them returned the expected Microsoft.WindowsUpdatePlugin and Microsoft.Hotfix.Plugin listings.

Any ideas on how I can troubleshoot and get the error cleared?

↧

SQL 2012 std Cluster/ W2012R2 on ESXi 5.1

February 25, 2014, 10:26 am

≫ Next: How to make 2 Node File Cluster with SAS Disks

≪ Previous: Cluster Updating Readiness Results

As I was reading the posts it appear that ESXi 5.1 does support MSCS on W2012 server.

We are trying to setup a SQL 212 cluster (not always on bc we don't have Enterprise license) on two vm hosts running W2012 R2 server.

I assume the scenario above is supported on ESXi v5.1, but I am not sure if setting up a SQL 2012 cluster on Vmware is a good idea vs. using a physical server with direct attached disks.

Is there a good documentation for the setup?

Thanks again

----------------------------

↧

How to make 2 Node File Cluster with SAS Disks

February 25, 2014, 2:46 am

≫ Next: Windows 2012 Hyper-v Cluster Live Migration Failure

≪ Previous: SQL 2012 std Cluster/ W2012R2 on ESXi 5.1

Hello,

I cant fine any detailed, specific information and answer on this question

I have 2 Servers, Say HP, with 4 SAS disks each, 1 for OS (2012R2Data) and 3 available.

I want to create fault tolerant SMB 3.0 share for my Hyper-v nodes, to make hyper-v FT cluster afterwards.

So, I see inr equirements that SAS disks will work, but during creating cluster, I can not manage Cluster to see disks from both servers (total 6 free units)

If it is made via ISCSi target, both servers have access to that targets and cluster sees all available through iSCSI disks, but how to mnake SAS disk available to other server?

Is it possible what I need with this configuration? is it a option to make each server iSCSI target and initiator? (but it would be way complicated and slow, I think).

so, what I have misunderstood, how to make FT file Cluster with 2 servers with SAS drives?

Any info about this SPECIFIC config would be welcomed, or any general step-by-step guides (please, do not link me guides with other config, like iscsi or with additional servers, I have seen a lot of them :()

thanks

↧

Windows 2012 Hyper-v Cluster Live Migration Failure

January 22, 2013, 12:07 pm

≫ Next: Not able to control TCP/IP services through NLB of windows server 2008 R2

≪ Previous: How to make 2 Node File Cluster with SAS Disks

This is a fairly new deployment. When I first started having this issue, I foundhttp://support.microsoft.com/kb/2779204, which suggests that gpupdate /force will temporarily resolve the issue. That worked for me for several weeks, but no longer. It was always temporary and I had to do it at least once a day, but even that doesn’t work anymore and my Hyper-V guests are stuck. I also suspect if I have a Hyper-V host failure, the guests on the failed system will not come back up on the operational host.

kb2779204 suggest adding NT Virtual Machine\Virtual Machines in the entries for Log on as a Service. I have created a dedicated OU for the virtual hosts (as suggested by the kb article) and moved them to the new OU. I then addedNT Virtual Machine\Virtual Machines in the entries for Log on as a Service, then gpupdate /force and then wait minutes, then hours. It still doesn’t work. I get 2 slightly different depending on which server I initiate the live migration.

If from the server that currently is running the guest (HV1):

Live migration of 'Virtual Machine vg1' failed.

Virtual machine migration operation for 'dc4' failed at migration source 'HV1'. (Virtual machine ID 140C8893-44EC-481B-B8D5-52FCB8D422DC)

The Virtual Machine Management Service failed to establish a connection for a Virtual Machine migration with host 'HV2': General access denied error (0x80070005).

The Virtual Machine Management Service failed to establish a connection for a Virtual Machine migration because the destination host rejected the request: General access denied error (0x80070005).

If from the server attempting to move the guest to (HV2):

Live migration of 'Virtual Machine vg1' failed.

Virtual machine migration operation for 'vg1' failed at migration destination 'HV1'. (Virtual machine ID 140C8893-44EC-481B-B8D5-52FCB8D422DC)

'vg1' Failed to create Planned Virtual Machine at migration destination: Logon failure: the user has not been granted the requested logon type at this computer. (0x80070569). (Virtual machine ID 140C8893-44EC-481B-B8D5-52FCB8D422DC)

↧

Not able to control TCP/IP services through NLB of windows server 2008 R2

February 26, 2014, 1:34 am

≫ Next: how to enable a failed ip address(?)

≪ Previous: Windows 2012 Hyper-v Cluster Live Migration Failure

Hi,

I am not able to stop/ break the TCP/IP connection, even after stopped the cluster node configured in NLB.

Also able to see the TCP/IP connection while doing netstat though command prompt

↧

how to enable a failed ip address(?)

February 26, 2014, 1:38 am

≫ Next: Server 2012 DataCenter and Server 2012 DataCenter R2 Clustering

≪ Previous: Not able to control TCP/IP services through NLB of windows server 2008 R2

Windows Server 2008 R2

we have a cluster for Exchange DAG with two ip address (one local the other remote). usually the local ip address is online and the remote ip address is not. this morning I found the situation that the remote became online and the local is offline. because i'm in a hurry, I just rebooted the remote server to force the DAG to use the local ip address. it worked.

my question is, without rebooting the other server, how can I enable the offline ip address?

↧