Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

대전오피 「uuzoa2.com 」 ▷유유닷컴◁ 오피가격


S2D StoragePool and Virtual disk size

$
0
0

Hi, All.

I'm testing a failover cluster with S2D enabled on WS2019. I have 3 VMs with 2 HDD 5GB on each. I've created failover cluster, enabled S2D and created a storage pool.

PS C:\Windows\system32> Get-ClusterS2D

CacheMetadataReserveBytes : 34359738368
CacheModeHDD              : ReadWrite
CacheModeSSD              : WriteOnly
CachePageSizeKBytes       : 16
CacheState                : Disabled
Name                      : s2d-cluster2
ScmUse                    : Cache
State                     : Enabled

PS C:\Windows\system32> Get-StorageSubsystem *cluster* | Get-PhysicalDisk

DeviceId FriendlyName        SerialNumber                     MediaType CanPool OperationalStatus HealthStatus Usage       Size
-------- ------------        ------------                     --------- ------- ----------------- ------------ -----       ----
3002     VMware Virtual disk 6000c29503bcbdb2cf84ec2867ea371b HDD       True    OK                Healthy      Auto-Select 5 GB
1002     VMware Virtual disk 6000c29a55adf0ba2fb870ad3a9dfd32 HDD       True    OK                Healthy      Auto-Select 5 GB
3001     VMware Virtual disk 6000c291b4372413c4f7feaa10fe9beb HDD       True    OK                Healthy      Auto-Select 5 GB
2002     VMware Virtual disk 6000c2993762bd6617b3cd5eef1ff9d0 HDD       True    OK                Healthy      Auto-Select 5 GB
2001     VMware Virtual disk 6000c292a3946d8f1b33bb0f716ecd44 HDD       True    OK                Healthy      Auto-Select 5 GB
1001     VMware Virtual disk 6000c294f797902a11829554da27c6ae HDD       True    OK                Healthy      Auto-Select 5 GB


Questions about space allocation:
1. After creating a pool I see only 26.9GB free space. What is AllocatedSize? And where is gone also 1.6GB?

PS C:\Windows\system32> Get-StoragePool

FriendlyName OperationalStatus HealthStatus IsPrimordial IsReadOnly    Size AllocatedSize
------------ ----------------- ------------ ------------ ----------    ---- -------------
Primordial   OK                Healthy      True         False        70 GB       29.9 GB
S2D_4        OK                Healthy      False        False      26.9 GB        1.5 GB
Primordial   OK                Healthy      True         False        70 GB       29.9 GB

2. I try to create virtual disk 3GB with mirror. I'm expecting that pool decrease by 6GB, but really footprint is 8GB

PS C:\Windows\system32> New-VirtualDisk -StoragePoolFriendlyName "S2D_4" -FriendlyName disk1 -ResiliencySettingName Mirror -NumberOfDataCopies 2 -ProvisioningType Fixed -Size 3GB

FriendlyName ResiliencySettingName FaultDomainRedundancy OperationalStatus HealthStatus Size FootprintOnPool StorageEfficiency
------------ --------------------- --------------------- ----------------- ------------ ---- --------------- -----------------
disk1        Mirror                1                     OK                Healthy      3 GB            8 GB            37.50%


PS C:\Windows\system32> Get-StoragePool

FriendlyName OperationalStatus HealthStatus IsPrimordial IsReadOnly    Size AllocatedSize
------------ ----------------- ------------ ------------ ----------    ---- -------------
Primordial   OK                Healthy      True         False        70 GB       29.9 GB
S2D_4        OK                Healthy      False        False      26.9 GB        9.5 GB
Primordial   OK                Healthy      True         False        70 GB       29.9 GB


3. Next I try to create virtual disk 500MB with mirror. I'm expecting that pool decrease by 1000MB, but really footprint again is 8GB

PS C:\Windows\system32> New-VirtualDisk -StoragePoolFriendlyName "S2D_4" -FriendlyName disk2 -ResiliencySettingName Mirror -NumberOfDataCopies 2 -ProvisioningType Fixed -Size 500MB

FriendlyName ResiliencySettingName FaultDomainRedundancy OperationalStatus HealthStatus Size FootprintOnPool StorageEfficiency
------------ --------------------- --------------------- ----------------- ------------ ---- --------------- -----------------
disk2        Mirror                1                     OK                Healthy      3 GB            8 GB            37.50%


PS C:\Windows\system32> Get-StoragePool

FriendlyName OperationalStatus HealthStatus IsPrimordial IsReadOnly    Size AllocatedSize
------------ ----------------- ------------ ------------ ----------    ---- -------------
Primordial   OK                Healthy      True         False        70 GB       29.9 GB
S2D_4        OK                Healthy      False        False      26.9 GB       17.5 GB
Primordial   OK                Healthy      True         False        70 GB       29.9 GB

Another question, if I try to create virtual disk from GUI, most of options are absent and I can only set name and size of new virtual disk. For example I can't set resiliency two-way mirror.

Help please, what did I do wrong?

Failover cluster very slow

$
0
0

Hi everyone,

I took over admin tasks after one of admins left. He was responsible for failover cluster but I heard that users complained a lot. Main issue is that is very very slow. I logged in and check how it looks like.

It is 2 node cluster with witness disk.

Under network it has 2 networks both allowing cluster and client. There is no heartbeat network for cluster communiction. Both nics are in NIC Teaming mode. 

Storage - 2 csv are configured and each node is owner of 1.  Storage type is SAN and it is directly attached to the servers.

How to troubleshoot slow performance, what to think about or how to start troubleshooting?

IF you need more info just ask. Thank you all


VMs Unable to Live Migrate

$
0
0

I have a Failover Cluster running on two Server 2012 R2 Datacenter nodes hosting our Hyper-V environment.  Recently, we have run into an issue where the VMs won’t migrate to the opposite node unless the VM is rebooted or the Saved State data is deleted.  The VMs are stored either on an SOFS volume on a separate FO Cluster or a CSV volume both nodes are connected to.  The problem occurs to VMs in either storage location.

Testing I’ve done is below.  Note that I only list one direction, but the behavior is the same moving in the opposite direction, as well:

- Live Migration: if a VM is on Node1 and I tell it to Live Migrate to Node2, it begins the process in the console and for a split second shows Node2.  It immediately flips back to Node1.  If the VM has rebooted since the last migration, it will go ahead and migrate to Node2.  It will not migrate back until the VM has been rebooted again.  The Event Log shows IDs 1205 and 1069.  1069 states “Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.”  All resources show Online in Powershell.

- Quick Migration: I initiate a Quick Migration and the VM will move from Node1 to Node2, but will fail to start on Node2.  Checking the Event Log I see Event IDs 1205 and 1069.  1069 states “Cluster resource 'Virtual Machine IDF' of type 'Virtual Machine' in clustered role 'IDF' failed. The error code was '0xc0370027' ('Cannot restore this virtual machine because the saved state data cannot be read. Delete the saved state data and then try to start the virtual machine.').”  After deleting the Saved State Data, the VM will start right up and can be Live or Quick Migrated once.

- Shutdown VM and Quick Migration: I have not had an occasion of this method fail so far.

- Rebooting the Nodes has had no discernable effect on the situation.

- I’ve shut down a VM and moved its storage from SOFS to the CSV and still have the same issues as above.  I moved the VHDX, the config file, and saved state data (which was empty while the VM was powered down) to the CSV.

Items from the FO Cluster Validation Report:
1. The following virtual machines have referenced paths that do not appear accessible to all nodes of the cluster. Ensure all storage paths in use by virtual machines are accessible by all nodes of the cluster.
Virtual Machines Storage Paths That Cannot Be Accessed By All Nodes 
Virtual Machine       Storage Path      Nodes That Cannot Access the Storage Path 
VM1                       \\sofs\vms         Node1

I’m not sure what to make of this error as most of the VMs live on this SOFS share and are running on Nodes1 and 2.  If Node1 really couldn’t access the share, none of the VMs would run on Node1.

2. Validating cluster resource File Share Witness (2) (\\sofs\HVQuorum).
This resource is configured to run in a separate monitor. By default, resources are configured to run in a shared monitor. This setting can be changed manually to keep it from affecting or being affected by other resources. It can also be set automatically by the failover cluster. If a resource fails it will be restarted in a separate monitor to try to reduce the impact on other resources if it fails again. This value can be changed by opening the resource properties and selecting the 'Advanced Policies' tab. There is a check-box 'run this resource in a separate Resource Monitor'.

I checked on this and the check-box is indeed unchecked and both Nodes report the same setting (or lack thereof).

3. Validating cluster resource Virtual Machine VM2.
This resource is configured to run in a separate monitor. By default, resources are configured to run in a shared monitor. This setting can be changed manually to keep it from affecting or being affected by other resources. It can also be set automatically by the failover cluster. If a resource fails it will be restarted in a separate monitor to try to reduce the impact on other resources if it fails again. This value can be changed by opening the resource properties and selecting the 'Advanced Policies' tab. There is a check-box 'run this resource in a separate Resource Monitor'.

Validating cluster resource Virtual Machine VM3.
This resource is configured to run in a separate monitor. By default, resources are configured to run in a shared monitor. This setting can be changed manually to keep it from affecting or being affected by other resources. It can also be set automatically by the failover cluster. If a resource fails it will be restarted in a separate monitor to try to reduce the impact on other resources if it fails again. This value can be changed by opening the resource properties and selecting the 'Advanced Policies' tab. There is a check-box 'run this resource in a separate Resource Monitor'.

I can’t find a place to see this check-box for the VMs.  The properties on the roles don’t contain the ‘Advanced Policies’ tab.

All other portions of the Validation Report are clean.

So far, I haven’t found any answers in several days of Google searching and trying different tactics.  I’m hoping someone here has run into a similar situation and can help steer me in the right direction to get this resolved.  The goal is to be able to Live Migrate freely so I can reboot the Nodes one at a time for Microsoft Updates without having to bring down all the VMs in the process.




Can't add new node to existing failover cluster

$
0
0

Hi,

i have problem adding new node to existing failover cluster. Existing failover cluster is two node cluster with node and file share majority. i'm using this cluster for SQL AlwaysOn Availability group. There are no shared volumes.

when i use failover cluster manager console i'm getting error:

The server 'N3.local' could not be added to the cluster.
An error occurred while adding node 'N3.local' to cluster 'Cluster1'.

The parameter is incorrect

Also i have this error in Application and services log/Microsoft/FailoverClustering-Manager/Diagnostic:

Exception occurred in background operation - System.ApplicationException: An error occurred while adding nodes to the cluster 'Cluster1'. ---> System.ApplicationException: An error occurred while adding node 'N3.local' to cluster 'Cluster1'. ---> System.ComponentModel.Win32Exception: The parameter is incorrect
   --- End of inner exception stack trace ---
   at MS.Internal.ServerClusters.ClusApiExceptionFactory.CreateAndThrow(Cluster cluster, Int32 sc, String format, Object arg0, Object arg1)
   at MS.Internal.ServerClusters.Cluster.AddNode(String nodeName, ClusterActionCallback callback)
   at MS.Internal.ServerClusters.Configuration.AddNodeManagement.AddNodes(ActionArgs actionArgs, ActionUpdateHelper updateHelper)
   --- End of inner exception stack trace ---
   at MS.Internal.ServerClusters.Configuration.AddNodeManagement.AddNodes(ActionArgs actionArgs, ActionUpdateHelper updateHelper)
   at MS.Internal.ServerClusters.Configuration.AddNodeManagement.PerformAddNodes(ActionArgs actionArgs)
   at MS.Internal.ServerClusters.Configuration.ConfigurationBase.PerformActionWrapper(BackgroundOperationStatus backgroundOperationStatus, BackgroundOperationArgs parameter)
   at MS.Internal.ServerClusters.BackgroundOperation`2.BackgroundOperationProc(Object state)


i have tried to add node from powershell with same error (parameter is  incorrect). I have tried to remove Failover cluster role and add it again but i'm still getting the same error.

Please advice,

Thank you

High Availability Cluster without Shared Storage

$
0
0

Hi Experts, 

I've been doing some research on how to achieve this goal and what's the best practice.

We are planning to do a high availability cluster for our server running the following services.

  1. Active Directory
  2. DNS and DHCP
  3. File Server

Currently we have one fully operational Windows server 2016 running in Dell R530. Since we have 2 set of dell server, we want to configure a HA cluster for downtime protection. And  we want to set it up in a way were we have a main server that will be doing all the workload and a backup server that will replace the main if it fail without downtime.

But most common reference I found related to our goal involved a shared network. Now what I wanted to know are:

  • Why is it recommended to have a shared storage 
  • Is it possible to configure HA without shared storage
  • If possible, what are the risk of not having a shared storage

Thank you in advance experts.

File Server Clustering between Two Domain Controller

$
0
0

Hi all,

Is it possible to cluster file server between two active directory domain controller.

As of now our server is still in standalone. We will soon add another active directory in our domain for fault tolerance if the server fail. 

Our current server runs the following services which we want to add redundancy that's why we want to add new server.

  • Active Directory
  • DNS
  • DHCP
  • File Server

In my research, Active Directory and DNS High Availability will be achieved once we add another domain controller in our current domain. And in DHCP there's a feature called DHCP Clustering.

But regarding File Server, I haven't found any clear ideas on how to achieve this.

Thanks in advance for your advises.

Error with cluster-aware updating

$
0
0
I currently have one cluster with two nodes.  I manually apply updates to the nodes with cluster-aware updating.  The last several updates have gone fine on node 1 but I get an error "partially failed" on node 2.  The description is "Node "xxx" failed to leave maintenance mode".  The node is up and running, all vm's hosted on it are fine as well.  Does anyone have any idea why I'm getting this error and how to resolve it?  Nothing has changed on the cluster or the nodes that I'm aware of.

Setting up generic service as Active-Active cluster

$
0
0

Hi All,

I am going to config (MySQL - generic service) as Active-Active cluster in WSFC.

Ref: http://www.clusterdb.com/mysql/mysql-with-windows-server-2008-r2-failover-clustering

Both node (Master & Slave) data will replicate by "MySQL Replication" like "MS Availability Group".

However service will only run as "Active-Passive", service "Running" on owner node but "Stopped" on passive node, in this case slave node cannot replicate because service stopped.

Was though an idea to keep both node service running, is using a script to monitor service per 10 second... and auto start the service if stopped... May I know is there any solution to keep service "Running" even switchover-ed?

Thanks for assist :)


Why Clustering Domain controllers is a bad approach?

$
0
0

Hi Experts,

I would like to ask your insights about why is it bad to cluster domain and what are the risk . I have read some forums and pages concerning about this, but it seems I can't get a clear picture of it.

I understand that DC doesn't need to be clustered for failover environment. but what if there are services in our environment that needed to be clustered such as File Server. 

Thanks in advance.

Need to run Cluster Service with an Domain account Need Help!!!!

$
0
0

Hello Experts,

I have Windows Server 2016 installed on both my nodes, which is part of a failover cluster. I am running a VB Script role for High Availability of my application. My VB script calls a PowerShell Script which reads an XML file which stores encrypted username and password of our application. Below is the command which is used to generate the XML file with credentials. Now since the below command is run using the Domain User (as I was logged in using the domain user). This file can be read only by the domain user.

$credential = Get-Credential
$credential | Export-CliXml -Path "C:\My\Secrets\myCred.xml" 
So whenever my VB script which is running in failover cluster generic script role, calls my Powershell script within to read the above file using below command:

$credential = Import-CliXml -Path "C:\My\Secrets\myCred.xml" 

The cluster is unable to file the file to read it and extract the credentials.

My requirement is simple, to run the cluster service also with the same Domain user so that the XML is accessible and read via failover cluster generic script role.

Also is there a way I can call the ps1 script using the domain account from the Generic script (VB script)?

Hope this makes sense!! Thanks in advance!!

Generic Script Role (VB Script:)
Function Online( )
    PScmd = "powershell.exe -executionpolicy bypass -file " & ROOTFOLDERPATH & "\" & "StartCommPoints.ps1"
    Dim WshShell
    Set WshShell = CreateObject("WScript.Shell")
    rv = WshShell.Run(PScmd, , True)
    Dim http
    Set http = wGet( "update&online" & SERVICE  )
    Online = 0
End Function

PowerShell Script (StartCommPoints.ps1):
$credential = Import-CliXml -Path "C:\Program Files (x86)\Philips\IBE\IBEInstaller\HighAvailability\Scripts\rhapsody.xml" 
$cred =New-Object System.Management.Automation.PSCredential  ($credential.UserName, $credential.Password)

Whenever I try changing Log on for Cluster Service I get below error. I made sure the domain user has all the required permissions.

'The Cluster Service service failed to start due to the following error:  A privilege that the service requires to function properly does not exist in the service account configuration. You may use the Services Microsoft Management Console (MMC) snap-in (services.msc) and the Local Security Settings MMC snap-in (secpol.msc) to view the service configuration and the account configuration.'

Please Help!!!

Thanks,

Surabhi


Surabhi

Server 2016 HV Cluster - Storage migration for virtual machine ' failed with error 'Unspecified error' (0x80004005).

$
0
0

Storage is on the same array (different volume) that the server connects to

VM in question has some disks on Volume1 & some on Volume2

So I want to consolidate them all into Volume1

And I am presented with this Unspecified error (I have enough of this totally amatourish Microsoft approach to something that should be a business solution!)

I can disconnect the disk & move it by "hand" & re-attach & it works perfectly fine, but that is stupid to have Live Storage migration & not being able to use it!

Seb


Microsoft NLB not working correctly

$
0
0

Hi,

I have setup an NLB-cluster containing 2 servers.

As an exampel:

cluster called: corpweb (ip:10.0.0.3)

servers called corpweb01 (ip 10.0.0.1), corpweb02 (ip 10.0.0.2)

All Three are registred in DNS. If I browse corpweb, I come to the user GUI as expected. However, if I stop the host corpweb01 in the nlb-cluster I am still able to reach that server using the NLB-name? How come? I am not able to reach the server corpweb02, even though it is green in the NLB GUI. So the cluster-solution doesnt seem to work at all, besides I can use the dns-name for the cluster to reach server1. Any suggestion here would be appreciated.

[SOLVED] Disk Manager Hanging on Clustered (iSCSI) hosts

$
0
0

Just wanted to let you know a recent problem and fix I was having recently.

Server 2016, Hyper-V hosts, Clustered, iSCSi storage

Disk Manager would hang when I would open it.   Also, I couldn't "connect" to any VM via Failover Cluster Manager nor Hyper-V Manager.   If I would reboot both nodes in the 2-node cluster, it would work as expected for a short time.   I have 8 of these 2-node clusters.  It was happening on all 8 clusters.

After working with MS support, we were able to determine that it was coming from our Avocet IP KVM.   The Avocet have Virtual Disk capabilities (basically, it has the option to mount a disk via the USB port through the KVM).   There was a recursive call in the Virtual Disk Service for this VD.   I was able to disable the VD via the KVM management console to fix the problem.

Error Cluster Aware Update(CAU) in Windows 2012 FO Cluster

$
0
0

Hi,

I have configured Self-updating CAU  for 2 Nodes Windows 2012 FO Cluster . I use "Microsoft.HotfixPlugin" of using insatall hotfix at cluster nodes. 

  In my Lab, 3 Windows 2012 Servers -  1 AD & 2 are Cluster Nodes(W2016-N1 & W2016-N2). Shared folder at AD server (BLUEAD01 - 10.10.10.10) with "Full Permission" for Cluster Administrator  (User Name : Cluadmin - both nodes are loginto Blue.Local\Cluadmin user) . I am suspecting the issue is related with WUSA & having some conflict with security patch with extension *.msu.

Please check attached snap. If you have any way out, please let me know.

-Suddhaman


Cannot create checkpoint when shared vhdset (.vhds) is used by VM - 'not part of a checkpoint collection' error

$
0
0

We are trying to deploy 'guest cluster' scenario over HyperV with shared disks set over SOFS. By design .vhds format should fully support backup feature.

All machines (HyperV, guest, SOFS) are installed with Windows Server 2016 Datacenter. Two HyperV virtual machines are configured to use shared disk in .vhds format (located on SOFS cluster formed of two nodes). SOFS cluster has a share configured for applications and HyperV uses \\sofs_server\share_name\disk.vhds path to SOFS remote storage). Guest cluster is configured with 'File server' role and 'Failover clustering' feature to form a guest cluster. There are two disks configured on each of guest cluster nodes: 1 - private system disk in .vhdx format (OS) and 2 - shared .vhds disk on SOFS.

While trying to make a checkpoint for guest machine, I get following error:

Cannot take checkpoint for 'guest-cluster-node0' because one or more sharable VHDX are attached and this is not part of a checkpoint collection.

Production checkpoints are enabled for VM + 'Create standard checkpoint if it's not possible to create a production checkpoint' option is set. All integration services (including backup) are enabled for VM.

When I delete .vhds disk of shared drive from SCSI controller of VM, checkpoints are created normally (for private OS disk).

It is not clear what is 'checkpoint collection' and how to add shared .vhds disk to this collection. Please advise.

Thanks.

Hyper-Stretched Cluster with storage Replica, Windows 2016 Node Error

$
0
0

We have four HPE nodes in a HyperV-Stretched Cluster with two HPE 2040 Storage.

One of the node has repeated the below mentioned error twice in 9 days time. This disturbs the whole cluster, bring the volumes online and offline and pausing the VMs.

Error      3/8/2019 9:18:19 AM     FailoverClustering           1230       Resource Control Manager

A component on the server did not respond in a timely fashion. This caused the cluster resource 'Virtual Machine XXXXX' (resource type 'Virtual Machine', DLL 'vmclusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly.

The same error was repeated on 03/17/2019    3:08 PM

Up But Isolated Cluster Node

$
0
0

I'm running Server 2016 fully patched in a 5 node cluster.  Hyper-V and S2D for a hyper-converged solution running a few hundred VMs.  Two days ago one of my nodes decided that it wanted to be cranky.  This caused the roles to rearrange on the systems and ended up putting one of my healthy nodes in the "Isolated" state.  <g class="gr_ gr_456 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep" data-gr-id="456" id="456">Root</g> cause for the other node that went out to lunch is still unknown and is being researched separately.  However, this other healthy node has been stuck in an online but isolated state.  See screenshot.  I've seen plenty of examples where the node is offline and isolated, typically a network problem(network looks line. I have three separate NICs with separate switches/<g class="gr_ gr_3558 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" data-gr-id="3558" id="3558">vlans</g>/IP space).  I can live migrate VMs, my S2D storage is fully healthy on the cluster.  No issues using this node, but I don't like the "isolated" state.  I ran the cluster validation test for networking and it returns healthy.  No warnings or errors in the validation test.  Event logs show that the node when isolated, but in the same second I have a follow-up event that it's no longer isolated.  These events exist on all nodes in the cluster, so there is no reason why it should be isolated.  I'm sure if I rebooted this node(or even restarted the cluster service) that it would come back online as healthy, but another node in the cluster is having hardware issues, so that's not an option at the moment.  Any thoughts would be appreciated on how to remove the isolated state.  The end of the <g class="gr_ gr_5208 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling ins-del multiReplace" data-gr-id="5208" id="5208">powershell</g> command shows it all.  <g class="gr_ gr_5545 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins doubleReplace replaceWithoutSep" data-gr-id="5545" id="5545">State</g> is Up. StatusInformation is Isolated...


VMs Unable to Live Migrate

$
0
0

I have a Failover Cluster running on two Server 2012 R2 Datacenter nodes hosting our Hyper-V environment.  Recently, we have run into an issue where the VMs won’t migrate to the opposite node unless the VM is rebooted or the Saved State data is deleted.  The VMs are stored either on an SOFS volume on a separate FO Cluster or a CSV volume both nodes are connected to.  The problem occurs to VMs in either storage location.

Testing I’ve done is below.  Note that I only list one direction, but the behavior is the same moving in the opposite direction, as well:

- Live Migration: if a VM is on Node1 and I tell it to Live Migrate to Node2, it begins the process in the console and for a split second shows Node2.  It immediately flips back to Node1.  If the VM has rebooted since the last migration, it will go ahead and migrate to Node2.  It will not migrate back until the VM has been rebooted again.  The Event Log shows IDs 1205 and 1069.  1069 states “Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.”  All resources show Online in Powershell.

- Quick Migration: I initiate a Quick Migration and the VM will move from Node1 to Node2, but will fail to start on Node2.  Checking the Event Log I see Event IDs 1205 and 1069.  1069 states “Cluster resource 'Virtual Machine IDF' of type 'Virtual Machine' in clustered role 'IDF' failed. The error code was '0xc0370027' ('Cannot restore this virtual machine because the saved state data cannot be read. Delete the saved state data and then try to start the virtual machine.').”  After deleting the Saved State Data, the VM will start right up and can be Live or Quick Migrated once.

- Shutdown VM and Quick Migration: I have not had an occasion of this method fail so far.

- Rebooting the Nodes has had no discernable effect on the situation.

- I’ve shut down a VM and moved its storage from SOFS to the CSV and still have the same issues as above.  I moved the VHDX, the config file, and saved state data (which was empty while the VM was powered down) to the CSV.

Items from the FO Cluster Validation Report:
1. The following virtual machines have referenced paths that do not appear accessible to all nodes of the cluster. Ensure all storage paths in use by virtual machines are accessible by all nodes of the cluster.
Virtual Machines Storage Paths That Cannot Be Accessed By All Nodes 
Virtual Machine       Storage Path      Nodes That Cannot Access the Storage Path 
VM1                       \\sofs\vms         Node1

I’m not sure what to make of this error as most of the VMs live on this SOFS share and are running on Nodes1 and 2.  If Node1 really couldn’t access the share, none of the VMs would run on Node1.

2. Validating cluster resource File Share Witness (2) (\\sofs\HVQuorum).
This resource is configured to run in a separate monitor. By default, resources are configured to run in a shared monitor. This setting can be changed manually to keep it from affecting or being affected by other resources. It can also be set automatically by the failover cluster. If a resource fails it will be restarted in a separate monitor to try to reduce the impact on other resources if it fails again. This value can be changed by opening the resource properties and selecting the 'Advanced Policies' tab. There is a check-box 'run this resource in a separate Resource Monitor'.

I checked on this and the check-box is indeed unchecked and both Nodes report the same setting (or lack thereof).

3. Validating cluster resource Virtual Machine VM2.
This resource is configured to run in a separate monitor. By default, resources are configured to run in a shared monitor. This setting can be changed manually to keep it from affecting or being affected by other resources. It can also be set automatically by the failover cluster. If a resource fails it will be restarted in a separate monitor to try to reduce the impact on other resources if it fails again. This value can be changed by opening the resource properties and selecting the 'Advanced Policies' tab. There is a check-box 'run this resource in a separate Resource Monitor'.

Validating cluster resource Virtual Machine VM3.
This resource is configured to run in a separate monitor. By default, resources are configured to run in a shared monitor. This setting can be changed manually to keep it from affecting or being affected by other resources. It can also be set automatically by the failover cluster. If a resource fails it will be restarted in a separate monitor to try to reduce the impact on other resources if it fails again. This value can be changed by opening the resource properties and selecting the 'Advanced Policies' tab. There is a check-box 'run this resource in a separate Resource Monitor'.

I can’t find a place to see this check-box for the VMs.  The properties on the roles don’t contain the ‘Advanced Policies’ tab.

All other portions of the Validation Report are clean.

So far, I haven’t found any answers in several days of Google searching and trying different tactics.  I’m hoping someone here has run into a similar situation and can help steer me in the right direction to get this resolved.  The goal is to be able to Live Migrate freely so I can reboot the Nodes one at a time for Microsoft Updates without having to bring down all the VMs in the process.




2 Node Cluster with local storage

$
0
0

I am not a server expert here, but I am a general admin, so I'll try to keep up ;) I know that versions of this question have been asked, but the answers vary. Hypothetically, I have a server, with a 1TB OS drive, and a 10TB data drive. We are going to run SQL and a proprietary application that keeps the Database updated. We want to make this a highly available server, so we have an identical server that we want to use (along with the first one) to create a two node cluster. 

From what I am reading, there are people saying that I cannot create a cluster using servers with local storage, and must use a SAN or NAS, others are saying yes I can. But there is no definitive answer I've found, and I do not want to spend the money to go from hypothetical to reality, if it can't be done.

The problem with a shared storage like NAS, is that it creates a single point of failure. If the NAS fails (power failure, board failure, fire, etc.), then the Database server cluster fails, and that defeats the whole point of the cluster. Clustering is all about eliminating single points of failure, so NAS is not the solution. SAN seems to me like clustered storage, and makes me wonder why I can't just eliminate the complication of the extra hardware, and utilize the two perfectly good arrays I already have in the existing nodes I'm clustering. A SAN in this case just seems like redundant overkill, and a waste of money. I already have the high availability of the two server cluster, and their identical internal storage arrays, with all the storage I need. Why can't I use them? 

Is there a way to cluster two servers, with their own locally attached storage? For example, can I take server A with a C: and a locally attached (eg internally configured RAID array) D: drive, and make server B (configured identically) work in a cluster with server A? And can it be made to work with the OS on C:, and the database on the internal D:, without a NAS or a SAN?

Go...

Viewing all 4519 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>