Quantcast
Channel: High Availability (Clustering) forum
Viewing all 4519 articles
Browse latest View live

S2D on 2019 - Perhaps a bug

$
0
0

Hi,

I have my homelab with two Dell R710 with 6 HDD and 2 NVMe in both servers.

I have configured two virtual disks, created as nested mirror-accelerated parity, but if I suspend a node and reboots it, all vdisks goes offline with the error 

The pack does not have a quorum of healthy disks.

This was working fine on the same servers on 2016.

But if I run this before the reboot, all vdisks stays online as they should.

Get-StorageScaleUnit -FriendlyName $Env:COMPUTERNAME | Enable-StorageMaintenanceMode

Anyone else have the same experience ?

BR

Martin


Cluster Aware Updating (CAU) and Windows Defender definition updates

$
0
0

Our Cluster Aware Updating (CAU) is configured and doing a fantastic job of patching cluster nodes automatically, and with no downtime.

We have CAU set to a weekly cadence and Windows Defender updates are of course available every week.

One odd behavior we are seeing is that when CAU detects only Windows Defender updates it still goes through the motions of pausing, draining, patching, and resuming servers in the cluster.

This seems a bit overkill to apply defender updates which can be done with no server downtime.

Is there anything we can configure or any open feature request for CAU to handle patching differently if only Windows Defender definition updates are found?

Unable to get Failover cluster working with 2 Nodes Hyper V setup

$
0
0

Hi 

We have two nodes Failover Cluster Node A Node B

connected with MSA2040 via 4 SAS connectors 

As soon as Node A restart Node B unable to see storage and all the VMs restarts rather failover 

1- Both Nodes can see Cluster Storage in the C drives

2- Currently Owner node is A 

3- VM can be live migrated no issues 

3.1- Unable to see how we are mapping the Storage to HyperV Nodes as theresnt any iscsi initiator  ?

4- Its setup by previous employees

5- I am in field quite long but new to storage and stuff 

6- Tried various forums but unable to get this resolve

any help would be highly appreciated. 

VMs on nodes going into a locked or hung state.

$
0
0

I have a brand new Hyper V Cluster using Server 2019 Core with 6 nodes. Everything updated hardware-wise from the manufacturer (HPE) and connected SAN (netApp). Everything updated software-wise from HPE and Microsoft. I am using Veeam Backup 9.5 update 4a also.

What I have had happen on more than one occasion now, is I will get a VM or two get into a hung or locked state. My backup shows it has failed and I cannot Live Migrate nor shutdown the individual VM either from the guest itself or from the host node from task manager. Problem also causes my other VMs on that node to not be able to Live Migrate (it gets stuck at 3%). My only recourse has been to restart the server from command line. Today when it happened, I had to hardware reset the server to get it to reboot as even after an hour, it was still draining roles.

My first inclination is to blame this on Veeam and I will take this up with them if, after I have updated to 9.5 update 4b today, I encounter the issue again...but wanted to see if anyone else has had this or might provide some insight of what could be hosing the entire node for an errant VM?

Windows Admin Center: Missing sddcres.dll

$
0
0

Hello,

I have recently spun up a 3-Node failover cluster with S2D and Hyper-V roles installed, configured, and actively working. Windows Admin Center is pointing me to an article that states it relies on a set of APIs that are not included in Server 2016. However, when I run the command posted on the article, it fails and displays that "C:/windows/cluster/sddcres.dll" doesn't exist. According to the article the libraries are downloaded in 2016 if the 05-2018 KB is installed. I've verified that all 3 nodes are on 07-2019 (just ran CAU to ensure it was installed on all nodes and it was successful). This still didn't fix the command. So, I downloaded the update directly from the Microsoft Update Catalog just in case.. and the installer returns a message that "this update is not applicable".

During the deployment of these nodes I didn't see anything that specifically mentioned 'Hyper-Converged' or a setting I needed to toggle to indicate that. As far as I'm aware the term Hyper-Converged just describes the configuration of the architecture (S2D+Hyper-V on boxes in a Cluster).

Everything in the cluster validation is coming back valid, and I've verified that S2D is functional (NVMe are "Journals" and my HDD/SDD pool is correctly displaying as Capacity & Performance).

Any recommendations?



Showing Un-Monitor and isolated host from the fail over cluster

$
0
0

Hi Expert,

We are encountering same type of issue in my failover cluster environment " your host XYZ is un monitored state or islolated in cluster".

Due to this error my all belonging VMs of particular host were restarted or shutdown. i created 2-3 times support tickets to Microsoft but we did not get any finding or solution from them. I retsrated my host and then it will be ok.

Kindly advise me.

IN my failover cluster, we have 4 hosts and we are using Server 2016.

Thanks in advance.


ejaz

Unable to make a storage pool

$
0
0

Hello all :)

I'm currently in the process of teaching myself about Server 2019 and some of the technologies I've not had chance to play with before.

The one that I am trying at the moment is creating a file server using failover clustering.
I am able to create the cluster (LAB-CLUSTER01) using 3 servers (LAB-S03,LAB-S04 and LAB-S05), running Server 2019 DC Core.

I have created 3 storage pools before creating the cluster (S03-SP, S04-SP and S05-SP). These pools are made of 4 virtual SSDs, creating a single drive.

All of this is running on ESXi 6.5

The storage pools are all running and happy without issue but I am unable to access them from the cluster. The error given below

'Failed to bring the resource 'S03-SP' online.

The device does not recognize the command

Looking at the physical disks tab in cluster manager, they are all marked as 'Becoming Ready'

Once I have tried to add this pool to the cluster, I am then no longer able to access it from within Windows Server Manager.

Would it be possible for someone to advise what I causing this and what can be done (if anything) to fix it.

Many thanks
Tom


Cluster resource 'Virtual Machine VMNAME' of type 'Virtual Machine' in clustered role 'VMNAME' failed.

$
0
0

Hello!

I have Hyper-V Failover Cluster with 3 node. 

NODE1: Windows SRV2016

NODE2: Windows SRV2016

NODE3: Windows SRV2019

There are 30 VMs in failover cluster. I can move VMs with Live Migration to all node except one. The one of the VM can move with Live Migration from NODE2 to NODE 1 and NODE1 to NODE2, but I can't move from NODE1 and NODE2 to NODE3 and I get the following error:

Event id: 1069

Cluster resource 'Virtual Machine VMNAME' of type 'Virtual Machine' in clustered role 'VMNAME' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event id: 1205

The Cluster service failed to bring clustered role 'VMNAME' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Why should be the problem?

Thank You.


windows server 2016 failover cluster virtual machine is in locked state

$
0
0

windows server 2016 failover cluster virtual machine is in locked state

please anyone provide solution:

Hyper-v Replication unfortunately stopped from that time onwards automatically checkpoint creating status showing 7%. i could not able to that VM.VM is showing in failover cluster status (locked). i could not able remove the checkpoint.   

How to clustering Windows Server 2016 two different types hardware Dell vs Lenovo server

$
0
0

I have a question about Clustering between two different hardware companies.

I have a Lenvo x3650M 5 5462 server running Windows Server 2016.

Now I have another server, the Dell R740, which also runs Windows Server 2016.

My question is whether to run Windows Server 2016 clustering on Lenovo and Dell OS servers 2016.

Thanks for technical advice


Need help extending a clustered shared volume

$
0
0


Hello everyone,

I am new to Clustered share volumes within server 2012 r2. I am trying to expand or create a new volume.

I have tried to use diskpart to expand the V$ but I keep getting the error that there is not space available to expand.

This volume is on a 12TB SAN. I can see 1.2TB are available.

Does anyone know what I am missing? I don't know why I can see it available on the machine, but not within diskpart.


Problem running Update-ClusterFunctionalLevel on Server 2019

$
0
0

Hi

I have in-place upgrade a 2 node SQL cluster (from Server 2016 Std. to Server 2019 Std.). The whole process worked as expected.

Now I want to run Update-ClusterFunctionalLevel, but it is returning the following error:

Update-ClusterFunctionalLevel : You do not have administrative privileges on the cluster. Contact your network administ
rator to request access.
    Access is denied
At line:1 char:1+ Update-ClusterFunctionalLevel+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ CategoryInfo          : AuthenticationError: (:) [Update-ClusterFunctionalLevel], ClusterCmdletException+ FullyQualifiedErrorId : ClusterAccessDenied,Microsoft.FailoverClusters.PowerShell.UpdateClusterFunctionalLevelCo
   mmand


I the Microsoft-Windows-FailoverClustering/Diagnostic eventlog it gives me the following error:

EventID: 2051

Description: [CORE] mscs::ClusterCore::VersionUpgradePhaseTwo: (5)' because of 'Gum handler completed as failed'

I think all permissions are correct, but I can't find the root cause, can you please help me?



Some cluster networks with unavailable status

$
0
0
Hello. When we "mounted" the Cluster Failover with Windows Server 2012 R2, and all Networks were "Up", however we realized that we were not able to do Live Migration, and we checked into the Cluster Networks part and saw several interfaces with Status of "Not available". However, when we test access to these interfaces, they are normal and accessible. We have already checked Anti-Virus and Firewall on all Cluster servers (Nodes), and there is no restriction on Anti-Virus and Firewall is disabled.

Print attached.

NOTE: I already did what is on http://blog.mpecsinc.ca/2010/03/nic-binding-order-on-server-core-error.html

NOTE 2: This is only happening on some Interfaces of "Cluster Network 3", "Cluster Network 2" and "Cluster Network 1", all interfaces are "Up"

Guest file server cluster constant crashes

$
0
0

Hi

I have make working a guest file server cluster with Windows Server 2019. the cluster crash constantly, being very slow and finally crashing all my hypervisors servers....

Hypervisor infrastructure:

  • 3 hosts windows server 2019 LTSB datacenter
  • iSCSI Storage 10 Gb with 11 LUNs
  • cluster valid for all tests

Guest file server cluster, 2 VM with the same config:

  • VM 2nd generation with 2019 LTSB Server
  • 4 virtual UC
  • 8GB of non-dynamic RAM
  • 1 SCSI controller
  • primary hard drive: VHDX format, SCSI Controller, ID 0
  • empty DVD drive on SCSI controller, ID 1
  • 10 VHDS disks on SCSI controller, ID 2 to 11, same ID on each node
  • 1 network card on virtual switch routing to 4 physical teamed network cards.
  • Cluster is valid for all tests except the network with one failure point for non redundancy.


after some time, the cluster become very slow, crash and make all my hypervisors crashs. the only errors returned by Hyper-V is some luns became unavalaible due to a timeout with this message:

Le volume partagé de cluster « VSATA-04 » (« VSATA-04 ») est à l’état suspendu en raison de « STATUS_IO_TIMEOUT(c00000b5) ». Toutes les opérations d’E/S seront temporairement mises en file d’attente jusqu’à ce qu’un chemin d’accès au volume soit rétabli.

I have checked every single one parameters on VM and Hyper-V config, search with each hint I was given by logs but nothing and the crashes remains....

and sorry for my poor language, english is not my main ability for speaking

Zero Downtime File Server - Would this setup work?

$
0
0

Hello everybody,

I was given the task to plan a redundant file storage environment that can compensate failure of any component without service interruption. This is a field I have little experience with, which I want to confirm that the concept I am working on actually works. I don't have the resources to build a test system available at the moment either, making this a very theoretical construct.

I want to use a Windows Failover Cluster with a Scale Out File Server role installed. Three physical servers with limited storage space for only the operating system are supposed to be the nodes of this cluster (three as to avoid using a file witness). A single SAN storage solution will provide the storage space for the file server, attached to the individual nodes via fibre channel. The SAN storage itself has all components built in redundantly, eliminating the need to provide a second storage unit and managing the synchronization of both.

The clients are expected to then connect to the file service provided by the cluster which is then (transparently) handled by any of the nodes and, in case of failure of this node (e.g. loss of power), instantly taken over by another without interruption or considerable delay.

In case it is important: The file server is supposed to host files of different applications including resources and configurations. These applications are not run on the server, but on clients. They are executed FROM the server share though, so constant and uninterrupted file provision is required, otherwise the applications will eventually crash. Executing from the server share is mandatory.

Now as I mentioned my experience with this is rather limited, and while the concept is based on what I read from MS documentation I would like to ask you for confirmation of this working or, in case it doesnt, advice on what to do differently.

Additionally, as far as I understand running a domain controller role on the same server that is running a scale out file server role is not possible or at least not recommended. Is this still valid for Server 2019 and if, is there a way to achieve the goal of zero downtime file provisioning on the same device that is running a DC or do it have to be seperate machines?

Thanks in advance!


Windows Server 2016 cluster system Failover Cluster Validation Report shows error on the CNO

$
0
0

Hi All,

I'm having an issue with my Windows Server 2016 cluster system.
it consists of 2 nodes, let say Node1 (showing as down) and Node2 (is up).

Node1 is ping-able to Node2 and vice versa, but not sure why it is showing as down.

The Fail-over Cluster Validation Report shows error only on the below CNO:

  • The cluster network name resource 'PRDSQL-CLUS01' has issues in the Active Directory. The account could have been disabled or deleted. It could also be because of a bad password. This might result in a degradation of functionality dependent on the cluster network name. Offline the cluster network name resource and run the repair action on it. 
    An error occurred while executing the test.
    The operation has failed. An error occurred while checking the state of the Active Directory object associated with the network name resource 'Cluster Name'.

    Access is denied
This is the error logged from the Failover Cluster Manager.

Event ID 1069

Cluster resource 'Cluster Name' of type 'Network Name' in clustered role 'Cluster Group' failed.Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event ID 1688
Cluster network name resource detected that the associated computer object in Active Directory was disabled and failed in its attempt to enable it. This may impact functionality that is dependent on Cluster network name authentication.Network Name: Cluster NameOrganizational Unit: Guidance:Enable the computer object for the network name in Active Directory.

The Virtual Cluster Frontend called PRDSQL-CLUS01is reporting it is disabled in Active Directory, as per the above error.
 
I have tried:

Taking the virtual endpoint offline and running a repair, but the errors state that “File not Found” and Error Displaying Cluster Information
Create a blank role, SQL and CAU are still working, it is only the front end failover cluster virtual network name AD account (CNO) that is having the issue.

Any help would be greatly appreciated.

Thanks,


/* Server Support Specialist */

Cluster IP keep switching

$
0
0

Dear All,

I have cluster node with 2 IPs, one active and the other one is passive. When i do NSLOOKUP i get the 2 IPs, when i ping the cluster name, then its pining the passive IP not the active IP, it should ping the active IP ( passive IP not pining - request time out). I did delete both A records in DNS then it worked fine, but after a while it went back to the passive IP again. What i need is when i ping the cluster note it must ping the active IP.

Thank you 

Cluster shared storage issue

$
0
0

Hi

I have windows servers 2012 R2 cluster with 7 drive shared from SAN storage.

Now I am not able to open all 7 drives from each node as below error.

c:\ClusterStorage\Volume1 is not accessbile

the reference account is currently locked out and may not be logged on to.

Not able to rebuild cluster, issue on disks ?

$
0
0

Hi all,

I have two Windows servers 2012 r2 (DB1A and DB1B) where a failover cluster + SQL Server Avalibility Groups used to work. But something went wrong (don't really know what, maybe an aggressive GPO) and the cluster was totally dead.

When I try to rebuild it, I get this kind of warning :

List Disks To Be Validated
Physical disk ab780ec8 is visible from only one node and will not be tested. Validation requires that the disk be visible from at least two nodes. The disk is reported as visible at node: DB1A
Physical disk ab780ec0 is visible from only one node and will not be tested. Validation requires that the disk be visible from at least two nodes. The disk is reported as visible at node: DB1A
No disks were found on which to perform cluster validation tests. To correct this, review the following possible causes:
* The disks are already clustered and currently Online in the cluster. When testing a working cluster, ensure that the disks that you want to test are Offline in the cluster.
* The disks are unsuitable for clustering. Boot volumes, system volumes, disks used for paging or dump files, etc., are examples of disks unsuitable for clustering.
* Review the "List Disks" test. Ensure that the disks you want to test are unmasked, that is, your masking or zoning does not prevent access to the disks. If the disks seem to be unmasked or zoned correctly but could not be tested, try restarting the servers before running the validation tests again.
* The cluster does not use shared storage. A cluster must use a hardware solution based either on shared storage or on replication between nodes. If your solution is based on replication between nodes, you do not need to rerun Storage tests. Instead, work with the provider of your replication solution to ensure that replicated copies of the cluster configuration database can be maintained across the nodes.
* The disks are Online in the cluster and are in maintenance mode.
No disks were found on which to perform cluster validation tests.

and when I open the Failover Cluster Manager, I can see the two nodes but can't see anything on the Roles folder, nor Disks.

Of course, SQL Server Availibility Groups is not possible :


The local node is not part of quorum and is therefore unable to process this operation. This may be due to one of the following reasons:
•   The local node is not able to communicate with the WSFC cluster.
•   No quorum set across the WSFC cluster.

I'm a bit lost. It would be great if someone could help.

Live Migration and WorkGroup Cluster on windows 2019

$
0
0

Hi ,

I found the following document about live migration and work group cluster on Windows 2016.

https://techcommunity.microsoft.com/t5/Failover-Clustering/Workgroup-and-Multi-domain-clusters-in-Windows-Server-2016/ba-p/372059

I understand Live migration is not support, and support quick migration. Is it same on windows 2019? or any plans about it ?


Viewing all 4519 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>