Windows NLB cluster only one node remains active always even after rebooting

February 2, 2017, 10:02 pm

≫ Next: CSV using S2D Storage Spaces Direct not working

Hi Team,

I have server A and server B running in windows NLB cluster ,running exchange 2013 CAS roles,anti virus is runing on server A and not installed on server B,the cluster IP is currently getting connected to server A which has anti virus.
For troubleshooting an issue i want to make the server B active to confirm if anti virus is casuing the issue,but the problem is even if i restart server A ,server B becomes the active node only untill server A restarts sucessfully,once server A is up it becomes the active node again,how do i make node B active?

TechGUy,System Administrator.

↧

CSV using S2D Storage Spaces Direct not working

February 5, 2017, 8:53 am

≫ Next: Active Directory detached cluster creation failed - MSG_AUTH_PACKAGE::KerberosAuth failed with status 2148074254

≪ Previous: Windows NLB cluster only one node remains active always even after rebooting

Hi,

I have two servers running Server 2016 Data Center and I am trying to set up a high-availability fail-over environment using Storage Spaces Direct (S2D) for a bunch of Hyper-V VMs that will be running on the two servers.

I have created the cluster, enabled storage spaces direct, created the S2D storage pool, created two virtual disk from the storage pool (formatted as ReFS) and then created a Volume in each Virtual Disk. There is a file share witness configured on a local server. Cluster Validation is all passes (except for an unsigned driver for logmein display mirror, and a complaint that one of the servers is on a slightly different version of the windows defender definitions).

On both servers, I can see that I have c:\clusterstorage\volume1\ and c:\custerstorage\volume2\

However, when I shut one of the servers down, the other server looses visibility of one of the c:\custerstorage\volumeX\ folders.

When I have VMs stored in c:\custerstorage\volume1\ and ..\volume2\ and both nodes are up, everything is fine and I can do live migrations between hosts. But when one of the servers is shut down the live migration happens but then when the other server looses sight of one of the c:\custerstorage\volumeX\ folders the VM locks up and stops working (not surprisingly).

It's my first time using S2D so, does this sound correct? I would have thought that even if I take one of the nodes down, the other node should still retain visibility of both its c:\custerstorage\volume1\ and ..\volume2\ folders thanks to the magic of S2D.

Is there any trouble-shooting I can do to figure out why this isn't working?

Possibly relevant: I am building this system in my lab for a client. The client has a single SBS2011 server on their LAN at a different site. I have created a site to site VPN from my lab LAN to the customer's LAN. All NICs on the servers (excluding the 2 x 10 GbE NICs used for SMB/S2D/etc) have their DNS settings pointing to the customer's DC over the site-to-site VPN. The hyper-v hosts are connected to the customer's domain in this way.

↧

Active Directory detached cluster creation failed - MSG_AUTH_PACKAGE::KerberosAuth failed with status 2148074254

February 6, 2017, 1:53 am

≫ Next: Cluster Migration - Cluster operating system rolling upgrade vs. Setting up a new Cluster?

≪ Previous: CSV using S2D Storage Spaces Direct not working

I'm having trouble creating active directory detached cluster. From log I can see there is problem with authentication, but both users are the local admin accounts with same password. Here is log file extract:

Excerpt from log file:

00001488.0000046c::2017/02/06-09:41:10.610 INFO  [ACCEPT] 0.0.0.0:~3343~: Accepted inbound connection from remote endpoint 10.32.6.201:~58602~.
00001488.00001890::2017/02/06-09:41:10.610 INFO  [SV] New real route: local (10.32.6.200:~3343~) to remote  (10.32.6.201:~58602~).
00001488.00001890::2017/02/06-09:41:10.610 INFO  [SV] Got a new incoming stream from 10.32.6.201:~58602~
00001488.00001890::2017/02/06-09:41:10.610 ERR   [SM] Sponsor: Setting package MSG_AUTH_PACKAGE::KerberosAuth failed with status 2148074254
00001488.00001890::2017/02/06-09:41:10.610 WARN  mscs::ListenerWorker::operator (): (-2146893042)' because of 'Status'
00001488.00001890::2017/02/06-09:41:13.892 INFO  [NM] Received request from client address PLSD001.
00001488.00001890::2017/02/06-09:41:13.892 WARN  [VER] Could not read version data from database for node PLSD002 (id 2).
00001488.00001890::2017/02/06-09:41:13.892 WARN  [VER] Could not read version data from database for node PLSD002 (id 2).

There is no Active Directory present and no DNS server.

Can someone tell me what am I doing wrong?

Tnx, Robert.

↧

Cluster Migration - Cluster operating system rolling upgrade vs. Setting up a new Cluster?

June 27, 2018, 5:05 pm

≫ Next: Always On VPN NLB/high Availability Solutions

≪ Previous: Active Directory detached cluster creation failed - MSG_AUTH_PACKAGE::KerberosAuth failed with status 2148074254

Cluster Migration - Cluster operating system rolling upgrade vs. Setting up a new Cluster?

All- I currently have a Server 2012r2 cluster running with two nodes. I have just received two new hosts/servers with server 2016, I was wondering what would be the best option, Rolling Upgrade or just setting up a new cluster all together, and migrating the VM's to the new cluster.

-Side Note worried about different CPU versions affecting live migration during the upgrade period, and only migrating hosts at this time. Storage migration will take place after.

Notes on the migration

1. Old Server 2012 R2 nodes would be retired / DR use after the migration would be complete.

2. Processors would still be Intel but new versions - 2014 Xeons vs 2018 Xeons.

a. Current Cluster Processor Types - Intel Xeon E5-2640 v2

b. New Host Processor Types- Intel® Xeon® Silver 4116

↧

Always On VPN NLB/high Availability Solutions

June 21, 2018, 2:48 am

≫ Next: Storage Spaces Direct - Node Fault Tolerance

≪ Previous: Cluster Migration - Cluster operating system rolling upgrade vs. Setting up a new Cluster?

Hello All

I have a question i'm hoping some one can answer

we are looking at implementing Always On VPN on behalf of our desktop support team and I have been asked to advise on the design. it seems pretty straight forward until I hit the fact that the implementation of load balancing but i am struggling to an advise on this. the issue I see is 2 fold

1. the Remote access server. we have a cisco network load balancer so could configure them as separate and use this to Load balance from a network prospective, but my concern then is for the connection validation etc. (the risk of the users swapping servers mid connection and ending up in a validation loop) we could cluster this but i'm not sure RAS supports clustering.

2.for the Radius NPS having two separate ones configured isn't practical and what I read I don't believe it will be a supported set-up anyway. but I can't seem to find anything to confirm if Radius and NPS support clustering either (I suspect if they do it's just active/passive not active/active as they want.)

my preferred solution would be the NLB for the RAS and cluster for the Radius. does anyone know if this is supported or do I need to look at other options

↧

Storage Spaces Direct - Node Fault Tolerance

June 28, 2018, 8:53 am

≫ Next: Storage Spaces Direct -- virtual disks shown as "no redundancy" and "unhealthy"

≪ Previous: Always On VPN NLB/high Availability Solutions

We have built a 3 node S2D cluster using SuperMicro 6029U-TRT SuperServer. Each node has 2 Intel P4600 NVMe drives for cache and 8 10TB SATA drives (Seagate 3.5" 10TB,7.2K RPM,SATA 6Gb/s,256MB,512E,Helium) for capacity.

Everything ran through cluster validation and built fine. When testing, the storage pool stays online if one node fails, but if you fail a second node the storage pool goes offline.

Doing get-StorageEnclosure shows 3 enclosures.

Using failover cluster manager, I can see the disks in each node.

This morning I rebuilt it trying to change the Cluster Fault Domain settings. Then when I re-enabled the cluster I did get this message:

Performing operation 'Set rack fault tolerance on the S2D pool. This is normally recommended on setups with multiple
racks' on Target 'HVCL3'.

I thought that since I set each node to be in a different chassis and different rack, that maybe the fault tolerance would have been fixed, but it still only withstood a single node failure.

The only other issue I have found close to mine is:
https://social.technet.microsoft.com/Forums/windowsserver/en-US/4fc1fb86-61fa-4976-8b3f-9e314586fef8/storage-spaces-direct-cluster-virtual-disk-goes-offline-when-rebooting-a-node?forum=winserverClustering

James - Right Size Solutions

↧

Storage Spaces Direct -- virtual disks shown as "no redundancy" and "unhealthy"

September 21, 2017, 8:29 am

≫ Next: Change IP address on Failover Cluster and Hosts

≪ Previous: Storage Spaces Direct - Node Fault Tolerance

for no apparent reason I have a s2d virtual disk which has an operationalstatus of "no redundancy" and healthstatus "unhealthy". Running repair-virtualdisk gives this response:

PS C:\Users\administrator.PAO2K> get-virtualdisk "volume3" | repair-virtualdisk
repair-virtualdisk : There is not enough redundancy remaining to repair the virtual disk.
Activity ID: {a366982d-b602-4deb-8f6d-c8ec59c12217}
At line:1 char:29+ get-virtualdisk "volume3" | repair-virtualdisk+                             ~~~~~~~~~~~~~~~~~~+ CategoryInfo          : NotSpecified: (StorageWMI:ROOT/Microsoft/...SFT_VirtualDisk) [Repair-VirtualDisk], CimEx
   ception+ FullyQualifiedErrorId : StorageWMI 50001,Repair-VirtualDisk

PS C:\Users\administrator.PAO2K>

so how to fix? this virtual disk is tiered with triple mirror performance tier and double parity capacity tier, and is an ReFS volume.

↧

Change IP address on Failover Cluster and Hosts

June 29, 2018, 3:17 am

≫ Next: 2-Node Stretch Cluster with Storage Replica?

≪ Previous: Storage Spaces Direct -- virtual disks shown as "no redundancy" and "unhealthy"

Hello,

So, Im in a situation with a Failover Cluster 2016 with 2 hosts, where I want to change ip from :

cluster ip : 10.11.12.20

host1 ip : 10.11.12.21

host2 ip : 10.11.12.22

cluster ip : 10.40.12.20

host1 ip : 10.40.12.21

host2 ip : 10.40.12.22

How should I do this?

I've looked through these articles :

https://blogs.technet.microsoft.com/chrad/2011/09/16/changing-hyper-v-cluster-virtual-ip-address-vip-after-layer-3-changes/

https://social.technet.microsoft.com/Forums/windowsserver/en-US/a640a75b-52e1-43c1-a1fb-acbc142c614a/how-to-change-hyperv-cluster-ip-address?forum=winserverClustering

http://bartvanvugt.blogspot.com/2012/01/change-ip-address-hyper-v-cluster.html

But havent found the entire solution yet - so hopefully, someone in here can help me :)

↧

2-Node Stretch Cluster with Storage Replica?

June 23, 2018, 6:38 am

≫ Next: Storage Spaces Direct / Cluster Virtual Disk goes offline when rebooting a node

≪ Previous: Change IP address on Failover Cluster and Hosts

I want to create a 2-node stretch cluster with Storage Replica. I was able to setup the replica but once I create the cluster it breaks the storage replica. I'm not sure if that is by design or not. Is there something I am missing that prevents this type of scenario? If I create the cluster first it does not allow me to create the replica with an error that it cannot find the volume on the source server that it must be a CSV on the cluster or added to a role on the cluster. Neither of those are possible without the Replica running between the cluster nodes.

↧

Storage Spaces Direct / Cluster Virtual Disk goes offline when rebooting a node

March 1, 2018, 4:36 am

≫ Next: Hyper-V cluster - Performance Problems with Cluster share Volume

≪ Previous: 2-Node Stretch Cluster with Storage Replica?

Hello

We have several Hyper-converged einvoronments based on HP ProLiant DL360/DL380.
We have 3 Node and 2 Node Clusters, running with Windows 2016 and actual patches, Firmware Updates done, Witness configured.

The following issue occurs with at least one 3 Node and one 2 Node cluster:
When we put one node into maintenance mode (correctly as described in microsoft docs and checked everything is fine) and reboot that node, it can happen, that one of the Cluster Virtual Disks goes offline. It is always the Disk Performance with the SSD only storage in each environment. The issue occurs only sometimes and not always. So sometimes I can reboot the nodes one after the other several times in a row and everything is fine, but sometimes the Disk "Performance" goes offline. I can not bring this disk back online until the rebooted node comes back online. After the node which was down during maintenance is back online the Virtual Disk can be taken online without any issues.

We have created 3 Cluster Virtual Disks & CSV Volumes on these clusters:
1x Volume with only SSD Storage, called Performance
1x Volume with Mixed Storage (SSD, HDD), called Mixed
1x Volume with Capacity Storage (HDD only), called Capacity

Disk Setup for Storage Spaces Direct (per Host):
- P440ar Raid Controller
- 2 x HP 800 GB NVME (803200-B21)
- 2 x HP 1.6 TB 6G SATA SSD (804631-B21)
- 4 x HP 2 TB 12G SAS HDD (765466-B21)
- No spare Disks
- Network Adapter for Storage: HP 10 GBit/s 546FLR-SFP+ (2 storage networks for redundancy)
- 3 Node Cluster Storage Network Switch: HPE FlexFabric 5700 40XG 2QSFP+ (JG896A), 2 Node Cluster directly connected with each other

Cluster Events Log is showing the following errors when the issue occurs:

Error 1069 FailoverClustering
Cluster resource 'Cluster Virtual Disk (Performance)' of type 'Physical Disk' in clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Warning 5120 FailoverClustering
Cluster Shared Volume 'Performance' ('Cluster Virtual Disk (Performance)') has entered a paused state because of 'STATUS_NO_SUCH_DEVICE(c000000e)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Error 5150 FailoverClustering
Cluster physical disk resource 'Cluster Virtual Disk (Performance)' failed. The Cluster Shared Volume was put in failed state with the following error: 'Failed to get the volume number for \\?\GLOBALROOT\Device\Harddisk10\ClusterPartition2\ (error 2)'

Error 1205 FailoverClustering
The Cluster service failed to bring clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Error 1254 FailoverClustering
Clustered role '6ca63b55-1a16-4bb2-ac53-2b23619e258a' has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster. Please check the events associated with the failure. After the issues causing the failure are resolved the role can be brought online manually or the cluster may attempt to bring it online again after the restart delay period.

Error 5142 FailoverClustering
Cluster Shared Volume 'Performance' ('Cluster Virtual Disk (Performance)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

Any hints / inputs appreciated. Had someone something similar?

Thanks in advance

Philippe

↧

Hyper-V cluster - Performance Problems with Cluster share Volume

June 29, 2018, 1:22 am

≫ Next: Cluster Validation in Windows Server 2016

≪ Previous: Storage Spaces Direct / Cluster Virtual Disk goes offline when rebooting a node

Hi Experts,

we are in the process of implementing SQL server 2016 2-node (Node1 and Node2) traditional fail-over cluster on Hyper-V guest VMs with Clustered shared volume and Quorum with Disk witness.

Each VM we have two Disks C:\ and D:\
We have cluster shared Volume (Cluster Disk) of 250 GB like I:\ and J:\ for SQL server files
We have one cluster shared Volume (Cluster Disk) of 25 Gb for Quorum as Q:\
All the local disks, Cluster volumes I:\ and Q:\ are carved out from the same 4 TB Data store available at Esxi hos

Now we are Testing Disk performance and Results are showing bad performance on clustered shared volumes compared to local dedicated volumes.

What could be the problem and are there any ways to improved the cluster shared volumes performance.

Ramesh M

↧

Cluster Validation in Windows Server 2016

July 2, 2018, 8:28 am

≫ Next: Problem with failover clustering and iSCSI target in Windows Server 2016

≪ Previous: Hyper-V cluster - Performance Problems with Cluster share Volume

Gents,

I've upgraded 3-node Hyper-V cluster from Windows Server 2012 R2 to Windows Server 2016.

And a bit confused a new interface of the "Validate Cluster" wizard.

I don't see any choice of the disk for Disk's cluster tests as it was in Windows Server 2012 R2.

In Win 2012 I can choose "Run all test" and wizard will ask you which disk you will use for failover testing. I don't see the same in Win 2016.

Can any one explain if I choose "Run all tests" in Win 2016, will be there disruption ?

I don't see any explanation in the help. All articles are about Win 2012.

Thanks

mcse^4

↧

Problem with failover clustering and iSCSI target in Windows Server 2016

June 4, 2018, 11:53 am

≫ Next: Windows 10 offline network drive file server cluster name nightmare

≪ Previous: Cluster Validation in Windows Server 2016

I have a problem with failover clustering and iSCSI target in Windows Server 2016.

I often demo the configuration of Failover Clustering for students and customers. And for that I have built a small lab consisting of a Domain Controller and three servers functioning as my cluster nodes.

All Servers run Windows Server 2016 (1607) and have been fully updated with the latest patches. All servers are virtual machines running in Hyper-V on Windows 10 v1709/1803.

The Domain Controller functions as Domain Controller and I have also added the iSCSI role service on that as well. And yes, I know that is not the way to do but this is only a test/demo environment. I have created two iSCSI disk, one for the Witness (1 GB) and one for Shared Storage (100 GB).

My three nodes also run Windows Server 2016 (1607) and is configured with two network cards, one for management (LAN) and one for cluster traffic. I have installed the Failover Clustering feature on these servers and are using the built-in iSCSI initiator to connect to the shared storage published by my iSCSI target running on the Domain Controller. And this is where I run into the first problem.

In order to give the nodes (servers) access to the storage they need to be added to theInitiators list from within Server Manger ->File and Storage Services -> iSCSI. If I choose theQuery initiator computer for ID options and then click Browse (in order to browse for the names of the nodes) Server Manager crashes and I get a window stating that Server Manager has stopped working. If I look in the Event Viewer I see the following events:

Log Name: Application

Source: .NET Runtime

Date: 04-06-2018 17:36:52

Event ID: 1026

Task Category: None

Level: Error

Keywords: Classic

User: N/A

Computer: cph-dc-01.ad.petzfeed.com

Description:

Application: ServerManager.exe

Framework Version: v4.0.30319

Description: The process was terminated due to an unhandled exception.

Application: ServerManager.exe

Framework Version: v4.0.30319

Description: The process was terminated due to an unhandled exception.

Exception Info: System.ArgumentException

at Microsoft.FileServer.Management.Plugin.Dialogues.BrowseDSObjectsNativeMethods+IDsObjectPicker.Initialize(DSOP_INIT_INFO ByRef)

at Microsoft.FileServer.Management.Plugin.Dialogues.BrowseDSObjectsDialog.ShowDialog(System.Windows.Forms.IWin32Window, PickerTypes, Boolean, StartScope, ProviderTypes, System.String, System.Security.SecureString)

at Microsoft.FileServer.Management.Plugin.Services.DialogService.ShowPickDialog(PickerTypes, Microsoft.FileServer.Management.Framework.ComputerName, StartScope, ReturnSourceTypes, System.String ByRef)

at Microsoft.FileServer.Management.Plugin.Services.DialogService.ShowPickComputerDialog(Microsoft.FileServer.Management.Framework.ComputerName, Microsoft.FileServer.Management.Framework.ComputerName ByRef)

at Microsoft.FileServer.Management.Plugin.Dialogues.AddInitiatorIdSectionDescriptor+<>c__DisplayClass41_0.<.ctor>b__0()

at MS.Internal.Commands.CommandHelpers.CriticalExecuteCommandSource(System.Windows.Input.ICommandSource, Boolean)

at System.Windows.Controls.Primitives.ButtonBase.OnClick()

at System.Windows.Controls.Button.OnClick()

at System.Windows.Controls.Primitives.ButtonBase.OnMouseLeftButtonUp(System.Windows.Input.MouseButtonEventArgs)

at System.Windows.RoutedEventArgs.InvokeHandler(System.Delegate, System.Object)

at System.Windows.RoutedEventHandlerInfo.InvokeHandler(System.Object, System.Windows.RoutedEventArgs)

at System.Windows.EventRoute.InvokeHandlersImpl(System.Object, System.Windows.RoutedEventArgs, Boolean)

at System.Windows.UIElement.ReRaiseEventAs(System.Windows.DependencyObject, System.Windows.RoutedEventArgs, System.Windows.RoutedEvent)

at System.Windows.UIElement.OnMouseUpThunk(System.Object, System.Windows.Input.MouseButtonEventArgs)

at System.Windows.RoutedEventArgs.InvokeHandler(System.Delegate, System.Object)

at System.Windows.RoutedEventHandlerInfo.InvokeHandler(System.Object, System.Windows.RoutedEventArgs)

at System.Windows.EventRoute.InvokeHandlersImpl(System.Object, System.Windows.RoutedEventArgs, Boolean)

at System.Windows.UIElement.RaiseEventImpl(System.Windows.DependencyObject, System.Windows.RoutedEventArgs)

at System.Windows.UIElement.RaiseTrustedEvent(System.Windows.RoutedEventArgs)

at System.Windows.Input.InputManager.ProcessStagingArea()

at System.Windows.Input.InputManager.ProcessInput(System.Windows.Input.InputEventArgs)

at System.Windows.Input.InputProviderSite.ReportInput(System.Windows.Input.InputReport)

at System.Windows.Interop.HwndMouseInputProvider.ReportInput(IntPtr, System.Windows.Input.InputMode, Int32, System.Windows.Input.RawMouseActions, Int32, Int32, Int32)

at System.Windows.Interop.HwndMouseInputProvider.FilterMessage(IntPtr, MS.Internal.Interop.WindowMessage, IntPtr, IntPtr, Boolean ByRef)

at System.Windows.Interop.HwndSource.InputFilterMessage(IntPtr, Int32, IntPtr, IntPtr, Boolean ByRef)

at MS.Win32.HwndWrapper.WndProc(IntPtr, Int32, IntPtr, IntPtr, Boolean ByRef)

at MS.Win32.HwndSubclass.DispatcherCallbackOperation(System.Object)

at System.Windows.Threading.ExceptionWrapper.InternalRealCall(System.Delegate, System.Object, Int32)

at System.Windows.Threading.ExceptionWrapper.TryCatchWhen(System.Object, System.Delegate, System.Object, Int32, System.Delegate)

at System.Windows.Threading.Dispatcher.LegacyInvokeImpl(System.Windows.Threading.DispatcherPriority, System.TimeSpan, System.Delegate, System.Object, Int32)

at MS.Win32.HwndSubclass.SubclassWndProc(IntPtr, Int32, IntPtr, IntPtr)

at MS.Win32.UnsafeNativeMethods.DispatchMessage(System.Windows.Interop.MSG ByRef)

at System.Windows.Threading.Dispatcher.PushFrameImpl(System.Windows.Threading.DispatcherFrame)

at System.Windows.Window.ShowHelper(System.Object)

at System.Windows.Window.ShowDialog()

at Microsoft.FileServer.Management.Plugin.Services.DialogService.ShowAddInitiatorIdDialog(Microsoft.FileServer.Management.Framework.ComputerName, Microsoft.FileServer.Management.Plugin.Model.InitiatorId ByRef)

at Microsoft.FileServer.Management.Plugin.PropertyPages.IscsiTargetInitiatorsPropertySectionDescriptor.<.ctor>b__4_0()