A practical guide for VMware HA

I would like to give a brief overview of VMware HA as it stands today in the 4.1 release. Let me first say that VMware high availability (HA) and VMware vMotion are two separate functions. VMware HA is a restart of the virtual machines and vMotion is the migration of a virtual machine from one host to another. One question I get often is "Don't the virtual machine just vMotion over to another host server in the event of a host failure?". In the event of a failure, vMotion does not get involved. HA will restart the virtual machine on an available host in the cluster, provided there are resources available for your virtual machine to use. There is a lot to explain about this process, but we will get there soon. VMware HA is a big topic and you could write a book about it, but let’s look at some of the key features of High Availability according to VMware:

  • Automatic detection of server failures. VMware HA automates the monitoring of physical server availability.  HA detects physical server failures and initiates the new virtual machine restart on a different physical server in the resource pool without human intervention.
  • Automatic detection of operating system failures. VMware HA detects operating system failures within virtual machines by monitoring heartbeat information. If a failure is detected, the affected virtual machine is automatically restarted on the server.
  • Smart failover of virtual machines to servers with best available resources (requires VMware DRS). Automate the optimal placement of virtual machines restarted after server failure.
  • Scalable high availability across multiple physical servers. Supports up to 32 nodes in a cluster for high application availability. VMware HA has the same limits for virtual machines per host, hosts per cluster, and virtual machines per cluster as vSphere.
  • Resource checks. Ensure that capacity is always available in order to restart all virtual machines affected by server failure. HA continuously and intelligently monitors capacity utilization and reserves spare capacity to be able to restart virtual machines.
  • Proactive monitoring and health checks. VMware HA helps VMware vSphere users identify abnormal configuration settings detected within HA clusters. The VMware vSphere client interface reports relevant health status and potential error conditions and suggested remediation steps. The Cluster Operational Status window displays information about the current VMware HA operational status, including the specific status and errors for each host in the VMware HA cluster.
  • Enhanced isolation address response. Ensures reliability in confirming network failure by allowing multiple addresses to be pinged before declaring that a node is isolated in the cluster.

____________________________________________________________________________

Automatic detection of server failures.

This key feature describes the very nature of VMware High Availability. This monitoring takes place between all nodes in a VMware cluster, with one of 5 being the (Active) master in control of the other 4 primaries in the cluster. Yes, the VMware HA environment has a hierarchy just like any organized structure. There are secondary nodes and Primary nodes. One of those 5 primary nodes acts as a "master". What happens if the master dies? Well, one of the remaining 4 nodes in the cluster is then elected as the master and a secondary node is then brought into the group to become a primary. This maintains the hierarchy of 5 primary nodes. How do you tell which hosts in your cluster are primary nodes? As of the 4.1 release, you can tell from the GUI which are primary and which are secondary! I am a GUI fan, but I also like the CLI (cash line interface). Under the cluster summary tab, there is a "cluster operational status" pop-up that shows you the role of each host and any HA configuration issues. But, this will only display the role of the host if there is an HA configuration issue with a host in the cluster. If you have no issues, you get a blank gray screen like the one listed below. It would have been nice to get a good clean list from here of the roles for each host.

You may also run a PowerCLI script to pull which hosts are primary and secondary. All hosts and vCenter must be at 4.1. It would look something like this…

$clusterName = "your cluster name here" Get-Cluster $clusterName | %{ $info = $_.ExtensionData.RetrieveDasAdvancedRuntimeInfo() $_ | Get-VMHost | %{ $row = "" | Select Name,Role $row.Name = $_.Name $row.Role = &{if($info.DasHostInfo.PrimaryHosts -contains ($_.Name.Split('.')[0])){"primary"}else{"secondary"}} $report += $row } } $report

A big thanks goes out to LucD on the VMware communities for slapping this together for me. You can also dive into the CLI. You can log into any host in your cluster. So what if I only have two nodes in my cluster or only 5? Well, all hosts are  then considered a primary with one of those being the master. Once you reach your 6th host in the cluster, that node then becomes your very first secondary node. Like I stated in the beginning, there is a monitoring process that takes place between all the hosts in the cluster. This monitoring process is independent of vCenter. Each host knows who is one of the 5 primaries and who is the secondary. This takes place thanks to the HA agent installed on each host when you configure a cluster for HA. This sometimes gets confused with the vpxa agent, which is configured when you initially connect a host to vCenter. The vpxa agent (or vmware-vpxa service) talks to the hostd service which is relaying information from the ESX kernel. Hostd is built in with ESX, but vpxa is installed when you connect a host server to vCenter. I will dive into all the services running on an ESX host in a later post. Remember though, the information that the HA agent retains is stored in RAM. Now that I have explained what agent is communicating all this HA information, let's take a look at where we can view the results of this communication and how we can manipulate it. Note, these methods are unsupported and you can try them at your own peril. Another warning, creating more complex clusters with advanced settings equates to more information you need to track, also equating to more work, which may lead to less sleep.  Why would I want to go through all this hassle to configure complex clusters anyway? Maybe you have a 32 node cluster and you want some way to manage which hosts are primaries in your cluster. You may also have a blade infrastructure in which you need to ensure primary hosts stay separated across blade chassis. If you keep all of your primary hosts on one particular blade chassis, you have whats know as a "failure domain". How can I change which hosts are the primary and secondaries? 1. Today in vSphere 4.1, there is an unsupported method in the GUI (thanks to Mr Duncan at Yellow Bricks suggestions) that allows you to specify your primary hosts in the cluster.  This advanced HA option is das.preferredPrimaries = ESXhost1, ESXhost2, ESXhost3, ESXhost4, ESXhost5. You will probably want to use the FQDN of the hosts for proper resolution. You may also specify the IP address of your host. Again, this is an unsupported option. 2. The supported way is actually the most labor intensive way. Especially if you have a 32 node cluster! Like I mentioned before, when the first 5 nodes are added to an HA cluster, those are considered the primaries. All other nodes in the cluster would be considered secondary. So, in a 32 node cluster, that would give you 27 possible secondaries in the cluster. So if a primary host dies, which of those possible 27 hosts is promoted to a primary? That election process is random. To find out which one is now a primary, you would need to cat /var/log/vmware/aam/aam_config_util_listnodes.log to find out which host is now a primary. So if I want to elect a certain host in my cluster as a primary, I would need to enter and exit maintenance mode on hosts, then check the listnodes.log, to see which node is now the primary. But how can you keep the previous attempts at this method from becoming the primaries again? You can't. You either have to leave them in maintenance mode until the correct host is elected as a primary, or you can try option 3. But also note, the election process takes place when you reconfigure HA from the cluster level or when a host is removed / disconnected from the cluster. 3. The first 5 primary nodes in the cluster is a soft limit. This doesn't mean you should go bananas and add 32 primary nodes to your cluster. This option falls under the category of "unsupported" and "More complex environments = more overhead = less sleep". Did I mention I like to get my well earned sleep at night, knowing that my clusters are safe and sound in the datacenter? 🙂 Enough of this sleepy talk! To promote a 6th node (method is unsupported), you can open the shell under "/opt/vmware/aam/bin # .Cli". From the prompt, you can issue the command "promoteNode host1.burdweiser.com". Of course, you would enter the FQDN of your host server. To demote of the primaries, issue the command "demoteNode host2.burdweiser.com". These options are the only way to view and manipulate the primary and secondary nodes in the cluster. In a later post I will go over slot sizing, the origins of VMware HA from Letago and advanced options from the cluster level.

____________________________________________________________________________

Automatic detection of operating system failures.

I do not recall how long this feature has been around. By default this option is disabled. You can configure this option for all virtual machines or just certain virtual machines. Keep in mind that the heartbeat for this HA feature is not sent via the NIC or any other virtual device, it is relayed to the hostd service on the ESX host from VMware Tools in your VM. If hostd does not receive heartbeats from VMware Tools, it can also check the disk I/O for the VM as a secondary measure. The disk I/O check is a configurable interval. This is essentially HA for the virtual machines. VM's can be restarted on the same host or on another host. For a detailed guide on the options you have with VM and application monitoring, see the VMware Availability Guide.

In release 4.1, you can now monitor applications within your virtual machines! But "you must first obtain the appropriate SDK (or be using an application that supports VMware Application Monitoring) and use it to set up customized heartbeats for the applications you want to monitor". To do this, you must use a tool like Hyperic. I've gotta be honest, when I first saw this feature I thought I would be able to automatically restart services via VMware tools. That is not the case, you must purchase an application that supports VMware Application Monitoring. It is a nice addition, but requires another product to use it.

 

____________________________________________________________________________

Smart failover of virtual machines to servers with best available resources

If you have sized your clusters properly and you have available resources to restart virtual machines after a host failure, Distributed Resource Scheduling (DRS) will relocated virtual machines to another host in the cluster if it is deemed necessary to balance the cluster according to your migration threshold settings. HA will restart your virtual machines on other hosts in the cluster according to the admission controls you have set. This HA failover has the potential to create an imbalance of resources used across your cluster (RAM and CPU). DRS gathers metrics over time (every 5 minutes) to gauge any imbalance in the cluster. HA actually has a new feature in release 4.1 that helps curve this resource fragmentation, which we will talk about soon.

Since we are talking about new features in the 4.1 release, one of the newest and greatest features is the affinity / anti-affinity rules in DRS. You can now create groups for your hosts to separate VM's. Before, all you could do is create rules to either keep certain VM's together or separate. This is something that I believe all blade architectures have been waiting on for a long time, especially if you are looking to keep VM's separated by blade chassis.

Confused? Well, if I have a failover application XYZ and I want to make sure I am fully redundant across my blade architecture, I need to make sure they stay separated across different blade chassis. Let's say application XYZ is installed on two VM's and the application itself has a built in failover feature. If VM1 with XYZ application in chassis 1 fails (or the host fails), then VM2 with XYZ application needs to take over. If VM2 is sitting on chassis 1, then you just lots the XYZ application (and your company could be losing money by the second!). But, if you had VM2 placed on chassis 2, then everything would be safe! Along comes the new host groupings in DRS. You can tell DRS to keep VM's separated across these host groups. So in blade 1 I have 6 hosts, I create group 1. In chassis 2, I have 6 hosts for group 2. All hosts are a part of the same cluster. You simply create a new VM anti-affinity rule to say "keep these two VM's separated across these groups".

Keep these two things in mind that when creating these affinity / anti-affinity rules in DRS.

  1. Let's say VM1 (with application XYZ) is on a host that fails, then HA will restart VM1 on the next available host in the cluster. Unless you have specified a failover host in your HA admission controls. So this means that VM1 could be restarted on the very same host that VM2 is running on.
  2. DRS will evaluate the cluster after a period of 5 minutes to check for an imbalance in resources, and rules! So, if VM1 happens to be restarted on the same host as VM2, DRS will move that VM back to chassis 1 (or group 1).

Pretty magical huh? Before, there was a little overhead to keep track of where the VM's might have moved to, even though you create anti-affinity rules for the VM's. You could of course create a PowerCLI script to run and report the location of certain VM's. It would look like this: Get-VM | Select Host, Name | Sort Name, Host. That will just give you a quick list. But you might want to use something a little cleaner if you are dealing with hundreds or thousands of VM's. The timing process of a VM failover in an HA event has not really changed in release 4.1. Just to review – If your cluster setting is "shut down" for virtual machines (default in 4.1) during a host isolation response, the VM will be restarted at the 15 second mark on another host in the cluster.

____________________________________________________________________________

Scalable high availability across multiple physical servers

This simply states the fact that you can have 32 host in a VMware cluster. For the full list of maximums within the cluster, please visit the VMware Configuration Maximums document. Even if you max out those pretty new host servers, you still have to keep in mind the number of virtual machines you can host in the cluster and scale the resources for each virtual machine appropriately.

Pay close attention to the HA admission controls that make sense for your environment.  Don't take the lazy road and choose to disable HA admission controls. Most admins do this when rolling out the first clusters in vCenter but forget to go back and scale things appropriately. Disabling the HA admission controls allows you to "Power on VM's that violate availability constraints". Doing this is like overcrowding a train.

Yes, you can overload a host just like the examples in these pictures. You can pack a ton of VM's on a host, but things will slow to a crawl. By default, only 32 VM's will power up on a host at one time. Unless you have created restart priority levels for your VM's. HA will continue to restart remaining VM's (if you have enough left over resources in your cluster). DRS will eventually even things out if needed. Be careful how you over provision! Just because you see free space in your cluster doesn't mean you should take it all!

____________________________________________________________________________

Resource checks

Before the 4.1 release, a failed over VM could be granted more resource shares than what was available on the host, causing a real drag on resources until DRS balanced things out. Remember that HA calculates resources based on VM's that are restarted after an HA event.

To help avoid the over crowded train scenario above, VMware retooled the way it does the HA failover. Now in the 4.1 release, before the failed virtual machine is restarted on another host, HA will actually create a "test" virtual machine identical to your failed VM to test for available resources. When HA determines that resources are available for this test VM, it is deleted and your failed VM is restarted on the host. This process allows for better placement of failed virtual machines and reduces fragmentation of resources.

There are not many details in any documents on this process to create and destroy a test VM. What I've been able to find out so far is that extra storage is not required, this process is just a simulation by HA and the test VM is not even powered on.

 

____________________________________________________________________________

Proactive monitoring and health checks

As mentioned before, the "Cluster Operation Status" provides a clear view of any HA misconfiguration issues in the cluster. During the 10 second window (das.sensorPollingFreq option) that the HA agent takes to monitor the health of the cluster, this process will report to vCenter any issues with HA.

____________________________________________________________________________

Enhanced isolation address response

This feature is not new to release 4.1. It was introduced in vCenter 2.0.2. This function is used by the ESX (or ESXi) host server when it is unable to contact other hosts in the cluster. All hosts in a cluster send "heartbeats" to each other every second. If one or more of the hosts in the cluster do not receive a response from it's isolation address (default isolation address is the gateway of the Service Console) after 13 seconds, the host considers itself isolated.It is on the 13th second that the host detects it is possibly isolated and than it will ping the gateway, on the 14th second if it is isolated it will trigger the isolation response ( so das.failuredetectiontime -2 and -1.) This is important as when you increase the das.failure detection time to 20 it will be the 18th and the 19th second. You can have up to 10 isolation addresses, but it is a good idea to increase the "das.failuredetectiontime" to 20 seconds. For every isolation address you add you will need to add 2 seconds at a minimum. If you use 2 in total 20 seconds is enough, if you increase it to 4 you should have at least 25 seconds in total as the das.failuredetectiontime.

What happens in this case? The default rule for VM's is to shut down (a graceful shutdown if VM tools is installed) and the virtual machines are restarted on other hosts in the cluster that are still considered alive. This is of course only possible if one of the 5 primary hosts in your cluster is not in an isolated state. If the VM's are not finished with the shut down process after 300 seconds, the VM's are powered off (like pulling the power cord on a physical box). This is a configurable value in the das.isolationshutdowntimeout.

This isolation address VMware is referring to is a redundant address that can be pinged in the event heartbeats are no longer received over the service console (management network). This address is contacted on the 12 second mark, just three seconds before VM's are configured to be restarted on other hosts in the HA cluster. Now if you changed your VM isolation response from "shut down" to "leave powered on", the host will retain the VMFS lock on the vmdk files for the VM. There are some considerations that will be address later for this, but your storage configuration can have different results if you are using NFS, FCoE or iSCSI, which leads to the infamous "Split Brain" scenario. Why do this? Perhaps your host is really not down and it cannot ping your isolation address. If this is the case then your host will still consider itself isolated, but VM's will continue to run. The other hosts in the cluster will attempt to start the VM's from an isolated host, but the file locks in VMFS will prevent this. But if the host is truly down, the locks on the VM's files in VMFS will be release and the VM's will be restarted on other hosts.

The important thing to take away from this highlighted feature is that HA heartbeats have a redundant feature – an isolation address that can be reached, "just in case". How do you configure an isolation address? You simply create a das.isolationaddressX in the HA advanced properties. It is recommended to have a secondary service console on all hosts in the cluster. Redundancy everywhere is not a bad thing.

VMworld 2010 backpacks

Check out the new 2010 VMworld backpack!

Last years green backpack really didn’t mesh with the colors of my motorcycle. I’m really looking forward to getting my hands on the new one this year. I think they really went for style and usability on this one, so everyone will be proud to flash the VMware logo. All of the backpacks have been great, but I really like this one!

New USB passthrough support in vSphere 4.1

I came across this new feature in the release notes, so I had to try it myself. And, no need to have support for VMDirectPath! The virtual machine I am testing is a Windows 7 x64. I tested the same procedure on multiple Windows operating systems with no issues.

The process is pretty simple, First add a USB controller to your virtual machine.

Now, add the USB device. Edit the virtual machine after you have added the controller, select “USB device”

Now, select a USB device

I went bananas and used 3 different USB sticks to see if it would be hit or miss. I had no problems with all 3. Although, you may want to check the compatibility matrix for an official list from VMware.

In the device manager, everything shows up perfect and the Windows auto launch executes.

I open the explorer and the test folder I created before I plugged in the USB device shows up just fine.

Make sure to check the VMware KB Article 1022290 for special considerations. What are the use cases for this? For one, if you require software and a USB dongle for licenses. You could use a USB device as a backup media until it is replicated somewhere else. This could save a little on SAN costs. But when it comes down to it, large enterprises may not want to manage a plethora of USB devices hanging off of a host.

I did notice that there are a few posts out there that mention vCenter is a requirement for this new feature. I have not found that to be the case. I poured through the admin guide and it does walk you through the setup process via vCenter, but I had no problems adding a USB device directly from the host. Of course if you want to leverage the vMotion feature with this USB passthru device, you will need vCenter.

Time to move the vCenter DB to an enterprise level application

Ahh, customers love getting something for free. But like all good free things, they come to an end eventually. Let’s take a look at the SQL Express database that is bundled with the installation of vCenter. Many have relied on this as a good start to virtualization, but it is sometimes overlooked by management even during an enterprise deployment. A well planned virtual infrastructure deployment must include an enterprise database application. In my opinion, if your company is investing in vSphere Enterprise or Enterprise Plus, you should be investing in an enterprise database application that can handle an expanding business.

So SQL Express has some limitation to it. There is a 4GB database limit in size (this limit does not apply to the log file size), limited to one CPU (can be installed on a multi CPU system, but only 1 CPU is utilized), one GB memory limit for the buffer pool, you are limited to a starter schema – you cannot run the scheduled jobs via the provided sql scripts, VMware recommends a small deployment of 5 hosts and 50 virtual machines and more. There is no hard-coded limit on the number of users that can access the database, but the recourse limits can cause performance issues. Features like parallel query execution are not supported due to the limit of one processor. I am not trying to discourage anyone from using SQL express, it is a great product with tons of features, but it is intended for small or initial deployments. SQL Express does make it easy to scale up to an enterprise edition.

I am no DBA, but there are a ton of advantages to installing your vCenter DB on an enterprise database application. You can cluster your database with other SQL servers and it can be placed in a SQL farm. There are to many features to list within this post, so visit the Microsoft SQL site to learn more.

Can you virtualize the vCenter database? Most admins frown upon placing a key component of vCenter within it’s own infrastructure. This is similar to the argument over whether to virtualize the vCenter server itself. I would say, if you are virtualizing tier 1 application like MS Exchange or other SQL databases, why should the vCenter database be anymore special? So in my opinion, absolutely you can virtualize the vCenter database. There is nothing in the vSphere best practice guide that says you need the vCenter database on a physical server, but it would be nice for them to mention housing it within the virtual infrastructure. Remember to follow all VMware recommendation when building the SQL server for the vCenter database, it applies to physical and virtual machines. It is up to you to size your virtual infrastructure resources accordingly.

Keep in mind that according to the VMware compatibility matrix, vCenter also supports DB2 and Oracle databases. Also note that VMware update manager does not have DB2 listed as a supported database. When it comes to update manager, VMware recommends to separate this database from the vCenter database. So choose your database application wisely.

vSphere web access going the way of the dinosaur after 4.1

Say it ain’t so! I was never a big fan of the web access portion of vCenter, but I assume there are a few companies still out there leveraging it. Like customers who have Linux machines that need to manage virtual machines, there is still no word on a Linux vCenter client to install. Maybe after the 4.1 release there will be an agent for other operating systems besides windows? For now, you will need to either RDP to a workstation that has the vCenter client or install windows.

So from the release note of version 4.1, “vSphere 4.1 is the last product release for vSphere Web Access. As a best practice, VMware recommends that you use the vSphere Client, which contains all the functionality of Web Access. Because vSphere Web Access is no longer being developed, support for this product is provided on a best effort basis.”.

Ever since vSphere 4.0 was release, web access was considered “experimental”, but vmware support would troubleshoot errors to a best effort. For customers with vSphere versions that support web access (experimental), you will still receive best effort troubleshooting. In version 3.5, web access was considered fully supported. I still have not found a reason why web access went to experimental in version 4.0.

In vSphere 4.0, the concurrent connection to vCenter is now 40. The vCenter client was never meant to be a common tool for the entire department to use for access virtual machines, it is a management tool. Using this tool to it’s maximum will takes it’s toll on the vCenter server itself and not to mention all network traffic from all those console windows using the KVM protocol. But, you are limited to 25 console connections (soft limit). The impacts of multiple connection to vCenter will be discussed at a later date.

Removing web access from vCenter strips out one more application that could be considered a security risk, which is welcome by all I would imagine. It looks like web access will remain on the host servers though. We will see if anything changes with the ESXi only platform on future releases.

Credit for finding this is the release notes goes out to my colleague Mr Rosebury.

Microsoft VDA licensing and VMware

Since I am a Microsoft / VMware guy, for my very first post I would like to talk a little bit about Microsoft’s new VDA licensing model. This was publish by Microsoft back on July 1st, 2010. I have to admit that I did not see the change coming in any press releases, but it is a welcome change. As a Microsoft and virtualization guy, I have to keep on top of these things! By now, everyone has been accustom to purchasing Windows Datacenter edition (per physical socket) for host servers to cover as many Microsoft OS’es as you can cram into a box. But what about the end points that access these virtual servers? Most of the companies I have worked for have had Microsoft Software Assurance across all Microsoft platforms, which covers you when accessing all those virtual machines you crammed within your hosts. But what about an environment where different departments or even different projects within a company buy pockets of Microsoft licenses for workstation, with no Microsoft SA. What a nightmare! Especially if we are talking about thousands of workstations. How about we toss in some Linux machines or iPhones that want to access Microsoft VM’s. In the IT field, we can’t expect everything to be smooth from company to company.

Along comes Microsoft VDA (Virtual Desktop Access). OK, so I’ve got 200 Linux machines that want to access Microsoft VM’s. Let’s create a PO for VDA licenses, which is going to run us 100$ per Linux machine = $20,000 per year. Now we have another 300 Linux based thin clients? That’s going to be another $30,000 per year. We we also have another 100 Windows XP workstations that were purchased with retail licenses for the financial department (and no Microsoft SA), we’ll need another PO for $10,000 per year. Or you have the option to purchase Microsoft SA for these Windows XP workstations. As far as I know, there is no option to purchase Microsoft Software Assurance for non-windows operating systems (correct me if I’m wrong).

The perks with this new licensing model:

  1. Rights for the primary user to access corporate VDI desktops from non-corporate PCs, such as internet cafes and home PCs. This is huge! Since I am the main users of my Linux based thin client at work, I can now go home and use my desktop to access my virtual machine at work! Thanks Microsoft!
  2. You are now permitted to access up to 4 virtual machines concurrently, even from your home PC!
  3. If you purchased Microsoft SA, you no longer need to buy VECD to access the virtual machines. It is included with SA. That saves $23 per year. So if I have 5000 Windows desktops covered under SA that need to access virtual machines, I just saved $115,000 on cost. Thanks

But you have Microsoft SA covering your XP workstation at work at you want to use your iPhone or your rigged up toaster at home with a MAC OS to access your virtual machine at work, you will need to purchase a VDA license ($100 per year) for those devices not covered by SA. So even if you purchase a thin client with a non-Microsoft OS installed or you wire up your refrigerator’s LCD screen with Ubuntu, you still need to purchase a Microsoft VDA license to access the Microsoft virtual machine you need to get to. Yes, one of the goals of virtualization is to cut costs, but there is no getting around the licensing models. Everyone wants to get paid. I believe the Microsoft licensing model has always been a confusing – hate relationship with most people, but it is nice when they attempt to simplify the process. Every vendor has a licensing model, you just have to wrap your head around how it falls into your VDI deployment strategy.