Category: Virtualisation

Attending AWSome day online conference 2019

The AWSome day was a free online Conference and a training event sponsor by Intel that will provide a step-by-step introduction to the core AWS (Amazon Web Services) services. Its free and everyone can attend. It was scheduled on 26 March 2019 online. The agenda covered broad topics such as AWS Cloud Concepts, AWS Core Services, AWS Security, AWS Architecting and AWS Pricing and Support. It’s pretty interesting for IT manager, system engineers, system administrators, and architects who are eager to learn more about cloud computing and how to get started on the AWS cloud. I do have some experience in managing AWS servers and even host my own server. However, I registered for the free training to refresh my knowledge and get more exposure such as the AWS pricing which I am not aware at all. Another interesting thing is that you will receive a certificate of attendance and you received 25 USD of AWS credits. Pretty cool right?

Right from the beginning, I knew this was something interesting. I encountered a minor problem whilst signing in. I had to send a mail to support and it was resolved immediately. Once connected to the lobby, it was pretty easy to attend and follow the online conference. After some minutes, Steven Bryen, head in the AWS Cloud delivered the keynote speech.

There was also an online challenge and I score 25,821 on the Trivia Leaderboard.

On the “Ask an Expert” tab, I was mostly interested in Man on the Side Attack – MOTS attack. They referred me to the WAF section on AWS. Another interesting link is the whitepaper of the AWS Overview of Security guidelines. AWS also offers comprehensive security across all the layers, SSL, DDoS, Firewall, HSM and Networking. I also shoot some question on Metric and Monitoring on application level such as on MariaDB. I discovered about the RDS performance insight. For applications on EC2, Containers, and Lamda, X-Ray looks very promising. Apart from virtualization, its good to note that AWS also provides containerization services.

The event was pretty enriching. The panel on the question area knows well their subject. I discovered a lot by participating in the AWSomeDay. I’m looking forward to AWS certifications in the near future.


VMware vSphere High Availability Basics

VMware vSphere HA is one of the core feature in a cluster. So let’s bring some more precision about it. High Availability – HA enables a cluster of ESXi hosts to work together so that they can provide high levels of High Availability for virtual machines rather than just an ESXi host by itself. In brief, the High Availability feature is provided by pooling virtual machines and the ESXi hosts in the cluster for protection. Some examples could be host failures, host isolations and application crashes. The requirements for HA is a minimum of two hosts, vCenter Server and Shared Storage.

[google_ad data_ad_slot=” data_ad_format=’rectangle’]

Photo Credits: VMware.com
Photo Credits: VMware.com

One ESXi goes down

By default, HA uses management network (Service Console/Management Network VMkernel connections). Let’s take a scenario where there are three ESXi hosts in a cluster. In the event where a physical server (ESXi hosts) goes down, the VM machines will be restarted on the other ESXi hosts. We can also set up applications to be started on the other physical server. From the three physical servers in the cluster one is going to be elected as master. The master server is going to keep track of other ESXi hosts through the heartbeat of other servers. This is done at the management network level. The master server will always expect to have heartbeat responses from other ESXi hosts.

Only the management network went down

If at any moment, the master server detects that a host is down, it will report that to the vCenter server and all servers will be powered on the other ESXi hosts. What is more interesting is that if only the management network goes down, and other network such the datastore network is still working, that would be referred as an Isolation incident. In that case, the vSphere will communicate to the master server and will claim that the ESXi host is still active is through the datastore heartbeat. In that case, the VMs will not be powered onto other ESXi host because it is an Isolation incident.

Only the Datastore network went down

Now, what if only the Datastore network went down and not the Management network? The master server will still receive heartbeat messages from other ESXi hosts, but no data communication is being sent to the datastore. Another element that is included in HA is VMCP – VM Component Protection which is a component that detects that if a VM is having access to the datastore. In the event of failure messages from the datastore heartbeat, the VMs will be powered onto other ESXi hosts where the datastore is sending alive heartbeat messages.

In all three scenarios, HA implies downtime as servers will be restarted in other ESXi hosts, but same is usually done within minutes. Another point to keep in mind is that HA applies only to physical host. For example, if a particular VM encounter a BSOD or Kernel Panic, HA will not know about it because the Physical server (ESXi host) is still communicating with the master server.

How the election process takes place to become the master?

When HA gets activated in the vSphere, the election process takes around 10-15 seconds. In that process (Enabling HA) an agent gets installed to activate HA which is called FDM – Fault Domain manager. Logs can be checked at /var/log/fdm.log. The election process is defined by an algorithm with two rules. For the first, the host with access to the greatest number of datastores wins.

Now, what if all ESXi hosts see the same number of datastores ? There will be a clash. This is where the second rule kicks in i.e; the host with the lexically-highest Managed Object ID (MOID) is chosen. Note that in vCenter Server each object will have a MOID. For example, objects are ESXI servers, folders, VMs etc.. So the lexical analyzer is a first component where it takes a character stream as input, outputs a token which goes into a syntax analyzer and the lexical analysis is performed. Care must be taken when attempting to rig this election because lexically here means, for example, that host-99 is in fact higher than host-100.

What IF …. ?

 

So what if vCenter Server goes down after setting up HA? 

The answer is HA will still work as it now the capacity to power on the vCenter Server. FDMs are self sufficient to carry on the election process as well as to start the vCenter Server. FDMs are inside the VMs but not inside the vCenter Server.

Enable and Configure vSphere HA
 
I will be using the free labs provided by VMware to set up HA.
 
1.The first action is to choose the Cluster then click on ‘Actions‘  then ‘Settings‘.
 
Photo Credits: VMware.com
Photo Credits: VMware.com

2. Choose ‘vSphere Availability‘ on the left -> then click on ‘Edit‘.

Photo Credits: VMware.com
Photo Credits: VMware.com

3. Click on ‘Turn ON vSphere HA’.

Photo Credits: VMware.com
Photo Credits: VMware.com

4. Choose ‘Failures and Responses‘ option and click on -> and enable ‘VM and Application monitoring‘.

Photo Credits: VMware.com
Photo Credits: VMware.com

5. On the ‘Admission control‘ -> check the ‘Cluster resource percentage‘ option.

[google_ad data_ad_slot=” data_ad_format=’rectangle’]

Photo Credits: VMware.com
Photo Credits: VMware.com

6. Click on ‘Heartbeat Datastores’ and select ‘Automatically select datastores accessible form the host‘.

Photo Credits: VMware.com
Photo Credits: VMware.com
7. From the ‘Summary’ tab click on ‘vSphere Availability‘, it should mentioned vSphere HA: Protected.
 
Photo Credits: VMware.com
Photo Credits: VMware.com
 
 
REFERENCES and CREDITS:
1.VMware Tech Plus:
2.VMware White paper:
3.VMware Labs:
4.Other Links:
 

ESXi installation on my Dell Laptop and hands on VMware Labs

If you are thinking why i should install a bare metal hypervisor on a laptop, i assure you its just for educational and testing purpose only. I noticed that it was quite difficult for me to get this done. However, after some research it looks that my Dell Inspiron n5110 motherboard will not authorised me to install ESXi 6.x. Probably, it looks like there are some drivers missing or the motherboard does not support it.

Here is what my processors looks like from the configuration menu on VMware vSphere Center

Anyway, i have been able to inject some network drivers – VIB files into the ESXi5.0 which allowed me to install the ESXi 5.0 on the laptop. You can follow the instructions at the link how to make your unsupported NIC work with ESXi. Once installed, VMware will provide you with a two months free trial before you purchase the license.

Another way of messing around VMware Vsphere is to deploy a lab from labs.hol.vmware.com That’s so easy to deploy labs and access the VMware vSphere web client. All credentials will be available on the readme.txt file found on the desktop. Also a lab manual will be shown alongside whilst working on the environement labs.

I am sure this would help anyone to get into hands on lab quickly and it would be a nice start for beginners.


Recovery Data and Applications with Zerto – Part 2

The flexibility of Zerto Virtual Replication means that we have multiple options for data recovery depending on what’s specifically needed in each use case. ZVR enables data mobility by adding offsite cloning to the toolkit. You have the ability to restore specific files and folders. Now if a critical folder is inadvertently erased, the clock can be rewind by saving time and money.

JLFR -Journal Level File Restore is a powerful feature that extends Zerto’s protection features to allow recovery of individual files or folders. Restoring a file uses the same checkpoint system to facilitate point in time selection and recovery. Any files in the journal can be recovered with the journal sizing tool. JLFR requires NTFS or FAT and is thus Windows compatible only.

To restore a file, click on “Actions” and then on the “Restore File” button.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

Select the ZFileServer and click on next

Photo Credits: Zerto.com
Photo Credits: Zerto.com

Files can be restored from the available checkpoints.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

The disk need now to be mounted whether physical or virtual that stores the files or folders to recover. Only one disk at a time can be recovered.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

After that, the mount process can start.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

As usual, the running tasks will be shown running on the dashboard. There are few ways to start restoring files and folders from the mounted disk, including from the monitoring tab or the open folder icon on the right under the running tasks.

For example on the bottom, there is a tasks button.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

The browse button allows you to search for the files and restore it. Once finish click on the unmount button.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

Another feature is cloning – Cloning will create a copy of the virtual machines saved to a specific point-in-time checkpoint.

By selecting the VPG name from the VPG tab which will open the VGS’s name in another tab. Click on More then on offsite clone.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

Once on the Offsite Clone tab, click on Select a checkpoint.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

For example, by selecting the latest checkpoint and the name of the Datastore, we can start the cloning operation.

Another recovery option is the backup restore. This is done by clicking on Actions then on Restore Backup. Again from some options, few things need to be checked. By choosing the restore plan which is the name of the VRA, the restore point, VM settings and the Power On option.

Photo Credits: Zerto.com
Photo Credits: Zerto.com

And finally the Restore Button

Photo Credits: Zerto.com
Photo Credits: Zerto.com

Restoring from backup allowed you to leverage scheduled and unscheduled backups that extend the protection offered by the journal. Cloning extends ZVR by  giving you multiple copies of your machines as they looked at a very specific point in time. With the file restore feature, you can extend the functionality of ZVR across the full spectrum of disaster recovery options, ranging from recovery of an entire virtualized datacenter to recovery of just one single file.


Recovery Data and Applications with Zerto – Part 1

By doing a failover test with Zerto features, we know that in a real disaster or disruption, everything is configured correctly and working as expected. Because if we put our VMs in a VPG, an entire multi-VM application can be rigorously tested without any interruption to that same application in production.

By clicking on the Failover button on the right bottom, we can start a failover test. The VPG can be ticked and click on next to continue.

Photo credits: Zerto.com

The execution parameters are that which have been set up in the VPGs for example, boot sequence and checkpoint dates etc..  The Failover Test section is where you can start the Failover test.

The failover test creates VMs in a sandbox using the test network defined in the VPG settings. All testing is written to scratch volumes. The longer the test, the more space is consumed. At the end of the test, ZVR will power on the test VMs and do so in the correct boot order if one was specified.

The test will keep writing to scratch volumes until either:

  • The hard journal storage limit is reached
  • It’s manually stopped.

Photo credits: Zerto.com
Photo credits: Zerto.com

Since Zerto automates the test cleanup, you should only stop a test from within a vSphere client. In a live environment, you would then verify the results of the test in the recovery site and ensure each VM is performing as expected. Assuming a successful test, you can come back to the ZVM and click “Stop” under the running task section.

Photo credits: Zerto.com
Photo credits: Zerto.com

The report tab provides detail on the test ran. This can be used for confirmation of test success or failure and aid in compliance. The recovery reports can also be exported in PDF.

A live failover test can also be performed. This is an example from where you can toggle from test to live failover.

Photo credits: Zerto.com
Photo credits: Zerto.com

I have to choose the VPG and click on next

Photo credits: Zerto.com
Photo credits: Zerto.com

The i click on the checkpoint field to choose the date.

Photo credits: Zerto.com
Photo credits: Zerto.com

As mentioned the date can be choosen as well as a recovery can be performed from the latest backup.

Photo credits: Zerto.com
Photo credits: Zerto.com

You can also choose if you want to auto-commit, auto-rollback or none.

Auto-commit – Selecting Auto-Commit means that after a designated time (Default is 0 minutes), Zerto will commit the failover which promotes the failed over VMs to the new live production servers. Once the failover is committed, the DR servers will need to be failed back to production once the production site is restored to keep any changes made on the servers while failed over. To complete this, Reverse Replication will need to be enabled to replicate the changes from the target site back to the production site.

Auto-Rollback – The Auto-Rollback option allows you to designate a time after the Live Failover (Default 10 minutes) for the failover to be rolled back to production. This works similar to a Test Failover as you have a window to test your servers and applications and then undo the changes. This will also remove any changes that were made on the servers while at the DR site and does not require reverse replication.

Photo credits: Zerto.com
Photo credits: Zerto.com

NoneIf you set ‘None’ for the commit policy, you will have the option to either Rollback or Commit the failover later in time. This may be used in a situation where your production site is down, but could possibly be brought back online quickly. You have the option to commit the failover if you do not foresee a time production will be back online. However if the option is quickly fixed you can perform a Rollback.

After settings parameters in the “Execution Parameters” settings, the failover can start.

Photo credits: Zerto.com
Photo credits: Zerto.com

The sucessful failover test can be viewed on the dashboard. A move (or migration) is a more graceful operation than a failover since it is a planned outage It’s great for failbacks, preventive maintenance and site/hardware migrations. ZVM will gracefully powered down the VMs and then, as they are shutting down, grab the very latest copy of the data and use that instead of the journal. To move VPG, click on “actions” and “move VPG”

Photo credits: Zerto.com
Photo credits: Zerto.com

Then follow the same step by selecting the VPGs, but this time on the execution parameters, the VM need to be shutdown and click on “move”

Photo credits: Zerto.com
Photo credits: Zerto.com

After ZVR, has finished the commit and processed the VPGs, the move is done and we are back to the green circle which means the SLA has been met and the operation is successful.