Category Archives: Technical How To

vCenter Server Disconnecting From vCloud Director

Following on from my post on vCloud Director constantly syncing inventory I wanted to address a second point that could cause the underlying connection issue.

In the current revision of vCloud Director (5.1 and 5.1.1) there is an issue that may present itself as vCD disconnecting from vCenter at random times coupled with connection alerts from vCloud Director such as the email alert shown below.

vCloud Director is trying to reconnect to the vCenter Server Server “vcenter.domain.com“.
When vCloud Director reconnects, it will send another email alert.

Further information of the error can be seen in the log /opt/vmware/vcloud-director/logs/vcloud-container-info.log.
Look for the following error.  ORA-01013: user requested cancel of current operation.
You can do this as follows.
# less /opt/vmware/vcloud-director/logs/vcloud-container-info.log
Then press / and type in user requested cancel of current operation to go to the location in the log where this entry is recorded.

As detailed in my previous post you can change the connection time to get around it to allow vCD time to reconnect to the vCenter Server. However there is another work around available that involves modifying the vCloud Director SQL database to remove some null entries that keep on creeping up in value and trigger this disconnect in the first place.

To do this I suggest you stop your vCloud Director cells first and make sure to backup the SQL server database. These instructions are for Oracle.

    1. Quiesce the services of the cells using the cell-management-tool and then stop the services with service vmware-vcd stop as described here.
    2. Backup the Oracle server.
    3. Open an SSH connection to the Oracle server and type sqlplus then provide the vcloud username and password.  (Hint username is vcloud)
    4. Run the following commands at the SQL> prompt.

SQL> select count(*) from task_inv where (status = 2 OR status = 3) AND completion_date is null;

This will return a numerical value, probably in the tens or hundreds of thousands. What we need to do is to run a series of commands to reduce this number down. It is this number that is causing vCloud Director to time out during the synchronization process.
Run these commands to fix this.

A. Get list of all vc_ids in the setup.
SQL> select distinct vc_id from task_inv;

B. For each vCenter in the setup.
1. Get max managed object value for that vCenter. That is the vc_id obtained from the above query.
SQL> select substr(moref, 6) from (select * from task_inv where vc_id = vc_id order by to_number(substr(moref, 6)) desc) where rownum = 1;

This will give result_1. Next we need to do some basic maths. We will keep ‘top’ 1000 entries per VC and will delete rest of them.

2. result_1 minus 1000 = result_2

3. Using the above result_2, run the following.

SQL> delete from task_inv where (status = 2 OR status = 3) AND completion_date is null AND vc_id = vc_id AND to_number(substr(moref, 6)) < result_2;

4. Run a commit; command.

5. Finally run the original query to see if the number has gone down.
SQL> select count(*) from task_inv where (status = 2 OR status = 3) AND completion_date is null;

Don’t forget to restart the cell services on vCD.  service vmware-vmd start and tail the cell.log to watch the progress of restarting the cell service. /opt/vmware/vcloud director/logs/cell.log -f

It will continue to creep up until VMware fix this in an update currently due as release version 5.1.2 at the end of April 2013.

Hidden VMware Snapshots

You may find from time to time that a snapshot removal fails and that the delete all option is not working.  What you are left with is a virtual machine running off of the snapshot disks whereas vCenter may think that the virtual machine has no snapshots.
What does this mean and how can I avoid it?  Well first let me explain how the VMware snapshot process works and what should happen.

How Snapshots Work

A snapshot of a virtual machine is a point in time image of the current state and data.  The state is the virtual machines current power state, and the data is made up of all the files that make up the virtual machine including memory, disk, network cards, USB devices and so on.

A snapshot can be created simply through the use of the vSphere Client and the vSphere Web Client by right clicking on a virtual machine and selecting Snapshot>Create Snapshot.  You are then presented with the following options.

Name – Name for the snapshot.
Description – Description of the snapshot.
Snapshot the virtual machine’s memory – All the memory in active use on the virtual machine is written to a memory dump file (vmsn file) that is included in the snapshot.
Quiesce guest file system (Needs VMware Tools installed) – The quiescing process tells the operating system to write transactions out of the memory buffers and in-memory cache to the disk so that the virtual machine can have a consistent state that can be recovered from.
Virtual Machine Snapshot

When the snapshot is created an additional disk is added to the virtual machine called a child disk or a delta disk which is labelled as <vm-name>-<number>.vmdk and  <vm-name>-<number>-delta.vmdk.Virtual Machine Files
The <vm-name>-<number>-delta.vmdk file is a hidden file that will not show up in the datastore browser. You can however view this by connecting to the ESXi host either through SSH or through the vMA (vSphere Management Assistant). Here is an example of the same datastore location through a remote SSH connection.Remote SSH connection to host
Snapshot child disks are sparse disks that use a copy-on-write mechanism which means that only changed data is written to the child disks which allows for space saving by not replicating existing data.  The data is only written to the disk following a write.  This means that the child delta disks can save quite a bit of space.

In the illustration below the hashed blocks represent changed data blocks and the white blocks represent empty space due to the sparse layout of the disks.Copy on Write Disk Layout
Some additional files are created with the snapshot; the virtual machine snapshot database <vm-name>.vmsd and the virtual machine memory state file <vm-name>.vmsn.  The virtual machine snapshot database name file <vm-name>.vmsd contains the snapshot information and is where the snapshot manager gets its information from. It is a text readable file that can prove useful when trying to troubleshoot snapshot issues.

Here is an output of the snapshot .vmsd file associated with the example virtual machine.

.encoding = "UTF-8"
snapshot.lastUID = "1"
snapshot.current = "1"
snapshot0.uid = "1"
snapshot0.filename = "Demo-VM01-Snapshot1.vmsn"
snapshot0.displayName = "Demo-Snapshot01"
snapshot0.description = "Example Snapshot"
snapshot0.type = "1"
snapshot0.createTimeHigh = "316405"
snapshot0.createTimeLow = "-1275531695"
snapshot0.numDisks = "1"
snapshot0.disk0.fileName = "Demo-VM01.vmdk"
snapshot0.disk0.node = "scsi0:0"
snapshot.numSnapshots = "1"

The snapshot options are controlled through the VMware API using the following options.

CreateSnapshot - Creates the snapshot. This is labelled as ‘Take Snapshot‘ in the vSphere Client.
RemoveSnapshot  - Remove the snapshot and delete the associated <vm-name>-<number>.vmdk and <vm-name>-<number>-delta.vmdk disks.  This is labelled as ‘Delete’ in Snapshot Manager in the vSphere Client.
RevertToSnapshot – This option takes the running state of the virtual machine back to the state of the last snapshot and changes made since are lost.  You can save the current state of the virtual machine by taking another snapshot should you need to revert back to the currently active state of the virtual machine.  This is labelled as ‘Go to‘ in Snapshot Manager in the vSphere Client.
RemoveAllSnapshots – This option removes all the snapshots by writing the active state of the child disk into the parent disk.  Pre-vSphere 4 Update 2 f there are multiple snapshots and thus multiple child disks, each child disk will write it’s contents into its parent disk all the way up the chain until the child disks have written all their changes into their parent disks.  At this point all the child disks are deleted.

If you think about what that means for a second, if you have lots of large snapshots then you will also need to ensure there is enough free space to accommodate these snapshots during the RemovalAllSnapshot process.

As an example lets say that your virtual machine has 4 snapshots on it which are left on there whilst carrying out some work on the server and these snapshots grow in size as follows.

Original disk – 100GB
Snapshot one – 10GB
Snapshot two – 20GB
Snapshot three – 10GB
Snapshot four – 20GB

When the RemoveAllSnapshots API is called the four snapshots would roll up, so four would roll into three, then three into two, then two into one and finally one into the original disk.  What was originally a 100GB virtual machine disk is suddenly a machine with a potential size requirement of 240GB!

Thankfully that is no longer the case with vSphere 4 Update 2 version or later.  The changes made were that the snapshots would roll up starting with the closest disk, so snapshot one would roll into the original disk, then two into the original disk, then three and finally four.  This means that not only is space saved during the RemoveAllSnapshots but also data is only written once rather than repeatedly during each snapshot roll up.
This is labelled as ‘Delete All‘ in Snapshot Manager in the vSphere Client.
Consolidate – The consolidate option was added in vSphere 5 and is there to allow you to write back the child disks that may have become disassociated from the Snapshot Manager due to a failed RemoveSnapshot or RemoveAllSnapshots command.  This failure can be caused by a time out during the write back of the child disks to the parent disks.

A virtual machine may show up in the vSphere Client as requiring consolidation with a Needs Consolidation alert on the summary tab of the virtual machine.Virtual machine disk consolidation needed
There is also a Needs Consolidation column in the virtual machines view from any higher level in vCenter, such as the cluster level.
Click the image for a larger view.
Needs Consolidation Column

Orphaned Snapshots

What may happen is that the Snapshot Manager may think that the consolidation process is complete and so you do not get an error related to the virtual machine requiring consolidation in the vSphere Client but when you check the .vmx file or select the option to edit settings and view the location of the virtual machine disk files you may see that the disk is actually called <vm-name>-<number>.vmdk.  If this is the case look in the datastore browser and you will see the files <vm-name>-<number>.vmdk.
Virtual Machine Files
You can also open an SSH connection to the host  to view the  <vm-name>-<number>.vmdk and <vm-name>-<number>-delta.vmdk files by listing out the contents of the directory location of the virtual machine.  You can do this with the following commands.
#cd /vmfs/volumes/<datastorename>/<VirtualMachineName>
#ls -lah
Remote SSH connection to host

Here you will see all the disk files including the hidden flat disks.  <vm-name>-<number>-flat.vmdk. The flat disks are the actual virtual machine disk files, the ‘plain’ .vmdk files are a configuration file pointing to the flat disk file.
If you see that the VM is running from a snapshot delta you have several options.

Option 1 – Clone the virtual machine.  A nice simple fix.  To ensure a consistent state of the virtual machine you will need to shut the machine down first before starting the clone, otherwise the cloned VM will be in the state the the original virtual machine was in during the initial snapshot taken at the start of the clone process.  Please note this snapshot state is a crash consistent snapshot; one without the option to quiesce the disk or snapshot the memory so any items on the virtual machine not committed to disk will be lost.
Option 2Take and delete a snapshot in the vSphere Client.  What will happen with this option is that the snapshot removal will also perform the consolidate action and rewrite the additional delta child disks back to the original parent disk.  Should you try this option and the snapshot removal doesn’t fix it either try shutting the virtual machine down first or selecting the option to Quiesce guest file system whilst taking the snapshot.
Option 3 - Take and delete a snapshot using an SSH connection to the host.  You may find that the snapshot removal still doesn’t work using the vSphere Client.  If so try the same process from the command line.  Use these steps as a guide.

Step 1 – List out the VMID of the virtual machines on the host
# vim-cmd vmsvc/getallvms

Alternatively use grep to list out just the virtual machine name you are looking for.  In my example I use
# vim-cmd vmsvc/getallvms | grep Demo*

Here is the output.
22     Demo-VM01    [EQL03-SHARED05] Demo-VM01/Demo-VM01.vmx
windows7Server64Guest       vmx-08

Step 2 – Verify if the snapshot exists
# vim-cmd vmsvc/snapshot.get [VMID]

Here is the output.
# vim-cmd vmsvc/snapshot.get 22
Get Snapshot:
|-ROOT
--Snapshot Name : Demo-Snapshot01
--Snapshot Id : 1
--Snapshot Desciption :
--Snapshot Created On : 2/1/2013 12:11:49
--Snapshot State : powered off

Step 3 – Create a new snapshot
# vim-cmd vmsvc/snapshot.create [VmId] [snapshotName] [snapshotDescription] [includeMemory] [quiesced]

Here is the output.
# vim-cmd vmsvc/snapshot.create 22 Demo-Snapshot02 "Snapshot Demo 2 Two" 0 0
Create Snapshot:

Step 4 – Remove all the snapshots  (Labelled as Delete all in Snapshot Manager)
# vim-cmd vmsvc/snapshot.removeall [VMID]

Here is the output.
# vim-cmd vmsvc/snapshot.removeall 22
Remove All Snapshots:

Run a directory list command ls -lah to confirm that the snapshots have all been removed.

You can also take and remove snapshots using the vSphere CLI or vSphere Management Assistant  (vMA) and PowerCLI.  The vSphere CLI and vMA uses the same commands as above, you just need to specify the remote server that you want to perform the checks against.

For example run this to take a snapshot of a virtual machine running on an ESXi host through vCenter Server.
> vmware-cmd -h <vCenter Server> -U <user_name> -P <password> createsnapshot <name> <description> quiesce [0|1] memory [0|1]

PowerCLI can use the following commands to take a snapshot.
> New-Snapshot [-Name] <Snapshot_Name> [-Description <Description_Of_Snapshot>] [-Memory] [-Quiesce] [-VM] <Virtual_Machine_Name> [-Server <vCenter_Server>]

Checking for virtual machine disk locks

Should any <vm-name>-<number>.vmdk delta disks remain the next step is to see if any virtual machine disks have locks on them.  For this you can use the vmkfstools command set and have a look at the current mode of the relevant .vmdk file.
A virtual machine disk can be in one of four modes.

mode 0 = no lock.
mode 1 = is an exclusive lock.  This will be the case if the virtual machine is powered on and in use.  A powered on virtual machine will also have an up to date modification date on the .vmdk file.
mode 2 = is a read-only lock.  This will be the case of the <vm-name>-flat.vmdk  of a running virtual machine with snapshots.
mode 3 = is a multi-writer lock.  This will be the mode of the vmdk if it is used for Microsoft clusters disks or fault tolerance virtual machines.

Ensure you are in the relevant virtual machine directory and use the following actions to perform these checks.

Step 1 – Check the mode state of the virtual machine flat disk file  (<vm-name>-flat.vmdk)
 # vmkfstools -D <vm-name>-<number>.vmdk

Here is the output of the demo VM with a snapshot in place.
# vmkfstools -D Demo-VM01-flat.vmdk

Lock [type 10c00001 offset 159152128 v 123, hb offset 3244032
gen 25, mode 2, owner 00000000-00000000-0000-000000000000 mtime 1190286 nHld 1 nOvf 0]
RO Owner[0] HB Offset 3244032 50b60d57-e9cb48dc-9d82-984be10fc230
Addr <4, 346, 95>, gen 106, links 1, type reg, flags 0, uid 0, gid 0, mode 600
len 42949672960, nb 0 tbz 0, cow 0, newSinceEpoch 0, zla 3, bs 1048576

As you can see the base disk is in read only mode because all changes are currently being written to the snapshot delta disk.
If I run the same command on the snapshot delta disk I get the following.

# vmkfstools -D Demo-VM01-000001-delta.vmdk

Lock [type 10c00001 offset 262713344 v 152, hb offset 3244032
gen 25, mode 1, owner 50b60d57-e9cb48dc-9d82-984be10fc230 mtime 1190281 nHld 0 nOvf 0]
Addr <4, 598, 134>, gen 147, links 1, type reg, flags 0, uid 0, gid 0, mode 600
len 86016, nb 1 tbz 0, cow 0, newSinceEpoch 0, zla 1, bs 1048576

This disk is in exclusive lock mode because the virtual machine is switched on and is being used to write the changes to.   You can see which host has the lock on this virtual machine disk by looking at the MAC address given after the word, owner.

Step 2 – Shut the virtual machine down to see if the lock gets released
Here is the output following a shutdown of the virtual machine.

# vmkfstools -D Demo-VM01-flat.vmdk

Lock [type 10c00001 offset 159152128 v 124, hb offset 3244032
gen 25, mode 0, owner 00000000-00000000-0000-000000000000 mtime 1190723 nHld 0 nOvf 0]
Addr <4, 346, 95>, gen 106, links 1, type reg, flags 0, uid 0, gid 0, mode 600
len 42949672960, nb 0 tbz 0, cow 0, newSinceEpoch 0, zla 3, bs 1048576

As you can see the mode is 0 on the demonstration virtual machine meaning that the machine disk is not locked by another device.  Once the mode is 0 you should be able to take a snapshot and remove a snapshot successfully.

Step 3 – Forcefully remove the lock
If you find that the mode is anything other than 0 then another device is locking the disk.  This may be another host or depending on your backup software may be your backup server.  If the file is still locked you should see the MAC address of the owner.  If you find that it is your backup server that corresponds to the MAC address restarting the backup server should release the lock.  If it is another host then you will need to unregister the virtual machine from the current host and re-register it on the host with the corresponding MAC address.  Once you have registered it on the appropriate host try and power it on.  If it still fails check if the virtual machine still has a World ID assigned to it on the host identified as the owner of the lock.

# esxcli vm process list

Demo-VM01
World ID: 3657905
Process ID: 0
VMX Cartel ID: 3670192
UUID: 42 36 06 d4 0f 1b 35 61-17 aa f9 4b 8d 6c e1 78
Display Name: Demo-VM01
Config File: /vmfs/volumes/4fe306c8-b1c504a6-a734-984be10fb3e4/Demo-VM01/Demo-VM01.vmx

The world ID number (3657905) is the Virtual Machine Monitor (VMM) for vCPU 0.  Run the following command to force the virtual machine to stop by killing the process.

# esxcli vm process kill --type soft --world-id 3657905

Should you find that you are not able to see the virtual machine name when running this command this is because the virtual machine is not running on this host.
If this is the case or you are not able to kill the process you can restart the management agent or reboot the host to release the lock.

It is worth noting that you can use the k command in esxtop to kill a running virtual machine process. SSH to the host and perform the following.

Step 1 – Run esxtop by typing esxtop
Step 2 -Press c to switch to the CPU resource utilization screen (This is the default view)
Step 3 -Press Shift+f to display the list of fields
Step 4 -Press c to add the column for the Leader World ID
Step 5 -Identify the target virtual machine by its Name and Leader World ID (LWID)
Step 6 -Press k
Step 7 -At the World to kill prompt, type in the Leader World ID from step 5 and press Enter
Step 8  -Wait up to 30 seconds and validate that the process is no longer listed

How to Upload your own Virtual Machines to the StratoGen vCloud Platform

One of the most common questions our customers ask is ‘How do I upload my own VM images to your platform?’ – here’s a step by step guide.

Step 1 > Have your exported Virtual Machine files ready!

Remember, the StratoGen vCloud Platform will only allow you to upload files in the .OVF format. If you exported your virtual machine in the .OVA format, unfortunately you will need to redo the export process, ensuring you select OVF (Multiple Files option)

Step 2 > Log in to your StratoGen vCloud Director account

Using a supported browser, connect to your URL, as provided by your StratoGen representative at the time of sign up.  Enter your username and password to login to your account.

Step 3 > Select the ‘Catalogs’ tab

The initial homepage for your cloud is displayed. Now click on the ‘Catalogs’ tab.

Step 4> Create a new Organization Catalog

To add your OVFs the files into a catalog, you’ll first need to create one. Click on the green‘+’ to add a new catalog:

New vCloud Catalogue

Give the catalog a name, and step through the Wizard.

Step 5 > Upload OVF Files

One you’ve created your catalog, open it up, click on the upload button and then browse to the location that you exported the VMware Image file to, which in my case was the desktop:

Upload VMware Image OVF File

Select ONLY the .ovf file – the export process will have created a couple of other files but you don’t need to worry about these – the import process will pick them up:

Upload OVF

Click the ‘upload’ button in the catalogue wizard, and then wait for the import to the catalog to complete:

Upload VMware Image 4

The time taken to upload the Image will depend on the size, and the speed of your upload connection.

Step 6 > Deploy Virtual Machine from uploaded image

Once the upload has completed we are then able to deploy the .ovf into your vCloud Virtual Datacentre. In my demo account, I already have a couple of vApps set up, so I’m going to add the uploaded VM to one of my existing vApps.

To do this, open up the vApp, and then click on the ‘Add Virtual Machine’ button. The catalog wizard should pop up, defaulting to the catalog that you just created, and you should see your virtual machine in it. Select the VM, then click the ‘Add’ button and then click ‘Next’:

Deploy VMware Image 1

In the next step we need to give the VM a network connection – choose either the Direct Internet Connection for external connectivity, or import into an existing Internal Network structure – if applicable. Once the network has been added, click next through to the end of the wizard, and the VM Image will begin deploying. If I go back to my vApp diagram I can see the uploaded image is now deployed in my vApp alongside the other servers which were there previously:

Deployed VMware Image

That’s it! I can now power up the machine, and my VMware Image has been moved from my local cloud onto the StratoGen VMware Hosting platform.

Troubleshooting File Uploads

If you experience issues trying to upload .OVF files, they will most likely be caused by browser incompatibilities or Java related issues.

Browser Compatibility List (Correct for vCloud Director 1.5.1)

This table outlines support for browsers on Microsoft Windows operating systems:

This table outlines support for browsers on Linux operating systems:

Java Version 7 Issues

We have seen various issues with browsers using Java version 7. Typical problems include the ‘browse’ button not working when trying to load the file selector. If you experience a similar issue we recommend that you install and use Java JRE version 6 instead*. It is best to install the 32Bit and 64Bit versions. You can download the latest Java 6 from this URL:

http://www.oracle.com/technetwork/java/javase/downloads/jre-6u32-downloads-1594646.html

*Please check that this won’t impact any existing applications that you already have installed.