Archive for category Vmware

Arrays with duplicate ID’s received by SRA

Problem –

Ran into this issue when configuring the HDS SRA 2.0 for SRM using a ouple of ASM2300 arrays. Everything installed ( SRM, VCenter, SRA, HDS CCI etc) and a HORCM instance was set up on each server as required. When trying to configure the SRA I keep getting this error –

“Arrays with duplicate ID’s received by SRA”.

Solution –

After digging through the error logs in c:\Documents and Settings\All Users\Application Data\ Vmware\Vmware vCenter Site Recovery Manager\Logs, I noticed that it was identifying the array on the first command LUN and then attempting to do it again at which point everything failed and I got the error message.

This ended up being be because the HORCM file was incorrectly configured. Under the HORCM_CMD section of the configuration file the I had the command LUNs listed one after the other like this –

#/************** For HORCM_CMD **************

HORCM_CMD

#dev_name                                       #dev_name

\\.\PHYSICALDRIVE1

\\.\PHYSICALDRIVE2

This caused the SRA to scan the fist device and return the array ID it then went to the second device which reported the same array ID and cause all my issues. The correct way of doing it is to add your command devices on a single line like this –

#/************* For HORCM_CMD ***************

HORCM_CMD

#dev_name                                       #dev_name

\\.\PHYSICALDRIVE1                       \\.\PHYSICALDRIVE2

Once I made the change and restarted the HORCMINST service I was able to detect the Array and all paired LUNs. Happy Days!

Leave a comment

Arrays with duplicate ID’s received by SRA

Problem –

Ran into this issue when configuring the HDS SRA 2.0 for SRM using a couple of ASM2300 arrays. Everything was installed ( SRM, VCenter, SRA, HDS CCI etc) and a HORCM instance was set up on each server as required. When trying to configure the SRA I keep getting this error –

“Arrays with duplicate ID’s received by SRA”.

Solution –

After digging through the error logs in c:\Documents and Settings\All Users\Application Data\ Vmware\Vmware vCenter Site Recovery Manager\Logs, I noticed that it was identifying the array on the first command LUN and then attempting to do it again at which point everything failed and I got the error message.

This ended up being be because the HORCM file was incorrectly configured. Under the HORCM_CMD section my HORCM0.conf file the I had the command LUNs listed one after the other like this –

#/****************** For HORCM_CMD ****************************

HORCM_CMD

#dev_name                                       #dev_name                                       #dev_name

\\.\PHYSICALDRIVE1

\\.\PHYSICALDRIVE2

This caused the SRA to scan the fist device and return the array ID it then went to the second device which reported the same array ID and cause all my issues. The correct way of doing it is to add your command devices on a single line like this –

#/****************** For HORCM_CMD ****************************

HORCM_CMD

#dev_name                                       #dev_name                                       #dev_name

\\.\PHYSICALDRIVE1                       \\.\PHYSICALDRIVE2

Once I made the change and restarted the HORCMINST service I was able to detect the Array and all paired LUNs. Happy Days!

1 Comment

Installing vCenter on Server 2008 R2

There are a couple of small gotchas to be aware of when installing vCenter on a Windows 2008 R2. The main reason for this is that R2 is 64-bit only. The issues arn’t with the installation of vCenter as much as issues with getting the installation to recognise the connection to the SQL backend server.

vCenter requires a 32-bit DSN. The problem on R2 is that if you setup a DSN through Start > Administrative Tools > Data Sources (ODBC) the vSphere installer won’t be able to see it as the created DSN will be a 64-bit connection.

In addition to a 32-bit DSN vCenter also requires the SQL (2005 or 2008) native client. The confusing part is you need to download and install the 64x version of the native client and then use the procedure below to setup the connection.

So here is a quick guide on installing vCenter 4.0 on a Windows 2008 R2 server.

Prerequisites –

Procedure –

  1. Download the native client from the above link
  2. Install the SQL 2008 native client with default settings
  3. From the run command – %systemdrive%\Windows\SysWoW64\Odbcad32.exe
  4. Select the System DSN Tab
  5. Choose the Native SQL Client
  6. Run through the rest of the setup of the DSN connection as normal.
  7. Run the vCenter installation as normal and the installation program should show you DSN as a Database connection option.

Note – If you go back to Start > Administrative Tools > Data Sources (ODBC) no  DSN’s will appear. This is normal as they have been created using the 32-bit tool and so will only show up when you launch it from %systemdrive%\Windows\SysWoW64\Odbcad32.exe

Leave a comment

How to install the HP CIM providers on ESXi

Quick guide on installing the HP ESXi CIM providers

Prerequisites

Procedure

Note – Bold text should be replace with relevant information.

  1. Copy bundle to c:\temp
  2. Place the target host in maintenance mode
  3. Open the vsphere CLI command line
  4. Change to bin directory by typing CD bin
  5. Run vihostupdate.pl –server ESXiHOST –install –bundle c:/temp/bundle.zip
  6. Reboot the host
  7. Check it installation – Run vihostupdate.pl –server ESXiHOST –query

You should now have additional information on hardware status in the vSphere hardware status tab.

Leave a comment

Error during the configuration of the host: Cannot open volume: /vmfs/volumes/

I came across this error after setting up and FlexVolume on a Netapp filer and trying to connect to it from vSphere ESX host.

I had run through the wizard and made sure that each of the ESX hosts had read/write access as well as “Root” access to the export/volume but I continued to get this error:

“Error during the configuration of the host: Cannot open volume: /vmfs/volumes/270c64e8-cae0114b”

Originally I thought it may be that I had underscores in my export name as described in the Vmware KB article. I changed the export name to something without an underscore but still no dice.

It turns out even if you specify Unix security when creating the export, the Netapp filer will create it using NTFS by default! Why it sets the underlying security to NTFS on a NFS export with Unix security is beyond me. It may be the version of OnTap on our old dev filer (7.2.6.1)

To change this you need to do the following:

  1. Logon to your filer web management interface
  2. Expand “Volumes”
  3. Expand “Qtrees”
  4. Click “Manage”
  5. Click on the “NTFS” link next to the export you need to change
  6. Change the security style to “Unix” and then apply
  7. Go back to your host and try adding the NFS volume

I recommended getting familiar with this Netapp and vSphere best practices document as well.

9 Comments

Unable to create File system, please see VMkernel log for more details

I came across an error recently and thought I would share my solution.

While configuring an iSCSI based vSphere 4 implementation I came across the following error when I went to create a VMFS volume on a LUN that I had just presented and discovered.

 “Unable to create File system, please see VMkernel log for more details”. I took a look at the log but could decipher the messages.

I had configured everything to use jumbo frames including the physical switches (or so I thought) and I could not get it to work. I could see the LUN and all the paths to the LUN but could not create a VMFS volume. After changing everything back to the standard 1500 MTU everything worked like a charm.

After double checking everything it turned out that the network guy had not enabled jumbo frames on the switch ports, even though he assured me he had! So I asked him to make the changes and I changed everything back to Jumbo frames again on the VMware side (VMKernel ports and the vSwitch etc.)

Once that was done I could create my VMFS volume and it’s been working great ever since.

It was odd that I could “see” the storage and presented LUNS with a misconfigured network but I could not “use” it until everything was spot on.

1 Comment

Vmware HA and 3Com Stacked Switches

After going through various “Signoff tests” with a customer today I came across an interesting “Gotcha”. I was running through the availability/fault tolerance tests when we came to simulating a switch failure and confirming network and storage (iSCSI) was unaffected and virtual machines continued to run.

The network configuration consisted of 2x 3Com SuperStack 5500 switches configured as a stack. I havn’t had a lot of 3com exposure but they seem to do the job and being stackable allows for redundant switches and the ability to configure link aggregation between the two switches and the ESX hosts. I normally work with ProCurve and this configuration is not possible so it was good to try something new.

So anyway I pulled the power on one of the switches and after a few lost pings and a bit of VM unresponsiveness while ESX did it’s NMP magic everything returned to normal. So it was yet another tick in the box for that test J

The “Gothca” came when we plugged the switch back in. As the previously powered off switch began to boot everything started going pear shaped. VM’s began shutting down and the hosts reported HA errors.

What had happened is that as the second switch boots and joins the stack the other switches go offline and re-organise themselves and there unit ID. This resulted in each host not being able to ping either the gateway or the secondary isolation address and beginning the default “power off VM’s when isolated” routine.

There are two options to overcome this issue of VM’s shutting down when this happens –

  1. Increase the das.failuredetectioninterval advanced setting to something longer than the time it take the switches to sort themselves out (60 seconds out to do it). This fixes the issue but will cause a longer outage if anything really does happen.
  2. Change the default action for isolation to “Leave Virtual Machines Powered On” this fixes the virtual machines dying however the server will need to be “Re-enabled for HA”

Of course this fixes the issue of virtual machines powering down but not the problem of losing complete connectivity to the VM’s for more than 15 seconds. So it was decided with the client that the most appropriate action would be to arrange an maintenance outage before adding a failed or new switch to the stack.

Leave a comment