Archive for January, 2010
I am not a Linux guy. I know enough to make my way around the ESX service console but that’s about it. A Linux guy would not have spent hours figuring out the issue I had recently!
Anyway I have been setting up a little SRM lab and I wanted to configure a couple of Netapp ONTAP simulators for the storage replication component. The simulator needs to be run on a Linux host machine. I managed to get my host VMs running with Ubuntu 9.1. You can find the installation steps here if you are interested.
I attempted to mount the ONTAP simulator ISO to run the install but for some reason even though Linux was telling me it was mounted I could not get access to it. I mounted another ISO to make sure I wasn’t doing something wrong and it seemed to work fine.
So I mounted the ONTAP simulator ISO on a windows VM and I could access the files. I decided to copy the files to a share and access the installation files over the network.
After copying the install files over the network to the Linux VM I ran the installation which failed. I kept getting a “Cannot find an installed simulator nor an install package” error. I spent hours trying to figure out what the hell was wrong.
After digging through the setup.sh file I noticed that it looks for sim.tgz as the installation package. The file was in the directory and I could manually un-tar it so the file wasn’t corrupt. Then it finally dawned on me! The files I copied across from the windows server where all in CAPITAL LETTERS. I changed every files name to lower case and it worked!
I know Linux is case sensitive but I assumed setup would just run without adjusting anything. I think at some point during the coping of the files to windows then to linux all the file names where changed to capital letters.
I came across an error recently and thought I would share my solution.
While configuring an iSCSI based vSphere 4 implementation I came across the following error when I went to create a VMFS volume on a LUN that I had just presented and discovered.
“Unable to create File system, please see VMkernel log for more details”. I took a look at the log but could decipher the messages.
I had configured everything to use jumbo frames including the physical switches (or so I thought) and I could not get it to work. I could see the LUN and all the paths to the LUN but could not create a VMFS volume. After changing everything back to the standard 1500 MTU everything worked like a charm.
After double checking everything it turned out that the network guy had not enabled jumbo frames on the switch ports, even though he assured me he had! So I asked him to make the changes and I changed everything back to Jumbo frames again on the VMware side (VMKernel ports and the vSwitch etc.)
Once that was done I could create my VMFS volume and it’s been working great ever since.
It was odd that I could “see” the storage and presented LUNS with a misconfigured network but I could not “use” it until everything was spot on.
I recently implemented my first EqualLogic SAN recently and I must say that I am now a bit of a fan of the product. That’s quite a statement coming from me as I have never been a fan of DELL. If I am being honest I probably only disliked them because it’s cool not to like DELL.
It is dead easy to setup and configure and the management console (web based) is responsive and intuitive.
However there is one major flaw as I see it and that is there snapshot implementation. Yes they are easy to setup, manage and can be used to do application consistent snapshots (using host integration tools) but they have one fatal flaw. The Equallogic arrays use a 16Mb page no matter how much data has changed when using snapshots.
As an example of why this is bad, consider an Exchange 2007 server that needs to make 10 x 8k random writes to the database. Assuming none of the 8k blocks reside in the same 16Mb page you will need a 160Mb snapshot (10 x 16Mb) to allow for 80k worth of writes. You are going to need alot of spare space on your array if you plan on using snapshots.
This (I assume) is most likely why snap shot reservation is set to 100% by default meaning a 100Gb LUN needs 200Gb of available space. Add RAID partiy and host spares into the mix and you are looking at less than 50% usable space!
I would still consider using it for none database LUNs like a file server as writes are more likely to be sequential and thus won’t accentuate the issue as much.
Despite this flaw, I still like to product as long as you don’t plan on making extensive use of snapshots.
After going through various “Signoff tests” with a customer today I came across an interesting “Gotcha”. I was running through the availability/fault tolerance tests when we came to simulating a switch failure and confirming network and storage (iSCSI) was unaffected and virtual machines continued to run.
The network configuration consisted of 2x 3Com SuperStack 5500 switches configured as a stack. I havn’t had a lot of 3com exposure but they seem to do the job and being stackable allows for redundant switches and the ability to configure link aggregation between the two switches and the ESX hosts. I normally work with ProCurve and this configuration is not possible so it was good to try something new.
So anyway I pulled the power on one of the switches and after a few lost pings and a bit of VM unresponsiveness while ESX did it’s NMP magic everything returned to normal. So it was yet another tick in the box for that test J
The “Gothca” came when we plugged the switch back in. As the previously powered off switch began to boot everything started going pear shaped. VM’s began shutting down and the hosts reported HA errors.
What had happened is that as the second switch boots and joins the stack the other switches go offline and re-organise themselves and there unit ID. This resulted in each host not being able to ping either the gateway or the secondary isolation address and beginning the default “power off VM’s when isolated” routine.
There are two options to overcome this issue of VM’s shutting down when this happens –
- Increase the das.failuredetectioninterval advanced setting to something longer than the time it take the switches to sort themselves out (60 seconds out to do it). This fixes the issue but will cause a longer outage if anything really does happen.
- Change the default action for isolation to “Leave Virtual Machines Powered On” this fixes the virtual machines dying however the server will need to be “Re-enabled for HA”
Of course this fixes the issue of virtual machines powering down but not the problem of losing complete connectivity to the VM’s for more than 15 seconds. So it was decided with the client that the most appropriate action would be to arrange an maintenance outage before adding a failed or new switch to the stack.
I recently moved to the UK and left most of my “tools” at home that I have gathered over the years.
As happens more often than not, I needed a tool and I didn’t have it on me. I wanted to build a Server 2003 template VM but before installing the OS I wanted align the system partition so that the offset was at 64 and not the default 63 that Server 2003 uses (Server 2008 now uses an offset of 1024 by default on any disk larger than 4Gb). You should read this if you don’t know setting the partition alignment is so important. The white paper is written for ESX 3 but it still relevant.
To make a long story short I downloaded all the components and created my BartPE ISO. Now I have done this in the past and it worked great but this time I ran into several issues including diskpart not starting with “The disk management services could not complete the operation” and “DiskPart encountered an error starting the COM services”.
After several failed attempts I had a brain wave! I already have a Windows 2008 VM up and running so why not use it to do the alignment? Setting alignment on non-system disks is easy on 2003 as you can just use the diskpart command “Create Partition Primary Align=X” before you have any data on the drive. Server 2008 makes it even easier by setting an alignment of 1024 by default when configuring new partitions.
To enable me to set the alignment using the 2008 server I did the following –
- 1. Removed the 2003 VM from inventory (but left the vmdk on the storage)
- 2. Edited the properties of the 2008 server and added the 2003 server vmdk as an additional drive.
- 3. Fired up disk manager and let Windows 2008 discover and format the drive with a single partition
- 4. Removed the disk by editing the properties of the 2008 VM (shut it down first)
- 5. Navigated to the VMX (and vmdk) of the template and registered it with vCenter again.
- 6. Mounted the 2003 R2 ISO and started the install
- 7. When I got to the point where setup asks to format the drive I told it not to format the drive
- 8. Finished the installation
- 9. Once the VM was up and running I confirmed the alignment and sure enough it was the default 1024 that Server 2008 uses.
I will create my BartPE ISO again when I get time but for now I know it works and that every 2003 VM that gets deployed from the template will be aligned and thus perform better and stress the SAN less.
I recently needed to configure an ESXi 4.0 server to use the new multi-pathing capability along with jumbo frames. I have done this in the past with “classic” ESX using this fantastic post by Chad Sakac and friends which uses the ESXCFG commands within the service console. As ESXi doesn’t have a service console I had to do a little research to figure it out. The information is out there but it took a bit of finding so I decided to post the process here.
Yes I know I can hack ESXi so I can use SSH and use the regular esxcfg commands. However since VMware keep telling us that the service console is going away, I figure I may as well do it “right” and use the vSphere CLI.
On a side note I originally tried to accomplish this using power CLI (based on power shell) but ran into issues with setting up vmknics and their MTU settings. I also couldn’t bind thevkernel ports to the iSCSI HBA. There is most likely a way of doing it using power CLI but in the end I found it was easier to use the regular vSphere CLI.
This guide assumes that you have already –
- Got a reasonable knowledge of ESX already
- Have read this post and understand the concepts
- Enabled Jumbo frames on the relevant physical switch ports
- Ensured that your iSCSI target and server supports jumbo frames
- That you have a base install of ESXi 4.0 up and running
- You know the name of your iSCSI HBA (Should be something like vmhba33)
- That you are able to substitute anything between the with your own relevant information J
To get the job done we will be using a combination of the sphere CLI which you can get from here and the vSphere windows client which you can get by connecting to the IP of your ESXi host using your web browser.
Step 1 – Create the vSwitch and set the MTU
In this section we will create the vSwitch , assign the physical NIC’s that will be used for iSCSI traffic using the GUI and then switch to the CLI to set the MTU to 9000 (Jumbo Frames) as it can’t be done using the GUI.
- Log into the ESX host with the vSphere Client
- Create a vSwitch and take a note of its name (ie “vSwitch1”)
- Attach the NICS you intend to use for iSCSI traffic. Be sure these are plugged into switch ports with jumbo frames enabled. In this example I am using two NICs.
1. If you choose all the defaults you will end up with a port group on the vSwitch. You can safely delete that as you don’t need it.
2. If you haven’t already, install the vSphere CLI and choose all the defaults
3. Fire up the vSphere CLI command prompt from the start menu
4. The command prompt defaults to c:\Program Files\Vmware\Vmware vSphere CLI\. Change to the” bin” directory.. You should now be at c:\Program Files\Vmware\Vmware vSphere CLI\bin.
5. To configure the switch we just created with jumbo frame support type –
vicfg-vswitch.pl -server -m 9000
eg. vicfg-vswitch.pl -server ESX01 –m 9000 vSwitch1
6. To confirm it worked correctly run the following –
vicfg-vswitch.pl -server -l
eg. vicfg-vswitch.pl -server ESX01 -l
Your switch should appear with MTU of 9000 as shown.
- Keep the prompt open as we will be using it a few more times yet
Step 2 – Setup vkernel ports with jumbo frames support
We have to do this part entirely from the CLI as we can’t create vmknics in the GUI and set the MTU later on like we did with the vSwitchs. The MTU can only be set on the creation of a vkernel port.
1. Before you can create the vmknics and assign them an IP address and MTU setting you need first create a port group with the names that you intend to use for each vkernel port. For each vkernel port type –
vicfg-vswitch.pl -server -add-pg
eg. vicfg-vswitch.pl -server ESX01 -add-pg iSCSI_1 vSwitch1
2. To confirm it worked type-
vicfg-vswitch.pl -server -l
eg. vicfg-vswitch.pl -server ESX01 -l
You should get something like this –
3. Now create the vkernel ports and attach them to the relevant port group by typing –
vicfg-vmknic.pl -server -add –ip -netmask -p “PortGroup” –mtu 9000
eg. vicfg-vmknic.pl -server ESX01 -add -ip 192.168.254.12 -netmask 255.255.255.0 -p “iSCSI_1” –mtu 9000
4. To confirm it worked type –
vicfg-vmknic.pl -server -l
eg. vicfg-vmknic.pl -server ESX01 -l
Step 3 – Binding the vkernel ports to the physical NIC’s
At this point we need to switch back to the GUI and configure each vKernel port so that it only uses one active adaptor.. This allows the NMP driver within ESX to handle all the load balancing and failover. Once that is done we go back to the command line one more time and then the job is done.
1. Connect to you ESXi host with the vsphere client
2. Go to the properties of the vSwitch that you have created.
3. Highlight the first vKernel port and click edit, then go to the “Nic Teaming” tab.
4. Check the “override vSwitch fail over order” box
5. Move all but one of the physical adaptors from the “active” list to the unused list. Do this for each adaptor so that each vKernel port uses a different physical adapter.
6. Go back to the CLI prompt and “bind” each vKernel port to the iSCSI initiator by running the following command –
esxcli –server swiscsi nic add -n -d
eg. esxcli –server ESX01 swiscsi nic add -n vmk1 -d vmhba34
7. To confirm it worked run the following-
esxcli –server swiscsi nic list -d
eg. esxcli –server esx01 swiscsi nic list -d vmhba34
You should see a whole bunch of details (IP, MTU etc) for each vKernel port that is bound to the iSCSI HBA.
Wrapping it up
So that’s it. If everything worked you should now be able to point your jumbo frame enabled ESXi iSCSI initiator at your target and run a discovery. Each target device should now have at least two paths to the storage. Keep in mind that you can only have a maximum of 8 paths to a device when using ISCSI on ESX.
Once you can see your LUNS you should be able to configure the NMP diver to use Round Robin for each of the accessible devices.
First off the word “Virtualisation” is not spelt with a “Z” in Australia so chances are you are here because you are an Aussie or a Pom and searched of something relevant using the correct spelling. If you are neither of those breeds then I have no idea how you got here!
Secondly the plan for this site at the moment is simply a repository for documenting anything related to virtualisation and storage that I run into during my day job. As I learn something new, figure something out or re-discover something that I had forgotten I plan on putting it here.
If someone gets something out of it or I save someone some time then that’s great. However at this stage I am under no illusion that I am going to be an industry leading blogger. Plus I don’t use twitter which all the “elite bloggers” seem to so that put me at a disadvantage almost immediately. That and they are probably way smarter than I am anyway!