I have been meaning to take a look at the EMC Rainfinity product for a long time now but never got around to it. EMC have now released it as a virtual appliance which you can get from Chad’s site here. The link also provides info on the product as well as installation steps.
The VE edition is free for non production use but there is a paid version that could be a great solution for SMB clients who have data archiving requirements if the price is right. The biggest benefit I see for SMB clients is reducing the amount of data that needs to be backed up.
I’m not sure what targets and destination storage it is able to work with just yet but after having a play with it I will post again with my findings.
Detailed info of the following best practice round up can be found here.
- The SMVI server component should be installed on the vCentre server
- Transient data such as the guest operating system swap file, temp files and page files, should be moved to a separate virtual disk on a separate data store to prevent large snapshots due to high rate of change
- Set vDisks that house temp/transient data to “Independent Persistent” to ensure SMVI does attempt to snap the volume that that vDisk resides on
- Licences required for SMVI
- Any required NFS, iSCSI FC protocol licenses
- When vCentre is installed on a VM the database associated with vCenter must not be installed on a virtual machine protected by SMVI to avoid time out issues. Where possible use a physical SQL server for vCenter
- It is recommended to configure data store level backups instead of VM backups where possible to reduce admin overhead and increase restore options.
- SMI .XML configuration files need to be on shared storage or backed up often to allow for recovery
- Set the Startup of the SMVI to Manual if you do restore vCenter from an image level backup. Important because
“When the virtual machine running SnapManager for Virtual Infrastructure is backed up by SMVI, the current state of the SMVI backup workflow is captured within the Snapshot copies. Should the virtual machine be restored at a later date, SMVI within the virtual machine assumes that the host has failed in mid-workflow and attempts to resume backups at the point at which the Snapshot copy was taken. This can cause backups to run multiple times even if they were originally successful.”
- After a restore of the SMVI vm remove the contents of the PROGRAMFILES%\NetApp\SMVI\server\crash directory and start the SMVI service
- If Vmware In an environment experiencing heavy disk I/O this will greatly increase both the speed and success rate of backups. This will result in crash consistent backups only which is generally only an issue for databases or e-mail servers.
- If a VM requires an RDM ( a requirement to use SnapDrive for SQL) it should be configured in physical mode. This will enabled the non RDM virtual disks (ie system partition) to be backed up using the vmware tool VSS integration.
- Create a separate storage admin account for SMVI to interact with the filer (don’t use the root account)
- Enable and use SSL on the filer to ensure passwords are encrypted
I am not a Linux guy. I know enough to make my way around the ESX service console but that’s about it. A Linux guy would not have spent hours figuring out the issue I had recently!
Anyway I have been setting up a little SRM lab and I wanted to configure a couple of Netapp ONTAP simulators for the storage replication component. The simulator needs to be run on a Linux host machine. I managed to get my host VMs running with Ubuntu 9.1. You can find the installation steps here if you are interested.
I attempted to mount the ONTAP simulator ISO to run the install but for some reason even though Linux was telling me it was mounted I could not get access to it. I mounted another ISO to make sure I wasn’t doing something wrong and it seemed to work fine.
So I mounted the ONTAP simulator ISO on a windows VM and I could access the files. I decided to copy the files to a share and access the installation files over the network.
After copying the install files over the network to the Linux VM I ran the installation which failed. I kept getting a “Cannot find an installed simulator nor an install package” error. I spent hours trying to figure out what the hell was wrong.
After digging through the setup.sh file I noticed that it looks for sim.tgz as the installation package. The file was in the directory and I could manually un-tar it so the file wasn’t corrupt. Then it finally dawned on me! The files I copied across from the windows server where all in CAPITAL LETTERS. I changed every files name to lower case and it worked!
I know Linux is case sensitive but I assumed setup would just run without adjusting anything. I think at some point during the coping of the files to windows then to linux all the file names where changed to capital letters.
I came across an error recently and thought I would share my solution.
While configuring an iSCSI based vSphere 4 implementation I came across the following error when I went to create a VMFS volume on a LUN that I had just presented and discovered.
“Unable to create File system, please see VMkernel log for more details”. I took a look at the log but could decipher the messages.
I had configured everything to use jumbo frames including the physical switches (or so I thought) and I could not get it to work. I could see the LUN and all the paths to the LUN but could not create a VMFS volume. After changing everything back to the standard 1500 MTU everything worked like a charm.
After double checking everything it turned out that the network guy had not enabled jumbo frames on the switch ports, even though he assured me he had! So I asked him to make the changes and I changed everything back to Jumbo frames again on the VMware side (VMKernel ports and the vSwitch etc.)
Once that was done I could create my VMFS volume and it’s been working great ever since.
It was odd that I could “see” the storage and presented LUNS with a misconfigured network but I could not “use” it until everything was spot on.
I recently implemented my first EqualLogic SAN recently and I must say that I am now a bit of a fan of the product. That’s quite a statement coming from me as I have never been a fan of DELL. If I am being honest I probably only disliked them because it’s cool not to like DELL.
It is dead easy to setup and configure and the management console (web based) is responsive and intuitive.
However there is one major flaw as I see it and that is there snapshot implementation. Yes they are easy to setup, manage and can be used to do application consistent snapshots (using host integration tools) but they have one fatal flaw. The Equallogic arrays use a 16Mb page no matter how much data has changed when using snapshots.
As an example of why this is bad, consider an Exchange 2007 server that needs to make 10 x 8k random writes to the database. Assuming none of the 8k blocks reside in the same 16Mb page you will need a 160Mb snapshot (10 x 16Mb) to allow for 80k worth of writes. You are going to need alot of spare space on your array if you plan on using snapshots.
This (I assume) is most likely why snap shot reservation is set to 100% by default meaning a 100Gb LUN needs 200Gb of available space. Add RAID partiy and host spares into the mix and you are looking at less than 50% usable space!
I would still consider using it for none database LUNs like a file server as writes are more likely to be sequential and thus won’t accentuate the issue as much.
Despite this flaw, I still like to product as long as you don’t plan on making extensive use of snapshots.
After going through various “Signoff tests” with a customer today I came across an interesting “Gotcha”. I was running through the availability/fault tolerance tests when we came to simulating a switch failure and confirming network and storage (iSCSI) was unaffected and virtual machines continued to run.
The network configuration consisted of 2x 3Com SuperStack 5500 switches configured as a stack. I havn’t had a lot of 3com exposure but they seem to do the job and being stackable allows for redundant switches and the ability to configure link aggregation between the two switches and the ESX hosts. I normally work with ProCurve and this configuration is not possible so it was good to try something new.
So anyway I pulled the power on one of the switches and after a few lost pings and a bit of VM unresponsiveness while ESX did it’s NMP magic everything returned to normal. So it was yet another tick in the box for that test J
The “Gothca” came when we plugged the switch back in. As the previously powered off switch began to boot everything started going pear shaped. VM’s began shutting down and the hosts reported HA errors.
What had happened is that as the second switch boots and joins the stack the other switches go offline and re-organise themselves and there unit ID. This resulted in each host not being able to ping either the gateway or the secondary isolation address and beginning the default “power off VM’s when isolated” routine.
There are two options to overcome this issue of VM’s shutting down when this happens –
- Increase the das.failuredetectioninterval advanced setting to something longer than the time it take the switches to sort themselves out (60 seconds out to do it). This fixes the issue but will cause a longer outage if anything really does happen.
- Change the default action for isolation to “Leave Virtual Machines Powered On” this fixes the virtual machines dying however the server will need to be “Re-enabled for HA”
Of course this fixes the issue of virtual machines powering down but not the problem of losing complete connectivity to the VM’s for more than 15 seconds. So it was decided with the client that the most appropriate action would be to arrange an maintenance outage before adding a failed or new switch to the stack.
I recently moved to the UK and left most of my “tools” at home that I have gathered over the years.
As happens more often than not, I needed a tool and I didn’t have it on me. I wanted to build a Server 2003 template VM but before installing the OS I wanted align the system partition so that the offset was at 64 and not the default 63 that Server 2003 uses (Server 2008 now uses an offset of 1024 by default on any disk larger than 4Gb). You should read this if you don’t know setting the partition alignment is so important. The white paper is written for ESX 3 but it still relevant.
To make a long story short I downloaded all the components and created my BartPE ISO. Now I have done this in the past and it worked great but this time I ran into several issues including diskpart not starting with “The disk management services could not complete the operation” and “DiskPart encountered an error starting the COM services”.
After several failed attempts I had a brain wave! I already have a Windows 2008 VM up and running so why not use it to do the alignment? Setting alignment on non-system disks is easy on 2003 as you can just use the diskpart command “Create Partition Primary Align=X” before you have any data on the drive. Server 2008 makes it even easier by setting an alignment of 1024 by default when configuring new partitions.
To enable me to set the alignment using the 2008 server I did the following –
- 1. Removed the 2003 VM from inventory (but left the vmdk on the storage)
- 2. Edited the properties of the 2008 server and added the 2003 server vmdk as an additional drive.
- 3. Fired up disk manager and let Windows 2008 discover and format the drive with a single partition
- 4. Removed the disk by editing the properties of the 2008 VM (shut it down first)
- 5. Navigated to the VMX (and vmdk) of the template and registered it with vCenter again.
- 6. Mounted the 2003 R2 ISO and started the install
- 7. When I got to the point where setup asks to format the drive I told it not to format the drive
- 8. Finished the installation
- 9. Once the VM was up and running I confirmed the alignment and sure enough it was the default 1024 that Server 2008 uses.
I will create my BartPE ISO again when I get time but for now I know it works and that every 2003 VM that gets deployed from the template will be aligned and thus perform better and stress the SAN less.