On VMware vSphere and driver/firmware issues


I have spent the better half of this year planning and finishing preparing to migrate some large databases on to virtual machines running on top of VMware vSphere.

While working through specs and other stuff I read up on loads and loads of forums, white papers, guides and anything else I could find on the subject.

In my research I started to find more and more posts that mentioned issues with drivers and/or firmware on VMware hosts, and not specific to any one vendor. Of course this worried me somewhat. So I did some more research on this.

My conclusion was very simple after reading through a lot of blog posts and speaking to multiple experts on VMware ESXi. Since we are seeing larger and larger mission critical systems virtualized we are pushing the hardware a lot more then we have done normally. And when we push the hardware to 70%, 80% or even 100% utilization, flaws that were hidden before are often more visible then they have been in the past when systems were only utilized at like 30-40% of the resources that were available to the operating system.

Just thought I should write this down….especially since I am watching one of my DB hosts pushing its CPU hard! 🙂


Using LVM to migrate between arrays (and raw device mapped LUNs to VMFS backed ones)


Recently I have been working on a project that requires me to migrate few multi-terabyte databases from physical to virtual machines.

Since we were lucky enough that the LUNs for those databases were hosting a LVM-backed filesystems I was able to present the LUNs as RDMs to the VMware virtual machines and then create new virtual hard disks and use the magical pvmove command to migrate the data.

The total downtime for each database is around 5-15 minutes and is mostly due to the fact that we have to present the LUNs to the virtual machine, mount the file systems and then chown the database files to a new uid/gid. After that is done the databases are started.

When the database has been verified to work as expected we created new virtual hard disks, ran pvcreate on them and import them to the volume group we were migrating.

After that we just fire up a trusty screen session (or tmux or whatever!) and run the mythical command: pvmove -i 10 -v /dev/oldlun /dev/newlun.

When that command finishes we remove the LUN from the volume group with vgreduce, run pvdestroy on the LUN and then remove the LUN from the virtual machine (you might want to run echo 1 >/sys/block/lunname/device/delete before you do that), unmap the LUN from the ESXi hosts and we are done!

The biggest reason for us not to use RDMs is that the flexibility we get by using native virtual disks kind of nulls all performance gains we might gain (with emphasis on might) by using RDMs (although I have yet to see any performance loss due to using VMFS). And when we finally make the jump over to vSphere 6.x I can migrate those virtual disks straight to VVols.

The only sad thing in our case is that by using this method we are stuck on EXT3 since the file systems are migrated over from old RHEL5 machines. I’m not sure I want to recommend anyone to run a migration from EXT3 to EXT4 on 6-16TB file systems 😀 (at least make sure you have a full backup available before testing this!).


My faith in vendor support has been restored!


Recently I had to use the support of two different vendors we have started using more in the last year or so.

I have had my share of dealing with support at random large enterprise software and hardware vendors. And I have had my share of “uhh…have you turned it on and off again” with a 50000$ server. Which I was not so willing to reboot just so a first level support agent could go through his script (yes, I am a evil customer!).

So, the first case was with a hardware vendor. I had mentally prepared my self for a fight with first level support. I opened a case, gave them as much detail as I could and went on with my day.
In about 30 minutes someone contacted me (this was not a system down issue so I just opened up a “normal” ticket) and gave me access to a site where I could upload the relevant hardware logs. Around 20 minutes later I got a response from a very knowledgeable person which gave me a solution. Case closed in less then two hours.

Another case I had recently was that I have been installing and configuring new machines to host our databases. I had a question that I needed to get an answer for so I could finish building out the master which I would use to duplicate to our new fancy database virtual machines.
My experience with our previous OS vendor for our database servers have been horrible, slow responses and pretty much all had the classic “have you tried to turn it off and on again”. So, a case was opened where I layed out my question. Again, I prepared my self for loads and loads of script-based answers and even pretty much gave up any hope that I would get some answers.
Well, to say the least I got a reply in about two hours asking for some more info, and after providing the info I had a well backed answer from a very knowledgeable person in another 20-30 minutes. This was a support ticket with a very low priority.

My faith in support has been restored. HP and Oracle, keep up the good work!


What is the single most important thing Oracle VM is missing?

I have been going through the Oracle VM feature set.

As a virtualization solution it actually looks pretty good. The cost is in the lower end and it seems pretty feature complete.

But there is one thing missing…one huge feature.

VM snapshot based backups. I’m betting that if they would throw a API for taking snapshot based backups and make a deal with Veeam to support it this would actually make quite a lot of system administrators take a harder look at using Oracle VM for at least some projects (i.e, virtualizing Oracle applications).

From my standpoint this is my biggest issue. Oracle has already gotten servers from multiple vendors certified (hcl) and it seems that Oracle is playing nicely with the larger hardware vendors (IBM/Lenovo, HP, Cisco).

Oracle – your techs might want to take a hard look at this feature – this will actually help you guys gain larger market share in data center virtualization!

Just my two cents!


3PAR StoreServ 7000 – Peer Persistence links

Howdy all,

I have been testing a 3PAR Peer Persistence setup using two 3PAR StoreServ 7200c, dual interconnected fabrics between sites and a multi-site VMware cluster (although only one node per site).

It works flawlessly!

Being able to take a array offline (disruptively, removing power to the controller shelf) and the only thing that happens is about ~10 second “delay” (while Peer Persistence fails the VMFS volumes over) for the virtual machines is pretty awesome.

We are still missing VVOL support for replicated volumes (and vMSC) but hopefully it will come later this year.

Here are the most important links on Peer Persistence:
VMware.com (KB article 2055904)
HP’s own Implementing vSphere Metro Storage Cluster using HP 3PAR Peer Persistence

And for a added bonus – HP 3PAR SSMC 2.1 makes Peer Persistence configuration easy as 1-2-3 by Techazine.com

The whole thing takes about 1 hour to configure when you know what you are doing and adding a new volume to the Peer Persistence configuration is a snap.


3PAR StoreServ 7000 – Zoning best practices

Howdy all,

Recently I started working with HP 3PAR StoreServ 7000 series storage. While installing them I was looking for the best-practice zoning information and seemed to find conflicting information. But after reading this document you see that HP recommends either:

  1. One initiator to one target
  2. One initiator to multiple targets

Check out the section called “FC hosts zoning” and things should be pretty clear! 🙂


A whitepaper on IBM’s HyperSwap


Checkout this whitepaper on IBM’s HyperSwap. The product is somewhat 1.0 and a little crippled (from my point of view anyway) but a large giant told me that most of the complicated stuff will be fixed in next version……


HP OneView – Quick overview


If you are looking for some information on HP OneView I recommend that you take a look at this site.

It is amazing how fast the development of OneView is progressing. I started using it back in January when we installed two HP C7000’s and 13 blades. Since then I have met a VP from the group which owns the OneView product inside HP, which was very clear on where HP was taking this product and how much HP is listening to the customers regarding features and development.

Now that version 2.0 is about to be released (FAQ here) I am really excited to see the new features, especially the features around storage (3PAR and the SAN). Having a single pane of glass to manage and monitor your whole infrastructure is great. The only “bad” thing is that of course OneView is mostly bound to HP products. It can monitor some third-party switches but with limited feature set.

Now, I guess we will have to wait for few more weeks for 2.0! 🙂


Disclaimer: I do not work for HP or a HP Partner. My views here are my own and have nothing to do with my employer!