Monthly Archives: September 2015

On VMware vSphere and driver/firmware issues

Hi,

I have spent the better half of this year planning and finishing preparing to migrate some large databases on to virtual machines running on top of VMware vSphere.

While working through specs and other stuff I read up on loads and loads of forums, white papers, guides and anything else I could find on the subject.

In my research I started to find more and more posts that mentioned issues with drivers and/or firmware on VMware hosts, and not specific to any one vendor. Of course this worried me somewhat. So I did some more research on this.

My conclusion was very simple after reading through a lot of blog posts and speaking to multiple experts on VMware ESXi. Since we are seeing larger and larger mission critical systems virtualized we are pushing the hardware a lot more then we have done normally. And when we push the hardware to 70%, 80% or even 100% utilization, flaws that were hidden before are often more visible then they have been in the past when systems were only utilized at like 30-40% of the resources that were available to the operating system.

Just thought I should write this down….especially since I am watching one of my DB hosts pushing its CPU hard! 🙂

Bgrds,
Finnur

Using LVM to migrate between arrays (and raw device mapped LUNs to VMFS backed ones)

Hi,

Recently I have been working on a project that requires me to migrate few multi-terabyte databases from physical to virtual machines.

Since we were lucky enough that the LUNs for those databases were hosting a LVM-backed filesystems I was able to present the LUNs as RDMs to the VMware virtual machines and then create new virtual hard disks and use the magical pvmove command to migrate the data.

The total downtime for each database is around 5-15 minutes and is mostly due to the fact that we have to present the LUNs to the virtual machine, mount the file systems and then chown the database files to a new uid/gid. After that is done the databases are started.

When the database has been verified to work as expected we created new virtual hard disks, ran pvcreate on them and import them to the volume group we were migrating.

After that we just fire up a trusty screen session (or tmux or whatever!) and run the mythical command: pvmove -i 10 -v /dev/oldlun /dev/newlun.

When that command finishes we remove the LUN from the volume group with vgreduce, run pvdestroy on the LUN and then remove the LUN from the virtual machine (you might want to run echo 1 >/sys/block/lunname/device/delete before you do that), unmap the LUN from the ESXi hosts and we are done!

The biggest reason for us not to use RDMs is that the flexibility we get by using native virtual disks kind of nulls all performance gains we might gain (with emphasis on might) by using RDMs (although I have yet to see any performance loss due to using VMFS). And when we finally make the jump over to vSphere 6.x I can migrate those virtual disks straight to VVols.

The only sad thing in our case is that by using this method we are stuck on EXT3 since the file systems are migrated over from old RHEL5 machines. I’m not sure I want to recommend anyone to run a migration from EXT3 to EXT4 on 6-16TB file systems 😀 (at least make sure you have a full backup available before testing this!).

Bgrds,
Finnur

My faith in vendor support has been restored!

Hi,

Recently I had to use the support of two different vendors we have started using more in the last year or so.

I have had my share of dealing with support at random large enterprise software and hardware vendors. And I have had my share of “uhh…have you turned it on and off again” with a 50000$ server. Which I was not so willing to reboot just so a first level support agent could go through his script (yes, I am a evil customer!).

So, the first case was with a hardware vendor. I had mentally prepared my self for a fight with first level support. I opened a case, gave them as much detail as I could and went on with my day.
In about 30 minutes someone contacted me (this was not a system down issue so I just opened up a “normal” ticket) and gave me access to a site where I could upload the relevant hardware logs. Around 20 minutes later I got a response from a very knowledgeable person which gave me a solution. Case closed in less then two hours.

Another case I had recently was that I have been installing and configuring new machines to host our databases. I had a question that I needed to get an answer for so I could finish building out the master which I would use to duplicate to our new fancy database virtual machines.
My experience with our previous OS vendor for our database servers have been horrible, slow responses and pretty much all had the classic “have you tried to turn it off and on again”. So, a case was opened where I layed out my question. Again, I prepared my self for loads and loads of script-based answers and even pretty much gave up any hope that I would get some answers.
Well, to say the least I got a reply in about two hours asking for some more info, and after providing the info I had a well backed answer from a very knowledgeable person in another 20-30 minutes. This was a support ticket with a very low priority.

My faith in support has been restored. HP and Oracle, keep up the good work!

Bgrds,
Finnur