My experience with Nimble Storage in a POC I did nearly 9 months ago

About this time last year I was part of a project where we were looking at replacing the primary storage systems in our infrastructure.

While we did not end up going with Nimble Storage at the time we did do a POC of the CS700 arrays (which has now been replaced by the more powerful CS7000).

Since we live on a small island in the north Atlantic, companies do not normally ship out a fully configured solution so when the Nimble reps actually asked me if I wanted to do a POC before I even asked them about availability of such a program I was pretty amazed. And when they were actually willing to ship a fully configured solution (two CS700 arrays) to us without any commitment I was even more amazed. Of course, since Nimble is a small company in the storage world this is probably the only way for them to get the larger companies to give them a chance when competing against the big dogs (EMC, HPE, NetApp).

When we were considering storage vendors we did not even think of Nimble Storage. I actually saw them at a VMMUG meeting here in Iceland where they had a presentation about the solution. The presentation was on a Thursday evening and I spent the days after wondering if the claims they laid out were actually true (X IOPS per array with 11 NLSAS disks and 4 SSDs).

I sent the sales guy who was at the presentation couple of emails over the weekend asking him questions about the solution. He got me in touch with a pre-sales technician who was able to answer my questions quickly and gave me lot of stuff to think about.

Well, long story short – we ended up testing the CS700 arrays with ~200TB usable space and ~7TB of SSD cache. The technician came on a Tuesday morning at ~09:00 and about 2 hours after we started installing the array in the first datacenter we were actually done installing both arrays in two different sites and could start migrating data to the arrays. The technician went through the basic stuff but since the interface is so simple (and well, the solution it self is just amazingly simple as whole) so he left us at ~14:00 if I am not mistaken. Nimble Storage shipped the arrays to us (and back) free of charge.

We moved loads of dev and test databases (and even some production ones later in the POC) on the system and started having some fun. It was obvious that even though the system only uses NLSAS drives as the backend storage they are capable of pushing some extreme IOPS/throughput in comparison to the traditional array with the Adaptive Flash/CASL secret sauce. Performance was pretty good. However – since our workload is not only IOPS based but also throughput based we did actually hit a small wall (please note that this was with the older CS generation, they have newer arrays out now that fix this somewhat). We actually have a window in our environment where we push more then ~1.6GB/sec (gigabytes) for quite a while (however, we were just maxing out the FC interfaces in the hosts/arrays – we later found out that we could push ~3GB/s on a faster array/faster network). And while the solution actually powered through things nicely we were not happy with the max throughput available at the time.

While we were doing the POC I interacted with the support team often, especially when we were looking at the throughput issue. I had access to Infosight during the POC and the information gathered there was very helpful. First we had a small latency issue – and the Nimble tech was able to point out that the issue was not in the Nimble array but in our VMware hosts. This ended up being a driver issue – so we got that fixed. After that the support technicians went through a lot of data analysis (we had multiple phone meetings with support during this case) and they helped us understand where the bottlenecks we were hitting and why the system did not perform in the way we were expecting.

After finding out that the product wasn’t a fit for us at the time they were understanding and didn’t hold any grudge against us, even thought they spent quite a lot of time trying to get things working. We packed the systems in the packaging they arrived in and got things shipped out.

The saddest part was that while the solution didn’t work out for us I have never had as pleasant support experience as I did during this ~30 day POC (and I have sadly had to deal with the support of multiple vendors and most of the time it can be quite tedious). When we were debugging the host (latency) related issue they never tried to argue that this was not there problem (but of course this was a POC and maybe they were just being extra nice, I don’t really know) but I never got the feeling they were trying to push the problem over to anyone else.

Thumbs up for Nimble – I hope they will keep up the good work!

Bgrds,
Finnur

Vendor bashing in the storage space….

I’ve been spending some time reading up on different storage vendors in the last 6-12 months (yes…my personal interests are probably different from yours)…..and it has been horrible to see employees of different companies fighting things out on forums.

Now – Most of the time I have been searching up on some performance related stuff from customer stories on those forums and without an exception when I find something interesting there is always a employee of a competitor in the forum thread yelling something bad about the competition. Yes….we get it, your product must be superior and without all bugs.

What bothers me the most is that I have had sales people talking some shit about other vendors directly to me. And often, those sales people have actually sold me or a company where I am working or have been working something that maybe was not the best solution at the time.

Vendors/Resellers: Can we please leave the bashing at home? I have no interest in buying a product from someone that spends his day talking crap about a competing product. It really just tells me that I should stay away from doing business with you!

If someone has actually decided to buy a IBM storage solution instead of something else because a vendor rep called NetApp “NetCrapp” one might wonder if that decision was actually made on the correct terms…….

Laters!
Finnur

HPE OneView 2.0

Hi!

Recently we upgraded from OneView 1.20 to OneView 2.0. The 2.0 version has been out for a while but we didn’t upgrade right away. Luckily though – because there were some serious issues with the upgrade from 1.x to 2.0 with the first release. Earlier this year they released a updated version, 2.00.07, that fixed those issues so one can safely upgrade 🙂

The biggest feature I was waiting for was the ability to move a server profile from one hardware type to another. For example: A HPE BL460c G9 with one mezzanine adapter is one type and BL460c G9 with two mezzanine cards is another one. So a server profile created for one type of hardware cannot be reused for another. Imagine how surprised I was when I was adding a HBA to few blades and I had to recreate those profiles from scratch! But now – this is as easy as 1-2-3 since you just move the server profile to the new hardware type and start your server on either the same blade with the added HBA or move it to another generation or type of a blade. A small (and dare I say – a little bit weird) issue but makes life just a little bit easier when adding new hardware to your blades (managed by HPE OneView).

Server Profile templates are also brilliant – you simply create a VMware host template, then create server profiles based of that template. If you need to do a change on the BIOS settings on all your VMware hosts, you can do so on the template and the changes should be pushed down to all your servers.

Another feature I like a lot is the addition of Smart Update Tools and the ability of being able to upgrade drivers as well as firmware from OneView without needing to shutdown the machine and change the FW baseline to a newer version and then wait for OneView to start up HPE SUM and finish it’s business. And then going into the OS and upgrade the drivers manually (or by running SUM locally on the OS). I have not tested it yet but I am planning to do so later on. Cool feature at least.

So far – cannot complain! 🙂

Bgrds,
Finnur

Migrating Solaris 10 to zones hosted on Solaris 11 – How I learned to (somewhat) like Solaris!

Hi,

NOTE: There is not much technical info here!

Anyone who has ever worked with me has probably gotten a pretty good idea about how I somewhat dislike most proprietary UNIX systems. Sure, it probably has most to do with the fact that I am very fond of Linux (which was the first *NIX system I ever played around with). Although I have never pretended to be a UNIX expert, I have had to learn far more then I ever thought I would about HP-UX, AIX and Solaris. Yes – I am the guy who wants to migrate everything to Linux (except when the other platform is the better tool for the job).

However, I often end up with being one of the few guys that has any experience with any kind of *NIX systems so if there is a old UNIX machine around it will probably end up in my hands at some point.

Now – I have a application in a dev/test/prod environment that was running on a couple of Sun M4000’s and a single T4-1 machine that is one of those cases. This application is expected to live for years to come and the machines were showing their age. After spending a night replacing the motherboard in the prod machine with the one from the test machine last winter I now had a good case for a hardware replacement (not that we didn’t really have a good case before – it just made things move a little bit higher on the priority list). And after a couple of meetings we decided to go for a couple of T7-1 machines….and to get a contractor to help us out with the migration.

The T7-1’s are probably an overkill for what we had to do but since we needed SPARC machines it was the best option – and I was pretty amazed after receiving the quote for the hardware. The pricing was far less than I thought – even with 24×7 3 Year hardware and software support. And yes….they actually weigh a lot less then those damn M4000s!

We finally got to work and the contractor setup a plan for us. The machines would be installed with Solaris 11 (Oracle VM for SPARC really I guess), and run three global zones (LDOMs) – one for each instance of the application. We would then migrate the Solaris 10 installs into a zone on the global zone.

This was the first time I did any real work on Solaris 11, and had someone helping me out who was more the capable with Solaris. Long story short, with the help of our contractor we quickly had the three LDOMs ready for action. The contractor showed me some ldm magic (ahhh, hello ldm migrate!) and I got to give it to Oracle that they have done wonders with Logical Domains and migration. We played around with the vHBA stuff but it seems it is still a bit buggy so we mapped the disks through the ldm interface instead. The guys at Oracle might want to fix that though – it makes things very easy with disk management. If I remember correctly, IBM has already mastered this with the VIOS (Virtual IO Server).

But – I cannot praise the migration process highly enough. Solaris 10 zones on Solaris 11 are pretty darn cool. A native Solaris tool was used to create a backup (flar) of the source operating system installation. It is then restored into a Solaris 10 zone hosted on a Solaris 11 global zone. Data is then migrated with (we used dd for the raw devices and either NFS for the data or just took a storage snapshot on our SAN and copied the data with the usual tools between the LUNs from the UFS filesystem over to a new ZFS one).

After fixing up some symlinks and some permissions we were able to start the application in about 2 hours. Rinse, repeat for each instance. However I must admit that we had some issues with the first try of migrating the test system (which was the first system we tried to migrate from a M4000, the dev was on the T4-1) so we had to give it another try.

With the migration done, we have had the application running on the new hardware for about a month now and man, those beasts fly! I have done some reading on Solaris 11 and working with it ain’t all that bad. IPS makes installing and patching packages a breeze. Man…I wish we had IPS on Solaris 10 since we still have to manually update those Solaris 10 zones!)

The point of this post (moral of the story?): I thought I would never say this – but I have actually learned to like Solaris 11 somewhat. It has come a long way since Solaris 10, and I really think both Solaris and SPARC are going to be here for a long time to come…..at least in the enterprise.

……(and this fondness of Solaris has probably something to do with the fact I was actually working with someone who actually knew what they were doing :-))

Bye!
Finnur

Interesting blog post on Oracle hangs (and hanganalyze/systemstate)

Hi,

Saw a colleague post this blog post earlier which gives some pointers on how to debug hangs in Oracle databases. At least it gets you looking in the right direction!

Bgrds,
Finnur

STOR2RRD to the rescue once again!

Hi,

A few weeks ago we decided to upgrade the firmware on couple of storage systems. Everything seemed to go as planned but after the upgrade we started to notice some latency issues.

I won’t go into detail what was wrong (hint, HDD firmware can cause some serious issues!)- but this tool, STOR2RRD has saved us loads and loads of time – and is as simple as any storage monitoring tool can be. It ain’t perfect but it pointed us in the right direction and helped with getting this specific issue solved.

Cannot recommend this tool enough (and it is free and open source licensed under the GPL v3!).

Bgrds,
Finnur

A quick way to backup your LDM config on SPARC hardware

Hi,

A quick and dirty way to backup your LDM configuration for each domain in to separate XML file:
for i in `ldm list-domain|grep active|awk '{ print $1 }'`;do ldm list-constraints -x $i>>$i-`hostname`-$DATE.xml;done

NOTE: This worked on a Solaris 11.3 box….YMMV!

Bgrds,
Finnur

Checkout this deep dive with Frank Denneman on NUMA!

Hi,

Just a quick shoutout to Frank Denneman and his deep dive articles on NUMA. A must-read for everyone working with servers!

Bgrds,
Finnur

azure-cli and Azure DNS!

Hi!

I have been looking at using Azure DNS for a couple of weeks for my domains. So, yesterday morning when my SO was still sleeping I went in for the kill and researched options on how I could import BIND zone files without having access to a machine with Powershell.

Hello azure-cli! A magical little tool that runs on pretty much all platforms capable of running node.js. My interest in Azure just skyrocketed after finding out MS actually has spent some time working on this brilliant little tool. A quick “npm -g install azure-cli” from my MacOSX laptop got me going.

But back to the Azure DNS magic.

First – make sure you have a copy of your BIND zone file.
Next, Switch to azure-cli ARM config mode: azure config mode arm
Next, Create the domain in Azure DNS: azure network dns zone create myresourcegroup myawesomedomain.com
Next, Import the BIND zone file: azure network dns zone import myresourcegroup myawesomedomain.com myawesomedomain.com.txt (Where myawesomedomain.com.txt is my BIND zone file)

Now, check out what NS servers you were assigned with the following command:
azure network dns record-set show myresourcegroup “myawesomedomain.com” “@” NS
Then go to your domain registrar control panel and point your domain to the Azure DNS servers listed with the above command.

You can get more info from this MS documentation site.

And….if you have a huge domain estate then you might be interested in knowing that the Men&Mice Suite actually supports managing DNS records in Azure DNS! 🙂

Bgrds,
Finnur

All Flash…and other SAN stuff

Hi all,

This year I have (among other things) been working on migrations from older storage arrays to some newer all flash arrays. Man, those babies scream!

It was awesome to see the latency drop from 5-10ms+ to under 1ms….at times less than 0.4ms. You also start to see the flaws of older filesystems (read: ext3).

In the progress we also moved from 8Gbit FC to 16Gbit FC to be able to push more bandwidth as well as IOPS.

Some unforeseen problems (which everyone should also look out for:))…….It can often be very easy to saturate the back-end bandwidth on your arrays if everything is doing 16Gbit….and you run backup jobs on top of some huge batch processes plus your normal workload 🙂

Just some words of advice – when you move onto faster equipment you often move problems from one place to another!

Bgrds,
Finnur