NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program - Around the Storage Block Blog -
NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program

By Jim Haberkorn

NetApp has a huge usable capacity issue in many environments that it tries desperately to hide but at the same time seems driven to confess as if subconsciously trying to purge some unresolved guilt.  As further proof of this I submit NetApp's 50% capacity guarantee program for VMware environments.  http://media.netapp.com/documents/virtualization-guarantee-faq.pdf

If I had to guess, there is a group within NetApp that writes their technical white papers and there is another group that does their marketing and responds to blogs - and these two groups hate each other and are constantly trying to get each other in trouble.  But what is working in NetApp's favor is that I think most people are like me and can't read every word of every document that crosses their desk so these NetApp contradictions go mainly unchallenged.  But, in NetApp's case, when I do get around to thoroughly reading their marketing papers I sometimes feel like Shelley Duvall in the Shining when she finally gets a peek at Jack Nicholson's book.     

I'll give you an example: my colleague Craig Simpson wrote a blog correctly pointing out that the NetApp 50% guarantee stipulated that the comparison configuration had to use RAID-10 and the NetApp one had to use RAID-DP with a 14+2 raid stripe, and that that single stipulation accounted for 43% out of the 50% capacity guarantee.  I thought that point extremely significant but I didn't notice too many other bloggers on the subject picking it up.  In fact, I noticed one blogger correcting Craig and saying he must have meant RAID-5 and not RAID-10 - clearly that blogger hadn't read the fine print in the NetApp guarantee.  Which reinforces my belief that most people just don't have the time to read these NetApp papers with any degree of detail.    

So, back to NetApp.  What really was being guaranteed in their program?  When you strip away the RAID red herring, the real guarantee was that NetApp's dedupe technology would save the customer 7% in an extremely dedupe friendly dataset in a VMware environment. (see list of stipulations on page 2 of the program paper).     

And then the next obvious question is:  could this NetApp program really be just another subconscious confession, another desperate plea to be caught?  In other words, does this program have anything to say about NetApp usable capacity?  I think it does.

Consider this:  NetApp advertises a dedupe efficiency of 70% for typical VMware environments (http://www.stemmer.de/service/workshops/sbb2008sep/download/netapp.pdf; page 6)   Now, I don't want to be accused of piling on here so even though the program restrictions appear to me to lift the dataset out of the range of ‘typical' into ‘extremely dedupe friendly', let's stick with the 70% number. 

If NetApp can only guarantee a 7% capacity savings but in fact has deduped the volume down to 30% of its original size, I would say that puts the underlying NetApp usable capacity efficiency in the range of, well...I'll let someone else do the math.  Actually, someone else has done the Math - NetApp.  See page 10 of this NetApp white paper: http://media.netapp.com/documents/NetAppFAS3170ESRPStorageSolution.pdf  and page 24-25 of this one: http://media.netapp.com/documents/tr-3431.pdf  and the note at the top of page 11 of this IBM Redbook (note: IBM OEMs NetApp filers - and my thanks to the contributor on an EMC blog who pointed out this link): http://www.redbooks.ibm.com/redpapers/pdfs/redp4287.pdf

These papers all have something to say about NetApp usable capacity in three different environments, though not for VMware.  But based on NetApp's also limiting its capacity guarantee to only 7% for a deduped VMware configuration, I think you can justifiably conclude that NetApp's usable capacity woes span all sorts of applications.  And here's the final kicker:  to qualify for the program you have to first buy NetApp's Implementation and Deployment services and NetApp VMware Implementation Services (page 2).  So NetApp gets the extra professional services revenue for every customer who wants to try out the guarantee and then only pays out the extra disks on those customers who can't get an overall 7% improvement on their deduped NetApp VMware environments - environments that NetApp advertises as typically deduped  by 70%.    Where is Jack Nicholson when you need him? 

Jim Haberkorn


Posted 12-09-2008 3:40 PM by CalvinZ
Filed under: ,

Comments

Alex McDonald wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-09-2008 9:02 PM

I laughed like a drain!

And then I got a little more than annoyed, because you're basically at it, and you can't keep your story straight from one post to the next.

I'm calling you out on the papers you quote, which all refer to snapshot reserve. Your previous post you said "Notice that I did not bring up snap reserve space since I don’t consider that wasted space." Hmm. Cheap trick.

And then I ask myself, where does HP keep its snapshot blocks, I wonder. If you were using snapshots on HP kit, how big would you recommend your snapshot reserve to be? We'll do whatever you do -- for the same effect.

And now, Father Jim, I've got that confession you want me to make. We guarantee a 50% saving for VMware on RAID-10, because RAID-5 just isn't good enough to protect hundreds to thousands of VM images. And I am sorry, you'll have to buy some services. Once. Unlike HP's dumb as a box of rocks offering, where you just have to keep buying disks.

Because you don't have dedupe to save space, dual parity RAID protection, snapshots and clones that don't hit performance and, more to the point, a guarantee for any %age at all.

How much space do a couple of thousand server images take on an EVA? I've a calculator if you run out of fingers and toes. www.dedupecalc.com

Chuck Hollis wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-09-2008 10:37 PM

Hi Jim

This is one area where EMC and HP can agree -- customers deserve better than these sorts of marketing gimmicks.  I took them to task the very day it came out.  

The NetApp bloggers didn't have much to say about my analysis, since -- well -- I was right.

You missed a key aspect -- all the exclusions regarding the data that's not applicable.  No Exchange.  No database of any sort.  Nothing that's already compressed, like PowerPoint or PDFs.

As I looked through those exclusions, I had a really tough time coming up with a "what's left?" target.

Not to mention all the other arcane configruation exclusions and restrictions you'll find in their technical white paper.

The customers I've talked to laugh and say "what else would you expect from them?" and perhaps we're all taking this too seriously.  

I think both HP and EMC take the customer relationship very seriously.  

We're both shocked when another vendor doesn't.

Cheers'!

-- Chuck

Cleanur wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-10-2008 1:40 PM

Talking of dedupe, you know people in glass houses shouldn't throw stones. Witness the catastrophe that could have ensued on those many thousands of de duped images across many thousands of filers.

thezendiary.blogspot.com/.../vmware-and-nfs-on-netapp-filers.html

vmwaretips.com/.../nfs-datastores-and-what-was-their-big-issue

Being at the bleeding edge as an administrator is never a good policy, nor is sacrificing availability for performance and time to market.

Alex McDonald wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-10-2008 6:06 PM

@cleanur; this was related to a VMware bug (VMware SR195302591,fixed by patch ESX350-200808401-BG) for VMs on NFS only, which caused a freeze of the VM when taking a VMsnap (not a NetApp snapshot). There was at one point a workaround to turn locking off, but a VM could be incorrectly re-started on another node in a cluster if connectivity was lost to the service console.  This opened a window of opportunity for data corruption to that VM (and only that VM). On August 4th, customers were advised that locking should be enabled at all times. kb.vmware.com/.../search.do

This was related to NFS VMsnaps and the workaround, and was unrelated to NetApp snapshots or deduplication.

We deduplicate identical blocks; if a block is updated by one VM, it is "unduplicated" and the other VMs retain the original deduped block. I fail to see how dedupe comes into this at all.

Jim Haberkorn wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-10-2008 6:31 PM

Thanks for your comments, Alex, my son.  

Laughter and confession are both good for the soul.  Now all you have to do is disband your 50% guarantee program, take the Mercer/Wyman paper off your website, give your tech writers a pay raise for telling the truth, and then go and sin no more.   Father Jim  

P.S.  Here’s your space reservation formula (at the very bottom) cut and pasted from page 25 of the second NetApp white paper I linked to in my original blog above.  Notice I didn’t calculate in the 15% for growth either.  I don’t like to pile on.  But if you think I was playing a cheap trick, then by all means add in the snapshot space if you think it helps your cause.  Notice that each of the snaps (called backups in the formula) has its own separate 10% space reservation.

P.P.S.  Thanks for clearing up the issue about why RAID-5 was excluded from the guarantee.  All along I thought it was because NetApp was trying to stack the deck in its favor.  I see - it’s because of the RAID-DP data protection advantage.  So NetApp then will honor the 50% guarantee vs. a RAID-6 configuration?  Right?  No.  But what’s the reason now. Let’s pick one:

  • RAID-6 isn’t used enough
  • Isn’t fast enough
  • Isn’t NetApp enough

-----------------------------------------------------------------------------

Database Volume Sizing

Formula:

Minimum database volume size = ( ( 2 * LUN size ) + ( number of online backups * data change percentage * max database size ) )

Example:

- Using the database LUN size of 611GB + 15% for growth

611GB + 15% = 703GB

- 10% data change between backups

- 7 backups kept online

Minimum database volume size = ( ( 2 * 703 ) + ( 7 * 10% * 611GB ) ) = 1,834GB

Cleanur wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-11-2008 12:16 AM

Alex,

"This was related to NFS VMsnaps and the workaround, and was unrelated to NetApp snapshots or deduplication."

I never said it was, the point I was making was that bleeding edge using the latest supported features can be a heady combination. But it can also become a very uncomfortable place very quickly, especially for the Admin who's effectively underwriting the solution and carrying the can for recovery.

The root cause of the corruption is not actually fixed by the patch you mention, the patch appears to merely fixe a performance problem with Vmware snap deletion on NFS. The main cause of corruption appears to be a work around for the performance issue noted on NFS when deleting snaps. The workaround published by yourselves in an earlier version of TR-3428 best practice documentation used NFS.LockDisable=1 to disabled the locking mechanisms allowing corruptions to occur under specific circumstances. The supplied patch does not resolve this issue alone and also requires the previous Netapp best practice NFS.LockDisable=1 to reversed to NFS.LockDisable=0 in order for the issue to be fully resolved.

vmwaretips.com/.../nfs-datastores-and-what-was-their-big-issue

"So…why did NetApp initially recommend disabling NFS Locks? Well, in their initial testing the removal of VMware Snapshots took a long time over the NFS client (has to do with VMDK quiescing, lock removals, appending, etc… the one area NFS lacks because it is not a block level protocol). Their fix for the best practice guide was to remove NFS locks, but it sounds now like they didn’t do enough research before recommending this (and neither did I before implementing)."

I'm not placing this solely at anyone's door, it looks like a number of factors were involved and it could potentially have effected many more vendors had they made the same recommendation, besides it looks like the dust has just about settled now. I was merely trying to make the point, that just because something's feasible it's not always immediately desirable.

blogs.netapp.com/.../vmware-over-net.html

blogs.netapp.com/.../the-world-is-ta.html

To Netapp's credit all of those snaps probably saved a few red faces during the recovery operations.

You got me on the dedupe, I must have been having a blond moment.

Cheers

Jim Haberkorn wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-11-2008 4:03 PM

Thanks Chuck for your comments.  Yeah, being right in these blog discussions is always the best defense.  And a thick skin doesn’t hurt either.  Jim Haberkorn

Jim Haberkorn wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-11-2008 5:39 PM

Thanks Cleanur for contributing to the blog.  My hope for this blog has always been that it would be constructive for readers. I appreciate the thoroughness of your comments.   And, you too, Alex, thanks for your constructive repsonse to Cleanur.  Jim

Calvin Zito wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-11-2008 10:46 PM

Hi everyone -

Jim had a comment that was a bit long so I decided (that's what I get to do as the owner of this blog) to use it as a follow-up post.  You can find it at this URL: www.communities.hp.com/.../netapp-s-shining-moment-its-capacity-guarantee-program-follow-up.aspx

Thanks,

Calvin

Andy Steele wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 12-16-2008 2:05 AM

I had a client tell me that the netapp a-sis de-dupe feature puts the saved space into a sort of 'reservation' state which although free'd up, should not be used.  So although space is available, the claim was that it should not be reused.  Slam me for rumour mongoring if you must, i'm just looking for validation either way.  

Mike Riley wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 02-16-2009 10:34 PM

Andy,

I don't consider that rumor mongering.  It's an honest question.  You will see competitive solutions with this type of approach, namely a reservation reserve area where data will be moved before it is deduped.  (Defeats the purpose of dedupe IMHO if you have to allocate an additional reserve area that can't be used for anything else).

There is no such thing as an a-sis reserve or "reservation state" with NetApp.  If you dedupe on NetApp (SAN or NAS) the space can absolutely be re-used.  Heck, it's encouraged.  Dedupe on primary storage has been a huge catalyst for success in areas such as VMware because customers can boil off massive amounts of redundant data (e.g. think about the "free space" on the C: drive of multitude of VMs) and then use that re-claimed space to deploy additional VMs.  NetApp leverages the same block-sharing technique for a non-duplication approach.  For more on that, check out the FlexClone feature demonstrated in this YouTube clip: www.youtube.com/watch.  

Jim Haberkorn wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 02-23-2009 4:26 PM

Hi Mike,

Thanks for taking the time to respond to Andy's question. For any readers intriqued by your recommendation to use NetApp dedupe, I would refer them to my opening comments in this blog in regards NetApp's dedupe/capacity guarantee program. Also, if any customer is interested in proving or disproving NetApp's dedupe performance claims, a simple IOmeter test should do the trick. Cheers, Jim    

Thomas wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 03-25-2009 10:43 AM

As being a customer who has to decide which system to use in the future for the next - lets say - 5 years... which system I should use?

I know that you cannot answer the question here in this blog. Overall, I feel a bit helpless. I read about "best pratices", "avaiable free space", "dedupe just working for 70%" and all this stuff...

So... what exactly is the difference between a HP EVA (e.g. 4400) and a NetApp solution? Is it true that NetApp is the only vendor who support direct NFS  access? What would be a "killer application" for each product?

Thank you for answers.

Regards,

Thomas

Jim Haberkorn wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 04-07-2009 7:20 AM

Hi Thomas,

Thanks for your questions. Sorry for the delayed response - there hadn't been any traffic on this blog for over a month and I stopped checking it.

It’s impossible to advise you on which HP system would be best for you to buy without knowing more details about your environment. However, I can give you a short answer on what is the difference between a NetApp filer and an EVA: A NetApp solution is two servers each with the approximate compute power of a DL585, joined together in a very primitive active-passive cluster, with software RAID, and a built-in proprietary operating system that imposes a huge usable capacity tax in many environments and is so inflexible that even LUNs have to be treated as files.

While the EVA, on the other hand, is exactly what it claims to be: It is a bona fide mid-range disk array with state of the art block technology and the easiest management in its class.

And thanks again for writing, you actually gave me the inspiration for my next blog. I’m going on holiday, but in a couple of weeks, look for a blog on why I refer to a NetApp cluster as an ‘active-passive’ cluster - in direct contradiction to NetApp's claims of having an 'active-active' cluster.

Best regards,

Jim    

John Martin wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 04-07-2009 11:50 AM

Thomas,

     speaking as a NetApp employee I would clearly counsel you to purchase NetApp, but as this is an HP blog, that would be really poor form.

If you really want to get this clarified, get an HP SE and a NetApp SE (not neccesarily in the same room at the same time though that might be interesting) tell them your current and projected requirements, and let them give you a current and projected costs and benefits for their offerings. I'm confident that both vendors will give you a good solution, but if you dont at least look at the NetApp offering in detail, you will be doing yourself a disservice

NetApp is not the only company with a VMware certified NFS solution, though NetApp almost certainly has more experience than any other vendor. TO be absolutely sure, check on the VMware compatability guide for what is supported from which vendor.

Regards

John Martin

Jim Haberkorn wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 04-07-2009 2:35 PM

Hi John,

Thanks for your response - a  civil and gracious answer - and frankly, I don't see too many of those on blogs these days. I have already made a few comments on NetApp COO in previous blogs, so I'll let your point about comparing the two products rest for now. Have a good day.

Jim

Matt wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 04-09-2009 1:48 PM

What a mis-leading blog post.

We're a SME that implemented a small NetApp SAN last year and we couldn't be happier. It has a host of fantastic features but de-dupe is right at the /top/ of the list. I should add at the outset that we have absolutely nothing to gain by promoting NetApp; we're just a very satisfied customer.

In our environment we're achieving a de-duplication rate of over 70% across all data (a typical mix of VMware, user and app data) - that's more than a tripling of usable space.

The couple of other NetApp customers I know show similar savings using de-dupe, and most (unpromoted) testimonials on the internet demonstrate similar results.

Even if you take into account the disk space lost to the various parity and protection mechanisms NetApp implement (which I assume is your "huge usable capacity problem", and are all aimed at protecting your data I might add), we're still achieving over 50% space saving compared to storing the same data on any other type of storage I can think of.

No comparisons-to-RAID10 trickery, no but-your-dataset-is-special arguments, just a plain and simple space saving of over 50%, and the $$ saved by not having to buy extra disks.

Calvin Zito wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 04-09-2009 5:46 PM

Hi Matt,

We always appreciate the perspective of end users.  Jim (our blogger who wrote this post) is on vacation through early next week.  Check back the middle of next week as I'm sure Jim will reply to you then.  Thanks, Calvin

Jim Haberkorn wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 04-14-2009 10:16 AM

Hi Matt,

Thanks for your comments. First: I would like to know what you found misleading about this blog in regards to the NetApp capacity guarantee program. Was I wrong about all the caveats? Was my RAID-comparison math wrong? If I am in error about something, I would be pleased to correct it.  I think, though, if I had made a mistake, a NetApp employee would have jumped in and pointed it out by now. But, in any case, please let me know.

Second, as an aside: the argument you use below in justifying and accepting the ~20% lost space because "it is aimed at protecting your data” is not the first time I’ve heard that argument. I heard it the first time from a NetApp storage architect who is now working for HP, and who was relating to me how some NetApp sales reps tried to explain away their usable capacity problems. When you think about it, though, isn’t that argument a little strange. The NetApp space reservation policies are indeed there for a very necessary purpose, and though NetApp will refer to them as ‘best practices’ they are much more serious than that. They are necessary to compensate for a flaw in the NetApp technology – a flaw no other array has, and one that could result in the loss of data. So, in a sense, it is a tax on the customer, for a design flaw that NetApp cannot remove from their operating system.  

At no place in my blog will you ever catch me saying that every NetApp customer is unhappy or that every NetApp customer sees every NetApp problem. What I do say is that with the NetApp technology there is a very complicated relationship between capacity, performance, and time, and that NetApp customers rarely know ahead of time if, when, and how this complicated balancing act will hit them. If you are one of the lucky ones that is doing well, then I am happy for you. If that should change for you at any time in the future, then we would consider it a privilege if you would consider HP for your next storage purchase.  

Best regards,

Jim

Neal Wingenbach wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 04-28-2009 5:53 AM

There is a NetApp useable space calculator online available for anyone to download.  

nicholasbernstein.com/calc.php

I just did a calculation of 60 - 300GB FC drives.  With 2 spares, it yields 9852 GB usable from 16800 GB raw.  That's approx 57%.  That's not incredible, but its fair.  So why the debate?

Reserving part of that usable capacity for snapshots (which are online backups) provides tremendous benefits.  Thin provisioning is free, and can be done anytime by simply unchecking a box.  Deduplication is free, and done on a per LUN basis.  That 57% is starting to look a lot better, huh?

How is performance on a NetApp snapshot?  I don't believe they use a copy-on-first-write type snapshot so it doesn't suffer from the same performance problems of other legacy storage.

Jim Haberkorn wrote re: NetApp’s ‘Shining’ Moment – its Capacity Guarantee Program
on 05-13-2009 1:02 PM

Hi Neal,

Thanks for your comment. Sorry it has taken so long to respond, but I just got back from holiday this morning. You’re not the first person who has mentioned these NetApp capacity-calculation tools. I suppose in the interest of fairness I should give you the same response I gave to the others – that these calculators are marketing aids and are not the tools NetApp installation people use to correctly size a NetApp environment. With any other vendor, I normally, wouldn’t make a big deal out of this – but with NetApp you really need to be careful. Here’s why: With NetApp, usable capacity is heavily dependent on the performance you are willing to live with. NetApp performance typically degrades over time. The way NetApp recommends solving that is to add disks. As far as I could tell, the calculator you reference does not take this into consideration. Also, NetApp has ‘guidelines’ that give a range of how much space should be reserved at the LUN, volume, and aggregate levels. In a ‘paper’ configuration, these guidelines are easily ignored or minimized. I’ve seen NetApp do it both ways.  

In regards, snapshot performance, you are right about other vendors doing a copy-on-first-write whereas NetApp doesn’t, but it would be misleading to say that, as a result, the NetApp snapshot implementation is faster. NetApp’s WAFL file system optimizes for writes; while the copy-on-first-write was designed to optimize for reads. So, the question is: Which would you prefer in your environment? Since most apps are read intensive, the industry, has pretty much decided that optimizing for reads is the best way to go.

Best regards,

Jim          

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Type the numbers and letters above:
Powered by Community Server (Non-Commercial Edition), by Telligent Systems