Making Sense of WAFL - part 2 - Around the Storage Block Blog -
Making Sense of WAFL - part 2

By Karl Dohm, HP Storage Architect

Today I'm taking a few minutes to respond to some of the comments regarding my initial post on Making Sense of WAFL

Apparently in that post I unwittingly opened up a few of NetApp's old wounds which have been extensively hashed through previously in public forums.   Looking through the responses, NetApp has done a nice job of trying to deflect some of these problems through releases of nice looking apparently credible documentation. 

For those that are biased NetApp's way, or are enamored with the technical ways of WAFL, there may be nothing to say to convince you otherwise.  But for those with an open mind, read on.

The problems we are talking about here are the core of WAFL, and are clearly not easy to fix - or they would be already fixed.  NetApp is not unique is having problems of course, all array vendors have their strong and weak points.  But to assert that WAFL has no weaknesses around fragmentation, performance, and capacity utilization defies common sense.  The old wounds are there for a reason.

Let's take a look at the Avanade white paper.  It glows with enthusiasm about how the FAS3050c performs in MS Exchange based environments.  Further detail from the paper's author can be found in an interview here.  Peeling back the onion a bit, we see that this paper was created shortly after creation of a business partnership between NetApp and Avanade.  Evidence of this partnership can be found here

The IOmeter baseline performance data cited in the paper is interesting and worth exploring.  In the words of the white paper's author the IOmeter test against the FAS3050c had..  "two goals: to validate that our environment was set up correctly and to assess overall performance of the FAS3050c".   

The report is exceptionally loose about describing the setup.  The transfer size used for IOmeter are claimed to range from .5KB to 64KB in size, but there is no indication on the weight applied to portions of this range.  There is no mention of percent reads/writes or percent random vs sequential.  It also doesn't discuss MPIO policy or HBA queue depth setting.  There is no indication whether OnTap Exchange extents are enabled.  Worst of all, and unique to NetApp, it doesn't define the history of writes and therefore level of fragmentation on the LUN.

I like IOMeter because its a relatively simple test to run that is available for anyone to try since its in the public domain.  Given this open invitation to compare results with Avanade, it made sense to give the described test a try and see what happens. 

It turns out that no matter what combination of the unspecified test parameters I tried, I could never get into the ballpark of results claimed in this white paper. 

So to illustrate an example, I decided to just keep things simple as possible.  Running a typical exchange 2008 simulation load of 8KB transfer size, 80% random, 60% read, IOmeter queue depth of 128, MPIO round robin, Exchange extents enabled, HBA max queue depth of 254, 20x 15K spindle raid-dp aggregate, and letting the LUN settle through its fragmentation period, the throughput settles at 19MBs at a average latency of 52msec. 

The white paper claimed the FAS3050c runs 48 MB/s at 30msec latency, which is a world of difference.

So what gives?  One of several things has happened.  Perhaps I could not successfully piece together how to run this test from the information given.  It would be great to get clarification from NetApp on how to properly run this test and recreate the results.  The other explanation is perhaps that the results are not re-creatable without some special internal-use-only tuning parameters.  Or perhaps there is no way to recreate these results.

An EVA4400, run with the same workload, experiences approximately 39MB/s at 25 msec average latency.  That's about twice the thoughput on a workload that is mostly random, meaning the bottleneck is supposed to be at the spindles.  Apparently on the FAS the bottleneck is somewhere else.

Incidentally, this FAS 3050c LUN degraded about 10% in MB/s throughput as the fragmentation settled out.  That isn't such a big number, but recall that this test is mostly random I/O.  The sequential read portion, if looked at in isolation, degrades much worse.  It is why NetApp introduced Exchange extents. 

As in my previous post, if you don't believe what I am saying, give it a try.  Unlike my colleagues at NetApp, I gave you enough information here to run the test. 

Barring sound explanation from NetApp, It seems to me that there is reason to doubt the credibility of the white papers and test results that NetApp is producing.  


Posted 11-08-2008 12:04 AM by CalvinZ
Filed under: , ,

Comments

kostadis roussos wrote re: Making Sense of WAFL - part 2
on 11-08-2008 6:27 AM

Karl,

There are several issues with this post even if I disregard the innuendo about NetApp's integrity, ethics and business practices. .

What I find frustrating is that your argument can be structured in the following way:

1. I , Karl, the HP storage architect, know WAFL is flawed architecturally

2. My experimental evidence demonstrates this point

I could, just as easily argue:

1. I, Kostadis the NetApp architect, know that WAFL is not flawed technically

2. But I know your experiment is flawed and shows nothing

But the problem is that your an HP employee, and it's important to prove your point, which is that NetApp storage is deeply flawed for exchnage workloads inspite of our commercial succes. And I am a NetApp employee who is not about to describe in detail proprietary information about how exactly WAFL works so as to be successful while delivering unique data management features. And so we are at an impasse.

And so I can never prove my case to your satisfaction, and you can never prove your case to my satisfaction.

And the fact that we are at an impasse is okay. because i never believed benchmarks are how customers should buy storage systems, and I never believed that perfomance should be the only criteria for an exchnage solution. I believe things like the aiblity to create consitent point-in-time copies efficiently with minimal disruption to Exchnage as well efficient storage based replication, the ability to use less storage with no compromise on performance or resiliency (raid-dp) and tools that simplify the exchnage admin's life like SnapManager for Exchnage to be far more valuable than just raw performance.

So this performance argument is an interesting one but at the end of the day a piece of the overall exchange solution puzzle.

But I will make an observation, last I checked you never worked inside of the WAFL code base. Your name was never brought up as a WAFL architect.  I do not recall seeing your name on a patent application. You have never actually seen how WAFL is structured. So frankly, how you can presume to make assertions that WAFL is architecturally deeply flawed is a mystery to me.

And I will further observe, that I never argued that Avanade was a benchmark. The whitepaper shows how a significant exchange expert is willing to recommend NetApp storage. That's the significance of that whitepaper.

The benchmark, for what it's worth, was the SPC-1 number. And the benchmark demonstrates our system behavior under random IO workloads that stress the disk subsystem. The result's intended purposes is to deal with commentary such as yours.

And finally, I am, somewhat, surprised that you compaed a 2 year old storage system's performance  (FAS3050c) with 2 year old chipsets and significantly smaller amount of memory to HP's brand new storage array.

cheers,

kostadis

Pat Cimprich wrote re: Making Sense of WAFL - part 2
on 11-10-2008 7:31 PM

Hi Karl,

I've responded to some of your comments and questions regarding our NetApp testing. I've posted the response on Avanade's blog site in order to maintain some formatting in tables.

You can find the post here: blog.avanadeadvisor.com/.../12107.aspx

James wrote re: Making Sense of WAFL - part 2
on 12-18-2008 1:03 PM

I'm a SQL DBA and one of my companies has drank the Kool-aid.  Performance is the #1 priority and since our db's were put on this netapp device, we've had nothing but problems.  So yes performance is a piece but it is a very huge slice in my world.

Karl Dohm wrote re: Making Sense of WAFL - part 2
on 12-18-2008 4:54 PM

Hi James,

Thanks for the comment.  See the 4th post www.communities.hp.com/.../making-sense-of-wafl-part-4.aspx for quite a bit more detail on this topic.

I agree, performance is only one of the important aspects to consider.  I plan to get to some of the other main  considerations for enterprise class customers (like yourself)  in future posts.      

dennis wrote re: Making Sense of WAFL - part 2
on 01-08-2009 8:31 PM

I was unfortunate enough to purchase one of NetApp's basic iSCSI sans. I wish someone like you was around to tel me that I could get better performance out of openfile, a FOSS distribution. It had for a time biased me against iSCSI in general. I now see that it may havebeen instead a poor choice in product. I am also experiencing what you have described in that the more space is occupied / carved into a LUN the worse it appears to perform.

hp_and_netapp wrote re: Making Sense of WAFL - part 2
on 07-14-2009 8:18 PM

I can tell you as an HP and Netapp customer that I really don't care about high performance.  I do care about performance and getting that consistently but I'm just not one part of that 1% that needs crazy speed and feeds, nor will I buy storage based on that... especially for my exchange.

Fast is fast enough and my users don't care if they get data.... this fast or....... this fast.

We also have several SQL installations and have had no problems with performance.  Oh did i forget to mention that this is all running on Netapp storage.

Our preferred server vendor is HP and for storage we go Netapp.  A correctly architected environment will not present performance problems.

Every server and storage vendor sucks.  It just so happens that HP and Netapp suck the least.

CalvinZ wrote re: Making Sense of WAFL - part 2
on 07-15-2009 3:29 PM

Thanks for your feedback on what you're looking for and that performance isn't important to you.  Different customers have different use cases and it's always interesting to hear about what they are.  Personally, I care a lot about Exchange performance - for me, Exchange is very important and I don't want my productivity to go out the window because of slow performance, especially when I can get over 200 messages a day.

I'd tend to disagree with you on your comment about "Every server and storage vendor....".  Are there pluses and minuses - absolutely.  Are some better than others - yes again.  If your premise that they all sucked was true, in our industry, a new company would rise to the top and wipe all of the rest out.  Obviously, a lot of storage and server vendors are doing some things right but with lots to still improve upon.  

Thanks again...

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Type the numbers and letters above:
Powered by Community Server (Non-Commercial Edition), by Telligent Systems