Why NetApp Gets Blogged - Around the Storage Block Blog -
Why NetApp Gets Blogged

By Jim Haberkorn

Before I begin, I don't know how other readers react to it, but whenever I see someone get frustrated and lose their temper on a blog, it's usually because deep inside they know they are losing. Maybe that's not how it is portrayed in movies and books, but in real life, that's how it works.

For those who have come in late to this discussion, (see the discussion on our previous post titled "Are NetApp performance claims logical") this entire exchange I am having with Alex from NetApp actually started last November when I analyzed a NetApp white paper attacking the HP StorageWorks EVA, and pointed out its inconsistencies and highly questionable tactics.  At the same time, an HP engineer, initiated a blog in which he discussed performance testing he'd completed on a NetApp filer vs. an EVA.  I'm pointing everyone now to the last of the four blogs our engineer posted titled "Making sense of WAFL Part 4"  as this one more specifically addresses the performance issue Alex raised in his blog comments I referenced above. I have made the comment several times that HP won that discussion. I stand by that, but in the end, it's up to every reader to decide for themselves.  

Now, Alex asked if I would explain why NetApp's having a file system optimized for writes has any bearing on NetApp storage performance. Here is the answer: The NetApp file system - WAFL -  is optimized for writes. Basically, WAFL will write to the nearest available free space to where the disk head is. I believe Alex conceded that. The rest of the arrays that we have tested in our lab take a different tack; they optimize for reads by updating blocks in place. And why? Because in block environments most applications are read intensive as opposed to write, and you want to keep the blocks for databases as contiguous as possible so they can be read faster with a minimum of disk head movement. Oracle, for example, depends heavily on locality of reference for its performance. This NetApp write-optimization was designed into the NetApp file system many years before they ever contemplated going into block storage, and the trade-off is that their read performance is impacted relative to the competition. Now, I don't consider this single point to be the final word on NetApp filer performance - all storage systems have their peculiarities - but coupled with some of NetApp's other design decisions and some of the testing we've presented, I think we are building a pretty good case in regards their block performance behavior.     

So, forget for a second the tests that each vendor runs and publishes on their own product, forget that all of us are paid to ensure our company's success in the market, forget that we all like to win debates; the point I keep hammering home in my blog posts through various arguments, some technical and some logical, is that the block performance, usable capacity, and cost of ownership claims NetApp makes about its technology do not add up. They don't add up in our lab when we test them, and they don't add up logically when we analyze their technology. Oh, every vendor puts its best foot forward - we all expect that, but, in my opinion, some of NetApp's claims deserve special attention.

Finally, there are two things that NetApp has done over the past several years that, in my opinion, have seriously hurt their credibility among people who know storage. One was the capacity guarantee program that I have previously blogged about, and the second was their Wyman/Mercer cost of ownership white paper where they attacked the HP EVA, and which I also blogged.   

If any reader wants to understand why I question some of NetApp's claims about itself, those two blogs would be a good place to start.  

Best regards,

Jim

Tweet this! 


Posted 07-16-2009 9:46 PM by CalvinZ
Filed under: ,

Comments

kostadis roussos wrote re: Why NetApp Gets Blogged
on 07-17-2009 5:26 AM

yes, WAFL is write optimized.

And you know what, that is the right trade off for this century, even if it was, possibly, the wrong tradeoff for the last century.

I'll just cut and paste my entire blog entry for your benefit where i admit, horror of horrors that WAFL is write optimized, and then explain why that is the right tradeoff.

Maybe some of your more technical readers will be able to follow my reasoning.

The traditional storage array is optimized for read operations. The array minimizes the amount of effort it has to do on a read by putting the disk block exactly where the client thinks it is. This means on a read, the array has to do a trivial lookup to get the disk block.

However, the downside is that the array makes the write operations more expensive.

When you factor in RAID and the impact of RAID on write performance, the TLA offers a poor value proposition.

But the poor write performance was okay as long as the read operations dominated.

Enter bigger memories

The problem with the TLA architecture was that it was designed in a pre-64 bit era, and in a pre-flash era. In that era, the servers connected to the storage array were memory starved. Because they were memory starved, the buffer cache on the server was quite small, resulting in more IO operations for reads.

In the new 64 bit era, and especially with the availability of Flash, the servers are no longer memory starved. You can now have more volatile memory than you ever had before. What that means is that the average server is rarely doing a read operation, but instead is mostly doing write operations.

The impact on the TLA is that the workload shifts from being dominated by read operations to a bigger mix of write operations.

The problem, then, is that the poorly optimized path is now a bigger piece of the overall workload mix.

Or more to the point, that poor write behavior  around RAID-6, suddenly becomes a very big issue.

Which is why, after all, the TLA vendor is recommending RAID-10.

Their poor write performance is forcing them to throw more hardware at the problem.

And their poor write performance is forcing application administrators to look at alternative storage architectures.

Enter write optimized storage arrays

Write optimized storage arrays, like ONTAP, are designed for this new world order. The downside, of course, is that the read operations are more expensive, but if the mix shifts between read and write, then that’s a reasonable tradeoff.

And as I’ve said before you can solve the read performance problem …

The traditional storage array is optimized for read operations. The array minimizes the amount of effort it has to do on a read by putting the disk block exactly where the client thinks it is. This means on a read, the array has to do a trivial lookup to get the disk block.

However, the downside is that the array makes the write operations more expensive.

When you factor in RAID and the impact of RAID on write performance, the TLA offers a poor value proposition.

But the poor write performance was okay as long as the read operations dominated.

Enter bigger memories

The problem with the TLA architecture was that it was designed in a pre-64 bit era, and in a pre-flash era. In that era, the servers connected to the storage array were memory starved. Because they were memory starved, the buffer cache on the server was quite small, resulting in more IO operations for reads.

In the new 64 bit era, and especially with the availability of Flash, the servers are no longer memory starved. You can now have more volatile memory than you ever had before. What that means is that the average server is rarely doing a read operation, but instead is mostly doing write operations.

The impact on the TLA is that the workload shifts from being dominated by read operations to a bigger mix of write operations.

The problem, then, is that the poorly optimized path is now a bigger piece of the overall workload mix.

Or more to the point, that poor write behavior  around RAID-6, suddenly becomes a very big issue.

Which is why, after all, the TLA vendor is recommending RAID-10.

Their poor write performance is forcing them to throw more hardware at the problem.

And their poor write performance is forcing application administrators to look at alternative storage architectures.

Enter write optimized storage arrays

Write optimized storage arrays, like ONTAP, are designed for this new world order. The downside, of course, is that the read operations are more expensive, but if the mix shifts between read and write, then that’s a reasonable tradeoff.

And as I’ve said before you can solve the read performance problem …

The traditional storage array is optimized for read operations. The array minimizes the amount of effort it has to do on a read by putting the disk block exactly where the client thinks it is. This means on a read, the array has to do a trivial lookup to get the disk block.

However, the downside is that the array makes the write operations more expensive.

When you factor in RAID and the impact of RAID on write performance, the TLA offers a poor value proposition.

But the poor write performance was okay as long as the read operations dominated.

Enter bigger memories

The problem with the TLA architecture was that it was designed in a pre-64 bit era, and in a pre-flash era. In that era, the servers connected to the storage array were memory starved. Because they were memory starved, the buffer cache on the server was quite small, resulting in more IO operations for reads.

In the new 64 bit era, and especially with the availability of Flash, the servers are no longer memory starved. You can now have more volatile memory than you ever had before. What that means is that the average server is rarely doing a read operation, but instead is mostly doing write operations.

The impact on the TLA is that the workload shifts from being dominated by read operations to a bigger mix of write operations.

The problem, then, is that the poorly optimized path is now a bigger piece of the overall workload mix.

Or more to the point, that poor write behavior  around RAID-6, suddenly becomes a very big issue.

Which is why, after all, the TLA vendor is recommending RAID-10.

Their poor write performance is forcing them to throw more hardware at the problem.

And their poor write performance is forcing application administrators to look at alternative storage architectures.

Enter write optimized storage arrays

Write optimized storage arrays, like ONTAP, are designed for this new world order. The downside, of course, is that the read operations are more expensive, but if the mix shifts between read and write, then that’s a reasonable tradeoff.

And as I’ve said before you can solve the read performance problem …

blogs.netapp.com/.../how-flash-killed-the-tla.html

Alex McDonald wrote re: Why NetApp Gets Blogged
on 07-17-2009 12:32 PM

There's a real 1990s flavour to your arguments, none of which have been backed up by any half decent research. Where did you get this one from?

"Because in block environments most applications are read intensive as opposed to write"

That flies in the face of all the research about workload trends, block or otherwise, that I've seen.  

And no, I'm not angry, I'm Scottish. Where I come from, my comments on your previous efforts would be interpreted as sarcasm. (BTW, what's with the psychoanalysis? Just completed a "Freud for IT Specialists" 101 course or something?)

It's a low form of wit, I know, but sometimes effective -- if it's understood.

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-17-2009 6:14 PM

Hi Kostadis,

Welcome aboard. I was intrigued when you wrote,  “Maybe some of your more technical readers will be able to follow my reasoning.” It reminded me of something Einstein once said, “It should be possible to explain the laws of physics to a barmaid.” Well, marketing people have been known to spend time in bars so maybe there is hope for us.  

But, just so you know, I’m gearing my remarks to the more marketing-oriented readers and can only hope that the engineers reading this aren’t left behind.    

I’m sure you are aware that not every highly technical, NetApp savvy person in the world agrees with you. In fact, one particular ex-NetApp Oracle guy, now working for EMC, has constructed a pretty good argument vs. NetApp write-optimization even in this ‘new century’: oraclestorageguy.typepad.com/.../oracle-backup-1.html

Now, looking at your response from a marketing perspective, I was intrigued by your comment that, speaking of NetApp’s built-in write optimization design which has been a feature of WAFL since it was released in 1993, you say, “And you know what, that is the right trade off for this century, even if it was, possibly, the wrong tradeoff for the last century.”

So, that begs several marketing questions. But first, I want to make it clear: if your argument is to be believed, then there is no ‘possibly’ about it: Your write-optimized design was definitely the wrong technology for most of NetApp’s existence.

Marketing observations:

1. Back in the ‘old’ century (1993-2003 before NetApp introduced RAID-6) did NetApp ever admit that their write-optimization was ‘possibly’ the wrong technology choice? No, don’t believe so. As a matter of fact, back then NetApp had some very ingenious arguments as to why the rest of the industry was way out of step with its read-optimization.  

2. So now it is the ‘new century’ and NetApp, just by sheer engineering luck, happened back in 1993, when it was selling departmental-level NAS storage, to pick the right design choice for today’s performance-hungry workloads? Okay, if you want that to be your argument, that’s fine with me.

3. But I think you’ll understand when I say that your argument would sound a lot more credible if back in the ‘old century’ your company had been a little more upfront about the problems customers faced with ‘write-optimization’.  

But all this, while interesting and a bit comical, is slightly beside the point. You’ve come in late on this ongoing blog discussion and so you haven’t been following the arguments from the beginning. The issue for me isn’t whether or not NetApp has some good features. They do. And the issue isn’t whether or not they have justifications for some of their design decisions. I’m sure they do. The issue is, in my opinion, can I trust NetApp to tell me and the customers we both sell to, a reasonably accurate story about themselves, their technology, and the competition.

So, since this is the first time you have responded to one of my blogs, do you mind if I ask you to take the “NetApp Blog-Responder Credibility Litmus Test’ below?

Single test question: Do you believe that NetApp’s Mercer/Wyman cost-of-ownership white paper attacking the EVA, a version of which has been on the NetApp website since 2004, to be fair, accurate, and credible?

If your answer is ‘yes’ feel free to state so publicly on this blog along with any supporting facts you may have. If it is ‘no’ then you don’t have to say anything.  

Best regards,

Jim    

Alex McDonald wrote re: Why NetApp Gets Blogged
on 07-17-2009 11:27 PM

Jeff Browning left NetApp in November 2005, and as far as I'm aware, he didn't work in engineering; he "roamed the world closing deals with top enterprise accounts. I also performed this function with respect to Oracle on open systems storage." Prior to that he was in technical marketing.

Kostadis, by comparison, is an engineer of several year's experience in NetApp, and is bang up to date. blogs.netapp.com/.../biography.html . Jeff's mistake is to construct a "straw man" scenario and knock it down; I have never seen a one-disk WAFL system, but that is exactly what he describes. No doubt it has the problems he outlines too.

As to your marketing questions, perhaps a light hearted reply might be appropriate, as they seem just a tad pointless given the current date.

1. Yes, NetApp had a really bad system for reads prior to 2003. It was awful, and only sold in bucketloads because we were lazer focussed on selling to companies that had sequential write-only workloads.

2. Yes, luck played the most important part in NetApp's success up to 2003. Lucky beyond our wildest dreams; obviously, the competition wasn't as lucky, and they foolishly insisted on selling to folks that wanted to randmly read their data as well.

3. We were lucky. All our customer workloads were sequential write-only in those days.

As to the litmus test for Kostadis. A bit unfair asking an engineer about a TCO report, so I'll do so. IIRC the Mercer/Wyman cost of ownership report is here. media.netapp.com/.../ar1038.pdf I think HP has something to hide if it thinks this is an unfair report, but I'll let my readers be the judge of it.

Jim, hope that closes this particular issue.

On to more substantive stuff. As you're playing the part of Einstein's barmaid, I'm not asking you to explain it, but perhaps you could tell me where you got the information for this statement;  

"Because in block environments most applications are read intensive as opposed to write"

That flies in the face of all the research about workload trends, block or otherwise, that I've seen.    

Keep it up, btw; this is fun! A bit more substance might help your readers though.

kostadis roussos wrote re: Why NetApp Gets Blogged
on 07-18-2009 6:40 AM

Hi!

Actually Jim, I have a lot to say.

If you don't like our marketing, take it up with our marketing team.

But I've got an issue with you. You made some specific technical claims, that are bogus. And I'm going to call you on that.

I am not a marketing person, but I've had the misfortune of interacting with bad lawyers. And you're playing legal games with me.

So you're saying that in the 1990's when we were a NAS mostly company we sucked for high performance workloads. Except that in the markets we were in (and hell we must have done something write because we grew to about a 1billion dollars before the world collapsed) we won because of our performance. Our tagline at the time was FAST, SIMPLE and RELIABLE.

So maybe in the 1990's we didn't have the best SAN platform.

Hell we didn't have a SAN platform until 2003.

And who cares.

It's not 1990.

It's not 2003.

It's July 2009.

In this century, *right now* when people have problems to be solved *today* we've got the right technology and the EVA is the *wrong* technology.

You're like the buggy manufacturer observing that cars used to go slow and used to be expensive and therefore are worse than buggies. It's not about the past, it's about today.

cheers,

kostadis

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-20-2009 6:31 AM

Hi Alex,

In regards your two responses, let me summarize your key points:

You weren’t angry, you were just being Scottish.

Jeff Browning isn’t qualified to speak even though he was once paid to close deals for NetApp, and didn’t leave NetApp until two years after all these important changes you claim were made in NetApp technology.

Nobody cares what I think, but even so, you feel moved to keep responding to my comments.

Kostadis is a senior NetApp engineer and therefore can’t read a COO white paper by his own company and make any judgments as to its fairness.

The NetApp Cost of ownership paper that, among other shenanigans, compares the space required for its parity RAID against its competitors’ RAID-1 and forgets to mention that fact, is a fine piece of work.

NetApp, as a company, prior to 2003, because of its product’s ‘awful’ read performance, honorably avoided selling into read-intensive storage environments.

And you would like a bit more substance in my responses because, presumably, yours are filled with substance, sound logic, wit, grace, and just plain good sense.

Okay. Got it. Now, one thing that puzzled me was your reference to Psychology 101.  If you were referring to my opening statement that when people lose their temper in a debate, it usually means they are losing the argument, then I have to ask: Since mine was the opening statement in a new blog and since I never mentioned your name, how did you come to assume I was referring to you?

And by the way, my comments weren’t based on Psychology, but rather on a lifetime of observing human nature…and of reading several years-worth of blog responses from NetApp enthusiasts.

Best regards,

Jim

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-20-2009 7:20 AM

Hi Kostadis,

I have a few rules for this blog. Since you just came on, I don’t blame you for not knowing them:

1. If you are going to dish it out, you have to be willing to take it – without complaining.

2. If you play engineering games with a marketing person, you have to expect a quid pro quo. Again, no complaining.

3. And, most of all, no ‘feigned outrage’. Of all the games NetApp bloggers use to score points, 'feigned outrage’ is, in my opinion, the least constructive in a discussion between adults.

Now, on to my response:

Go back and read my comments. Not everything you said in your original response was wrong, but if you can’t see the obvious contradictions that were mixed in with your technical arguments after I explained them to you once, then I’m not going to repeat myself.

Now, since you came in late, you might double-check what started this whole ruckus with NetApp back in November of last year and go read that COO paper your company has on their website attacking the EVA. Now, I’m not expecting you to come back with a confession, but if you read it, you’re going to be embarrassed for your company. And if you have any regard for your company’s credibility in the market place, you’ll quietly use your influence to have it taken off your website.

With that said, thanks for getting involved, as we now have a chance to really accomplish something. At HP we have tested NetApp filers and we believe that the model we have is really slow compared to the comparable EVA we’ve tested against. It’s slow in reads AND writes. Now, we’ve had a chance to discuss our findings with Pat, the Avanade guy who does your competitive testing, and we are more convinced than ever that we are right.

However, being the honest company we strive to be, we will say that we are still open to the possibility that we are doing something wrong in how we set up your unit. If you will engage with us and help us identify if we are doing something wrong with our testing, we would be grateful. I have no desire to say anything untrue about NetApp. I think I’ve got plenty of other fair and impactful arguments to use in sales situations with NetApp, and I have no problem with taking the performance slides I use out of my NetApp presentation if they are not accurate.

But, right now, this is what our research and testing tell us: Yeah, applications, coupled with advances in server technology (BTW, thanks for sharing that info with HP – the biggest server vendor in the world), are swinging to be a closer mix of reads and writes, more than they were even two years ago. But we don’t believe it’s crossed over yet. We still think the smart design would tilt towards reads. However, even though the EVA is still optimized for reads, and NetApp filers are optimized for writes, the EVA is faster than your filer in both for a comparably sized array.  

So there’s my offer and my statement. Are you game? If not, you and Alex need to graciously step off this blog.

Best regards,

Jim

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-23-2009 8:11 AM

Dear Readers,

Now, I’m not ready, yet, to say the NetApp bloggers have thrown in the towel on my offer above – I’m still hoping they will say ‘yes’ –  but I am a little surprised at how long they are taking to answer. In the past, they’ve responded to my blogs with the speed of a chameleon’s tongue.

I mean, given the assertive, self-confident tone with which they explained to me what an idiot I was on the subject, I would have thought they’d have accepted my offer before my finger lifted from the enter key.

Be honest now, I’ll bet most of you reading this blog, even the NetApp enthusiasts, were, like me, expecting an immediate answer along the lines of, “Yeah, baby, yeah, bring it on,” or something like that.

Anyway, I'm still holding out hope they will say 'yes'. For all I know they have so many volunteers willing to take up my offer that they are now in a big fight back at headquarters over who gets to do it.

In any case, while we are waiting for an answer, let me position my offer to NetApp in a little more detail.

The reason I made the offer is that I and HP have absolutely nothing to lose by making it. I have no desire to say anything inaccurate or even misleading about NetApp performance. If our performance testing turns out to be wrong, it will be because of an honest mistake on our part, and we would gladly be willing to correct it. Like I said before: I have 97 slides on NetApp of which only 10 have anything to do with performance. I can lose those ten and still have plenty to say in a competitive situation.

I'll give the NetApp bloggers through the weekend to respond, and then I'll be posting another blog on NetApp performance.

Best regards,

Jim

Sjon wrote re: Why NetApp Gets Blogged
on 07-23-2009 2:51 PM

How many slides do you have on HP when you're in a competitive situation with NetApp...?

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-23-2009 3:43 PM

Hi Sjon,

If I underderstand your question correctly, I think you are asking "In a sales situation vs. NetApp do I spend anytime talking about HP or all about NetApp?"

I'm not a sales rep. When I visit customers and discuss NetApp, it is with a sales rep and a pre-sales person. They typically handle the HP stuff and I talk about NetApp. And by the way, I never show all 97 slides. I pick the ones that matter most to that customer.

Let me add though: The amount of time we spend talking about a competitor is in direct proportion to how honestly the competitor sales team delivers their message. Obviously, if the competitors are making outlandish claims about themselves then we are forced to spend more time setting the record straight.

Best regards,

Jim

Lee Razo wrote re: Why NetApp Gets Blogged
on 07-23-2009 5:40 PM

For goodness sake Jim, why don't you guys just publish an updated SPC-1 result and get it over with once and for all?

Surely it would require half of the time and effort as these voluminous blog postings - and the full SPC-1 reports that result will require far less time just to read.

Assuming EVA performance is as good as you say it is, then it'll be a win/win. You'll get your kudos, we'll get our time back.

Alex McDonald wrote re: Why NetApp Gets Blogged
on 07-23-2009 10:15 PM

Pray, what outlandish claims do we make?

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-24-2009 6:19 AM

Hi Lee,

Even the SPC won’t claim it’s the only valid indicator of storage performance. And no one I know takes it that way. Most people are not able to run an SPC test on their own, therefore for those customers interested in confirming performance results for themselves, it’s much more practical to propose a test using tools that can all be found on the public domain. And that is what we’ve done. If NetApp truly has confidence that its SPC results are an accurate reflection of its performance then it should have no objections to having its product subjected to a simple test via a publicly available load generator.

So, I have claimed that our NetApp test results show a filer that is very slow when compared to a comparable array, and, by the way, has other negative anomalies as well – ones we’ve seen in no other array we’ve ever tested. NetApp claims that we are mistaken. We have responded by offering to show our configuration, etc., and to let them point out what we did wrong. So far, yours is the only response – thank you for that – but your response is to solve a problem we have when testing a NetApp filer by further testing our own array. That to me seems to be avoiding the real issue.

And I have a better idea. Just tell us what we are doing wrong in our testing to get the poor results we get with NetApp.

There’s the offer. What’s the problem?

Best regards,

Jim

John Martin wrote re: Why NetApp Gets Blogged
on 07-24-2009 6:29 AM

@JIM - At HP we have tested NetApp filers and we believe that the model we have is really slow compared to the comparable EVA we’ve tested against. It’s slow in reads AND writes

Do you have any details ? Hard to say what you've done wrong without any details. Even EMC's comparative "benchmarks" include enough detail to to  find out where things have been skewed. On the other hand, why not just use SPC-1 ? Audited, independent, and if the EVA is as good as you say, you'll have a result you can be proud of.

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-24-2009 6:51 AM

Hi Alex,

What? Blogs aren’t working for you. Do you prefer your outlandish claims in an Excel spreadsheet? Question: Will your email provider accept gigabyte-sized files?  

Hate to change the subject, but: The clock is ticking. You have until midnight Sunday to accept my offer. It’s the best deal you’ll ever get from a competitor. If you don’t, I’ll have one more outlandish claim to add to my NetApp list: The outlandish claim that NetApp has nothing to hide in regards its performance.

Best regards,

Jim

john wrote re: Why NetApp Gets Blogged
on 07-24-2009 9:00 AM

Looks like the usual stalling for time and obfuscation to me !

Jim, you need to bring The Ntapp boys and this discussion back ontrack.

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-24-2009 2:58 PM

Hi John Martin,

It doesn't make any difference what benchmark we use as long as it is fair. I've already explained why there are advantages to running a benchmark with tools found in the public domain. Maybe you posted your comment before you saw mine.

With that said, I've read some of your previous blog responses and your tone struck me as you were trying to be reasonable, and so I think we might be able to work together on this. So, here's my question: Are you accepting my offer? If we share with you our configuration, etc., do you have access to a FAS 2050 so we can double-check each other's results? If so, let me know, and I'll start feeding you the info on this blog along with some ways I think we can get to the bottom of this whole question.

Best regards,

Jim  

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-24-2009 3:02 PM

Hi John,

Thanks for your comment. It's like herding cats. I'm doing my best, but it's not easy.

Best regards,

Jim

Alex McDonald wrote re: Why NetApp Gets Blogged
on 07-24-2009 5:36 PM

No, I'm not accepting an offer based on your insistence that you have a problem with our systems that makes us liars. Open a support call.

That's final.

Please send the gigabyte sized excel spreadsheet to my email address. If it gets bounced, I'll (a) know the size and (b) be able to make arrangements inside NetApp for it to be sent to me by some other means.

kostadis roussos wrote re: Why NetApp Gets Blogged
on 07-25-2009 5:45 AM

Jim,

For about 8 years of my career I worked on performance analysis. My job was to measure performance and tune performance and improve performance. I worked on a 128 scheduler tuning context-switch time, on a streaming media device I tuned the performance of our storage and packet scheduling, I invented the first competitive measure for streaming media caches, I worked on file-system performance for

NetApp's first generation file system cache.

The only thing that I learned was that the only way you can compare competitor performance is through standard benchmark. Everything else is opinion. Furthermore I learned that benchmarks are a piece of the puzzle. Customers must understand their workload and their requirements and take time to evaluate storage systems before they purchase storage systems.

I also learned as an engineer that thinking you can understand the behavior of your competitors system, the hard trade-offs is very very dangerous. And assuming that a particular problem is hard or impossible is also dangerous.

So the answer to your request is the following. NetApp has published benchmarks for both the SPC-1 and SFS workloads. These are industry standard audited benchmarks.

I will not spend time trying to debug and analyze your benchmark.

If you want to compare performance, pick a standard benchmark, and talk about it.

The only reason I have engaged is that you are saying things that are factually incorrect. They don't reflect our system design, they don't reflect how our software behaves. They are the opinions of a marketing person who has been misinformed.

You choose to believe that read-ahead and caching don't work and can't be made to work. You choose to believe that we're incapable of solving the hard technical problems.

The short answer is they can and we have solved the hard problems and the industry standard benchmarks to bear this out.

kostadis

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-27-2009 11:04 AM

Hi Alex,

No need for a support call. Your local service guy is doing a great job. Besides being slow, expensive, and eating up too much capacity, our NetApp filer is working spot-on.

I'm sorry, the spreadsheet on NetApp 'outlandish claims' is growing daily. It's now over a terabyte. Before I send it, can you confirm you have dedupe on your laptop.

Best regards,

Jim

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-27-2009 11:13 AM

Hi Kostadis,

You’ve got a wonderful resume. Whoever buys NetApp is going to get a very experienced engineer.

Now this is going to come as a shock to you, but I did not take time off of my marketing job to personally do the NetApp performance testing. We actually had engineers do it – guys every bit as qualified as yourself.

Now to your points:  

I especially liked your comment about how “dangerous” it is for any company to even think they can properly performance tune a competitor’s array. And I wondered as I read it, if that is what the NetApp engineers were saying to themselves when they, in March of last year, published an SPC result on an EMC CX3-40. So let me get this straight: You won’t take the time to work with HP to find out where we went wrong in our very simple benchmarks, even though I’ve got half of NetApp tied up responding to me, but you will turn an entire engineering team loose on producing 137 pages of SPC results on a competitor’s array, even though, according to you, it is impossible for them to do it accurately. Am I the only one here who sees the contradiction? And the irony?

Also, about your point where you claim that I: “…choose to believe that read-ahead and caching don't work and can't be made to work.” Didn’t you know? Practically every array uses read ahead and caching, and has for years. So, the fact that you have it, does not give you an advantage in a performance test – at most, it brings you to parity. However, now that I think of it, if you really do believe yours is the only array in the industry with those two features then some of your arguments start to make more sense.    

Now, as far as the facts you say I’ve gotten wrong. I stated 11 facts and you didn’t contradict any of them, though you did choose to defend one of them – your write-optimized architecture. No problem. You’re allowed to do that. But my point is that it really hasn’t been my facts you’ve been objecting to, just my conclusion – the one where I state that an EVA is much faster than a comparable sized NetApp filer? If so, I gave you a fair chance to defend yourself and to debunk us – and you refused.

Finally, let’s talk about the SPC. To me, your fixation on the SPC seems to betray a lack of confidence in its applicability to anything else but the SPC benchmark itself. Our tests on NetApp are, indeed, simpler than the SPC, but with the advantage that they can be easily replicated by customers using free tools found in the public domain. Wisely done, simplicity in testing should not be a cause for re-ordering the relative performance rank of various arrays, but of confirming it.

Best regards,

Jim  

John Martin wrote re: Why NetApp Gets Blogged
on 07-27-2009 2:24 PM

Unfortunately i dont have access to a 2050, I do have access to a 3040 and a 2020 (I'm based in the colonies). Having said that, I'm really not sure what the offer is.

If you're looking for how to make a FAS have the same performance characteristics as a similarly specced EVA, for a synthetic workload then I probably cant help you.

Some workloads will work much better on the FAS (I'd argue most of them, but then again consider the source :-) ), and some will work better on the EVA.

Comparative results with single threaded sequential workloads tend to favor array architectures that use algorithmic mapping vs ones that use tabular or dynamic mapping. Similarly, workloads that are completely dominated by random random reads across a really large working set will also tend to favour arrays using algorithmic mapping techniques.

NetApp's ability to dynamically map the logical block location to any physical block in the array has allowed us to continually improve the performance of those arrays across a large number of real world workloads (i.e. our customers). As a result our choices of where to put the data intially, how to predict read activity and most importantly adapt to multiple simultaneous workloads is where our approach really starts to show its benefits. I dont mean to be rude, but when I did a quick review of the IOPS/spindle acheived by various vendors in the old Microsoft ESRP results, NetApp had one of the highest IOPS/spindle whereas the EVA's had one of the worst. Does this mean EVA's are bad machines, no, but it does show that for that particular workload (Exchange 2003), Netapp's design choices worked better.

I dont know how you're doing your tests. Maybe you're running DD from /dev/null perhaps ?, or a IOMeter using 100% random IOPS using 512K non aligned reads over the entire povisioned capacity. I've seen both approaches used by customers, though exactly what relation either has to a real world workload or what its meant to prove is nebulous to say the least.

All I can say is that for performance in the real world NetApp's choice of heavily optimising random writes to reduc the I/O load on the back thus allowing the predictive read caching via read-sets to work very well indeed in almost every customer I've talked to.

If you're interested in ways of maximisng the performance of a 2050, may I reccomend Steve Daniels excellent technical report tr-3647 on our whitepaper website, or perhaps either of the latest 2050 based ESRP configurations which can be found here www.netapp.com/.../ms-esrp-fas2000.html

A few other pointers

1. Design your aggregates to match your workloads, in general one large aggregate is better than two small ones. If you're running an database/exchange like load, consider having more disks in the database aggregate attached to one controller, and a smaller number of disks in the aggregate for logs on the other controller.

2. Dont use dedicated disks for the root volume

3. Reduce the hot-spares to 1 for each controller if you're running with less than 20 disks per controller. (i.e. recruit as many spindles to the workload as you can)

4. Run the latest version of OnTAP

5. Align your IOPS on multiples of 4K boundaries like real world best practices and dont use 512byte requests in iometer, because thats just being silly.

6. If your workload really is unusual, there are some specific optimisations you can tune, but most people dont run really unusual configurations. Also, if you're only testing SAN/iSCSI performance, dont foget to change the peformance setting on the 2050 that accelerates SAN only configurations. This kind of thing is covered in the technical report I pointed to earlier.

7. Try comparing something you believe is equivalent to a 3040 using something like say SPC, which is independently audited with peer reviews,. That way HP, EMC and NetApp will all have something we can compare each other to

Steve Duplessie wrote re: Why NetApp Gets Blogged
on 07-27-2009 8:01 PM

Fella's, finding this most interesting - and I say most because i'm only smart enough to follow about 80% of it.  What isn't addressed in this excellent debate, at least that I've seen thus far, is "why does it matter?"  What I mean is this: block performance (or file for that matter) ONLY is an issue if, in fact, it is an issue.  An SSD is much faster at block writes or reads than either the systems in question, thus does that mean everyone should abandon all EVA's or FAS systems?  If I'm a happy NetApp shop because 80% of my workload is file based, and the remainder block - and thus unifying storage is far more important to me than the performance of the blocks themselves - or if it simply doesn't matter - that the NetApp block performance is "good enough" for my needs, the question seems moot.  

If the argument is simply one of "mine is better than yours" without any context, I'm not sure either of you is really doing yourselves a service, as the only ones enjoying it are idiots like me and probably not customers or prospects of your respective offerings.

For what it's worth, I don't believe I've ever been asked, by anyone, if the EVA is faster than the NetApp box (or vice versa) in the last 8 years. I think someone did ask me about 9 years ago, but I'm sure they are probably dead by now.  Or they stopped caring.  Or they have moved on to higher intellectual pursuits that don't make anybody any money, such as debating politics......

John Martin wrote re: Why NetApp Gets Blogged
on 07-28-2009 12:53 AM

Steve,

  from my personal perspective there are a couple of reasons for responding.

1. Our integrity and honour have been called into question, and not just by HP. In debating terms these are "Ad hominem" attacks, which as Douglas Walton states are easy to put forward and difficult to refute, so we need to go into what sometimes appears as unnecssary levels of detail to defend ourselves .

In sales situatoins this tactic is sometimes referred to as "ratholing" the competition, and I've seen it used against us over and over again, even to the point where one vendor starts recycling other vendors misnformation even after its been demonstrably disproven.

Since joining NetApp, this kind of attack has been "de-rigueur" for most competitors, it's annoying,because as you pointed out, in most cases the issue is pretty much irrelevant, nonetheless these attacks serve to create a level of mistrust of NetApp which, given that we are emotional creatures, affects our decision making processes.

When I first joined, our policy was "dont react, the truth will make itself known". While remaining above the fray has a certain appeal, it allowed a lot of unsubstantiated FUD to become 'truth" in many peoples minds. Since then we've decided to counteract what we perceive to be unsubtantiated misinformation via the social media and this lets me occasionally say things like "'I'm as mad as hell, and I'm not going to take this any more!", it might make me emotionally unbalanced and cause my wife to frown at me as I write blog comments when I should be changing my sons dirty pants, but I think occasionally it's warranted, and sometimes it makes for interesting reading.

2. Like Jim Said "we all like to win debates" :-)

kostadis roussos wrote re: Why NetApp Gets Blogged
on 07-28-2009 1:21 AM

Jim,

You asked a set of questions:

  1. Do you build your LUNs on top of a file system?

  2. Does your file system write only to free block space rather than updating the old blocks?

  3. Do you ship a de-fragmentation tool on every NetApp filer?

  4. Do you use software RAID?

  5. Does your file system spread the metadata over the entire disk system?

  6. Does WAFL require you to build secondary inode trees if the file is bigger than 64KB?

  7. Can IOs to a particular file be serviced by only a single controller in a NetApp cluster?

  8. Do NetApp snaps reside in the same disk group as the primary volume?

  9. Is only a quarter of your NVRAM usable?

 10. Is the processing power of your largest filer, rated by NetApp to handle 1176 disks, based on a maximum 8 x 2.6 GHz AMD dual core Opteron processors - about the same processing power as a Proliant DL785?

 11. Is your file system optimized for writes?  

And you made the *specific* offer to remove anything that was factually incorrect.

So I request that you remove #1 (see: blogs.netapp.com/.../why-wafl-is-not.html).

cheers,

kostadis

kostadis roussos wrote re: Why NetApp Gets Blogged
on 07-28-2009 1:34 AM

Jim,

In the post with the original 11 points you said the following:

<i/>

Now, I am sure that every one of those eleven engineering decisions was made by NetApp for a logical reason from their perspective. But I am also sure that every one of them has a negative impact on NetApp performance, and the cumulative effect is pretty significant in most real-world environments.  And those features put NetApp at a huge performance disadvantage in a storage world dominated by vendors who don't require the extra overhead of having to build LUNs on top of file systems, or don't need to ship a de-fragmentation tool with their array, or don't use software RAID.

</i>

I have explained, here blogs.netapp.com/.../raid-dp-better-performance-with-less-hardware-and-more-availability.html why WAFL performs better than a traditional RAID array.

I have also explained here why the appeal to architecture orthodoxy (more vendors do it this way therefore it is better) is the last defence of the dinosaur: blogs.netapp.com/.../proven-architec.html

So we are at an impasse, you believe your 11 points completely characterize a system with several millions of lines of code with several thousand customers and with several thousand developers.

You believe that those points are so salient that any other evidence is made irrelevant by those 11 point and that you can infer everything about our system from those 11 points.

And I don't agree. And I realize I don't care.

NetApp continues to provide significant value to customers. NetApp continues to perform well in mission critical environments. Arguing with you will not increase the world's understanding of our system which to me is the most important activity of my blog.

I refuse to continue this discussion any further, because I do not agree with your assertions and I can not argue with assertions.

cheers,

kostadis

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-28-2009 4:38 PM

Hi John Martin,

Just for the record, I appreciate your candor and the tone of your response. I also read your recent response to engineer Karl Dohm on his WAFL blog. Thanks again for taking our offer seriously and for making those helpful comments.  

You did point out a few things to us that we weren’t doing but that we will now try. We had though, already incorporated in our testing all the major points you made. One point: Our Iometer pattern in the blog posts was the one suggested by your test engineer at Avanade. But in all other cases we, of course, always ran block-aligned, and never used anything as small as 512B. Minimum block size was always 8KB.

There weren’t any major surprises in your comments. But, nonetheless, we will try the things you suggested and see how much of a difference it makes. We’ll let you know how it turns out.

You did say, though, that you weren’t sure what kind of a ‘offer’ I was proposing. Originally, I was hoping that someone at NetApp had access to a FAS2050 because that’s what we did most of our testing on. My plan was to share with you our configuration and numbers, then ask you to duplicate our config and result. Then after applying whatever extra tuning you normally do, share with us whatever you did and the new numbers you got. And then we at HP apply your tweaks, if any, to our FAS. Sounded reasonable to me.

The reason why we originally picked the FAS2050 to purchase and test was because our marketing folks said that was the one the EVA4400 usually competed against. But then also, after we looked at capacity, NVRAM size, processor size, etc., it did seem to be the right match.

Now, with all that said. We have a FAS2050. But you don’t. But here’s the challenge anyway: Run a simple 8KB, 50% reads, 100% random, block-aligned access pattern with a cap of 40 spindles (that’s about all the disks we’ve got) – anyway you want to set it up and with any tweaks you want to do. If you have a great ESRP result then you should be great at this since this is the basic Exchange database access pattern. Do some random and sequential read and write tests. Then tell us your results and the tweaks you did and we will try to replicate it. Does that sound fair? Is there a way you can do this?

I was surprised to read in your response to Steve that you felt so worked up about things. I'd actually come to think of you as a voice of calm reason. Personally, I think you've handled yourself very well in these discussions. But I will add - this world of blogging is rough and tumble. It helps to have a very thick skin.  

Now, I'm going to try and say this as respectfully as I can, but if you think I have offended NetApp's honor, then you need to read your company's own Wyman/Mercer white paper and its less than professional methodology for attacking the EVA and CLARiiON. If you can, you should read the Mercer version that was on your website up until last year, and the Wyman version that's on there now. I refer to both versions in my blog from last November.  www.communities.hp.com/.../netapp-apparently-still-lags-in-cost-of-ownership.aspx  

Best regards,  Jim  

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-28-2009 4:39 PM

Hi Steve,

Thanks for your comments. To answer your question of ‘why does it matter?’:

  1. In my experience, many customers are greatly concerned about performance. Less so in NAS than in the block space, but it’s a definite competitive issue. When customers stop asking for POCs and live performance tests on their data, I suppose we’ll have to find something else to blog about, but that hasn’t happened, yet. Now, with the EVA, we’ve put a lot of our engineering into the ease of management, and we would love the customer discussions to center on more subjective issues such as that, but time and time again, we have customers wanting to talk about performance. I think part of it is that customers are being pulled by so many choices that it helps if they can have some numbers on which to base their decision.
  2. I do agree though, that in most cases, as long as two storage systems have performance within some reasonable range of each other, for most customers it should be a moot point.
  3. When we test NetApp gear, we can’t get near their numbers? And to be honest, we are genuinely curious as to why.      www.communities.hp.com/.../making-sense-of-wafl.aspx  

Best regards, Jim

Jim Haberkorn wrote re: Why NetApp Gets Blogged
on 07-28-2009 4:41 PM

Hi Kostadis,

Ah…my friend. Someday you and I are going to agree on something but today is not the day. I read your blog and also Dave Hitz’s (blogs.netapp.com/.../is-wafl-a-files.html ), who as an engineer wrote WAFL, where he says that WAFL is not a file system but that it ‘contains’ a file system. So…my point #1 below states that your filer builds LUNs on top of a file system. Okay…it builds LUNs on top of the file system ‘contained’ in WAFL.

What? What?

As to your second comment, well, you can’t say I haven’t given you a fair chance to make your points. And between Karl Dohm’s WAFL blog  and this blog from me, I guess we’ve had our fair chance, too.

All the best. No hard feelings.  Jim

Geert wrote re: Why NetApp Gets Blogged
on 07-28-2009 8:22 PM

Sjeees, Jim.

You keep on spinning your 'WAFL is a filesystem' forever in the hope it will be true some day.

Kostadis (and Dave) explained it very clearly; it has grown to a block virtualization layer that can be accessed through file symantics and block symantics, either way.

Stop spending time on pounding home your point that WAFL is (still) a file system, just because it started out that way (in case you haven't noticed; we're over 15 years further in time by now). It is starting to look silly (already quite some time ago).

Then again - to Steve's point - why does it matter? WAFL performs well in many many real world environments, with Snapshot and RAID-6 enabled by default. If you can't get it to work just like our customers can get it to work it sure tells a lot about your 'engineers'...

John Martin wrote re: Why NetApp Gets Blogged
on 07-29-2009 8:34 AM

Jim,

    Personally I'm not a fan of synthetic benchmarks using iometer (even if only because of bugs in iometer), for a bunch of reasons which I'll talk about in more detail in the storage efficiency blog when I get the time. In principal I think things like jetstress  for exchange and orion for oracle are probably better tools to use for creating SAN based benchmarks.workloads For filesystem based benchmarks IOzone is probably your best publically avaialble tool.

As an FYI, I did crank up a workload similar to the one you asked about on my 3040 with a 1TB RAW LUN on an old Dell 2150 on Win2K3  via 1Gbit iSCSI across a 10 disk 15K aggregate  (8 data drives) I've had in our lab for about 2 years without reallocating and got between 7000-11,000 IOPS @28ms average response time across a range of settings without any tweaking any of the array (I did however up the outstanding I/O requests to 128, but  only used one worker thread)

Given the number of spindles I'm using, his result ended up being so good that  I suspect I may have done something wrong, or there is some unintended amount of caching going on at the host. Getting 800-1,370 IOPS per data disk is great, but if I were reading this from an outsiders perspective, I'd be more than a little skeptical.

The reason we use audited public benchmarks like SPC-1 and SpecSFS is because it avoids a lot of this potential "how  did they get that ?" questions, though I'll grant you that they are a little hard to run up in your average customer environment. I dont think the "run this iometer workload" is that informative either, but it has started me thinking about what the "happy medium" would look like. Maybe an SPC-1 "light" or a simultaneously running a mixture of Jetstress, Orion and a  IOZone workloads, all interesting stuff for propeller heads like me, though I'm not sure it really proves anything.

From my experience,  a customer asks for a configuration that will run a certain set of workloads, professionals review those workloads, compare them against sizing tools and create configurations they belive will work, and quotes go out. Usually the best quote wins. Theoretical questions of benchmark performance dont often make that much difference.

P.S. The "mad as hell" comment was meant to be a funny reference to "Network", didnt come off with the intended humor, maybe I should stick to my day job

Alex McDonald wrote re: Why NetApp Gets Blogged
on 07-29-2009 2:31 PM

"I was surprised to read in your response to Steve that you felt so worked up about things. I'd actually come to think of you as a voice of calm reason. Personally, I think you've handled yourself very well in these discussions. But I will add - this world of blogging is rough and tumble. It helps to have a very thick skin."

<phut ping> My irony meter just blew a fuse, and the auxilliary bathos dial I was testing has just blown clear across the room. You owe me a new one.

shaun wrote re: Why NetApp Gets Blogged
on 07-29-2009 2:37 PM

@Geert, despite all the bluster made by Kostadis re  Write Anywhere File Layout (WAFL) not being a file system, which incidentally was timed to perfection given the proverbial thorn in the side emulation was proving for Netapp.  I and many other just don't buy it, at it's heart WAFL is a filesystem with support for different protocols layered on top,  I don't dispute the fact that WAFL has evolved greatly since it's introduction but the timing of the blog and the arguments used were just a little too convenient for the Netapp cause. If it's not as Netapp now claim a file system,  then why the legal debacle over SUN's Zetabyte FIle System ? why the need for defragmentation and file system tools etc ? and why does Kostadis keep talking about a better than Fibre Channel array, surely if there's no emulation there and it's pure block then you have a fibre channel array like everyone else.

Alex McDonald wrote re: Why NetApp Gets Blogged
on 07-29-2009 4:55 PM

@Shaun

I really don't understand what "thorn in the side" this might be for NetApp; it's the technology that allows us to do what we do. Nor what you mean by emulation; if it walks like a duck, talks like a duck, it's generally a duck. Nor what you mean by a filesystem with support for protocols; the only protocols that are file systems are NFS and CIFS. iSCSI and FC are block based, and WAFL is, not so oddly given that it deals with block based disks, block based too. It has no filesystem semantics.

At the end of the day, it's a FC array. Hey, it runs FC apps! It just happens it can do iSCSI and NAS out of the box too.

Legal issues I can't discuss.

Jim wrote re: Why NetApp Gets Blogged
on 07-29-2009 7:28 PM

Hi Shaun,

Thanks for your comment. I find that 'logical' arguments like yours are sometimes the most effective. And some of the best logical arguments are often phrased as simple questions.

Best regards,

Jim  

Jim wrote re: Why NetApp Gets Blogged
on 07-29-2009 7:36 PM

Hi Alex,

Welcome back. I knew it was you without even reading your name.

Best regards,

jim

Jim wrote re: Why NetApp Gets Blogged
on 07-29-2009 7:46 PM

Hi John Martin,

I see you responded on Karl Dohm's blog and are now engaged with him. That's the right place to be as you work out the performance issues we're having on our filer.

Also, sorry I missed the humor. Keep at it. I'm usually not that dense.

Best regards,

Jim  

Jim wrote re: Why NetApp Gets Blogged
on 07-29-2009 8:42 PM

Hi Geert,

All I said was that NetApp filers build LUNs on top of a filesystem.

Now, you're counter-argument is to say that I'm wrong because WAFL is NOT a file system, even though NetApp has been calling WAFL a file system since 1993, and even though every NetApp white paper, including ones published this month, refer to it as a file system, and even though per Shaun's comments above there is a huge lawsuit with Sun over Zetabyte, and even though you can do a Google search on 'WAFL file system' and get 54,000 hits, and even though the Wikipedia entry, written by a NetApp guy, says it's a file system, EVEN SO - WAFL is not a file system and you've got two blog posts to prove it.

Okay, if that's your argument.

Best regards,

Jim

Geert wrote re: Why NetApp Gets Blogged
on 07-30-2009 11:43 AM

All I said was that (not me, but) both Kostadis and Dave explained why WAFL is NO LONGER *just* a file system.

Your counter-argument are references to a legacy name - simply because WAFL was named a file system in the early 90's -, a bunch of Google links to that same legacy name (we all know its difficult to change information on the Net once it's put there), a reference to a lawsuit you know we can't say much about but we all know has nothing to do with whether they're "file systems"....

You rather simply choose to "not buy" a perfectly understandable, well worded, technical explanation straight from the horses' mouth(s) and continue your "LUNs on top of a file system" rant because that suits you better.

Okay, if that is your argument....

Best regards,

Geert

PS. Then again to Steve's point, why does this all matter?

If you can't get our systems to work, but our customers can, it sure must be blamed on the "file system", right?

John Martin wrote re: Why NetApp Gets Blogged
on 07-30-2009 2:13 PM

You know this is almost worth a thesis on "Semiotics and the Art of IT Marketing", though hard to do in a blog response I'll do my best.

You say "Filesystem", I say "Fine grained storage virtualisation layer". Throwing words like "emulated" in the perjorative sense with respect to block access are interesting, and possibly ironic on blog with significant discussions about Lefhand and EVA considering the V in EVA stands for Virtual.

So Why do we spend time talking about whether WAFL is a filesystem or not ? Apart from the technical fine points (and to be clear I completely agree with Kostadis there), its also an exercise in semiotics to help differentiate LUNs on top of WAFL vs LUNs on top of  something like say NTFS or ext3 which are the filesystems most people are familiar with.

By using words like "emulated" and "LUNs on top of a filesystem", this creates an association in the minds of many with the way that you might run a Linux server in Vmware on a Windows workstation with the "LUN" (in this case a thin provisioned vmdk), sitting as a file on top of NTFS. I do this on a regular basis, and yes the performance can sometimes be as bad as you might expect. This is partly because I'm working on a crummy 5.4K laptop SATA drive and partly because my NTFS filesystem, despite the best efforts of jkdefrag, fragments horribly and rather rapidly thanks to outlook, google desktop, my day to day workload, and some rather odd decisions about freespace allocation that Windows seems to make. From my experience, NTFS truly deserves the crown as the King of fragmentation despite what some unnamed marketing material might say otherwise.

Using WAFL as the data layout engine for LUNs is not like this, Kostadis has gone into reasonably lengthy detail to explain exactly how and why this is so, if you read it for its informative content rather than looking for debating points, you'll find out why. Vmware also runs "LUNS" (vmdks) on top of a fileystem (VMFS), this isnt an issue for Vmware for exactly the same reason its not an issue for NetApp. The engineering that has gone into these data layout engines ensures very high performance which has been proven over and over again with benchmarks from both organisatons. Now VMFS and WAFL have some different characteristics, the allocation granularity in VMFS is more course than WAFL, and VMFS allows many systems to access a single datastore simultaneously, whereas WAFL allows for high performance thin provisioning and deduplication. THe thing they have in common is that from a users perspective it is more accurate to think of them as fine grained storage virtualisation layers, rather than as filesystems in the common sense of the word.

What does this mean for WAFL, should we change the name to take out that pesky F which stands for File to stop people thinking about it in the wrong way ? Hmm, well the options arent that good, changing it to Write Anywhere Data Layout turns it into WADL .. I can see Mr Hollis having a field day with that one looks like a duck, quacks like a duck, walks like a duck, oh no, dont buy NetApp, it means you'd be putting your LUNs on top of a duck and that has to be a bad thing, right ? In any case  Dave Hitz coined the term WAFL and we like Dave hes one of the good guys. I like the term WAFL, WAFL's are tasty pieces of yummy goodness that you can put into a toaster and goodness knows that here at NetApp we love our appliances, you plug them in and they work, just like a FAS Array :-)

So it's late now for me, and I've finished swapping my Manly Stories, in the morning I'm making WAFLs !!

Jim wrote re: Why NetApp Gets Blogged
on 07-31-2009 5:25 AM

Hi Geert,

Since you want to keep discussing this, I should mention that you forgot to respond to my point about all the references to ‘WAFL file system’ that are found today in even your most current white papers. My position is that before I’d give the competition a hard time about referring to WAFL as a file system, I’d have cleaned up my own website first. I mean, according to you guys it hasn’t been ‘just a file system’ for years, so you’ve had enough time. And while I don’t expect you to clean up old references on the web, if you consider your point to be important and you want it to be taken seriously, then your own white papers would have been the right place to start, not here.  

Also, your technical arguments were not lost on me. I just thought that there was a reasonable sized hole in them – and one, by the way, that Shaun, noticed, too, i.e., that your LUNs have continued their same file-like behavior over the years regardless of what you say you’ve done to WAFL.

And now, as to your point about ‘WAFL’ being more than a file system, you need to go back a dozen responses and read my response to Kostadis where I already accepted the point that ‘WAFL contained’ a file system. We’ve got enough to argue about without continuing to argue over things that have already been accepted.

And by the way, the NetApp systems we own are working perfectly. We even got better numbers on some of the tests than your own test engineer at Avanade got. However, those numbers were still way below what we are seeing from comparable arrays in the industry - not just our own.  

Best regards,

Jim  

Dimitris Krekoukias wrote re: Why NetApp Gets Blogged
on 10-07-2009 8:55 AM

I wonder, in HP's benchmark: How were the LUNs created? Were they properly aligned? Was the "windows" LUN type selected? Was SnapDrive used to create them?

Let's also mention that behemoths like Oracle, Yahoo, BT, Siemens, Mercedes, T-Systems etc. buy hundreds of PB of NetApp gear.

If it didn't work, they wouldn't be buying the stuff.

D

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Type the numbers and letters above:
Powered by Community Server (Non-Commercial Edition), by Telligent Systems