Making sense of WAFL - Around the Storage Block Blog -
Making sense of WAFL

By Karl Dohm, HP Storage Architect

Extensible NetApp Blog (http://blogs.netapp.com/extensible_netapp) contains some posts describing WAFL.  It sums WAFL up as an internal component which...

...provides mechanisms for building file-system semantics, it manages the on-disk format, it manages the free and allocated space, and provides a logical and physical volume manager.

and further making the argument that it is not a file system, but rather an essential part of one.   Fair enough, calling it a file system might be splitting hairs and technically incorrect, but given the amount of confusion across vendors in the industry, it is likely that this common misunderstanding emanated from NetApp's own documentation.  

Whether the details of the FAS internals are technically part of WAFL, or part of OnTap, or part of something else, the main point is that none of that is particularly relevant to the Storage Administrator.    What matters most is performance, space efficiency, and ease of use. 

From this point on I'm not going to split hairs and discern between WAFL and the remainder of FAS internal software/firmware - because no one but NetApp architects really care.  For the benefit of simplicity lets call it all WAFL.

WAFL is a rather unique approach to organizing data on the spindles and controlling the flow of the data to the spindles.  No doubt WAFL can come across as impressive in sales presentations because it is very different than the approach used by EMC, IBM, and HP.  In this series of posts, we will explore the other side of WAFL, highlighting some of the problems that WAFL brings to the Storage Administrator, none of which we expect NetApp will fully acknowledge.

Today lets touch on fragmentation.  Some in the industry say, WAFL is "fragmentation by design".   I didn't make it up, but like WAFL being called a file system, its one of those things you tend to hear if the conversation is around NetApp.  This statement strikes me as accurate because WAFL tries to do full stripe writes whenever it can, meaning that it prioritizes writing non sequential blocks in the same stripe over read modify write operations associated with RAID-4 or RAID-DP parity calculation. 

Translating that to the world of the Storage Administrator, this means that gradually the throughput of the FAS degrades over time when the workload has a random component.  Most real world workloads have a random component.  Applications along the line of Microsoft Exchange present a nightmare situation for the FAS.  The throughput degradation can be significant, and throughput can be unpredictable because it varies depending on history of writes. 

For those that question this assertion as being somehow biased, try the following.  Take a FAS system and create a new volume with a new LUN.  Baseline the system by running a sequential read workload and measuring the result.  Notice that the number is already not very impressive.  Next run a few hours of random workload, say 8KB 50/50 R/W, which is similar to MS Exchange.  Now try the sequential read load again and observe the new throughput.  Chances are you will have some new questions for your NetApp sales rep.  

Next time we will discuss the benefit of reallocation, NetApp's answer to fragmentation, and explore how much this really helps the problem.


Posted 11-03-2008 7:39 PM by CalvinZ
Filed under: , ,

Comments

kostadis roussos wrote re: Making sense of WAFL
on 11-04-2008 12:02 AM

Hi!

There are three ways to read this blog post.

1. FAS systems are not appropriate for Exchange.

Rather than argue my case, I'll let Avanade argue my case, that our systems are appropriate for Exchnage.

media.netapp.com/.../Avanade_Testing_Center_NetApp_Whitepaper_Exchange.pdf

2. FAS systems perform poorly over time.

We have an SPC-1 benchmark which measures a similar work load you describe. This particular SPC-1 benchmark measured our performance over a protracted period time.

www.storageperformance.org/.../a00062_NetApp_FAS3040-48hr-sustain_executive-summary.pdf

3. FAS systems perform poorly as capacity gets reduced.

Alex McDonald put together a response to that argument.

blogs.netapp.com/.../finding-a-pair.html

Look, at the end of the day, trying to make assertions about WAFL behavior by reversing engineering our architecture is a dangerous game. There are techniques and mechanisms for solving those problems that exist. And we have smart people working on solving them.

cheers,

kostadis

Sjon wrote re: Making sense of WAFL
on 11-04-2008 8:55 AM

This must be a joke.

Riser wrote re: Making sense of WAFL
on 11-06-2008 4:03 PM

Thank You for this post, it is a clear example of misdirection (although you thought you had hit something).

Your statement "Translating that to the world of the Storage Administrator, this means that gradually the throughput of the FAS degrades over time when the workload has a random component.  Most real world workloads have a random component.  Applications along the line of Microsoft Exchange present a nightmare situation for the FAS." actually refers to an Ideal situation for FAS as it is optimized for random workloads.

No matter how powerfull your missile, it is useless if you aim wrong.

I will give you an exercise.

1.set up a realistic random IO test.

2.Take a timing.

3.Scramble your data

4. Take a timing

repeat steps 3 and 4 many times

I hope you learn something about how irrelevant fragmentation is for random IO.

CalvinZ wrote re: Making sense of WAFL
on 11-08-2008 12:13 AM

Instead of trying to address the comments here (which a couple had no substance), Karl posted a follow up that you can find at www.communities.hp.com/.../making-sense-of-wafl-part-2.aspx

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Type the numbers and letters above:
Powered by Community Server (Non-Commercial Edition), by Telligent Systems