Making sense of WAFL – Part 3 - Around the Storage Block Blog -
Making sense of WAFL – Part 3

By Karl Dohm, HP Storage Architect

Sorry for the delay, I'm just finally getting back to this making the third installment on this thread.  For the previous posts see threads Making Sense of WAFL and Making Sense of WAFL Part 2.  In this series it we are trying to seek technical truths in the highly varying posts about NetApp performance, capacity utilization, and usability.

I want to thank Patrick for his post as it was very beneficial in helping to figure out how the Avanade tests were run.  Clearly these tests were run with careful attention to detail.

As I said previously, I do like the notion of doing IOMeter based throughput tests to compare arrays.  Relatively speaking it is simple to configure, there is little ambiguity, anyone out there who is listening can repeat the test, and it offers us a fair opportunity to compare various arrays in an apples to apples fashion.  IOMeter can be modified to push nearly any load we like, so if someone has a favorite workload we can focus on whatever flavor of load we like.  Most other approaches are a bit too loose for me, leaving too much room for interpretation and variations. 

Patrick mentions the true test, i.e. that there are many happy NetApp customers who are running Exchange.  There is truth to this of course, but it isn't a good basis of comparison because every major array vendor has happy Exchange customers.  However, its reasonable to say that these installations can't know what they don't know. 

I'm not saying you can't run Exchange successfully with NetApp.  In fact I'm sure you can.  The question looking for an answer is whether the user gets good value in choosing NetApp to run Exchange.  Are they perhaps buying more iron than they need in order to handle their workload?  So if its ok with everyone, lets stick with simple IOMeter to probe this further and keep things from getting too hazy.

Unfortunately I don't have access to a FAS3070, so I reran the test as described by Patrick on a FAS3050c.  My numbers for 128 threads came in at 25.5 MB/s and 2950 IOPs with an average response time 43msec - on a completely defragmented LUN (best possible state).  This is still a long way off from the Avanade reported results of 48.1 MB/s with an average response time of 29.1 msec on a fragmented LUN.

Rather than comparing this result to our entry level EVA again, I'll just make an assumption that I've done something wrong.   So, please help me further iterate in understanding what it takes to achieve the results as reported in the paper.

One interesting clue is that Patrick mentioned was that MPIO was not used in the test.  I find this to be unexpected as this either means that all the load was run down a single 2Gb FC path to the FAS, or IOMeter was somehow configured to drive load down multiple paths to the same LUN, or some other multipathing product was used.   Given the heavy, random nature of the load, I would have expected the use of multiple paths just in case the host port became a bottleneck.   Perhaps this disconnect triggers figuring out what is different in the environment.

Here is some more config info.  I'm using a Proliant DL380-G5 running Windows Server 2003 with Emulex 4Gb/s dual ported HBA through 4Gb Brocade switches.  The Emulex max queue depth is set wide open to 254.   I am using MPIO RR to give the FAS the benefit of having multiple paths share the load.  Ontap is 7.2.2.


Posted 11-22-2008 5:03 AM by CalvinZ
Filed under: , , ,

Comments

Pat Cimprich wrote re: Making sense of WAFL – Part 3
on 11-24-2008 5:15 PM

Hi Karl,

I've posted some further clarifications regarding our testing and my test configuration. You can find those results on my blog here: blog.avanadeadvisor.com/.../12133.aspx.

Pat

John Martin wrote re: Making sense of WAFL – Part 3
on 12-01-2008 5:27 AM

If you're throwing a workload at the system that is designed to be spindle constrained, you'd also need to provide the following information before someone can help answer any performance related questions.

Number and type of disks

RAID Group configuration

How many aggregates

OnTap tuning options (e.g. volume options, and WAFL Setflags etc)

cluster mode configuration (e.g. Single Systems Image or other)

NetApp Host Utilities revision on the client

Multipath configuration

etc

You might also want to generate a perfstat report and upload it to a publicly available ftp site for an expert to analyse. That will help isolate exactly what needs to be done to improve the performance of your system.

For example, Its quite possible that if you have a relatively small number of spindles and a cache unfriendly workload, that you'll be better off allocating all of those spindles to a single aggregate rather than creating one aggregate for each controller. Another option might be that If you are planning to run the 3050 in a SAN only configuration, then you might also consider some of the tuning optimisations for these workloads too.

If you are running a properly licensed machine that has a serial number registered to HP, then you should have access to a NOW account. If so then you might consider opening a call with NetApp's global support centre, they're more than happy to help customers get the best out of their NetApp kit. I'm sure that HP wouldn't run a system in contravention of a software license agreement, so you should be good to go.

Also, the 3050 and the 3070 are very different beasts, the 3070 has roughly 400% more processing power, twice the memory, 4Gbit FC, 400% larger NVRAM size, faster chipsets etc. I wouldnt expect a 3050 to perform to the same level as a 3070.

Lastly, you should consider upgrading to OnTap 7.3 which improves performance in a number of ways. One of the advantages of determining the block allocation at the time the data is written is that you can continually improve the algorithms for laying out the data.

I hope this helps.

John

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Type the numbers and letters above:
Powered by Community Server (Non-Commercial Edition), by Telligent Systems