By Karl Dohm, HP Storage Architect
Sorry for the delay, I'm just finally getting back to this making the third installment on this thread. For the previous posts see threads Making Sense of WAFL and Making Sense of WAFL Part 2. In this series it we are trying to seek technical truths in the highly varying posts about NetApp performance, capacity utilization, and usability.
I want to thank Patrick for his post as it was very beneficial in helping to figure out how the Avanade tests were run. Clearly these tests were run with careful attention to detail.
As I said previously, I do like the notion of doing IOMeter based throughput tests to compare arrays. Relatively speaking it is simple to configure, there is little ambiguity, anyone out there who is listening can repeat the test, and it offers us a fair opportunity to compare various arrays in an apples to apples fashion. IOMeter can be modified to push nearly any load we like, so if someone has a favorite workload we can focus on whatever flavor of load we like. Most other approaches are a bit too loose for me, leaving too much room for interpretation and variations.
Patrick mentions the true test, i.e. that there are many happy NetApp customers who are running Exchange. There is truth to this of course, but it isn't a good basis of comparison because every major array vendor has happy Exchange customers. However, its reasonable to say that these installations can't know what they don't know.
I'm not saying you can't run Exchange successfully with NetApp. In fact I'm sure you can. The question looking for an answer is whether the user gets good value in choosing NetApp to run Exchange. Are they perhaps buying more iron than they need in order to handle their workload? So if its ok with everyone, lets stick with simple IOMeter to probe this further and keep things from getting too hazy.
Unfortunately I don't have access to a FAS3070, so I reran the test as described by Patrick on a FAS3050c. My numbers for 128 threads came in at 25.5 MB/s and 2950 IOPs with an average response time 43msec - on a completely defragmented LUN (best possible state). This is still a long way off from the Avanade reported results of 48.1 MB/s with an average response time of 29.1 msec on a fragmented LUN.
Rather than comparing this result to our entry level EVA again, I'll just make an assumption that I've done something wrong. So, please help me further iterate in understanding what it takes to achieve the results as reported in the paper.
One interesting clue is that Patrick mentioned was that MPIO was not used in the test. I find this to be unexpected as this either means that all the load was run down a single 2Gb FC path to the FAS, or IOMeter was somehow configured to drive load down multiple paths to the same LUN, or some other multipathing product was used. Given the heavy, random nature of the load, I would have expected the use of multiple paths just in case the host port became a bottleneck. Perhaps this disconnect triggers figuring out what is different in the environment.
Here is some more config info. I'm using a Proliant DL380-G5 running Windows Server 2003 with Emulex 4Gb/s dual ported HBA through 4Gb Brocade switches. The Emulex max queue depth is set wide open to 254. I am using MPIO RR to give the FAS the benefit of having multiple paths share the load. Ontap is 7.2.2.
Posted
11-22-2008 5:03 AM
by
CalvinZ