<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://www.communities.hp.com/online/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Reality Check: Server Insights : multi-core processor</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx</link><description>Tags: multi-core processor</description><dc:language>en</dc:language><generator>CommunityServer 2008.5 SP1 (Build: 31106.3070)</generator><item><title>I received a new HPC Multi-core server today – documenting the configuration</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/12/04/i-received-a-new-hpc-multi-core-server-today-documenting-the-configuration.aspx</link><pubDate>Thu, 04 Dec 2008 14:36:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86889</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86889</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/12/04/i-received-a-new-hpc-multi-core-server-today-documenting-the-configuration.aspx#comments</comments><description>&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;One of the complexities in benchmarking applications is defining the server configuration adequately.&amp;nbsp; The purpose of a benchmark is to provide guidance - to demonstrate how to obtain a given performance from a specific application workload.&amp;nbsp; This guidance is not useful if the performance cannot be reproduced.&amp;nbsp; To make benchmark results reproducible, it is necessary to define the server configuration, specifying everything that can affect performance.&amp;nbsp; From a hardware point of view, this is not difficult - specify the model numbers of every component in each server, then specify the model numbers of the components in storage and networking components.&amp;nbsp; Some of this information is available on-line, such as processor model number.&amp;nbsp; Other information, such as model or speed of memory DIMMs, is usually not accessible on-line, but this data is important.&amp;nbsp; For example, some current x86 servers have an option of 667MHz or 800MHz DIMMs, and this choice can affect application performance considerably.&lt;/p&gt;
&lt;p&gt;Identifying the software components can be difficult, since you need to know which components affect the performance of the workload.&lt;/p&gt;
&lt;p&gt;And the most obscure configuration area is firmware - in some cases, versions of firmware have a big impact on workload performance.&amp;nbsp; It is rarely necessary to document the firmware version of the server, but it is a good idea to document firmware versions of networking components.&lt;/p&gt;
&lt;p&gt;Next, it is important to know how the quantities of specific components affect performance.&amp;nbsp; Performance varies with the number of disks internal to the servers, the number controllers connecting the server to external disks, the number and topology of network switches, etc.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;One important variable is the number of memory DIMMs.&amp;nbsp; The number of DIMMs affects performance in two ways - the total amount of memory on the server, and the memory performance.&amp;nbsp; It is useful to run the workload using the maximum number of DIMMs, then repeat the benchmark using ½ as many DIMMs.&amp;nbsp; Memory is expensive, and it is very useful to know how the workload performance varies with memory configuration.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86889" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item><item><title>Virtualization Never Looked So Good</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/21/virtualization-never-looked-so-good.aspx</link><pubDate>Fri, 21 Nov 2008 21:33:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86735</guid><dc:creator>R_Palmer</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86735</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/21/virtualization-never-looked-so-good.aspx#comments</comments><description>&lt;p&gt;Virtualization has become a mainstay, even a business imperative, within most data centers today.&amp;nbsp; Today, every server manufacturer is claiming leadership in virtualization platforms, so the real question you need to ask is, &amp;quot;Why HP?&amp;quot;&amp;nbsp; Well, let me give you a few of the key reasons why HP is the right choice for your trusted advisor when it comes to virtualization.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;Whether it&amp;#39;s for server consolidation, workload balancing or security and disaster recovery purposes, the dramatic growth of virtual machine implementation is mind boggling.&amp;nbsp; This is not necessarily coupled to the comfort level of IT managers though, as there continues to be a disarray of management tools and interfaces required to manage both physical and virtual machines.&amp;nbsp; HP has taken this challenge head on with our new Insight Control Environment (ICE) which gives you a single pane of glass to monitor and manage both your virtual machines and physical machines simultaneously.&lt;/p&gt;
&lt;p&gt;On November 17&lt;sup&gt;th&lt;/sup&gt;, we introduced the new HP ProLiant DL385 G5p server based on the new Quad-Core AMD Opteron 2300 processor (code named &amp;quot;Shanghai&amp;quot;).&amp;nbsp; The HP ProLiant DL385 G5p is designed and optimized specifically for virtualization and consolidation environments.&amp;nbsp; In fact, we looked at every facet of the product in our development effort with a few key principles in mind:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Provide a strong investment protection value for existing AMD Opteron customers&lt;/li&gt;
&lt;li&gt;Improve the overall performance in the same, or lower, power envelope&lt;/li&gt;
&lt;li&gt;Optimize the virtualization performance &lt;/li&gt;
&lt;li&gt;Increase the number of virtual machines on a single platform&lt;/li&gt;
&lt;li&gt;Simplify and improve the security of deploying of virtual machines&lt;/li&gt;&lt;/ol&gt;
&lt;p&gt;The new DL385 G5p is now available, having met and exceeded these objectives, arriving ahead of schedule.&amp;nbsp; With the new 45nm &amp;quot;Shanghai&amp;quot; processor, with a 6MB L3 cache, we have achieved higher performance and maintained Socket F compatibility, yet at a substantially lower power envelope.&amp;nbsp; HP, unlike some of our competitors, has designed the DL385 G5p to provide our customers with the flexibility to choose between single processor or dual processor configurations, without sacrificing any of the improvements offered by Shanghai.&amp;nbsp; We have optimized the real estate inside of the 2U chassis to deliver twice the memory, twice the NICs and 6 times the storage capacity as the previous generation DL385.&amp;nbsp; With 16 DDR2 DIMM sockets, you can reach 128GB of memory using 8GB high performance DIMMs.&amp;nbsp; This allows for significantly more virtual machines to run on this single platform.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;In addition, with virtual machine deployment, networking capabilities are critical.&amp;nbsp; In the new DL385 G5p, we integrated 4 gigabit Ethernet ports to free up the PCI-Express slots to meet your flexible design requirements and increase the bandwidth for the virtual machines running on the server.&amp;nbsp; The flexibility of the PCI bus slot configuration is another key benefit of the HP ProLiant DL385 G5p.&amp;nbsp; With up to 6 PCI-e slots, additional networking, storage and application specific I/O adapters can be added to improve the virtualization experience.&amp;nbsp; There are many customers who require legacy PCI-X controller support and some that are looking for the latest and greatest x16 PCI-e controller support.&amp;nbsp; In the DL385 G5p, we have given those customers the flexibility to configure the PCI bus slots to meet their unique application and environment needs.&lt;/p&gt;
&lt;p&gt;Storage performance is a key differentiator in the market today, even more so in virtualization environments.&amp;nbsp; That&amp;#39;s why we increased the internal disk storage on the DL385 G5p to 16 2.5&amp;quot; SAS or SATA disk drives; or 6 3.5&amp;quot; SAS or SATA disk drives.&amp;nbsp; This provides up to 6TB of internal disk storage to be shared between the virtual machines with improved performance when compared to external shared disk solutions.&amp;nbsp; By bringing the disks closer to the processors, the performance bandwidth issues are minimized and the overall performance of the storage is improved.&lt;/p&gt;
&lt;p&gt;This server is built with efficiency in mind!&amp;nbsp; The HP ProLiant DL385 G5p helps save on energy costs with the industry&amp;#39;s highest efficiency power supplies that currently power the complete portfolio of HP ProLiant servers.&amp;nbsp; Finally, with the internal USB slot, you can utilize an iHypervisor on a USB key that allows you to deploy virtual machines securely and easily.&lt;/p&gt;
&lt;p&gt;In closing I&amp;#39;d like to quote Paul Gottsegen, VP of Marketing for HP&amp;#39;s Industry Standard Servers division, from a recent interview: &lt;i&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;&amp;quot;Customers can drive down costs through new &amp;quot;Shanghai&amp;quot;-based HP ProLiant servers that set new levels of power efficiency and performance.&amp;nbsp; HP has experienced unparalleled success over the past four years working with AMD in bringing AMD Opteron processor-based platforms to customers of all sizes. Early results indicate &amp;quot;Shanghai&amp;quot; is a winner.&amp;quot;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;HP continues to lead the market in delivering more ProLiant servers featuring AMD Opteron processors than any other system manufacturer.&amp;nbsp; In fact, HP has shipped over four times more than IBM and 1.5 times more than Dell &lt;i&gt;(IDC- Q208 Server Tracker, Oct.08)&lt;/i&gt;.&amp;nbsp; Have confidence in choosing HP as your trusted advisor as we continue to set the bar for performance, energy efficiency and optimization in virtualization environments.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;For more information click here on the &lt;a href="http://www.hp.com/servers/proliantDL385G5p"&gt;DL385 G5P server&lt;/a&gt; link, or listen to the replay from &lt;a href="http://www.youtube.com/watch?v=nncq0SoiQfE"&gt;Paul Gottsegen at the AMD&lt;/a&gt; Shanghai launch event on November 13&lt;sup&gt;th&lt;/sup&gt;.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86735" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/virtualization/default.aspx">virtualization</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/x86/default.aspx">x86</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/AMD/default.aspx">AMD</category></item><item><title>I received a new HPC Multi-core server today – Live from SC08</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/18/i-received-a-new-hpc-multi-core-server-today-live-from-sc08.aspx</link><pubDate>Tue, 18 Nov 2008 22:59:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86675</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86675</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/18/i-received-a-new-hpc-multi-core-server-today-live-from-sc08.aspx#comments</comments><description>&lt;p class="MsoNormal" style="MARGIN:0in 0in 0pt;"&gt;&lt;font size="3"&gt;&lt;font face="Times New Roman"&gt;This week, I am attending the annual SC08 conference.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;For 2 decades, the tradition in HPC is to announce and demonstrate new products at the SC conference, and there is a lot to absorb.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="MARGIN:0in 0in 0pt;"&gt;&lt;font face="Times New Roman" size="3"&gt;For me, the event started with a 2-day HP user conference, attracting some of HP’s largest HPC customers.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;It was a very full 2 days, with lectures on many subjects from customers, Intel, AMD, software development companies, and of course HP.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;Even though I work here, I learned a lot about projects in other parts of the company.&lt;/font&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="MARGIN:0in 0in 0pt;"&gt;&lt;font face="Times New Roman" size="3"&gt;At the SC08 show, I saw my 1&lt;sup&gt;st&lt;/sup&gt; HP POD – portable optimized datacenter – a 40-foot-long shipping container on the outside, and a self-contained computer facility on the inside.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;This excellently-designed mobile facility can contain 3500 servers or 12,000 disk drives, in standard 19” racks.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;Just provide a flat location, power, chilled water, and a network cable, and you have a new computer room.&lt;/font&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="MARGIN:0in 0in 0pt;"&gt;&lt;font face="Times New Roman" size="3"&gt;To make the week more interesting, AMD announced the new Opteron Shanghai processor – higher clock speed, bigger cache, faster Northbridge, and improved performance:power ratio.&lt;/font&gt;&lt;/p&gt;
&lt;p class="MsoNormal" style="MARGIN:0in 0in 0pt;"&gt;&lt;font face="Times New Roman" size="3"&gt;One of the good things about big companies is that they spawn startups.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;This year, a startup company announced a new HPC architecture at SC08.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp; &lt;/span&gt;Nearly all the employees are my friends and previous HP co-workers.&lt;span style="mso-spacerun:yes;"&gt;&amp;nbsp;&amp;nbsp; &lt;/span&gt;Check out &lt;/font&gt;&lt;a href="http://www.conveycomputer.com/"&gt;&lt;font face="Times New Roman" size="3"&gt;www.conveycomputer.com&lt;/font&gt;&lt;/a&gt;&lt;/p&gt;&lt;font face="Times New Roman" size="3"&gt;&amp;nbsp;&lt;/font&gt; 
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86675" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/cooling/default.aspx">cooling</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/power/default.aspx">power</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/container/default.aspx">container</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/datacenter/default.aspx">datacenter</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HP+POD/default.aspx">HP POD</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/pod/default.aspx">pod</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/x86/default.aspx">x86</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/AMD/default.aspx">AMD</category></item><item><title>AMD launches quad-core Shanghai processors</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/17/amd-launches-quad-core-shanghai-processors.aspx</link><pubDate>Mon, 17 Nov 2008 01:47:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86632</guid><dc:creator>aimeeschoaf</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86632</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/17/amd-launches-quad-core-shanghai-processors.aspx#comments</comments><description>&lt;p&gt;&lt;a title="&amp;quot;Shanghai&amp;quot; written in Chinese" href="http://en.wikipedia.org/wiki/Image:Zh-Shanghai.svg"&gt;&lt;/a&gt;This is an exciting time for our partner, AMD, as they launch their newest family of processors, and HP is pleased that our ProLiant portfolio will offer new capabilities computing to customers of all sizes.&amp;nbsp; The new &amp;quot;Shanghai&amp;quot; products (Quad-Core AMD Opteron 2300 processors) will help HP customers gain greater computing power and business results when they build their infrastructure on HP ProLiant ML, DL and BL servers.&amp;nbsp; Whether they are using virtualization technology to optimize their IT assets or they are consolidating their server inventory, HP ProLiant servers featuring the Quad-Core AMD Opteron 2300 processors will help customers do more business in a smaller footprint.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;Around the globe, customers using AMD Opteron-based HP ProLiant servers value investment protection, consistent performance improvement, power efficiency and optimized design elements that enhance their virtualization strategies.&amp;nbsp; The Quad-Core AMD Opteron 2300 processors deliver on these expectations with the larger 6GB L3 cache, Socket F compatibility, decreased power envelope, improved performance/watt and enhanced virtualization indexing.&amp;nbsp; Watch for news on HP ProLiant offerings featuring the Quad-Core AMD Opteron 2300 processors.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;- Rich Palmer&lt;/p&gt;
&lt;p&gt;Director, HP Technology Strategy&lt;/p&gt;
&lt;p&gt;Industry Standard Servers&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86632" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/x86/default.aspx">x86</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/BladeSystem/default.aspx">BladeSystem</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/AMD/default.aspx">AMD</category></item><item><title>I received a new HPC Multi-core server today – How to measure the power usage</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/12/i-received-a-new-hpc-multi-core-server-today-how-to-measure-the-power-usage.aspx</link><pubDate>Wed, 12 Nov 2008 00:38:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86582</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86582</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/12/i-received-a-new-hpc-multi-core-server-today-how-to-measure-the-power-usage.aspx#comments</comments><description>&lt;p&gt;Given that it is important to measure power usage and correlate it to application performance, how do you measure the power?&lt;/p&gt;
&lt;p&gt;We use 2 different methods - one for rack-mounted servers and another for blade servers. &amp;nbsp;The rack-mounted servers do not provide power meters, so we bought a power meter.&amp;nbsp; We plug the server into the power meter, so we are measuring the total power used. &amp;nbsp;Then, with a simple PC interface, we allow the application user on the server to obtain continuous power data which is easy to correlate with the applications.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;This is easy for the users, but it requires planning and logistics and some work by our system managers, to connect the meter to the right server at the right time.&lt;/p&gt;
&lt;p&gt;We often want to measure the power of a cluster running one HPC application in parallel, and it is usually sufficient to measure the power of any one server in the cluster running the application.&lt;/p&gt;
&lt;p&gt;It is easier to measure power on an HP blade enclosure, since the enclosure contains power measurement capability and provides this data in a usable way.&amp;nbsp; The available data includes the total enclosure power and also the power used by each blade server and each fan in the enclosure.&amp;nbsp; We integrated this information with the Platform Computing LSF job scheduler. &amp;nbsp;Now, users of our blade servers submit their jobs via LSF and automatically receive their power usage data as part of the job.&lt;/p&gt;
&lt;p&gt;Next week, I expect to post a message from the SC08 conference.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86582" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/power/default.aspx">power</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/management/default.aspx">management</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/BladeSystem/default.aspx">BladeSystem</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item><item><title>I received a new HPC Multi-core server today – Intro to power measurement</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/06/i-received-a-new-hpc-multi-core-server-today-intro-to-power-measurement.aspx</link><pubDate>Thu, 06 Nov 2008 00:02:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86515</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86515</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/11/06/i-received-a-new-hpc-multi-core-server-today-intro-to-power-measurement.aspx#comments</comments><description>&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Until a couple of years ago, when we referred to performance measurement of an application, we meant the amount of time that it took the job to run vs. the specific resources it used - number of cores, number of servers if you are using a cluster, the specific characteristics of the server cores and memory and other server specs, plus IO/storage resources and specs.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;Basically, we only measured one thing, the elapsed time of the job.&amp;nbsp; Then, using the resources and specs, we computed lots of things - throughput efficiency, parallel scalability and efficiency, performance per core or per server, IO metrics, etc, etc.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Now, we make an additional measurement - power utilization, which we correlate in time with the execution of an application.&amp;nbsp; We want to know the average power used during the execution of a single job, and we also look at the variation of power during a job, and the maximum power used.&lt;/p&gt;
&lt;p&gt;Of course, lots of people measure power used by computers.&amp;nbsp; But, since most of these people are system managers or system designers, they don&amp;#39;t have a reason to correlate power with specific applications and compute jobs. &amp;nbsp;They want to know the average and peak power used to run their overall workload, so they can plan for current and future power requirements. &amp;nbsp;This is important work, but it does not give them the ability to optimize their workload.&lt;/p&gt;
&lt;p&gt;If you measure the average power used during the execution of one compute job, and you multiply that power by the elapsed time of the job, you have Application Energy - the electrical energy used to run that specific job. &amp;nbsp;This is a very convenient quantity, since it gives you a single number that relates power usage to compute jobs. &amp;nbsp;You can use Application Energy to optimize your workload, just as you use elapsed time.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;A couple of examples:&lt;/p&gt;
&lt;p&gt;1. &amp;nbsp;You can measure application energy for a given set of applications on two or more different server models, and then select the more energy-efficient model. &amp;nbsp;You can use this App Energy comparison together with elapsed time comparison, and then make speed vs. energy tradeoffs. &amp;nbsp;If a job runs 30% faster but consumes 50% more Application Energy on Server A than it does on Server B, which is a better choice for your requirements?&lt;/p&gt;
&lt;p&gt;2. &amp;nbsp;We are also using Application Energy to determine the most efficient way to run applications which run in parallel on a cluster of servers - a common way to run HPC codes. &amp;nbsp;&amp;nbsp;For one common HPC application, we ran the same job at 3 levels of parallelization and compared the elapsed times and Application Energy. &amp;nbsp;We showed that the job used only 4% more Application Energy running 32-way-parallel (on 32 cores) vs. 16-way parallel. &amp;nbsp;But the job used 20% more Application Energy running 64-way-parallel vs. 32-way-parallel. &amp;nbsp;In other words, there is very little energy cost using 32 cores and returning the results to the user much faster vs. using 16 cores. &amp;nbsp;But there is a substantial energy cost to use 64 cores, which returns the results even faster.&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Does anyone find this interesting, or agree (or disagree) with this approach?&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86515" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/scale-out/default.aspx">scale-out</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/power/default.aspx">power</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item><item><title>I received a new HPC Multi-core server today – Running on a subset of cores</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/29/i-received-a-new-hpc-multi-core-server-today-running-on-a-subset-of-cores.aspx</link><pubDate>Wed, 29 Oct 2008 20:57:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86382</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86382</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/29/i-received-a-new-hpc-multi-core-server-today-running-on-a-subset-of-cores.aspx#comments</comments><description>&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;In some situations, it is useful to not use some of the cores on a server.&amp;nbsp; Since most processors do not have sufficient memory BW to support a memory BW-intensive code running on all cores, such codes do not &amp;quot;scale&amp;quot; perfectly. &amp;nbsp;There are 2 common ways to define scaling - serial-job throughput workload (multiple serial jobs), and a single parallel code workload.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;If scaling is perfect, then an 8-core server can run 8 copies of a serial job in the same time as one serial job.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;For a highly scalable application: if scaling is perfect, then an 8-core server runs an 8-way-parallel job 8 times faster than a serial job.&lt;/p&gt;
&lt;p&gt;Most HPC jobs can not scale perfectly, so this is an issue.&amp;nbsp; But in most cases, the server can run more total work (jobs per day) using all of its cores than it can if some cores are unused.&amp;nbsp; So why would we consider leaving some cores idle?&amp;nbsp; The primary reason is the cost of running licensed applications.&amp;nbsp; Many HPC applications are licensed on a per-core basis, although the cost may not be linear with the number of cores.&amp;nbsp; It is useful to compare the per-core job performance to the per-core license cost to determine the best performance-to-cost operating point.&lt;/p&gt;
&lt;p&gt;Given that it may be useful to use a subset of the cores, doing so correctly is difficult.&amp;nbsp; You need to know the architecture of your server.&amp;nbsp; Here is an example of an HP ProLiant server containing two Intel Xeon Harpertown quad-core processors.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;-Each processor has a separate connection to the memory system.&lt;/p&gt;
&lt;p&gt;-Each processor has 4 cores.&amp;nbsp; Each pair of cores shares a data cache.&amp;nbsp; The 4 cores share the processor&amp;#39;s memory BW.&lt;/p&gt;
&lt;p&gt;If you draw a picture of this, you will see that not all combinations of cores are equal in terms of cache size and memory BW resources.&lt;/p&gt;
&lt;p&gt;Let&amp;#39;s say we want to run a workload consisting of one parallel job, using only 4 cores in the server.&amp;nbsp; The best performance is obtained using 2 cores on each processor, and selecting cores which do not share a cache (so that one core has full use of the entire shared cache).&amp;nbsp; The next-best performance uses 2 cores per processor, and selecting cores which share a cache.&amp;nbsp; The 3&lt;sup&gt;rd&lt;/sup&gt;-best performance uses 3 cores on one processor and one core on the other processor.&amp;nbsp; The worst performance is obtained using 4 cores on one processor.&lt;/p&gt;
&lt;p&gt;If we want to run a single parallel job on only 2 cores in the server, there are three possible choices.&amp;nbsp; The best performance uses 1 core per processor.&amp;nbsp; The next-best performance uses 2 cores on one processor, selecting cores which do not share a cache.&amp;nbsp; Third best uses 2 cores on one processor, selecting cores which share a cache.&lt;/p&gt;
&lt;p&gt;How big is the performance difference based on these choices?&amp;nbsp; The answer depends on the application, the specific input data, and other factors.&amp;nbsp;&amp;nbsp; But here is an example, for a moderately parallel HPC application.&amp;nbsp; Running the code in parallel using one of its standard performance benchmarks, it runs 4 times faster using all cores (8-way-parallel) than using 1 core (serial).&lt;/p&gt;
&lt;p&gt;Running 4-way-parallel using the above choices of 4 cores, the performance varies from 3.4 (best case) to 2.9 (next-best case) to 2.4 (third-best case) to 2.3 (worst case) times faster than serial.&lt;/p&gt;
&lt;p&gt;Running 2-way-parallel using the above choices of 2 cores, the performance is either 2.0, 1.8, or 1.5 times faster than serial.&lt;/p&gt;
&lt;p&gt;Clearly we can lose a lot of performance if we do not select the cores carefully!&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;This analysis would be different for different processors or different server configurations.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86382" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item><item><title>I received a new HPC Multi-core server today – Measuring memory BW</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/21/i-received-a-new-hpc-multi-core-server-today-measuring-memory-bw.aspx</link><pubDate>Tue, 21 Oct 2008 14:39:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86221</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86221</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/21/i-received-a-new-hpc-multi-core-server-today-measuring-memory-bw.aspx#comments</comments><description>&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;The lmbench memory latency benchmark gave us a lot of information about the new system.&amp;nbsp; Next, we ran the STREAM memory BW benchmark suite.&lt;/p&gt;
&lt;p&gt;Before running HPC codes, we needed to ensure that hardware multi-threading is disabled, if it exists.&amp;nbsp; This feature allows each core on the server to appear to the OS as if it were 2 cores.&amp;nbsp; For many applications, this technique increases throughput.&amp;nbsp; But for nearly every HPC application, a server has insufficient memory BW to dilute it with multi-threading.&lt;/p&gt;
&lt;p&gt;We then measured memory BW - we used the industry standard benchmark, STREAM.&lt;/p&gt;
&lt;p&gt;There are four benchmarks in the STREAM suite: COPY, ADD, SCALE, and TRIAD.&amp;nbsp; There are multiple ways to run each of these benchmarks.&amp;nbsp; We started by running each benchmark in serial (1 non-threaded copy of each benchmark), one at a time.&amp;nbsp; This shows us the maximum amount of memory BW that one core can consume.&lt;/p&gt;
&lt;p&gt;Next, we measured memory BW consumed by multiples of 2 cores, up to the maximum number of cores.&amp;nbsp; There are 2 obvious ways to do this: &amp;nbsp;run multiple simultaneous copies of the serial benchmark, or run a single benchmark that is multi-threaded using SMP parallelism (via OpenMP).&amp;nbsp; We usually run the SMP-parallel STREAM benchmarks.&lt;/p&gt;
&lt;p&gt;We ran each SMP-parallel STREAM benchmark from 2 cores to the maximum, in 2 different arrangements:&lt;/p&gt;
&lt;p&gt;-Packed - use all cores on each processor before going to next processor&lt;/p&gt;
&lt;p&gt;-Cyclic - alternate cores among the processors&lt;/p&gt;
&lt;p&gt;In Packed mode, the memory BW usually increases monotonically with the number of cores.&lt;/p&gt;
&lt;p&gt;In Cyclic mode, the memory BW has a zig-zag pattern and often approaches the maximum memory BW of the system when only one core per processor is used.&lt;/p&gt;
&lt;p&gt;By observing these measurements, it is possible to determine how the memory BW is allocated to the processors.&amp;nbsp; Each processor may have a unique path to the memory system and therefore an amount of memory BW that is (roughly) independent of the memory usage of the other processors.&amp;nbsp; Or, two or more processors may share a path to memory.&lt;/p&gt;
&lt;p&gt;Normally, the maximum measurement for each benchmark occurs when all cores in the server are used.&amp;nbsp;&amp;nbsp; But we have seen strangely architected systems which reach a maximum memory BW using a subset of the cores, after which memory BW declines with additional cores.&lt;/p&gt;
&lt;p&gt;One interesting bit of information is obtained by comparing the memory BW measurement used by all the cores on one processor vs. the BW used by 1 core.&amp;nbsp; This tells us the fraction of the processor&amp;#39;s memory BW that a single core can consume.&amp;nbsp; In many cases, one core can consume the entire memory BW available to the entire processor; one core can often consume a significant fraction of the total system&amp;#39;s memory BW.&amp;nbsp; When applications requiring high memory BW are run on such a server, it is impractical to run the application on more than one core per processor (or even one core per server).&lt;/p&gt;
&lt;p&gt;All these measurements may seem excessive, but memory BW is often the performance limiter in HPC applications.&amp;nbsp; If an application developer knows the STREAM information, the application can be run in an optimal way on a server - using a subset of the cores to run the application, without demanding more memory BW per core than the system can provide.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86221" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item><item><title>I received a new HPC Multi-core server– Learning from standard benchmarks</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/15/i-received-a-new-hpc-multi-core-server-learning-from-standard-benchmarks.aspx</link><pubDate>Wed, 15 Oct 2008 22:07:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86159</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86159</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/15/i-received-a-new-hpc-multi-core-server-learning-from-standard-benchmarks.aspx#comments</comments><description>&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Now that we have configured the hardware components and firmware settings in a known and hopefully optimum way, it is time to run the 1&lt;sup&gt;st&lt;/sup&gt; performance test.&amp;nbsp; Personally, I like to run the memory bandwidth benchmark lmbench first, since there is a lot to learn from it.&amp;nbsp; This benchmark computes the time required to move different amounts of data from the caches and memory to a core.&lt;/p&gt;
&lt;p&gt;A modern server has multiple levels of caches - 2 or 3 levels.&amp;nbsp; The highest level may be shared among several cores.&amp;nbsp; The output of lmbench shows plateaus for each level of cache, showing both the latency to the cache and also the size of the cache.&amp;nbsp; If the system has NUMA memory organization, then lmbench shows the latency for each of the NUMA &amp;quot;hops&amp;quot;, as the data traverses the system topology.&lt;/p&gt;
&lt;p&gt;Usually, the latency of each cache level is a fixed number of processor clock cycles.&amp;nbsp; It is both interesting and important to know this number.&amp;nbsp; It&amp;#39;s interesting, because it allows you to compare different cache architectures, even if the systems you are comparing have different clock speeds.&amp;nbsp; For example, it is interesting to me that the 1&lt;sup&gt;st&lt;/sup&gt; level cache latency of several modern servers, with very different architectures, is 3 cycles.&lt;/p&gt;
&lt;p&gt;And it is important, because you are not always sure what clock speed your system is running, and you can compute it using lmbench.&amp;nbsp; Or, you might encounter the problem we hit yesterday - on a 2-processor pre-production server, the two processors were running at 2 different clock speeds!&amp;nbsp; This is of course not good - a user observed strange performance, but lmbench took the mystery out of the problem and told us exactly what was happening.&lt;/p&gt;
&lt;p&gt;You can run lmbench in different ways, and each provides additional understanding.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;The &amp;quot;stride&amp;quot; method runs sequentially through memory addresses and provides best-case latency.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;The &amp;quot;random&amp;quot; method accesses memory addresses randomly, showing the worst-case latency.&amp;nbsp; It is very useful to know the best-case and worst-case latencies - if their ratio is small, then memory performance is somewhat predictable.&amp;nbsp; If the ratio is large, prediction becomes difficult.&lt;/p&gt;
&lt;p&gt;If you set lmbench to access memory in units of the VM page size, then you can observe the impact of TLB misses on latency.&lt;/p&gt;
&lt;p&gt;And, if you run lmbench simultaneously on cores which share a cache, you learn about the behavior of the shared cache.&lt;/p&gt;
&lt;p&gt;Memory latency is a great tool if you need to unravel the architecture of a new server!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86159" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/scale-out/default.aspx">scale-out</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item><item><title>I received a new HPC Multi-core server today - Initial test, part 2</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/09/i-received-a-new-hpc-multi-core-server-today-initial-test-part-2.aspx</link><pubDate>Thu, 09 Oct 2008 18:36:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:86083</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=86083</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/09/i-received-a-new-hpc-multi-core-server-today-initial-test-part-2.aspx#comments</comments><description>&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;Today&amp;#39;s work needed to be done before we can do any code performance testing:&amp;nbsp;&amp;nbsp; First, we configured the memory.&amp;nbsp; I received half as many memory DIMMs as the system can hold, and I searched for the person who could tell me which memory slots to use for best performance.&amp;nbsp; After this step was completed, we tackled the firmware.&amp;nbsp; Firmware contains optional settings which impact performance.&amp;nbsp; We first determined the optional settings that are available for this server.&amp;nbsp; We looked for options like NUMA/interleave, snoop filter, multi-threading, adjacent-sector prefetch, hardware prefetch, etc.&lt;/p&gt;
&lt;p&gt;Eventually, we will test the performance&amp;nbsp;of all values of each of these settings.&amp;nbsp; For most of the options, we will get lucky, and one value&amp;nbsp;will be&amp;nbsp;definitely better than all others.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;But for some options, the best setting of the option depends on the applications being run.&amp;nbsp; Memory organization is a good example.&amp;nbsp; For a system with uniform memory architecture (UMA), the time to obtain a cache line from memory is roughly independent of which core accessed it.&amp;nbsp;&amp;nbsp; There are no settings needed for this organization.&lt;/p&gt;
&lt;p&gt;For a NUMA system, the memory is &amp;quot;closer&amp;quot; to the cores in one processor than in others.&lt;/p&gt;
&lt;p&gt;Some applications run better in NUMA mode, and others in UMA mode.&lt;/p&gt;
&lt;p&gt;In some systems with NUMA memory organizations, a firmware option exists to run the system in&amp;nbsp;both ways - use the NUMA organization, or interleave the memory among all processors to create a roughly uniform memory organization.&amp;nbsp; When this option exists, we run many applications in both ways.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/p&gt;
&lt;p&gt;We also needed to determine the way that cores are numbered on each processor. Core numbering is not always consistent from one model to the next, and to measure performance on a subset of the cores in a server, it&amp;#39;s necessary to know which cores you are using - which cores are on which processor; which cores share a cache.&lt;/p&gt;
&lt;p&gt;Having completed this work and established a baseline of configurations and settings, we can start to do some performance measurements.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=86083" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item><item><title>I received a new HPC Multi-core server today - Initial performance testing</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/01/i-received-a-new-hpc-multi-core-server-today-initial-performance-testing.aspx</link><pubDate>Wed, 01 Oct 2008 23:15:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:85990</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=85990</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/10/01/i-received-a-new-hpc-multi-core-server-today-initial-performance-testing.aspx#comments</comments><description>&lt;p&gt;After the operating system boots on a new server model, it&amp;#39;s time to start performance testing.&amp;nbsp; I&amp;#39;m hoping to get some comments on this, since there are many different ways to proceed. &lt;/p&gt;
&lt;p&gt;Personally, I am not a fan of industry standard benchmarks, but I think they are the best starting point for new product testing.&amp;nbsp; For an HPC server, I want to check the basics, to ensure that the system meets its design goals - memory BW, memory/cache latency, 64-bit floating point math, and filesystem IO are the first measurements.&amp;nbsp; We use STREAM, lmbench, LINPACK, and IOZONE standard benchmarks. &lt;/p&gt;
&lt;p&gt;We compare the measurements to older models and to the goals of the new product.&amp;nbsp; Since our group didn&amp;#39;t design the product, we don&amp;#39;t know if there are interesting but undocumented features which enhance or limit performance.&amp;nbsp; I have a name for the process of performance testing of a new product - discovery engineering.&amp;nbsp; We study the external behavior of a system and try to understand the design features which affect HPC performance.&lt;/p&gt;
&lt;p&gt;Each of these standard benchmarks provides information about the server.&amp;nbsp; In numerous cases, this information has solved performance mysteries in real codes.&amp;nbsp; It is hard to solve a performance problem with a large, complex application.&amp;nbsp; The standard benchmarks are a simpler starting point for problem-solving.&lt;/p&gt;
&lt;p&gt;But we don&amp;#39;t rely on industry standard benchmarks.&amp;nbsp; By design, each one tests a subset of a server&amp;#39;s performance characteristics.&amp;nbsp; I&amp;#39;ve had little success predicting the performance of HPC ISV applications based on the standard benchmark results, since real applications require a balance of all the performance features in the system.&amp;nbsp; &lt;/p&gt;
&lt;p&gt;After using the standard benchmarks to assure ourselves that the system is running correctly, we can move on - measure performance on real ISV applications, and experimenting with multi-core processor configurations!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=85990" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/BladeSystem/default.aspx">BladeSystem</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item><item><title>I received a new HPC Multi-core server today!</title><link>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/09/23/i-received-a-new-hpc-multi-core-server-today.aspx</link><pubDate>Tue, 23 Sep 2008 19:16:00 GMT</pubDate><guid isPermaLink="false">964d1d0f-bea0-4201-a2aa-8aa369a35a46:84868</guid><dc:creator>d-field</dc:creator><slash:comments>0</slash:comments><wfw:commentRss xmlns:wfw="http://wellformedweb.org/CommentAPI/">http://www.communities.hp.com/online/blogs/reality-check-server-insights/rsscomments.aspx?PostID=84868</wfw:commentRss><comments>http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/2008/09/23/i-received-a-new-hpc-multi-core-server-today.aspx#comments</comments><description>&lt;p&gt;&amp;nbsp;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;About this blog series&lt;/i&gt; - This is the 1&lt;sup&gt;st&lt;/sup&gt; posting of a series which describes the experiences of engineers who test the performance of HPC servers and server clusters at HP.&lt;/p&gt;
&lt;p&gt;My name is Dave Field.&amp;nbsp; I lead an engineering group at HP - we measure the performance of new HP servers.&amp;nbsp; In addition to the common industry-standard benchmarks, we concentrate on the performance of real HPC ISV applications.&amp;nbsp; In the 20+ years we have done this work, we have seen many server architectures.&amp;nbsp; These days, HPC clusters of servers using multi-core processors occupy most of our energy.&lt;/p&gt;
&lt;p&gt;We evaluate the performance of new server products, so receiving a new server model is a common occurrence.&amp;nbsp; This has been an especially rich year for new products - this is the 14&lt;sup&gt;th&lt;/sup&gt; new HP server we&amp;#39;ve tested this year, with at least one more to go before the year is over.&amp;nbsp; HP servers for HPC span the range of industry-standard processors - Intel Xeon and Itanium2 and AMD Opteron.&amp;nbsp; &amp;nbsp;&amp;nbsp;(In HP terminology, the processor is the physical component which plugs into the system board.&amp;nbsp; A processor contains one or more cores, or CPUs.)&amp;nbsp; And for each processor type, there are specific models with different architectural features.&lt;/p&gt;
&lt;p&gt;Since we test pre-production, or prototype, computers, it&amp;#39;s not quite true that I received a server - we usually receive new product kits.&amp;nbsp; Testing new products can be very interesting, but to get to the interesting part, there are inevitably a number of problems to solve.&amp;nbsp; We need to turn the kit into a working computer, then ensure that the performance meets the product specs, before we can do meaningful performance evaluation.&amp;nbsp; These initial steps are lessons in patience and expectation-setting, during which I always meet some new people who will help in problem-solving.&lt;/p&gt;
&lt;p&gt;The new server kit usually contains the server enclosure, system board, and processors.&amp;nbsp; To turn the kit into a computer, we need to obtain three layers of stuff - supporting hardware (the right DIMMs, network interfaces, and disks), firmware, and operating system.&amp;nbsp;&amp;nbsp; &lt;/p&gt;
&lt;p&gt;Firmware is in flux during the pre-production period, and each version of pre-production firmware changes the server&amp;#39;s performance.&amp;nbsp; Usually the processors are pre-production versions, tied to specific firmware revs.&amp;nbsp; Most of the performance data collected on these early versions will be discarded.&amp;nbsp; But if we don&amp;#39;t get some measurements now, we can&amp;#39;t influence the product.&amp;nbsp; Sometimes we identify performance issues which can be fixed before production release - so this is a very satisfying part of the job.&lt;/p&gt;
&lt;p&gt;These days, the current versions of the major Linux distributions work out-of-the-box on new server models.&lt;/p&gt;
&lt;p&gt;When the operating system boots, we can begin to measure performance!&lt;/p&gt;
&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;&lt;img src="http://www.communities.hp.com/online/aggbug.aspx?PostID=84868" width="1" height="1"&gt;</description><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/HPC/default.aspx">HPC</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/servers/default.aspx">servers</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/BladeSystem/default.aspx">BladeSystem</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/ProLiant/default.aspx">ProLiant</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/multi-core+processor/default.aspx">multi-core processor</category><category domain="http://www.communities.hp.com/online/blogs/reality-check-server-insights/archive/tags/performance+measurement/default.aspx">performance measurement</category></item></channel></rss>