Wednesday, 13 June 2012

Turning the Cloud into a Supercomputer: Windows Azure and High Performance Compute goes mass market

Introduction

When you’ve been a proponent of the Microsoft Technology stack for many years you tend to see a lot of technologies come and go. Some stick around, evolve and end up changing the nature of enterprise; others fall by the wayside as the market decides that it doesn’t need it. Eleven years ago one of us (Richard Conway) spoke at the PDC 2001 on .NET My Services, which was by all accounts ahead of its time but didn’t offer the open flexibility we’ve come to expect through evolving web standards. That’s because at the time the standards didn’t exist. Nowadays, it’s a brave new world and we have the luxury of application building blocks such as WCF, which embody the evolved standards.
Windows Azure is a PaaS (Platform as a Service). The word Windows is attached here and is highly relevant as this is a next generation of Windows; offering an alternative programming and deployment model for applications. It’s not a one size fits all and there are many hurdles left before the “cloud” becomes mainstream. Fears and suspicions have to be allayed and this will only come with time and a good record of performance and diverse use cases.
This article is about one aspect of Windows Azure which is not fully understood and in our view changes the nature of how companies and individuals will do business. As hosting costs fall and scalability becomes achievable to all new market entrants, intellect and time to market are just as likely to create the next Facebook as capital. This article discusses, with examples, how the cloud can be turned into a vast calculation matrix by applying parallel computational capabilities through the provision of on demand resources.
Elastacloud has spent the past year working on building applications for the Windows Azure HPC Scheduler SDK. We’ve presented on the topic the country over both virtually and in person and we’ll be presenting some more at an upcoming conference we’re running - sponsored Microsoft and others - on the 22nd June. HPC traditionally is a high value infrastructure product and Microsoft’s offering has been to enable HPC and condition clusters across an enterprise. When the HPC team began their analysis of the cloud they made it an extension of an on-premise model with the idea of “bursting to the Cloud”. Within the last 12 months HPC has become a wholly cloud deployable product which enables services and various types of applications across a range of business lines to leverage the sheer power and elasticity of the cloud to perform very mathematically intensive operations across a large number of nodes without the need for any deployment. The upshot is that HPC is falling within the purview of developers now as opposed to infrastructure. Therefore, here are a few lessons to get you started!

Simple Architectural Patterns

HPC is used for mathematically intensive workloads across various lines of businesses. For example, Monte Carlo simulations are a perfect candidate for an HPC workload. These are “embarrassingly parallel” applications which can be spread over multiple compute nodes such that sometimes billions of calculations can be executed and the results aggregated when they’re complete. Other compute intensive workloads which lend themselves well to HPC are frame rendering, encoding, transcoding, compressing, encrypting, statistical operations, engineering research, medical checks, image and video pattern matching to name but a few.
clip_image002
Fig 1 Using a Parametric Sweep
The simplest pattern we can do is also the most powerful. This is called a parametric sweep and will allow us to prepare our data in advance and use this data to build a meaningful workload. This could be a pricing calculation. The next iteration will grab the next available unlocked record in the database or storage and execute. The results could be left in the database or another database and a client can connect to this asynchronously. Whilst this is both quick and powerful it’s not very flexible.
clip_image004
Fig 2 - Makeup of a Parametric Sweep application
The good news is that HPC team has enabled SOA in their Azure implementation! Developers can make familiar use of WCF for embarrassingly parallel workloads by creating Services and registering them with the HPC runtime. HPC will host the WCF service and forward all calls through its host process so essentially developers can build WCF services with well published interfaces and access them through asynchronous proxies that can be created using svcutil or via Visual Studio.
Fig 3 shows the makeup of a SOA application in HPC which is a lot more flexible in its approach to using an application developer centric model and parallel-enabling this. The most important differences involve the asynchronous messaging between client and service since the client calls methods on the service proxy and gets results back when each individual task has been completed. This means that data can be returned to the client rather than the client having to fetch and index it.
clip_image006
Fig 3 – Makeup of a SOA application
Another type of application is an MPI application. This is beyond the scope of this article suffice it say that these types of applications put the “High Performance” in HPC. In principle they are written in C and can provide for a much richer execution experience using the MS-MPI runtime. These applications function best on reserved hardware where cores have access to shared memory and are located on the same switch which currently Windows Azure does not guarantee.

Building an HPC application

An HPC “cluster” is made up of the following:
  • Head node
  • Compute nodes
  • Web front end
  • Broker nodes
The “head node” receives all input and external requests and executes applications that are deployed on a compute node. It “schedules” “jobs” which are comprised of many “tasks” across the cluster. It is aware of the cluster’s behaviour at all times through various monitoring mechanisms to allow it to work out which cluster compute nodes it should be scheduling to.
The web front end is used to view the job scheduling and determine whether jobs have succeeded or failed. It is a secure ASP.NET web forms application which shows all relevant information about the cluster and the jobs its running. For newcomers to HPC this is the best insight into how the cluster functions and the information it takes a run a job. For example, when a SOA application executes it simply loads a certain type of job template with the required environment variables so it’s fair to say at its fundamental level though it is still a job and a set of underlying tasks.
clip_image008
Fig 4 – Showing the “physical” makeup of a cluster
In order to bring an illustration of how powerful, yet easy building a SOA for HPC is we’ll look at a modified example form the Windows Azure Platform Training Kit.
To begin ensure you have the latest version of the Windows Azure SDK installed which includes the Windows Azure HPC Scheduler SDK.
To begin with we should create a new WCF Project called DivisibleByTwoService and add the following interface:
[ServiceContract]
public interfaceIDivisibleByTwoService
{
  [OperationContract]
  int Mod2(int num);
}
Followed by this implementation:

[ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple)]
public class DivisibleByTwoService : IDivisibleByTwoService
public class DivisibleByTwoService : IDivisibleByTwoService
{
            public int Mod2(int num)
            {
                  return num % 2;
             }
}
In order to make HPC aware of the service we need to create some very specific configuration and call the configuration file in our project DivisibleByTwo.config.
The ServiceModel binding we won’t look at but we’ll look at the configuration of a broker here:
         <microsoft.Hpc.Broker>
                 <monitor messageThrottleStartThreshold="4096"
             messageThrottleStopThreshold="3072"
             loadSamplingInterval="1000"
             allocationAdjustInterval="30000"
             clientIdleTimeout="300000"
             sessionIdleTimeout="300000"
             statusUpdateInterval="15000"
             clientBrokerHeartbeatInterval="20000"
             clientBrokerHeartbeatRetryCount="3" />
                 <services>
                          <brokerServiceAddresses>
                            <add baseAddress="net.tcp://localhost:9091/Broker"/>
                            <add baseAddress="http://localhost/Broker"/>
                            <add baseAddress="https://localhost/Broker"/>
                          </brokerServiceAddresses>
                 </services>
                 <loadBalancing messageResendLimit="3"
                   serviceRequestPrefetchCount="1"
                   serviceOperationTimeout="86400000"
                   endpointNotFoundRetryPeriod="300000"/>
         </microsoft.Hpc.Broker>
Broker nodes allow requests to be forward to the service. It should be evident that there are three types of transport NetTcp, Http and Https and the bindings that we haven’t shown here reflect these.

Consuming an application

Once all of this is ready we should create a new console application to consume our new service. We can add a Service Reference to our new project so that a service proxy is created for us.
Once this is done we can just reference the proxy classes in code.
Now we can write some code to consume our service.
Firstly we need to add two service references to Microsoft.Hpc.Scheduler and Microsoft.Hpc.Scheduler.Session and add the respective using statements.
Our client will send 100 random number requests then our service will divide each one by two and send back the modulus.
To begin we need to create a Session by referencing the endpoint address of the head node in the cloud and the service name. We need to supply a username/password which we’ll set when we deploy the cluster and use the WebAPI transport scheme which sends batched requests to the head node via SOAP and XML.
SessionStartInfo info = new SessionStartInfo(“myhpcazureservice.cloudapp.net”, “DivideByTwoService”);
info.TransportScheme = TransportScheme.WebAPI;
info.Username = "elastacloud";
info.Password = "AC@mplexPass3";
 
The CreateSession method will be used to set up a session for sending requests to the cluster.
 
using (Session session = Session.CreateSession(info))
{
        AutoResetEvent done = new AutoResetEvent(false);
         using (BrokerClient<IDivisibleByTwoService> client = new BrokerClient< IDivisibleByTwoService>(session))
         {
                    client.SetResponseHandler<DivisibleByTwoResponse>((response) =>
                    {
                        try
                        {
                            int ud = response.GetUserData<int>();
                            int reply = response.Result.Mod2Result;
                        }
                        catch (SessionException ex)
                        {
                            Console.WriteLine("SessionException while getting responses in callback: {0}", ex.Message);
                        }
 
                        if (Interlocked.Increment(ref count) == 100)
                        {
                            done.Set();
                        }
                    });
 
                    
                    for (int i = 0; i < 100; i++)
                    {
                        client.SendRequest<Mod2Request>(new Mod2Request(i), i);
                    }
 
                    client.EndRequests();
                    done.WaitOne();
                }
         }
         session.Close();
For simple reading most of the logging has been stripped out of the code. The asynchronous WCF client code should be self-explanatory. Looking at the for loop we can see 100 requests being sent by the client but not actually sent on the wire until EndRequests is called. When all of the responses have been received in the handler the WaitHandle is released and execution continues and the Session is closed. The two methods which need special attention are the GetUserData<> which will return the iteration number for the request (or any state data passed in as the second parameter to the SendRequests method) and the Result.Mod2Result property which will return the integer result from the service.

Deploying an Application

The Windows HPC Scheduler SDK provides a simple developer tool called AppConfigure which can be used to deploy the packaged HPC software.
clip_image009
Fig 5 – A typical HPC deployment
The Windows Azure HPC Scheduler SDK comes with the correct .cscfg and .csdef configuration for the various types of nodes and an ASP.Net website which is referenced using https://{mydomain}.cloudapp.net/portal which needs the required login credentials to proceed.
AppConfigure generates a management certificate which can be uploaded to a subscription to perform the deployment. It will create a storage account and database which the scheduler will use to manage and monitor all activity on the deployed cluster.
The deployment for HPC is fairly complicated and so takes a reasonable amount of time to complete. You should expect to wait up to 40 mins. The underlying mechanism is simply through the use of the Service Management API as well as several executable files supplied within the Scheduler SDK which do things such as populate databases and Table Storage.
clip_image010
Fig 6 – The AppConfigure application

Summary

The Windows Azure HPC Scheduler represents a shift in the emphasis of the value proposition of the cloud. No longer can the cloud be simply thought of in the context of the deployment of applications but also in the building of parallel applications which span multiple nodes to execute tasks in record time.

About Elastacloud

Elastacloud are a wholly Windows Azure consultancy. They have years of experience migrating applications to Windows Azure including open source, heavy messaging and other stubborn applications. For the last year they have been dedicated to HPC on Azure and have released their first HPC product, the Big Compute Marketplace, to a limited audience. Contact them at @elastacloud and http://www.elastacloud.com/. They are founders of the UK Windows Azure User Group and are running a Microsoft-sponsored conference on the 22nd June with Scott Guthrie as a keynote speaker.
178743d[1]16ea31d[1]Richard Conway and Andy Cross
Elastacloud
http://blog.elastacloud.com/
Richard is Director of Elastacloud, a Microsoft partner providing cloud consultancy and product development for Windows Azure and HPC Server and Co-Founder of the UK Windows Azure Users Group. Andy is Head of Product Development at Elastacloud and co-Founder of the UK Windows Azure User Group.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.