Programming Windows Hpc Server – Using SOA Model


Conventionally, HPC/Parallel problems can be roughly divided into the following
two categories[1][2]:

- Data
Parallel
, these applications divides the input data into a number of
completely independent parts. The same computation is undertaken on each part.
And some kind of post processing after the computations is needed.

- Task Parallel, these are those jobs that its
functionality can be divided into many small tasks, each of which can be
executed on one CPU core. These tasks may need to communication or not all
at.

There is another special kind of parallel problems(orthogonal to
data/task parallel):
- Embarrassingly
Parallel
, for these applications, little or no effort is required to
separate the problem into a number of small tasks that runs on one CPU core. No
or very lightweight post processing is needed. (no/little cooperation among task
and post processing)

Windows
Hpc SOA Programming Overview (from Microsoft)


On
Windows Hpc Server 2008, programming model for Embarrassingly
Parallel(especially web based) applications is referenced as SOA Model, in which, client send requests to
service broker, and service broker forward these requests to service instances.
Service instance never talks to each other, and communication among these
components are all service oriented(more specifically, WCF based).

[SOA
Programming Model Workflow, from Microsoft]


[3]
is a detailed documentation on this topic, but in this article, I will demo a
SOA based PI(3.1415926535...) value calculation application using Monte Carlo method[6]
in a real Windows Hpc Cluster.

When I say "real windows hpc cluster", I
mean:
1. The cluster has multiple nodes(6 nodes: 1 head, 1 broker, 4
compute).
2. This cluster has dedicated AD/Network, which is totally
different from client/dev machines and network env.
3. You use some server
called "Boundary Server" to access the Hpc Cluster. The boundary server has NICs
to connect with both cluster private network and corp/enterprise network.
4.
You dev/debug your application on boundary server, not on cluster head node.(let
head node focus on job requests serving)
5. Your corp network domain account
is different from Hpc Cluster private network domain account.
6. In the
following sections, I assume the cluster environment is already correctly set
up.
[These environment assumptions are more complex than those in [3], but
they are more similar to real production env.]

The Monte Carlo PI value
calculation contains two parts - the server part and client part.

The server part is a pure WCF service
- you
define interface/contract and implement the interface. The core logic is listed
below:
1. it's a .net/c# class library project, and it is a typical WCF
service application.
2. use local machine IP and current date time to hash
out a random seed number. This will make the whole process more random.
3.
use .NET build-in random generator to generate a serial random number to do many
independent Monte Carlo experiments. The idea is generating a ranged random
point and see whether it is located within a circular area with some fixed
diameter.

PiCalcServer Core Logic
1 public PiCalcResult Calc(UInt64
scale)
2 {
3 // use
system time and machine ip to hash out the seed for random number
generation

4 long ticks = DateTime.Now.Ticks;
5 IPHostEntry host =
Dns.GetHostEntry(Dns.GetHostName());
6
ticks += host.AddressList[0].GetAddressBytes()[0];
7
ticks += host.AddressList[0].GetAddressBytes()[1] * 256;
8
ticks += host.AddressList[0].GetAddressBytes()[2] * 256 * 256;
9
ticks += host.AddressList[0].GetAddressBytes()[3] * 256 * 256 * 256;
10 Random rand = new Random((int)ticks);
11
12 //
result init

13 PiCalcResult
calcResult = new
PiCalcResult();
14
calcResult.InCount = 0;
15 calcResult.OutCount = 0;
16

17 // do Monte Carlo exercise
18 Int32 x = 0, y = 0;
19
for (UInt64 i = 0; i < scale; ++i)
20 {
21 x = rand.Next(RAND_RANGE_MIN,
RAND_RANGE_MAX);
22 y =
rand.Next(RAND_RANGE_MIN, RAND_RANGE_MAX);
23
24 UInt64 d =
(UInt64)Math.Round(Math.Sqrt((double)x * (double)x + (double)y * (double)y));
25 if (d <= RAND_RANGE_MAX)
26 {
27 calcResult.InCount++;
28 }
29 else
30 {
31 calcResult.OutCount++;
32 }
33 }
34
35 return calcResult;
36 }



After the WCF service
implementation, you should deploy it to the Windows Hpc Cluster. This includes
two steps:
1. Compose a service configuration file.(The PiCalcService.config
file is contained in the source code package,
see [3] for detailed fields explanation)

<microsoft.Hpc.Session.ServiceRegistration>
<service
assembly="%CCP_HOME%App\PiCalcServer.dll"
contract="PiCalcServer.IPiCalcServer"
type="PiCalcServer.PiCalcServer"
architecture="x86">
<environmentVariables>
<add
name="PATH"
value="%MY_SERVICES_HOME%Bin"/>
</environmentVariables>
</service>
</microsoft.Hpc.Session.ServiceRegistration>


2.
Copy Bin/Conf files to each compute node

clusrun xcopy /y \\FileServer\PrjDir\PiCalcServer.dll
"c:\Program Files\Microsoft HPC Pack\App"
clusrun
xcopy /y \\FileServer\PrjDir\PiCalcServer.Config "c:\Program Files\Microsoft HPC
Pack\ServiceRegistration"


To see whether the deployment is
successful:
1. go to StartMenu -> Hpc Pack -> Hpc Cluster Manager ->
Diagnostics -> Tests -> SOA -> SOA Service Configuration Report and run
this test.
2. Diagnostics -> Test Results. It will list the detailed
results of test in step 1. If your deployment is successful, the report will
tell you the service name/bin/interface/implementation and target
arch.

The other part of the solution is
the client application.

- It's both WCF client application and Hpc
cluster application:
1. It's a normal .Net/C# console/winform application,
which will call remote WCF service and Hpc scheduler service.
2. As normal
WCF client application, you should use svcutil tool to generate the wcf client
proxy class(async style is used here) and add it to your client application
project.

svcutil
PiCalcServer.dll
svcutil
*.wsdl *.xsd /async /language:C# /out:PiCalcServerProxy.cs


3. When
developing SOA application, you should create session with Hpc cluster, get the
SOA broker service endpoint from the session and call WCF service from this
endpoint.
4. Your whole client logic looks like: create session, divide
computation task, send requests, collect the partial results from various
sub-tasks and compute the final result.

PiCalcClient Core Logic
1 //
2
// Create a session object that
specifies the head node to which to connect and the name of

3 // the
WCF service to use.

4 //
5
SessionStartInfo ssInfo = new
SessionStartInfo(schedulerHost, serviceName);
6 ssInfo.Username =
clusterHeadUser;
7 ssInfo.Password
= clusterHeadPassword;
8
ssInfo.ResourceUnitType =
Microsoft.Hpc.Scheduler.Properties.JobUnitType.Core;
9 ssInfo.MinimumUnits = 2;
10
ssInfo.MaximumUnits = 1000;
11
12 Console.WriteLine("Creating a session ...");
13 using (Session session =
Session.CreateSession(ssInfo))
14
{
15
Console.WriteLine("Session creation
done!"
);
16
Console.WriteLine("Session's Endpoint
Reference:{0}"
, session.EndpointReference.ToString());
17 int nodes =
session.ServiceJob.AllocatedNodes.Count;
18
19 //
20
// Binds session to the client proxy
using NetTcp binding (specify only NetTcp binding). The

21 //
security mode must be Transport and you cannot enable reliable sessions.

22 //
23
System.ServiceModel.Channels.Binding myTcpBinding = new NetTcpBinding(SecurityMode.Transport,
false);
24 myTcpBinding.ReceiveTimeout =
maxTimeOut;
25
myTcpBinding.SendTimeout = maxTimeOut;
26 PiCalcServerClient calcServerClient =
new PiCalcServerClient(myTcpBinding,
session.EndpointReference);
27
calcServerClient.ClientCredentials.Windows.ClientCredential.UserName =
wcfClientUser;
28
calcServerClient.ClientCredentials.Windows.ClientCredential.Password =
wcfClientPassword;
29
30 //
31
// There is no way to get the accurate
allocated core count, just assume each node has avgCoresPerNode
cores.

32 //
33
timeBegin = DateTime.Now;
34
int taskCount =
session.ServiceJob.AllocatedNodes.Count * avgCoresPerNode;
35 asyncCalcCount = taskCount;
36 for (int i = 0; i < taskCount; i++)
37 {
38 UInt64 scale = totalScale /
(UInt64)taskCount;
39
calcServerClient.BeginCalc(scale, AsyncCalcCallback, new CalcReqContext(calcServerClient,
i));
40 }
41 asyncCalcDone.WaitOne();
42 Console.WriteLine("All sub tasks done!");
43 timeEnd = DateTime.Now;
44
45 calcServerClient.Close();
46 Console.WriteLine("========================================");
47 Console.WriteLine("totalIn:{0}, totalOut:{1}", totalIn,
totalOut);
48
Console.WriteLine("the mc pi
value:{0}"
, (totalIn + 0.0) /
(totalScale + 0.0) * 4);
49
Console.WriteLine("the total time
used:{0}"
, (timeEnd.Value.ToFileTime() - timeBegin.Value.ToFileTime()) /
(10 * 1000 * 1000));
50 Console.WriteLine("Please enter any key to
continue..."
);
51
Console.ReadLine();
52
}


NOTE:
1. this is just the core logic, for full code,
see the source code package.
2. the serviceName is defined as the service
configuration file name without the .config extension. schedulerHost is defined
as the machine name of the head node of the Hpc cluster.
3.
clusterHeadUser/clusterHeadPassword is used to login to head node to submit
jobs, while wcfClientUser/wcfClientPassword is used to login to compute node to
access WCF services, both of them should be explicitly set in a real cluster
environment. The two account are usually the same in most cluster environments,
but not the same as your domain account that is used to login corp
network.
4. if "Can't find file - Microsoft.Hpc.Sheduler.Store.dll" exception
raised when running the client, install Windows HPC Pack Client Utilities. Only
Windows HPC Pack SDK is not enough for developing/running Hpc
applications.
5. it takes some time(about 1 minute in my env) to establish
session with Hpc cluster.
6. you can see the job status, node head map etc in
Hpc Cluster Manager while the application is running. The cluster manager is
also helpful for investigation when error encountered.

Typical Client
Application Console Output

Creating
a session ...
Session creation done!
Session's Endpoint
Reference:net.tcp://dit840-013:9087/broker/206
Sub Task[3] Done =
In:263541326, Out:72002994
Sub Task[2] Done = In:263543455,
Out:72000865
......
......
Sub Task[10] Done = In:263544094,
Out:72000226
Sub Task[15] Done = In:263534741, Out:72009579
All sub tasks
done!
============================================================
totalIn:4216696722,
totalOut:1152012398
the mc pi value:3.14168387800455
the total time
used:1244
Please enter any key to continue...


You can increase the
exercise count to get more precise PI value, but it will consume more
time.

full source code download
http://code4cs.googlecode.com/files/McPiCalc.zip

Some Personal Observations:
1. Windows Hpc
Cluster provides convenient management tools and utilities, which makes
deploying/managing middle-level(several hundreds of nodes) of computing cluster
very easy.
2. Windows SOA programming model greatly simplified the
development process of some specific kind of Hpc applications.
3. Windows Hpc
build-in security feature add some complexity of the develop/deploy process and
potential performance downgrade occurs if large amount of data movement happens
among nodes. But these overhead results very little gains - dose security
problem really matter in a private computing cluster?
4. Hpc SOA programming
model is very similar to so called "web server farm" architecture. But as a
general programming platform, the head/broker node fail-over problems are not
solved in a very elegant and scalable way.
5. Windows Hpc scheduler is too
general purpose, too centralized, which making the session creation very very
time-consuming. This means that it takes SOA application much time to do init
work.
6. Although it is called "SOA" and it uses popular "WCF" technology,
the Hpc SOA architecture is completely not suitable for web
applications(especially for scaling purpose). Microsoft describes the target
scenario as "interactive application", which mainly includes Monte Carlo
Problems, Ray Tracing, Excel Calculation Add-in and BLAST
Searches.

[Reference]
[1]http://www.cs.mu.oz.au/498/notes/node39.html
[2]http://computing.llnl.gov/tutorials/parallel_comp/#Models
[3]Microsoft
Official SOA doc

[4]submit
jobs to head node in another AD

[5]Call
WCF services hosted on other nodes with specific client credentials(domain
username/password)

[6]Monte Carlo
Method

[8] From Sequential to Parallel Code Using Windows Hpc (Doc,
Code)

Skip to main content