Distributed Computation using ADE


Uses features new to Analytica 4.6. Requires Analytica Enterprise and ADE running on each server.


This application illustrates how to run a Monte Carlo simulation in parallel, using multiple cores on the same computer, and multiple server computers. The example achieves a 10-fold speed up, using 16 ADE instances on 2 computers with 8 cores each. It uses the COM Integration feature (introduced in Analytica 4.6) with a master instance running Analytica Enterprise calling multiple instances of ADE (Analytica Decision Engine), evaluating the same model on multiple cores on multiple computers. We applied this to a large model developed for and used by the North West Power Council for planning future electric grid capacity.

Distributed-computing-speedup.png

This graph shows the elapsed time required to complete the full Monte Carlo simulation as a function of the number of parallel ADE instances performing the simulation.

This page describes how this is done. It provides an real-life example of how to call COM objects from Analytica. You may find this useful even if you are using COM to do something much different than this, or you might be motivated to use ADE for parallel simulation.

Hardware and Licenses needed

This project uses:

  • Two high-end computers running Windows Server 2012. Each server has 8 cores and more than enough RAM to run 8 instances of the model. (The model itself evaluates in less than 1GB of RAM, and each server has 16GB).
    Note: Any edition of Windows except for Windows 7+ Home edition should work here.
  • A high-speed network connecting the two computers.
  • One edition of Analytica Enterprise or better, release 4.6 or later.
  • Two ADE licenses, one for each computer.

Distributing the computation

The master model runs in Analytica. The primary output of the model is computed by Monte Carlo simulation, usually with a sample size of 750. When run sequentially (in a single instance of Analytica), the computation takes about 50 seconds to complete. The model repeats this calculation repeatedly inside an optimization, which typically runs for 8 hours.

For distributed computation, the Master model calls COMCreateObject to instantiate up to 8 instances of ADE on each computer. Each instance loads the model, synchronizes the inputs, and each instance runs the Monte Carlo simulation using Ceil(750/N) samples, where N is the number of ADE instances. The ADE instances perform this computation concurrently. The samples for the final results are copied back to the master model and concatenated to get the full 750 samples.

The remainder of this page takes you through the steps and Analytica code in detail.

Specifying how the computation is distributed

To begin, we decide how many computers and how many cores the computation will be distributed over, and how that will be done, and encode that information in the model. This logic is simply deciding how the calculation shall be divided, but not using COM yet.

An index is set up with the names of the computers where ADE can be run:

Index Host_name := ['localhost', 'NWDistComp2']

The model was arranged so that the user can select how many parallel ADE instances to use. The first 8 instances are "dispatched" to the first host, the next 8 to the second host, etc. The code for deciding how many instances to assign to each computer is as follows:

Variable Cores_to_use := 16 { A user input }
Variable Max_cores_per_host := Table(Host_name)(8, 8)
Variable Num_ADE_Clients := Dispatch(Cores_to_use, Max_cores_per_host, Host_name)

When evaluated, Num_ADE_Clients is an array indexed by Host_name with a number showing how many ADE instances will run on each computer. With Cores_to_use set to 16, this will be 8 and 8. When you set Cores_to_use to 10, Num_ADE_Clients is 8 and 2.

This next index will index the ADE instances after we instantiate them.

Index ADE_instance := 1..Cores_to_use

And this array specifies which computer each instance of ADE will run on:

Variable Host_of_ADE_instance := StepInterp(Cumulate(Number_of_ADE_client, Distributed_Host_Nam), Distributed_Host_Nam, ADE_Instance, Distributed_Host_Nam)

Host_of_ADE_instance in an array indexed by ADE_instance where the first 8 cells are 'localhost' and the second eight cells are 'NWDistComp2'. So if you want to know where the 7th instance of ADE is running, you can look it up directly in this array.

The next variable specifies how many Monte Carlo samples each instance of ADE should run.

Variable Client_sample_size := Ceil(Size(Run)/Cores_to_use)

The master model uses the full sample size (e.g., 750), whereas each ADE client will use a smaller sample size. The master model uses the built-in Run index to store the full sample, but as we pull the data in from each client, the DC_Run index corresponds to the shorter index used by each client.

Index DC_Run := 1..Client_sample_size

The following mapping from each client sample into the full run index makes it easy to transform between the master and clients:

Variable Run_to_ADE_map := Min([sampleSize, (ADE_instance-1)*Client_sample_size + DC_Run])
Run to ade map.png

The result of Run_to_ADE_map shown here tells us, for example, that the 332nd sample in the full result will be the 3rd sample computed by the 8th ADE instance.

Instantiating the ADE clients

Up to this point, none of the code from the previous section has actually made use of COM -- we've simply defined how we intent to break the full computation into pieces, and those pieces map back to the full computation. In this section, we instantiate the ADE clients.

The objective here is to have a variable named ADE that holds an array of the ADE clients, indexed by ADE_instance. Setting this up involves quite a bit more than just a call to COMCreateObject, since we also want to have each instance load the model, set its own sample size and other simulation parameters, and synchronize the values of inputs. There are some important subtleties to pay attention to.

The ADE instances are created like this:

Variable ADE := COMCreateObject("ADE4.CAEngine", server: Host_of_ADE_instance)

The text "ADE4.CAEngine" identifies which COM object is being instantiated. The «server» parameter specifies where (i.e., on which computer) the ADE instance shall live. Since Host_of_ADE_instance is an array of up to 16 computer names, array-abstraction kicks in and instantiates all 16 instances at the same time.

The variable ADE will be used from various places in our model, and some variables might request its Mid-value, whereas other variables might request its uncertain, or Sample-value. This presents a problem, because Analytica will evaluate its definition in Mid-mode the first time the Mid-value is requested, and then will evaluate it again in Sample-mode the first time the uncertain value is requested. If that happens, we'll end up with twice as many ADEs as desired, and they won't be the same instances. To avoid this problem, we split the instantiation across two variables:

Variable Instantiation_of_ADE := COMCreateObject("ADE4.CAEngine", server: Host_of_ADE_instance)
Variable ADE := Mid(Instantiation_of_ADE)

No other variable in the model uses Instantiation_of_ADE. All code that communicates with the ADE instances uses the variable ADE only. This ensures that Instantiation_of_ADE is only evaluated in mid-mode.

We also load the model into each instance of ADE, so we augment the Definition of Instantiation_of_ADE. As a first cut, we could do this as:

Var a := COMCreateObject( "ADE4.CAEngine", server: Host_of_ADE_instance );
a->OpenModel(Model_file_path);
a

Since the local variable a contains an array of ADE instances, array-abstraction calls the OpenModel method for each of them. These two lines of code perform an impressive amount of work, instantiating 16 ADEs across two different computers via DCOM, and instructing all 16 to load the model. (The model, of course, must be on the disk on each computer already). However, we can do better. Written as shown above, the call to OpenModel occurs sequentially, meaning Analytica tells the first ADE to open the model, and waits for that to complete before telling the second instance to open the model. Analytica's COM Integration allows you to make these 16 calls concurrently when written as follows.

Var a := COMCreateObject("ADE4.CAEngine", server: Host_of_ADE_instance);
COMCallMethod(a, "OpenModel", Model_file_path, concurrent: True);
a

The model contains a flag to indicate whether the computation should be done sequentially (in Analytica only) or via distributed computation:

Variable Use_distributed_comp := Checkbox(1) { A user input }

It is extremely important that this flag be cleared in each ADE instance; otherwise, each ADE will attempt to spawn its own computation to 16 more ADEs, causing an explosion of ADE instances that will consume all available resources on each computer before you can realize what is happening. So, the instantiation is augmented further.

 If AnalyticaEdition="ADE" Then Error("Recursive launching of ADE from ADE");
 Var a := COMCreateObject( "ADE4.CAEngine", server: Host_of_ADE_instance);
 COMCallMethod(a, "OpenModel", Model_file_path, concurrent: True);
 a->Get("Use_distributed_comp")->SetAttribute("Definition", 0);
 a

Here we are making use of methods exposed by the ADE API -- CAEngine::OpenModel, CAEngine::Get, and CAObject::SetAttribute. It is not worth performing the CAObject::SetAttribute in parallel, since it is such a quick and simple operation, and is done only once per instance.

When you load a model into Analytica or ADE and then evaluate a result with uncertainty, a pseudo-random number generator is used to generate Monte Carlo sequences in such a way that the same sequence will occur again if you close Analytica, restart, and perform the same steps. So the "random" simulations are actually "deterministic". We say that the random seed in initialized to the same starting value each time you load a model. If you allow each ADE instance to perform a Monte Carlo simulation, they will produce the exact same random numbers, which is not what we want. Hence, we need to initialize the random seed of each ADE instance to something different in each instance:

a->Get("RandomSeed")->SetAttribute("Definition", @ADE_instance)

This line illustrates an interesting aspect of array-abstraction. Both a and @ADE_instance are indexed by ADE_instance, each having 16 elements. The line causes 16 calls to occur, one to each instance, with each being set to a different scalar. It is also preferable not to use Median Latin Hypercube in each of the clients, so if the model is saved with MLHS, we'll bump this to Random Latin Hypercube.

a->Get("SampleType")->SetAttribute("Definition", Min([1, a->Get("SampleType")->Result]));

Putting all this together, we have the following definition of Instantiation_of_ADE

 If AnalyticaEdition="ADE" Then Error("Recursive launching of ADE from ADE");
 Var a := COMCreateObject("ADE4.CAEngine", server: Host_of_ADE_instance);
 COMCallMethod(a, "OpenModel", Model_file_path, concurrent: True);
 a->Get("Use_distributed_comp")->SetAttribute("Definition", 0);
 a->Get("RandomSeed")->")->SetAttribute("Definition", @ADE_instance);
 a->Get("SampleType")->SetAttribute("Definition", Min([1, a->Get("SampleType")->Result]));
 a

In Variable ADE, we'll set the client sample size and synchronize all the model inputs. This later step is done so that if the user has changes the values of any inputs since loading the model into memory, these get synchronized in each of the ADE clients before we run the simulations. So, the Definition of ADE is

 Var a := Mid(Instantiation_of_ADE);
 a->Get("SampleSize")->SetAttribute("Definition", Client_sample_size);
 a->Get(Identifier Of (Model_inputs))->SetAttribute("Definition", Definition Of (Model_inputs));
 a

The third line in this expression pretty impressive in that it actually synchronizes hundreds of user inputs in all the ADE instances in one swoop with a very terse expression. Once again, this magic occurs from array-abstraction, which here abstracts over both the ADE_instance index as well as the Model_inputs index.

The Model_inputs is an index with the MetaOnly attribute set, containing a list of all the variables in the model that have associated user input nodes. These are set by a Button having the following OnClick attribute.

 Model_inputs := (
 	MetaIndex formnodes :=
 		HandleFromIdentifier(SplitText(TextTrim(EvaluateScript("List FormNode")), '\s+', re: 1));
 	Original Of Subset(Definition Of (formnodes) = "0")			
 );

The code in this OnClick is a bit advanced, but basically it locates all the user input nodes in the model and makes a list of the corresponding variables.

In this section, we instantiated the desired number of ADE instances across multiple machines, instructed each to load the model, configured the sample size, gave each instance a different random seed, ensured that each instance doesn't spawn ADEs, ensured that the variable ADE refers to the same instances in both sample- and mid-modes, and synchronized the values for all model inputs across all instances. An amazing amount of work is accomplish with various one-line expressions, each one providing a valuable example of how array-abstraction iterates transparently over COM calls. The OpenModel method was called concurrently, since it is a time-consuming operation, but all the remaining method calls here are quick small calls that did not warrant the need for concurrency.

Distributed Monte Carlo evaluation

The previous section instantiated the ADEs across multiple computers and configured the model in each instance for distributed computation, including the step of setting the client sample size. In this section, we instruct all the ADE instances to compute their portion of the computation and return the final results. The master model then concatenates these results to obtain the full Monte Carlo sample of the result variable.

The result variable of interest in the model is named Objective1. We split this into three separate variables as follows:

 Variable Serial_objective := { The original definition of Objective1 }
 Variable DC_objective      { specified below }
 Objective Objective1 := If Use_distributed_comp Then DC_Objective Else Serial_objective

The downstream model uses Objective1 as before.

To retrieve the smaller Monte Carlo simulations from each ADE instance, we do the following.

Variable Result_from_each_ADE:=
 Var obj := ADE->Get(Identifier of Objective1) 
 obj->ResultType := 2; { sample }
 Var theCall := COMCallMethod(obj, "ResultTable", concurrent: True);
 COMCallMethod( theCall->Result, "GetSafeArray", resultIndex: DC_Run, concurrent: True)

There are multiple interesting things going on in this definition. Starting with the first line, this is functionally the same as

Var obj := ADE->Get("Objective1");

By using Identifier Of Objective1 instead of "Objective1", the line will automatically adapt if you rename Objective1 to something else.

The second line sets the ResultType property in the ADE API to tell it we want the sample result.

The third line launches the calculation, but does so asynchronously. The COMCallMethod function returns immediately after starting the calculation in every ADE instance. The result, stored in theCall, is itself a COM object that represents a computation that is in-progress. In this case, it is an array containing 16 such objects, since there are 16 calls concurrently in progress.

In the fourth line, theCall->Result waits until all 16 instances have completed their calculation. The result of this is a CATable COM object. The GetSafeArray method of CATable is then called concurrently, transmitting the result data back from each ADE instance to the master model. The call to COMCallMethod specifies not only that this is to be done concurrently, but also that the result is indexed by our DC_Run index. The final result is an array indexed by DC_Run and ADE_instance, containing the smaller Monte Carlo sample computed by each ADE instance.

The next step is to concatenate these to end up with a single large Monte Carlo sample indexed by Run.

Variable DC_objective :=
ConcatRows(Mid(Result_from_each_ADE), ADE_instance, DC_Run, Run2)[Run2 = Run]

where

Index Run2 := 1..Size(ADE_instance)*Size(DC_Run)

This reindexing to Run2 is done for the case where the large is not an exact multiple of the smaller sample size -- for example, when you divide a sample size of 750 across 16 parallel clients, each client computes 47 samples, but 47*16 = 752, so two extra samples get dropped.

Summary

This example of distributing a massive Monte Carlo computation across multiple computers has been detailed here as a real example of using the COM Integration facility in Analytica. The specific example will undoubtedly be of interest to anyone who is looking for ways of speeding up immense calculations; however, it also illustrates a large number of elements of the COM Integration feature, thus (I hope) making it a relevant example even if your interest in a different use of COM with different components.

See Also

Comments


You are not allowed to post comments.