Using MATLAB MDCS, Generic Scheduler and Source control with Azure & VSTS

matlab

MATLAB® is a hugely popular platform used by students and educators to analyse and design systems and products.

Typical uses of Matlab in Academia are as follows

Creation of Cluster using MATLAB Distributed Computing Server ‘MDCS’

Matlab Distributed Computing Server is used on the cluster nodes and includes a built-in job scheduler that supports batch jobs, parallel computations and distributed large data.

MATLAB Distributed Computing Server Generic Scheduler

The generic  scheduler is used for running tasks at specific times.

MATLAB and Cloud

Typically significant work was  required by each customer to create the environment, configure the software and provide the required cluster management capabilities. However no more this can now all be done simply and quickly using Microsoft Azure to host MATLAB Distributed Computing Server clusters,

MATLAB Azure Resource Manager Templates

The Microsoft Azure team have now made it much easier and quicker to use Microsoft Azure for MATLAB Distributed Computing Server clusters by making available a set of Azure Resource Manager templates and scripts.

Compared to provisioning and using on-premises hardware for MATLAB Distributed Computing Server clusters, using Microsoft Azure has many benefits:

  1. A wide range of virtual machines (VMs) are available. VMs can be chosen according to the requirements of your application and how soon you need the work completed – choose the number of cores, amount of memory, processor speed, network interconnect and so on.
  2. One or more clusters can be created on-demand. Create clusters when required using the number and type of VMs you need, run your workloads, then delete them when you are done.
  3. Only pay for the compute you consume. Azure compute is billed by the minute, so you only pay for what you use, whether that is under an hour or for a few days.
  4. Leverage scale you never would have had access to on-premises. You can provision large numbers of VMs and/or high-performance VMs for your most demanding workloads or for time-critical jobs. Create the clusters only when needed and delete them when the work is complete. You’re no longer constrained to a fixed number and type of servers.

Running MATLAB Distributed Computing Server on a cluster of Azure VMs provides user-friendly, high performance computing at very low cost compared to the total cost of ownership of providing an equivalent MATLAB Distributed Computing Server capability on on-premises servers deployed in our own data centres. The users of MATLAB Distributed Computing Server are able to start and stop the Azure-based cluster, or even just portions of it, using some simple PowerShell scripts, thereby keeping Azure billing costs to a minimum.

James Mann
Solution Architect
Aberdeen Asset Management PLC

Using the new templates and scripts, one or more MATLAB Distributed Computing Server clusters can be created by specifying a small set of configuration parameters, such as the number of worker VMs and the size of worker VMs – then the network is configured, the VMs created and the MATLAB Distributed Computing Server is configured. Script commands allow clusters to be listed, paused, resumed and deleted. Licensing is handled by the MathWorks Hosted License Manager, which includes support for on-demand licensing as well as perpetual or annual licensing.

Running MATLAB Job Schedules

MATLAB also supports third-party job schedulers, in addition to the job scheduler included with MATLAB Distributed Computing Server.  We have now added support to enable Azure Batch to be used as a job scheduler with MATLAB.

Further information, links to the templates, scripts and detailed documentation are available as part of the Windows Virtual Machines documentation and github resource https://www.github.com/azure

Implementation MDCS Generic Scheduler using the Azure Batch service.

Azure Setup

  1. Get your MATLAB installation files. These steps are accurate as of May 2016, but the MathWorks website may change its layout in the future.

    • Log into your MathWorks account, go to the My Account page, and click the Download Products button.
    • Choose the version of MATLAB you want, and select the Windows (64-bit) option.
    • Select the MATLAB Distributed Computing Server tab, and click the button to download the installer.
    • Open the installer, and choose the Log in with a MathWorks account option.
    • Accept the license agreement and enter your MathWorks account credentials.
    • Select the Download Only option. Select the Windows (64-bit) option, and choose a path to download the installation files to.
    • Create a .zip file called MDCS.zip to contain the downloaded files. Make sure your .zip does not have a top level folder. setup.exe should be at the top level.
  2. Supply your file installation key and create the Batch setup package.

    • Log into your MathWorks account, go to the My Account page, and click the View My Licenses button.
    • Select your MATLAB Distributed Computing Server license, navigate to the Activation and Installation tab, and click the Get File Installation Key button.
    • Select your product version, and a long string of numbers should appear. Copy it to your clipboard.
    • In the BatchMdcsAppPackage folder in this package, open installer_input.txt.
    • In the second line of the file, paste your file installation key after the "=" character. Ensure there are no spaces between the "=" character and the beginning of your file installation key.
    • Put all files in a .zip file called BatchMdcs.zip. The .zip should just contain the files without a top level folder.
  3. Create an Azure Batch account following the steps here: https://azure.microsoft.com/documentation/articles/batch-account-create-portal/

  4. Create an Azure Storage account following the steps here: https://azure.microsoft.com/documentation/articles/storage-create-storage-account/#create-a-storage-account
    This Storage account will be used to store the MATLAB job data.

    • Create a file share under this account.
  5. Create an Azure Storage account to house the MATLAB installation and setup files.

    • Log into https://portal.azure.com and create a new Storage account.
    • Navigate to the Batch account you created in step 3 and link it to this new Storage account.
  6. Upload 2 application packages. For more information on application packages, see the following article: https://azure.microsoft.com/documentation/articles/batch-application-packages/

    • Add a new application to your Batch account:
      • Application Id: mdcs
      • Version: 1
      • Application Package: Enter the path to the MDCS.zip you created in step 1.
    • Add another new application:
      • Application Id: batchmdcs
      • Version: 1
      • Application Package: Enter the path to the BatchMdcs.zip you created in step 2.

Building and Installing the Toolbox

  1. Install Visual Studio 2015. You can download the free Community edition here: https://www.visualstudio.com/products/visual-studio-community-vs

  2. Install MATLAB and the Parallel Computing Toolbox. MATLAB R2015a and Parallel Computing Toolbox 6.6 have been verified to work with this project.

  3. Open src\MatlabBatchLib.sln in Visual Studio.

  4. Build the solution. If you see errors, please ensure that the required NuGet packages are downloaded.

  5. Create and install the MATLAB toolbox

    • Open MATLAB on your local machine.
    • Under the toolbar's Home tab, go to the Resources section and select Add-Ons > Package Toolbox.
    • In the Toolbox Folder section of the toolbar, click the + button and select this project's "toolbox" folder.
    • Name the package Batch MDCS Scheduler.
    • Click the Package button.
    • Double click the .mltbx file that was created and install the toolbox.

Batch Cluster Setup

  1. Create a Generic Cluster Profile in MATLAB.

    • Open MATLAB on your local machine.
    • In the "Home" tab, select Parallel > Manage Cluster Profiles.
    • Click the Import button, and select the BatchProfile.settings file at the root of this project.
    • Once imported, edit the top section's JobStorageLocation, NumWorkers, and License Number properties as appropriate for your setup. Also edit the WORKERS section as appropriate for your setup.
  2. Update getBatchConfigs.m

    • In MATLAB, navigate to the directory where you installed the toolbox (ex: C:\Users\johndoe\Documents\MATLAB\Toolboxes\Batch MDCS Scheduler) and open getBatchConfigs.m.
    • Fill in your Batch account information.
    • Fill in your Storage account information.
  3. Execute the storeBatchCredentials function.

    • From the Azure portal, get the keys for your data Storage account and your Batch account. You will be prompted for these keys by the storeBatchCredentials function.
    • The keys you supply will be stored in the Windows Credential Manager on your local machine. All future interactions with the Batch service will use the stored credentials.
    • If the keys to your Batch and Storage account are regenerated, simply rerun storeBatchCredentials and supply the new values.
  4. Use the pool helper functions to manage pools in the Batch service. See each function's help text for more information.

    • batchCreatePool
    • batchListPools
    • batchResizePool
    • batchDeletePool
  5. Choose which Batch pool to use with your cluster, and set the ClusterPoolId property in getBatchConfigs.m.

  6. Run your MATLAB workflow against the Batch cluster profile.

Known Issues:

  • Creating an interactive session with parpool is not currently supported.
  • The Start Task does not validate that the MATLAB installation was successful.

Testing

To verify your Batch cluster setup, go to the Home tab of the MATLAB toolbar and select Parallel > Manage Cluster Profiles. Select your Batch Profile, and click the Validate button. MATLAB will run some test jobs.

NOTE: The final job will fail because it attempts to create an interactive parpool session. This is not yet supported.

NOTE: The validation suite includes 2 communicating jobs, so the Batch pool you use for validation must have the MaxTasksPerComputeNode property set to 1.

Adding source control to MATLAB

 

Microsoft Azure team have also released github project is now also available which  provides TFS Version Control integration in MATLAB and Simulink.
https://www.mathworks.com/help/matlab/matlab_prog/about-mathworks-source-control-integration.html
https://www.mathworks.com/help/simulink/ug/write-a-source-control-adapter-with-the-sdk.html

The project can be downloaded from https://github.com/Azure/tfs-matlab-connector

This project is built on top of the TFS JAVA SDK, which is part of the Team Explorer Everywhere project:
https://github.com/Microsoft/team-explorer-everywhere

Installing and using the app

Open MATLAB and double click the "TFS Version Control Integration.mlappinstall" file. Follow the prompts to install the app. See appReadMe.txt for more details on setup and usage.

Building the source code with Ant
  1. Install MATLAB, at least version R2014a.
  2. Install the Java 8 Development Kit and the Java 7 Runtime. The MATLAB SDK uses Java 1.7 as its target, so in theory JDK 1.7 should be sufficient, but only the JDK 8 + JRE 7 configuration has been validated.
    https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
    https://www.oracle.com/technetwork/java/javase/downloads/jre7-downloads-1880261.html
  3. Install Apache Ant(TM) version 1.9.7. https://ant.apache.org/bindownload.cgi
  4. Set the JAVA_HOME environment variable. ex:
    • (Windows) SET JAVA_HOME=C:\Program Files\Java\jdk1.8.0_65
    • (Mac) JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_65/Contents/Home
    • (Linux) JAVA_HOME=~/java/jdk1.8.0_65
  5. Ensure that the JAVA_HOME bin directory and the Ant bin directory are on the PATH. ex:
    • (Windows) SET PATH=C:\Program Files\apache-ant-1.9.7\bin;%JAVA_HOME%\bin;%PATH%
    • (Mac) PATH=~/apache-ant-1.9.7/bin:$JAVA_HOME/bin:$PATH
    • (Linux) PATH=~/apache-ant-1.9.7/bin:$JAVA_HOME/bin:$PATH
  6. Navigate to the src directory and run:
    • ant compile
      Note that the build.xml file needs the path to your MATLAB installation and the Java 7 runtime directory containing rt.jar. Default Windows values are provided in the file, but you can overwrite them on the command line:
    • ant compile "-Dmatlab.root.dir=D:/MATLAB/R2015a" "-Djre7.lib.dir=D:/jre7/lib"
  7. To delete all build output, run:
    • ant clean