End to End Loan ChargeOff Prediction Built Using Azure HDInsight Spark Clusters and SQL Server 2016 R Service

Debt is ubiquitous in today's world. Borrower uses access to credit to pay for housing, education, transport and medical services while lender relies on debt to fuel for its revenue. A charged off loan is a loan that is declared by a creditor (usually a lending institution) that an amount of debt is unlikely to be collected, usually when the loan repayment is severely delinquent by the borrower.

Given that high chargeoff has negative impact on lending institution financial, loan manager often monitors loan chargeoff risk very closely prevent loans from getting charged off. Lending institutions can use ML to learn patterns from the past chargeoff loan data and make predictions for the immediate future. The prediction will provide actionable insights to loan manager to form a personalized plan with the borrower to save the loan from charging off.

Today we are excited to announce an end to end Loan ChargeOff Prediction Solution on Cortana Intelligence Solutions Gallery!  This solution can be deployed using two different technology stacks depending on your needs: (1) SQL Server 2016 R Services and (2) HDInsight Spark Clusters. Both platforms demonstrate the power of R in parallelized and server compute power of HDInsight and SQL Server. With a few clicks, you can explore the end to end solution in your Azure subscription and run the R scripts that perform data transformation, feature engineering, data scoring, and model operationalization. The code is also available in GitHub, if you wish to download and run in your own machine.

Using simulated data that contains lending institution customer demographic information, loan details and payment history records, the solution applies 5 different models to learn the data pattern and uses the best performant model to do data scoring. We will also walk through how the model can be operationalized so that business application can consume the scoring result. Finally, a PowerBI report will showcase the loan chargeoff history data and prediction for the next three months.

For the Database or IT administrator, this solution simplifies the platform and analytics toolset deployment with just a few clicks. If the administrator selects SQL Server R Service solution, the Deploy option will provision a Microsoft Data Science VM (DSM) that comes loaded with SQL Server 2016 and Microsoft R Server. If the  administrator selects HDInsight Spark solution, the Deploy option will provision HDInsight Spark clusters version 3.6 that comes loaded with Microsoft R Server 9.1.0 that contain 2 head nodes, 2 worker nodes, and 1 edge node.

For the data scientist, the solution template incorporates the scripts that perform data processing, feature selection/engineering, model training and evaluation. For the SQL Server R Service deployment option, the process launches a PowerShell script which invokes T-SQL stored procedures with R code embedded in them. For the HDI deployment option, R code will be run from a browser connecting to the RStudio Server on the HDI Spark cluster's edge node and publish the scoring model as a web service. In both platforms, MicrosoftML's industry-leading algorithms are used for model training and feature selection. This template also offers different data sizes for data scientist to explore. Raw data is deployed to the VM as csv files.

For the loan manager, a PowerBI report displays the prediction data which could be further drilled down by states and branches will help loan managers make actionable decision to offer personalized incentive to borrower so that borrower will continue to make loan payment and prevent the loan from charging off.

We have made the entire code that powered this solution free to use and customizable. Deploy the Loan Chargeoff Solution with Azure HDInsight or Loan Chargeoff Solution with SQL Server 2016 and try it out today!

Comments (0)

Skip to main content