SQL R Services optimization for concurrent execution of sp_execute_external_script


Tips and Trick for SQL R Services

With SQL Server 2016, we have introduced in-database analytics by bringing R closer to the database. This allows the compute to happen closer to the data,  and also leverage the power of SQL Server including resource governance. For production scenario, our guideline includes embedding ‘R’ scripts inside sp_execute_external_script (SPEES), which internally spawns processes for R analytics.

To achieve better concurrency and throughput in parallel execution of SPEES, we recommend customer to test different process pool size to determine the optimum setting for their workload. By default 6 processes are enabled and it takes 100 ms to start a new process in Windows. For a large concurrent request, we should following entry in rlauncher.config to get preemptively increase the pool size.

PROCESS_POOL_SQLSATELLITE_GROWTH=20

If customer is running 10 concurrent connections to SPEES with MAXDOP of 4, we recommend setting the pool size to at least 10 X 4 i.e. 40. In general, the optimum configuration depends on the workload, so they should be tested for specific use case.

Comments (5)

  1. Andy Hogg says:

    Great to see more information on R in SQL Server being published, please keep up the good work.
    One interesting thing though, on my machine I consistently see 5 R processes instantiated and not 6.

    1. Hi Andy,
      6 processes are created, butone of the processes are assigned to the execution of your script (MAXDOP etc., may have an impact here), and that process is torn down after the execution. That’s most likely why you see 5 RTerm processes. I am at the moment writing a series of blog-posts about SQL Server R Services, and you can find a post covering this particular subject here: http://www.nielsberglund.com/2017/04/13/microsoft-sql-server-r-services-internals-iii/

  2. Peter Cuttance says:

    I am aware that when ScaleR was initially developed, the focus was on parallel processing for individual analyses — i.e. splitting a job across multiple processors.

    However, my focus is not on large analyses using large data sets, but on running a large number of analyses (jobs) each using a relatively small dataset.

    Can the ScaleR function, or another function be used to create queues for each of a the processors in a set — e.g. if we run a Server with 4 cores, is there functionality available when running either R Server, or R in-database (SQL Server 2016) — I’ve not yet been able to work this out from the documentation I have read, as I have not seen any reference to the Use Case I have.

    1. There isn’t a direct function in ScaleR which can allow you to create queues for multiple execution of SPEES, however you can invoke them from PowerShell in parallel or use SQL agent etc.

Skip to main content