| title | Deploy R code in stored procedures |
|---|---|
| description | Embed R language code in a SQL Server stored procedure to make it available to any client application having access to a SQL Server database. |
| ms.prod | sql |
| ms.technology | machine-learning |
| ms.date | 03/15/2019 |
| ms.topic | conceptual |
| author | dphansen |
| ms.author | davidph |
| ms.custom | seo-lt-2019 |
| monikerRange | >=sql-server-2016||>=sql-server-linux-ver15||=sqlallproducts-allversions |
[!INCLUDEappliesto-ss-xxxx-xxxx-xxx-md]
When using the R and Python features in SQL Server Machine Learning Services, the most common approach for moving solutions to a production environment is by embedding code in stored procedures. This article summarizes the key points for the SQL developer to consider when operationalizing R code using SQL Server.
Traditionally, integration of data science solutions has meant extensive recoding to support performance and integration. SQL Server Machine Learning Services simplifies this task because R and Python code can be run in SQL Server and called using stored procedures. For more information about the mechanics of embedding code in stored procedures, see:
A more comprehensive example of deploying R code into production by using stored procedures can be found at Tutorial: R data analytics for SQL developers
Converting your R code in SQL is easier if some optimizations are done beforehand in the R or Python code. These include avoiding data types that cause problems, avoiding unnecessary data conversions, and rewriting the R code as a single function call that can be easily parameterized. For more information, see:
Because you can run R or Python from a stored procedure, you can execute scripts from any application that can send a T-SQL statement and handle the results. For example, you might retrain a model on a schedule by using the Execute T-SQL task in Integration Services, or by using another job scheduler that can run a stored procedure.
Scoring is an important task that can easily be automated, or started from external applications. You train the model beforehand, using R or Python or a stored procedure, and save the model in binary format to a table. Then, the model can be loaded into a variable as part of a stored procedure call, using one of these options for scoring from T-SQL:
- [Real-time scoring, optimized for small batches
- Single-row scoring, for calling from an application
- Native scoring, for fast batch prediction from SQL Server without calling R
This walkthrough provides examples of scoring using a stored procedure in both batch and single-row modes:
See these solution templates for examples of how to integrate scoring in an application:
Although the open-source R language has known limitations with regards to large data sets, the RevoScaleR package APIs included with SQL Server Machine Learning Service can operate on large datasets and benefit from multi-threaded, multi-core, multi-process in-database computations.
If your R solution uses complex aggregations or involves large datasets, you can leverage SQL Server's highly efficient in-memory aggregations and columnstore indexes, and let the R code handle the statistical computations and scoring.
For more information about how to improve performance in SQL Server Machine Learning, see:
The same R code that you run against [!INCLUDEssNoVersion] data can be used against other data sources, such as Spark over HDFS, when you use the standalone server option in SQL Server setup or when you install the non-SQL branded product, Microsoft Machine Learning Server (formerly known as Microsoft R Server):