| title | Step 6: Operationalize the Model| Microsoft Docs | ||
|---|---|---|---|
| ms.custom | |||
| ms.date | 05/25/2017 | ||
| ms.prod | sql-server-2017 | ||
| ms.reviewer | |||
| ms.suite | |||
| ms.technology |
|
||
| ms.tgt_pltfrm | |||
| ms.topic | article | ||
| applies_to |
|
||
| dev_langs |
|
||
| ms.assetid | |||
| caps.latest.revision | 2 | ||
| author | jeannt | ||
| ms.author | jeannt | ||
| manager | jhubbard |
In this step, you'll learn to operationalize the models that you trained and saved in the previous step. Operationalize in this case means "deploying the model to production for scoring". This is easy to do if your Python code is contained in a stored procedure. You can then call the stored procedure from applications, to make predictions on new observations.
You'll learn two methods for calling a Python model from a stored procedure:
- Batch scoring mode: Use a SELECT query to provide multiple rows of data. The stored procedure returns a table of observations corresponding to the input cases.
- Individual scoring mode: Pass a set of individual parameter values as input. The stored procedure returns a single row or value.
The stored procedure PredictTipSciKitPy uses the scikit-learn model. This stored procedure illustrates the basic syntax for wrapping a Python prediction call in a stored procedure.
-
The name of the model to use is provided as input parameter to the stored procedure.
-
The stored procedure will then load the serialized model from the database table
nyc_taxi_models.table, using the SELECT statement in the stored procedure. -
The serialized model is stored in the Python variable
modfor further processing using Python. -
The new cases that need to be scored are obtained from the [!INCLUDEtsql] query specified in
@input_data_1. As the query data is read, the rows are saved in the default data frame,InputDataSet. -
This data frame is passed to the
predict_probafunction of the logistic regression model,mod, which was created by using scikit-learn model. -
The
predict_probafunction (probArray = mod.predict_proba(X)) returns a float that represents the probability that a tip (of any amount) will be given. -
The stored procedure also calculates an accuracy metric, AUC (area under curve). Accuracy metrics such as AUC can only be generated if you also provide the target label (i.e. the tipped column). Predictions do not need the target label (variable
y), but the accuracy metric calculation does.Therefore, if you don't have target labels for the data to be scored, you can modify the stored procedure to remove the AUC calculations, and simply return the tip probabilities from the features (variable
Xin the stored procedure).
CREATE PROCEDURE [dbo].[PredictTipSciKitPy] (@model varchar(50), @inquery nvarchar(max))
AS
BEGIN
DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models where name = @model);
EXEC sp_execute_external_script
@language = N'Python',
@script = N'
import pickle;
import numpy;
import pandas;
from sklearn import metrics
mod = pickle.loads(lmodel2)
X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
y = numpy.ravel(InputDataSet[["tipped"]])
probArray = mod.predict_proba(X)
probList = []
for i in range(len(probArray)):
probList.append((probArray[i])[1])
probArray = numpy.asarray(probList)
fpr, tpr, thresholds = metrics.roc_curve(y, probArray)
aucResult = metrics.auc(fpr, tpr)
print ("AUC on testing data is: " + str(aucResult))
OutputDataSet = pandas.DataFrame(data = probList, columns = ["predictions"])
',
@input_data_1 = @inquery,
@input_data_1_name = N'InputDataSet',
@params = N'@lmodel2 varbinary(max)',
@lmodel2 = @lmodel2
WITH RESULT SETS ((Score float));
END
GOThe stored procedure PredictTipRxPy uses a model that was created using the revoscalepy library. It works much the same way as the PredictTipSciKitPy procedure, but with some changes for the revoscalepy functions.
CREATE PROCEDURE [dbo].[PredictTipRxPy] (@model varchar(50), @inquery nvarchar(max))
AS
BEGIN
DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models2 where name = @model);
EXEC sp_execute_external_script
@language = N'Python',
@script = N'
import pickle;
import numpy;
import pandas;
from sklearn import metrics
from revoscalepy.functions.RxPredict import rx_predict_ex;
mod = pickle.loads(lmodel2)
X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
y = numpy.ravel(InputDataSet[["tipped"]])
probArray = rx_predict_ex(mod, X)
probList = []
for i in range(len(probArray._results["tipped_Pred"])):
probList.append((probArray._results["tipped_Pred"][i]))
probArray = numpy.asarray(probList)
fpr, tpr, thresholds = metrics.roc_curve(y, probArray)
aucResult = metrics.auc(fpr, tpr)
print ("AUC on testing data is: " + str(aucResult))
OutputDataSet = pandas.DataFrame(data = probList, columns = ["predictions"])
',
@input_data_1 = @inquery,
@input_data_1_name = N'InputDataSet',
@params = N'@lmodel2 varbinary(max)',
@lmodel2 = @lmodel2
WITH RESULT SETS ((Score float));
END
GOThe stored procedures PredictTipSciKitPy and PredictTipRxPy require two input parameters:
- The query that retrieves the data for scoring
- The name of a trained model
In this section you'll learn how to pass those arguments to the stored procedure to easily change both the model and the data used for scoring.
-
Define the input data and call the stored procedures for scoring as follows. This example uses the stored procedure PredictTipSciKitPy for scoring, and passes in the model's name and query string
DECLARE @query_string nvarchar(max) -- Specify input query SET @query_string=' select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance, dbo.fnCalculateDistance(pickup_latitude, pickup_longitude, dropoff_latitude, dropoff_longitude) as direct_distance from nyctaxi_sample_testing' EXEC [dbo].[PredictTipSciKitPy] 'SciKit_model', @query_string;
The stored procedure returns predicted probabilities for each trip that was passed in as part of the input query. If you are using SSMS (SQL Server Management Studio) for running queries, the probabilities will appear as a table in the Results pane. The Messages pane outputs the accuracy metric (AUC or area under curve) with a value of around 0.56.
-
To use the revoscalepy model for scoring, call the stored procedure PredictTipRxPy, passing in the model name and query string.
EXEC [dbo].[PredictTipRxPy] 'revoscalepy_model', @query_string;
Sometimes, instead of batch scoring, you might want to pass in a single case, getting values from an application, and get a single result based on those values. For example, you could set up an Excel worksheet, web application, or Reporting Services report to call the stored procedure and provide inputs typed or selected by users.
In this section, you'll learn how to create single predictions by calling a stored procedure.
- Take a minute to review the code of the stored procedures PredictTipSingleModeSciKitPy and PredictTipSingleModeRxPy, which are included as part of the download. These stored procedures use the scikit-learn and revoscalepy models, and perform scoring as follows:
- The model's name and multiple single values are provided as input. These inputs include passenger count, trip distance, and so forth.
- A table-valued function,
fnEngineerFeatures, takes input values and converts the latitudes and longitudes to direct distance. Lesson 4 contains a description of this table-valued function. - If you call the stored procedure from an external application, ensure that the input data matches the required input features of the Python model. This might include casting or converting the input data to a Python data type, or validating data type and data length.
- The stored procedure creates a score based on the stored Python model.
Here is the definition of the stored procedure that performs scoring using the scikit-learn model.
CREATE PROCEDURE [dbo].[PredictTipSingleModeSciKitPy] (@model varchar(50), @passenger_count int = 0,
@trip_distance float = 0,
@trip_time_in_secs int = 0,
@pickup_latitude float = 0,
@pickup_longitude float = 0,
@dropoff_latitude float = 0,
@dropoff_longitude float = 0)
AS
BEGIN
DECLARE @inquery nvarchar(max) = N'
SELECT * FROM [dbo].[fnEngineerFeatures](
@passenger_count,
@trip_distance,
@trip_time_in_secs,
@pickup_latitude,
@pickup_longitude,
@dropoff_latitude,
@dropoff_longitude)
'
DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models2 where name = @model);
EXEC sp_execute_external_script
@language = N'Python',
@script = N'
import pickle;
import numpy;
import pandas;
# Load model and unserialize
mod = pickle.loads(model)
# Get features for scoring from input data
X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
# Score data to get tip prediction probability as a list (of float)
probList = []
probList.append((mod.predict_proba(X)[0])[1])
# Create output data frame
OutputDataSet = pandas.DataFrame(data = probList, columns = ["predictions"])
',
@input_data_1 = @inquery,
@params = N'@model varbinary(max),@passenger_count int,@trip_distance float,
@trip_time_in_secs int ,
@pickup_latitude float ,
@pickup_longitude float ,
@dropoff_latitude float ,
@dropoff_longitude float',
@model = @lmodel2,
@passenger_count =@passenger_count ,
@trip_distance=@trip_distance,
@trip_time_in_secs=@trip_time_in_secs,
@pickup_latitude=@pickup_latitude,
@pickup_longitude=@pickup_longitude,
@dropoff_latitude=@dropoff_latitude,
@dropoff_longitude=@dropoff_longitude
WITH RESULT SETS ((Score float));
END
GOHere is the definition of the stored procedure that performs scoring using the revoscalepy model.
CREATE PROCEDURE [dbo].[PredictTipSingleModeRxPy] (@model varchar(50), @passenger_count int = 0,
@trip_distance float = 0,
@trip_time_in_secs int = 0,
@pickup_latitude float = 0,
@pickup_longitude float = 0,
@dropoff_latitude float = 0,
@dropoff_longitude float = 0)
AS
BEGIN
DECLARE @inquery nvarchar(max) = N'
SELECT * FROM [dbo].[fnEngineerFeatures](
@passenger_count,
@trip_distance,
@trip_time_in_secs,
@pickup_latitude,
@pickup_longitude,
@dropoff_latitude,
@dropoff_longitude)
'
DECLARE @lmodel2 varbinary(max) = (select model from nyc_taxi_models2 where name = @model);
EXEC sp_execute_external_script
@language = N'Python',
@script = N'
import pickle;
import numpy;
import pandas;
from revoscalepy.functions.RxPredict import rx_predict_ex;
# Load model and unserialize
mod = pickle.loads(model)
# Get features for scoring from input data
X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
# Score data to get tip prediction probability as a list (of float)
probArray = rx_predict_ex(mod, X)
probList = []
probList.append(probArray._results["tipped_Pred"])
# Create output data frame
OutputDataSet = pandas.DataFrame(data = probList, columns = ["predictions"])
',
@input_data_1 = @inquery,
@params = N'@model varbinary(max),@passenger_count int,@trip_distance float,
@trip_time_in_secs int ,
@pickup_latitude float ,
@pickup_longitude float ,
@dropoff_latitude float ,
@dropoff_longitude float',
@model = @lmodel2,
@passenger_count =@passenger_count ,
@trip_distance=@trip_distance,
@trip_time_in_secs=@trip_time_in_secs,
@pickup_latitude=@pickup_latitude,
@pickup_longitude=@pickup_longitude,
@dropoff_latitude=@dropoff_latitude,
@dropoff_longitude=@dropoff_longitude
WITH RESULT SETS ((Score float));
END
GO-
To try it out, open a new Query window, and call the stored procedure, typing parameters for each of the feature columns.
-- Call stored procedure PredictTipSingleModeSciKitPy to score using SciKit-Learn model EXEC [dbo].[PredictTipSingleModeSciKitPy] 'linear_model', 1, 2.5, 631, 40.763958,-73.973373, 40.782139,-73.977303 -- Call stored procedure PredictTipSingleModeRxPy to score using revoscalepy model EXEC [dbo].[PredictTipSingleModeRxPy] 'revoscalepy_model', 1, 2.5, 631, 40.763958,-73.973373, 40.782139,-73.977303
The seven values are for these feature columns, in order:
- passenger_count
- trip_distance
- trip_time_in_secs
- pickup_latitude
- pickup_longitude
- dropoff_latitude
- dropoff_longitude
-
The output from both procedures is a probability of a tip being paid for the taxi trip with the above parameters or features.
In this tutorial, you've learned how to work with Python code embedded in stored procedures. The integration with [!INCLUDEtsql] makes it much easier to deploy Python models for prediction and to incorporate model retraining as part of an enterprise data workflow.