| title | Use Python with revoscalepy to Create a Model| Microsoft Docs | |
|---|---|---|
| ms.custom |
|
|
| ms.date | 04/14/2017 | |
| ms.prod | sql-server-2016 | |
| ms.reviewer | ||
| ms.suite | ||
| ms.technology |
|
|
| ms.tgt_pltfrm | ||
| ms.topic | article | |
| caps.latest.revision | 1 | |
| author | jeannt | |
| ms.author | jeannt | |
| manager | jhubbard |
This code sample demonstrates how you can create a logistic regression model using one of the algorithms in the revoscalepy package for Python, using Microsoft Machine Learning Services.
The revoscalepy package for Python contains objects, transformation, and algorithms similar to those provided for the R language's RevoScaleR package. With this library, you can create a compute context, move data between compute contexts, transform data, and train predictive models using popular algorithms such as logistic and linear regression, decision trees, and more.
For more information, see What is revoscalepy?
Important
To run Python code in SQL Server, you must have installed SQL Server 2017 CTP 2.0, with the feature, Machine Learning Services with Python. Other versions of SQL Server do not support Python integration.
To run this code, execute the sample as a Python script from the command line, or by using a Python development environment that includes the Python integration components provide in this release.
This sample contains the following steps:
- Import the libraries and functions you need
- Create the connection to SQL Server and create data source objects for working with the data
- In your Python code, modify the data so that it can be used by the rxLinMod algorithm
- Call rxLinMod and define the formula to train the model
- Generate a set of predictions based on the original data set
- Create a summary based on the predicted values
All operations are performed using an instance of SQL Server as the compute context.
In general, the process of calling Python in a remote compute context is very much like that used for using R in a remote compute context.
from revoscalepy.computecontext.RxComputeContext import RxComputeContext
from revoscalepy.computecontext.RxInSqlServer import RxInSqlServer
from revoscalepy.computecontext.RxInSqlServer import RxSqlServerData
from revoscalepy.functions.RxLinMod import rx_lin_mod_ex
from revoscalepy.functions.RxPredict import rx_predict_ex
from revoscalepy.functions.RxSummary import rx_summary
from revoscalepy.utils.RxOptions import RxOptions
from revoscalepy.etl.RxImport import rx_import_datasource
import os
def test_linmod_sql():
sqlServer = os.getenv('RTEST_SQL_SERVER', '.')
connectionString = 'Driver=SQL Server;Server=' + sqlServer + ';Database=RevoTestDb;Trusted_Connection=True;'
print("connectionString={0!s}".format(connectionString))
dataSource = RxSqlServerData(
sqlQuery = "select top 10 * from airlinedemosmall",
connectionString = connectionString,
colInfo = {
"ArrDelay" : { "type" : "integer" },
"DayOfWeek" : {
"type" : "factor",
"levels" : ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
}
})
computeContext = RxInSqlServer(
connectionString = connectionString,
numTasks = 1,
autoCleanup = False
)
#
# import data source to avoid factor levels
#
data = rx_import_datasource(dataSource)
print(data)
#
# run linmod
#
linmod = rx_lin_mod_ex("ArrDelay ~ DayOfWeek", data = data, compute_context = computeContext)
assert (linmod is not None)
assert (linmod._results is not None)
print(linmod)
#
# predict results
#
data = rx_import_datasource(dataSource)
del data["ArrDelay"]
predict = rx_predict_ex(linmod, data = data)
assert (predict is not None)
print(predict._results)
#
# do a summary
#
summary = rx_summary("ArrDelay ~ DayOfWeek", data = dataSource, compute_context = computeContext)
assert (summary is not None)
print(summary)
test_linmod_sql()