Skip to content

Latest commit

 

History

History
121 lines (94 loc) · 4.27 KB

File metadata and controls

121 lines (94 loc) · 4.27 KB
title Use Python with revoscalepy to Create a Model| Microsoft Docs
ms.custom
SQL2016_New_Updated
ms.date 04/14/2017
ms.prod sql-server-2016
ms.reviewer
ms.suite
ms.technology
r-services
ms.tgt_pltfrm
ms.topic article
caps.latest.revision 1
author jeannt
ms.author jeannt
manager jhubbard

Use Python with revoscalepy to Create a Model

This code sample demonstrates how you can create a logistic regression model using one of the algorithms in the revoscalepy package for Python, using Microsoft Machine Learning Services.

The revoscalepy package for Python contains objects, transformation, and algorithms similar to those provided for the R language's RevoScaleR package. With this library, you can create a compute context, move data between compute contexts, transform data, and train predictive models using popular algorithms such as logistic and linear regression, decision trees, and more.

For more information, see What is revoscalepy?

Prerequisites

Important

To run Python code in SQL Server, you must have installed SQL Server 2017 CTP 2.0, with the feature, Machine Learning Services with Python. Other versions of SQL Server do not support Python integration.

To run this code, execute the sample as a Python script from the command line, or by using a Python development environment that includes the Python integration components provide in this release.

Sample Code

This sample contains the following steps:

  1. Import the libraries and functions you need
  2. Create the connection to SQL Server and create data source objects for working with the data
  3. In your Python code, modify the data so that it can be used by the rxLinMod algorithm
  4. Call rxLinMod and define the formula to train the model
  5. Generate a set of predictions based on the original data set
  6. Create a summary based on the predicted values

All operations are performed using an instance of SQL Server as the compute context.

In general, the process of calling Python in a remote compute context is very much like that used for using R in a remote compute context.

from revoscalepy.computecontext.RxComputeContext import RxComputeContext
from revoscalepy.computecontext.RxInSqlServer import RxInSqlServer
from revoscalepy.computecontext.RxInSqlServer import RxSqlServerData
from revoscalepy.functions.RxLinMod import rx_lin_mod_ex
from revoscalepy.functions.RxPredict import rx_predict_ex
from revoscalepy.functions.RxSummary import rx_summary
from revoscalepy.utils.RxOptions import RxOptions
from revoscalepy.etl.RxImport import rx_import_datasource


import os

def test_linmod_sql():
    sqlServer = os.getenv('RTEST_SQL_SERVER', '.')
    
    connectionString = 'Driver=SQL Server;Server=' + sqlServer + ';Database=RevoTestDb;Trusted_Connection=True;'
    print("connectionString={0!s}".format(connectionString))
    
    dataSource = RxSqlServerData(
        sqlQuery = "select top 10 * from airlinedemosmall", 
        connectionString = connectionString,
        colInfo = { 
            "ArrDelay" : { "type" : "integer" }, 
            "DayOfWeek" : { 
                "type" : "factor", 
                "levels" : ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
            }
        })

    computeContext = RxInSqlServer(
        connectionString = connectionString,
        numTasks = 1,
        autoCleanup = False
        )

    #
    # import data source to avoid factor levels
    #        
    data = rx_import_datasource(dataSource)
    print(data)

    #
    # run linmod
    #
    linmod = rx_lin_mod_ex("ArrDelay ~ DayOfWeek", data = data, compute_context = computeContext)
    assert (linmod is not None)
    assert (linmod._results is not None)
    print(linmod)

    #
    # predict results
    # 
    data = rx_import_datasource(dataSource)
    del data["ArrDelay"]
    predict = rx_predict_ex(linmod, data = data)
    assert (predict is not None)
    print(predict._results)

    #
    # do a summary
    #
    summary = rx_summary("ArrDelay ~ DayOfWeek", data = dataSource, compute_context = computeContext)
    assert (summary is not None)
    print(summary)

test_linmod_sql()

See Also

Deploy and Consume Python Models