Skip to content

Latest commit

 

History

History
98 lines (69 loc) · 6.53 KB

File metadata and controls

98 lines (69 loc) · 6.53 KB
title Step 2: Import Data to SQL Server using PowerShell | Microsoft Docs
ms.custom
ms.date 05/25/2017
ms.prod sql-server-vnext-ctp2
ms.reviewer
ms.suite
ms.technology
r-services
ms.tgt_pltfrm
ms.topic article
applies_to
SQL Server 2017
dev_langs
Python
TSQL
ms.assetid
caps.latest.revision 2
author jeannt
ms.author jeannt
manager jhubbard

Step 2: Import Data to SQL Server using PowerShell

In this step, you'll run one of the downloaded scripts, to create the database objects required for the walkthrough. The script also creates most of the stored procedures you'll use, and uploads the sample data to a table in the database you specified.

Create SQL Objects and Data

Among the downloaded files you should see a PowerShell script. To prepare the environment for the walkthrough, you'll run this script. The script performs these actions:

  • Installs the SQL Native Client and SQL command-line utilities, if not already installed. These utilities are required for bulk-loading the data to the database using bcp.

  • Creates a database and a table on the [!INCLUDEssNoVersion] instance, and bulk-inserts data into the table.

  • Creates multiple SQL functions and stored procedures.

Run the script

  1. Open a PowerShell command prompt as administrator and run the following command.

    .\RunSQL_SQL_Walkthrough.ps1  
    

    You will be prompted to input the following information:

    • The name or address of a [!INCLUDEssCurrent] instance where machine learning Services with Python has been installed
    • The user name and password for an account on the instance. The account you use must have the ability to create databases, create tables and stored procedures, and upload data to tables. If you do not provide the user name and password, your Windows identity is used to sign in to SQL Server.
    • The path and file name of the sample data file that you just downloaded. For example, C:\tempRSQL\nyctaxi1pct.csv

    [!NOTE] Make sure that your xmlrw.dll is in the same folder as your bcp.exe. If not, please copy it over.

  2. As part of this step, all the [!INCLUDEtsql] scripts are also modified to replace placeholders with the database name and user name that you provide as script inputs.

  3. Take a minute to review the stored procedures and functions created by the script.

    SQL script file name Function
    create-db-tb-upload-data.sql Creates a database and two tables:

    nyctaxi_sample: Contains the main NYC Taxi dataset. A clustered columnstore index is added to the table to improve storage and query performance. The 1% sample of the NYC Taxi dataset will be inserted into this table.

    nyc_taxi_models: Used to persist the trained advanced analytics model.
    fnCalculateDistance.sql Creates a scalar-valued function that calculates the direct distance between pickup and dropoff locations
    fnEngineerFeatures.sql Creates a table-valued function that creates new data features for model training
    TrainingTestingSplit.sql Split the data in the nyctaxi_sample table into two parts: nyctaxi_sample_training and nyctaxi_sample_testing.
    PredictTipSciKitPy.sql Creates a stored procedure that calls the trained model (scikit-learn) to create predictions using the model. The stored procedure accepts a query as its input parameter and returns a column of numeric values containing the scores for the input rows.
    PredictTipRxPy.sql Creates a stored procedure that calls the trained model (revoscalepy) to create predictions using the model. The stored procedure accepts a query as its input parameter and returns a column of numeric values containing the scores for the input rows.
    PredictTipSingleModeSciKitPy.sql Creates a stored procedure that calls the trained model (scikit-learn) to create predictions using the model. This stored procedure accepts a new observation as input, with individual feature values passed as in-line parameters, and returns a value that predicts the outcome for the new observation.
    PredictTipSingleModeRxPy.sql Creates a stored procedure that calls the trained model (revoscalepy) to create predictions using the model. This stored procedure accepts a new observation as input, with individual feature values passed as in-line parameters, and returns a value that predicts the outcome for the new observation.
  4. You'll create some additional stored procedures in the latter part of this walkthrough:

    SQL script file name Function
    SerializePlots.sql Creates a stored procedure for data exploration. This stored procedure creates a graphic using Python and then serialize the graph objects.
    TrainTipPredictionModelSciKitPy.sql Creates a stored procedure that trains a logistic regression model (scikit-learn). The model predicts the value of the tipped column, and is trained using a randomly selected 60% of the data. The output of the stored procedure is the trained model, which is saved in the table nyc_taxi_models.
    TrainTipPredictionModelRxPy.sql Creates a stored procedure that trains a logistic regression model (revoscalepy). The model predicts the value of the tipped column, and is trained using a randomly selected 60% of the data. The output of the stored procedure is the trained model, which is saved in the table nyc_taxi_models.
  5. Log in to the [!INCLUDEssNoVersion] instance using [!INCLUDEssManStudioFull] and the login you specified, to verify that you can see the database, tables, functions, and stored procedures that were created.

    browse tables in SSMS

    [!NOTE] If the database objects already exist, they cannot be created again. If the table already exists, the data will be appended, not overwritten. Therefore, be sure to drop any existing objects before running the script. Follow instructions here to enable external scrip services.

Next Step

Step 3: Explore and Visualize the Data

Previous Step

Step 1: Download the Sample Data

See Also

Machine Learning Services with Python