Skip to content

Latest commit

 

History

History
197 lines (154 loc) · 5.8 KB

File metadata and controls

197 lines (154 loc) · 5.8 KB
title Quickstart for working with data structures in Python
description In this quickstart for Python script in SQL Server, learn how use data structures with the sp_execute_external_script system stored procedure.
ms.prod sql
ms.technology machine-learning
ms.date 01/04/2019
ms.topic quickstart
author dphansen
ms.author davidph

Quickstart: Python data structures in SQL Server

[!INCLUDEappliesto-ss-xxxx-xxxx-xxx-md]

This quickstart shows how to use data structures when using Python in SQL Server Machine Learning Services.

SQL Server relies on the Python pandas package, which is great for working with tabular data. However, you cannot pass a scalar from Python to SQL Server and expect it to "just work". In this quickstart, we'll review some basic data type definitions, to prepare you for additional issues that you might run across when passing tabular data between Python and SQL Server.

  • A data frame is a table with multiple columns.
  • A single column of a DataFrame, is a list-like object called a Series.
  • A single value is a cell of a data frame and has to be called by index.

How would you expose the single result of a calculation as a data frame, if a data.frame requires a tabular structure? One answer is to represent the single scalar value as a series, which is easily converted to a data frame.

Prerequisites

A previous quickstart, Verify Python exists in SQL Server, provides information and links for setting up the Python environment required for this quickstart.

Scalar value as a series

This example does some simple math and converts a scalar into a series.

  1. A series requires an index, which you can assign manually, as shown here, or programmatically.

    execute sp_execute_external_script 
    @language = N'Python', 
    @script = N'
    a = 1
    b = 2
    c = a/b
    print(c)
    s = pandas.Series(c, index =["simple math example 1"])
    print(s)
    '
  2. Because the series hasn't been converted to a data.frame, the values are returned in the Messages window, but you can see that the results are in a more tabular format.

    Results

    STDOUT message(s) from external script: 
    0.5
    simple math example 1    0.5
    dtype: float64
    
  3. To increase the length of the series, you can add new values, using an array.

    execute sp_execute_external_script 
    @language = N'Python', 
    @script = N'
    a = 1
    b = 2
    c = a/b
    d = a*b
    s = pandas.Series([c,d])
    print(s)
    '

    If you do not specify an index, an index is generated that has values starting with 0 and ending with the length of the array.

    Results

    STDOUT message(s) from external script: 
    0    0.5
    1    2.0
    dtype: float64
    
  4. If you increase the number of index values, but don't add new data values, the data values are repeated to fill the series.

    execute sp_execute_external_script 
    @language = N'Python', 
    @script = N'
    a = 1
    b = 2
    c = a/b
    s = pandas.Series(c, index =["simple math example 1", "simple math example 2"])
    print(s)
    '

    Results

    STDOUT message(s) from external script: 
    0.5
    simple math example 1    0.5
    simple math example 2    0.5
    dtype: float64
    

Convert series to data frame

Having converted our scalar math results to a tabular structure, we still need to convert them to a format that SQL Server can handle.

  1. To convert a series to a data.frame, call the pandas DataFrame method.

    execute sp_execute_external_script 
    @language = N'Python', 
    @script = N'
    import pandas as pd
    a = 1
    b = 2
    c = a/b
    d = a*b
    s = pandas.Series([c,d])
    print(s)
    df = pd.DataFrame(s)
    OutputDataSet = df
    '
    WITH RESULT SETS (( ResultValue float ))
  2. The result is shown below. Even if you use the index to get specific values from the data.frame, the index values aren't part of the output.

    Results

    ResultValue
    0.5
    2

Output values into data.frame

Let's see how conversion to a data.frame works with our two series containing the results of simple math operations. The first has an index of sequential values generated by Python. The second uses an arbitrary index of string values.

  1. This example gets a value from the series that uses an integer index.

    EXECUTE sp_execute_external_script 
    @language = N'Python', 
    @script = N'
    import pandas as pd
    a = 1
    b = 2
    c = a/b
    d = a*b
    s = pandas.Series([c,d])
    print(s)
    df = pd.DataFrame(s, index=[1])
    OutputDataSet = df
    '
    WITH RESULT SETS (( ResultValue float ))

    Remember that the auto-generated index starts at 0. Try using an out of range index value and see what happens.

  2. Now let's get a single value from the other data frame that has a string index.

    EXECUTE sp_execute_external_script 
    @language = N'Python', 
    @script = N'
    import pandas as pd
    a = 1
    b = 2
    c = a/b
    s = pandas.Series(c, index =["simple math example 1", "simple math example 2"])
    print(s)
    df = pd.DataFrame(s, index=["simple math example 1"])
    OutputDataSet = df
    '
    WITH RESULT SETS (( ResultValue float ))

    Results

    ResultValue
    0.5

    If you try to use a numeric index to get a value from this series, you get an error.

Next steps

Next, you'll build a predictive model using Python in SQL Server.

[!div class="nextstepaction"] Create, train, and use a Python model with stored procedures in SQL Server