Skip to content

Latest commit

 

History

History
62 lines (39 loc) · 3.91 KB

File metadata and controls

62 lines (39 loc) · 3.91 KB
title Tutorial for in-database Python analytics for SQL developers
description Learn how to embed Python code in SQL Server stored procedures and T-SQL functions.
ms.prod sql
ms.technology machine-learning
ms.date 10/29/2018
ms.topic tutorial
author dphansen
ms.author davidph

Tutorial: Python data analytics for SQL developers

[!INCLUDEappliesto-ss-xxxx-xxxx-xxx-md]

In this tutorial for SQL programmers, learn about Python integration by building and deploying a Python-based machine learning solution using a NYCTaxi_sample database on SQL Server. You'll use T-SQL, SQL Server Management Studio, and a database engine instance with Machine Learning Services and Python language support.

This tutorial introduces you to Python functions used in a data modeling workflow. Steps include data exploration, building and training a binary classification model, and model deployment. You'll use sample data from the New York City Taxi and Limosine Commission, and the model you will build predicts whether a trip is likely to result in a tip based on the time of day, distance travelled, and pick-up location.

All of the Python code used in this tutorial is wrapped in stored procedures that you create and run in Management Studio.

Note

This tutorial is available in both R and Python. For the R version, see In-database analytics for R developers.

Overview

The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases:

  • obtaining and cleaning data
  • exploring the data and building features useful for modeling
  • training and tuning the model
  • deployment to production

Development and testing of the actual code is best performed using a dedicated development environment. However, after the script is fully tested, you can easily deploy it to [!INCLUDEssNoVersion] using [!INCLUDEtsql] stored procedures in the familiar environment of [!INCLUDEssManStudio]. Wrapping external code in stored procedures is the primary mechanism for operationalizing code in SQL Server.

Whether you are a SQL programmer new to Python, or a Python developer new to SQL, this multi-part tutorial introduces a typical workflow for conducting in-database analytics with Python and SQL Server.

After the model has been saved to the database, you can call the model for predictions from [!INCLUDEtsql] by using stored procedures.

Prerequisites

All tasks can be done using [!INCLUDEtsql] stored procedures in [!INCLUDEssManStudio].

This tutorial assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know Python. As such, all Python code is provided.

Next steps

[!div class="nextstepaction"] Explore and visualize the data using Python