Skip to content

Latest commit

 

History

History
66 lines (41 loc) · 3.93 KB

File metadata and controls

66 lines (41 loc) · 3.93 KB
title Tutorial for in-database analytics using R and SQL Server Machine Learning | Microsoft Docs
description Learn how to embed R programming language code in SQL Server stored procedures and T-SQL functions.
ms.prod sql
ms.technology machine-learning
ms.date 10/29/2018
ms.topic tutorial
author HeidiSteen
ms.author heidist
manager cgronlun

Tutorial: In-Database analytics for SQL developers using R

[!INCLUDEappliesto-ss-xxxx-xxxx-xxx-md-winonly]

In this tutorial for SQL programmers, learn about R integration by building and deploying an R-based machine learning solution using a NYCTaxi_sample database on SQL Server.

This tutorial introduces you to R functions used in a data modeling workflow. Steps include data exploration, building and training a binary classification model, and model deployment. You'll use sample data from the New York City Taxi and Limosine Commission, and the model you will build predicts whether a trip is likely to result in a tip based on the time of day, distance travelled, and pick-up location. All of the R code used in this tutorial is wrapped in stored procedures that you create and run in Management Studio.

Note

This tutorial is available in both R and Python. For the Python version, see In-database analytics for Python developers.

Overview

The process of building a machine learning solution is a complex one that can involve multiple tools, and the coordination of subject matter experts across several phases:

  • obtaining and cleaning data
  • exploring the data and building features useful for modeling
  • training and tuning the model
  • deployment to production

Development and testing of the actual code is best performed using a dedicated development environment. However, after the script is fully tested, you can easily deploy it to [!INCLUDEssNoVersion] using [!INCLUDEtsql] stored procedures in the familiar environment of [!INCLUDEssManStudio].

Whether you are a SQL programmer new to R, or an R developer new to SQL, this multi-part tutorial is introduces a typical workflow for conducting in-database analytics with R and SQL Server.

After the model has been saved to the database, call the model for predictions from [!INCLUDEtsql] by using stored procedures.

Prerequisites

All tasks can be done using [!INCLUDEtsql] stored procedures in [!INCLUDEssManStudio].

This tutorial assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know R. As such, all R code is provided.

Next steps

[!div class="nextstepaction"] Set up the NYC Taxi database