Skip to content

Latest commit

 

History

History
73 lines (45 loc) · 5.11 KB

File metadata and controls

73 lines (45 loc) · 5.11 KB
title In-database R analytics for SQL developers (tutorial)| Microsoft Docs
ms.prod sql
ms.technology machine-learning
ms.date 06/07/2018
ms.topic tutorial
author HeidiSteen
ms.author heidist
manager cgronlun

In-database R analytics for SQL developers (tutorial)

[!INCLUDEappliesto-ss-xxxx-xxxx-xxx-md-winonly]

The goal of this tutorial is to provide SQL programmers with hands-on experience building a machine learning solution in SQL Server. In this tutorial, you'll learn how to incorporate R into an application or BI solution by wrapping R code in stored procedures.

Note

The same solution is available in Python. SQL Server 2017 is required. See In-database analytics for Python developers

Overview

The process of building an end to end solution typically consists of obtaining and cleaning data, data exploration and feature engineering, model training and tuning, and finally deployment of the model in production. Development and testing of the actual code is best performed using a dedicated development environment. For R, that might mean RStudio or [!INCLUDErtvs-short].

However, after the solution has been created, you can easily deploy it to [!INCLUDEssNoVersion] using [!INCLUDEtsql] stored procedures in the familiar environment of [!INCLUDEssManStudio].

In this tutorial, we assume that you have been given all the R code needed for the solution, and focus on building and deploying the solution using SQL Server.

Scenario

This tutorial uses a well-known public dataset, based on trips in New York city taxis. To make the sample code run quicker, we created a representative 1% sampling of the data. You'll use this data to build a binary classification model that predicts whether a particular trip is likely to get a tip or not, based on columns such as the time of day, distance, and pick-up location.

Requirements

This tutorial assumes familiarity with basic database operations such as creating databases and tables, importing data, and writing SQL queries. It does not assume you know R. As such, all R code is provided. A skilled SQL programmer can use a supplied PowerShell script, sample data on GitHub, and [!INCLUDE [tsql](../../ includes /tsql-md.md)] in [!INCLUDE [ssManStudioFull](../../ includes / ssmanstudiofull-md.md) to complete this example.

Before starting the tutorial:

Note

We recommend that you do not use [!INCLUDEssManStudioFull] to write or test R code. If the code that you embed in a stored procedure has any problems, the information that is returned from the stored procedure is usually inadequate to understand the cause of the error.

For debugging, we recommend you use a tool such as [!INCLUDErtvs-short], or RStudio. The R scripts provided in this tutorial have already been developed and debugged using traditional R tools.

Next lesson

Lesson 1: Download the sample data