Skip to content

Latest commit

 

History

History
94 lines (62 loc) · 5.78 KB

File metadata and controls

94 lines (62 loc) · 5.78 KB
title In-Database R Analytics for SQL Developers | Microsoft Docs
ms.custom
ms.date 04/28/2017
ms.prod sql-server-2016
ms.reviewer
ms.suite
ms.technology
r-services
ms.tgt_pltfrm
ms.topic article
applies_to
SQL Server 2016
dev_langs
R
TSQL
ms.assetid c18cb249-2146-41b7-8821-3a20c5d7a690
caps.latest.revision 15
author jeannt
ms.author jeannt
manager jhubbard

In-Database R Analytics for SQL Developers

The goal of this walkthrough is to provide SQL programmers with hands-on experience building a machine learning solution in SQL Server. In this walkthrough, you'll learn how to incorporate R into an application or BI solution by wrapping R code in stored procedures.

Note

The same solution is available in Python. SQL Server 2017 is required. See LINK.

Overview

The process of building an end to end solution typically consists of obtaining and cleaning data, data exploration and feature engineering, model training and tuning, and finally deployment of the model in production. Development and testing of the actual code is best performed using a dedicated development environment.For R, that might mean RStudio or [!INCLUDErtvs-short].

However, after the solution has been created, you can easily deploy it to [!INCLUDEssNoVersion] using [!INCLUDEtsql] stored procedures in the familiar environment of [!INCLUDEssManStudio].

In this walkthrough, we'll assume that you have been given all the R code needed for the solution, and you'll focus on building and deploying the solution using SQL Server.

Build and save the machine learning model, using stored procedures.

After the model has been saved to the database, call the model for prediction from [!INCLUDEtsql] by using stored procedures.

Note

We recommend that you do not use [!INCLUDEssManStudioFull] to write or test R code. If the code that you embed in a stored procedure has any problems, the information that is returned from the stored procedure is usually inadequate to understand the cause of the error.

For debugging, we recommend you use a tool such as RStudio or [!INCLUDErtvs-short]. The R scripts provided in this tutorial have already been developed and debugged using traditional R tools.

If you are interested in learning how to develop R scripts that can run in [!INCLUDEssCurrent], see this tutorial: Data Science End-to-End Walkthrough)

Scenario

This walkthrough uses the well-known NYC Taxi data set. To make this walkthrough easy and quick, we created a representative 1% sampling of the data. You'll use this data to build a binary classification model that predicts whether a particular trip is likely to get a tip or not, based on columns such as the time of day, distance, and pick-up location.

Requirements

This walkthrough is intended for users who are already familiar with fundamental database operations, such as creating databases and tables, importing data into tables, and creating SQL queries. All R code is provided, so no R development environment is required. An experienced SQL programmer should be able to complete this walkthrough by using [!INCLUDEtsql] in [!INCLUDEssManStudioFull] or by running the provided PowerShell scripts.

However, before starting the walkthrough, you must complete these preparations:

  • Connect to an instance of SQL Server 2016 with R Services, or SQL Server 2017 with Machine Learning Services and R enabled.
  • The login that you use for this walkthrough must have permissions to create databases and other objects, to upload data, select data, and run stored procedures.

Next Step

Step 1: Download the Sample Data

See Also

SQL Server R Services Tutorials

SQL Server R Services

Set up SQL Server Machine Learning Services