Skip to content

Latest commit

 

History

History
66 lines (48 loc) · 5.69 KB

File metadata and controls

66 lines (48 loc) · 5.69 KB
title Data Science End-to-End Walkthrough | Microsoft Docs
ms.custom
ms.date 03/17/2017
ms.prod sql-server-2016
ms.reviewer
ms.suite
ms.technology
r-services
ms.tgt_pltfrm
ms.topic article
applies_to
SQL Server 2016
dev_langs
R
ms.assetid edd76ae9-4125-45a8-bf42-47a85b9d9a32
caps.latest.revision 16
author jeannt
ms.author jeannt
manager jhubbard

Data Science End-to-End Walkthrough

In this walkthrough, you'll develop an end-to-end solution for predictive modeling using [!INCLUDErsql_productname].

This walkthrough is based on a popular set of public data, the New York City taxi dataset. You will use a combination of R code, [!INCLUDEssNoVersion] data, and custom SQL functions to build a classification model that indicates the probability that the driver will get a tip on a particular taxi trip. You'll also deploy your R model to [!INCLUDEssNoVersion] and use server data to generate scores based on the model.

This example can easily be extended to all kinds of real-life problems, such as predicting customer responses to sales campaigns, or predicting spending by visitors to events. Because the model can be invoked from a stored procedure, you can also easily embed it in an application.

Intended Audience

This walkthrough is intended for R or SQL developers. It provides an introduction into how R can be integrated into enterprise workflows using [!INCLUDErsql_productname]. You should be familiar with database operations, such as creating databases and tables, importing data, and creating queries using [!INCLUDEtsql].

  • You will be given the SQL and R scripts to execute.
  • No R coding is required; all scripts are provided.

Prerequisites

  • You must have access to an instance of SQL Server 2016, or an evaluation version of SQL Server 2017.
  • At least one instance on the SQL Server computer must have [!INCLUDErsql_productname] installed.
  • To run R commands, you'll need a separate computer that has an R IDE and the Microsoft R Open libraries. It can be a laptop or other networked computer, but it must be able to connect to the [!INCLUDEssNoVersion] instance.

For details about how to set up these server and client environments, see Prerequisites for Data Science Walkthroughs (SQL Server R Services).

What the Walkthrough Covers

Note that the estimated times do not include setup.

Lesson list Estimated time
Lesson 1: Prepare the Data (Data Science End-to-End Walkthrough)

Obtain the data used for building a model. Download a public dataset and load it into a [!INCLUDEssNoVersion] database.
30 minutes
Lesson 2: View and Explore the Data (Data Science End-to-End Walkthrough)

Explore the data, prepare it for modeling, and create new features. You'll use both SQL and R to explore the data and generate summaries.
20 minutes
Lesson 3: Create Data Features (Data Science End-to-End Walkthrough)

Perform feature engineering using custom functions in R and [!INCLUDEtsql]. Compare the performance of R and T-SQL for featurization tasks.
10 minutes
Lesson 4: Build and Save the Model (Data Science End-to-End Walkthrough)

Train and tune a predictive model. Assess model performance. This walkthrough creates a classification model. Plot the model's accuracy using R.
15 minutes
Lesson 5: Deploy and Use the Model (Data Science End-to-End Walkthrough)

Deploy the model in production by saving the model to a [!INCLUDEssNoVersion] database. Call the model from a stored procedure to generate predictions.
10 minutes

Notes

  • The walkthrough is designed to introduce R developers to [!INCLUDErsql_productname], so R is used whever possible. This does not mean that R is necessarily the best tool for each task. In many cases, [!INCLUDEssNoVersion] might provide better performance, particularly for tasks such as data aggregation and feature engineering. Such tasks can particularly benefit from new features in [!INCLUDEssCurrent], such as memory optimized columnstore indexes. We'll point out possible optimizations along the way.
  • The walkthrough was originally developed for and tested on SQL Server 2016. However, screenshots and procedures have been updated to use the latest version of SQL Server Management Studio, which works with SQL Server 2017.

Next Step

Lesson 1: Prepare the Data (Data Science End-to-End Walkthrough)