title

Data Science Deep Dive: Using the RevoScaleR Packages | Microsoft Docs

ms.custom

ms.date

05/18/2017

ms.prod

sql-server-2016

ms.reviewer

ms.suite

ms.technology

r-services

ms.tgt_pltfrm

ms.topic

article

applies_to

SQL Server 2016

dev_langs

ms.assetid

c2efb3f2-cad5-4188-b889-15d68b742ef5

caps.latest.revision

author

jeannt

ms.author

jeannt

manager

jhubbard

Data Science Deep Dive: Using the RevoScaleR Packages

This tutorial demonstrates how to use the enhanced R packages provided in [!INCLUDErsql_productname] to work with SQL Server data and create scalable R solutions, by using the server as a compute context for high-performance big data analytics.

You will learn how to create a remote compute context, move data between local and remote compute contexts, and execute R code on a remote SQL Server. You will also learn how to analyze and plot data both locally and on the remote server, and how to create and deploy models.

Note

This tutorial works specifically with SQL Server data on Windows, and uses in-database compute contexts. If you want to use R in other contexts, such as Teradata, Linux, or Hadoop, see these Microsoft R Server tutorials:

Overview

To illustrate the flexibility and processing power of the ScaleR packages, in this tutorial you'll move data and swap compute contexts frequently.

Data is initially obtained from CSV files or XDF files. You'll import the data into [!INCLUDEssNoVersion] using the functions in the RevoScaleR package.
Model training and scoring will be performed in the [!INCLUDEssNoVersion] compute context. You'll create new [!INCLUDEssNoVersion] tables, using the rx functions, to save your scoring results.
You'll create plots both on the server and in the local compute context.
To train the model, you will use data already stored in a [!INCLUDEssNoVersion] database. All computations will be performed on the [!INCLUDEssNoVersion] instance.
You'll extract a subset of data and save it as an XDF file for re-use in analysis on your local workstation.
New data used during the scoring process is extracted from the [!INCLUDEssNoVersion] database using an ODBC connection. All computations are performed on the local workstation.
Finally, you'll perform a simulation based on a custom R function, using the server compute context.

Get Started Now

This tutorial takes about 75 minutes to complete, not including setup.

Target Audience

This tutorial is intended for data scientists or for people who are already somewhat familiar with R and data science tasks including exploration, statistical analysis, and model tuning. However, all the code is provided, so even if you are new to R, you can easily run the code and follow along, assuming you have the required server and client environments.

You should also be comfortable with [!INCLUDEtsql] syntax and know how to access a [!INCLUDEssNoVersion] database using [!INCLUDEssManStudioFull] or other database tools, such as Visual Studio.

Tip

Save your R workspace between lessons, so that you can easily pick up where you left off.

Prerequisites

SQL Server with support for R

Install SQL Server 2016 and enable R Services (in-Database). Or, install SQL Server 2017 and enable Machine Learning Services and choose the R language. The setup process is described in SQL Server 2016 Books Online.
Database permissions

To run the queries used to train the model, you must have db_datareader privileges on the database where the data is stored. To run R, your user must have the permission, EXECUTE ANY EXTERNAL SCRIPT.
Data science development computer

You must also install the RevoScaleR packages and related providers in your R development environment. The easiest way to do this is to install Microsoft R Client or Microsoft R Server (Standalone). For more information, see Set Up a Data Science Client

[!NOTE] Other versions of Revolution R Enterprise or Revolution R Open are not supported.

An open source distribution of R, such as R 3.2.2, will not work in this tutorial, because only the RevoScaleR functions can use remote compute contexts.
Additional R Packages

For this tutorial, you will need to install the following packages: dplyr, ggplot2, ggthemes, reshape2, and e1071. Instructions are provided as part of the tutorial.

All packages must be installed in two places: on the computer you use for R solution development, and on the SQL Server computer where R scripts will be run. If you do not have permission to install packages on the server computer, ask an administrator. Do not install the packages to a user library. It is important that the packages be installed in the R package library that is used by the SQL Server instance.

For more information, see Prerequisites for Data Science Walkthroughs.

Next Step

Work with SQL Server Data using R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Science Deep Dive: Using the RevoScaleR Packages

Overview

Get Started Now

Target Audience

Prerequisites

Next Step

FilesExpand file tree

deepdive-data-science-deep-dive-using-the-revoscaler-packages.md

Latest commit

History

deepdive-data-science-deep-dive-using-the-revoscaler-packages.md

File metadata and controls

Data Science Deep Dive: Using the RevoScaleR Packages

Overview

Get Started Now

Target Audience

Prerequisites

Next Step