| title | Lesson 5: Create a Simple Simulation (Data Science Deep Dive) | Microsoft Docs | |
|---|---|---|
| ms.custom | ||
| ms.date | 10/03/2016 | |
| ms.prod | sql-server-2016 | |
| ms.reviewer | ||
| ms.suite | ||
| ms.technology |
|
|
| ms.tgt_pltfrm | ||
| ms.topic | article | |
| applies_to |
|
|
| dev_langs |
|
|
| ms.assetid | f420b816-ddab-4a1a-89b9-c8285a2d33a3 | |
| caps.latest.revision | 16 | |
| author | jeannt | |
| ms.author | jeannt | |
| manager | jhubbard |
Until now you've been using R functions provided by SQL Server R Services that are designed specifically for moving data between [!INCLUDEssNoVersion] and a local compute context. However, suppose you write a custom R function of your own, and want to run it in the server context?
You can call an arbitrary function in the context of the [!INCLUDEssNoVersion] computer, by using the rxExec function. You can also use rxExec to explicitly distribute work across cores in a single server node.
In this lesson, you'll use the remote server to create a simple simulation. The simulation doesn't require any [!INCLUDEssNoVersion] data; the example only demonstrates how to design a custom function and then call it using the rxExec function.
For a more complex example of using rxExec, see this article: http://blog.revolutionanalytics.com/2015/04/coarse-grain-parallelism-with-foreach-and-rxexec.html
A common casino game consists of rolling a pair of dice, with these rules:
-
If you roll a 7 or 11 on your initial roll, you win.
-
If you roll 2, 3, or 12, you lose.
-
If you roll a 4, 5, 6, 8, 9, or 10, that number becomes your point and you continue rolling until you either roll your point again (in which case you win) or roll a 7, in which case you lose.
The game is easily simulated in R, by creating a custom function, and then running it many times.
-
Create the custom function using the following R code:
rollDice <- function() { result <- NULL point <- NULL count <- 1 while (is.null(result)) { roll <- sum(sample(6, 2, replace=TRUE)) if (is.null(point)) { point <- roll } if (count == 1 && (roll == 7 || roll == 11)) { result <- "Win" } else if (count == 1 && (roll == 2 || roll == 3 || roll == 12)) { result \<- "Loss" } else if (count > 1 && roll == 7 ) { result \<- "Loss" } else if (count > 1 && point == roll) { result <- "Win" } else { count <- count + 1 } } result }
-
To simulate a single game of dice, run the function.
rollDice()
Did you win or lose?
Now let's see how you can run the function multiple times, to create a simulation that helps determine the probability of a win.
To run an arbitrary function in the context of the [!INCLUDEssNoVersion] computer, you call the rxExec function. Although rxExec also supports distributed execution of a function in parallel across nodes or cores in a server context, here you'll use it just to run your custom function on the server.
-
Call the custom function as an argument to rxExec, along with some other parameters that modify the simulation.
sqlServerExec <- rxExec(rollDice, timesToRun=20, RNGseed="auto") length(sqlServerExec)
-
Use the timesToRun argument to indicate how many times the function should be executed. In this case, you roll the dice 20 times.
-
The arguments RNGseed and RNGkind can be used to control random number generation. When RNGseed is set to auto, a parallel random number stream is initialized on each worker.
-
-
The rxExec function creates a list with one element for each run; however, you won't see much happening until the list is complete. When all the iterations are complete, the line starting with
lengthwill return a value.You can then go to the next step to get a summary of your win-loss record.
-
Convert the returned list to a vector using R's unlist function, and summarize the results using the table function.
table(unlist(sqlServerExec))Your results should look something like this:
Loss Win
12 8
In this tutorial, you have become proficient with these tasks:
-
Getting [!INCLUDEssNoVersion] data to use in analyses
-
Creating and modifying data sources in R
-
Passing models, data, and plots between your workstation and the [!INCLUDEssNoVersion] server
Tip
If you would like to experiment with these techniques using a larger dataset of 10 million observations, the data files are available from http://packages.revolutionanalytics.com/datasets.
To re-use this walkthrough with the larger data files, just download the data and then modify the data sources as follows:
- Set the variables ccFraudCsv and ccScoreCsv to point to the new data files
- Change the name of the table referenced in sqlFraudTable to ccFraud10
- Change the name of the table referenced in sqlScoreTable to ccFraudScore10
Move Data between SQL Server and XDF File (Data Science Deep Dive)