| title | Move Data between SQL Server and XDF File| Microsoft Docs | |
|---|---|---|
| ms.custom | ||
| ms.date | 05/18/2017 | |
| ms.prod | sql-server-2016 | |
| ms.reviewer | ||
| ms.suite | ||
| ms.technology |
|
|
| ms.tgt_pltfrm | ||
| ms.topic | article | |
| applies_to |
|
|
| dev_langs |
|
|
| ms.assetid | 40887cb3-ffbb-4769-9f54-c006d7f4798c | |
| caps.latest.revision | 17 | |
| author | jeannt | |
| ms.author | jeannt | |
| manager | jhubbard |
When you are working in a local compute context, you have access to both local data files and the [!INCLUDEssNoVersion] database (defined as an RxSqlServerData data source).
In this section, you'll learn how to get data and store it in a file on the local computer so that you can perform transformations on the data. When you're done, you'll use the data in the file to create a new [!INCLUDEssNoVersion] table, by using rxDataStep.
The rxImport function lets you import data from any supported data source to a local XDF file. Using a local file can be convenient if you want to do many different analyses on data that is stored in a [!INCLUDEssNoVersion] database, and you want to avoid running the same query over and over.
For this exercise, you'll use the credit card fraud data again. In this scenario, you've been asked to do some extra analysis on users in the states of California, Oregon, and Washington. To be more efficient, you've decided to store data for just these states on your local computer and work with the variables gender, cardholder, state, and balance.
-
Re-use the stateAbb vector you created earlier to identify the levels to include, and then print to the console the new variable, statesToKeep.
statesToKeep <- sapply(c("CA", "OR", "WA"), grep, stateAbb) statesToKeep
Results
CA OR WA 5 38 48 -
Now you'll define the data you want to bring over from SQL Server, using a [!INCLUDEtsql] query. Later you'll use this variable as the inData argument for rxImport.
importQuery <- paste("SELECT gender,cardholder,balance,state FROM", sqlFraudTable, "WHERE (state = 5 OR state = 38 OR state = 48)")
Make sure there are no hidden characters such as line feeds or tabs in the query.
-
Next, you'll define the columns to use when working with the data in R. For example, in the smaller data set, you need only three factor levels, because the query will return data for only three states. You can re-use the statesToKeep variable to identify the correct levels to include.
importColInfo <- list( gender = list( type = "factor", levels = c("1", "2"), newLevels = c("Male", "Female")), cardholder = list( type = "factor", levels = c("1", "2"), newLevels = c("Principal", "Secondary")), state = list( type = "factor", levels = as.character(statesToKeep), newLevels = names(statesToKeep)) )
-
Set the compute context to local, because you want all the data available on your local computer.
rxSetComputeContext("local") -
Create the data source object by passing all the variables that you just defined as arguments to RxSqlServerData.
sqlServerImportDS <- RxSqlServerData( connectionString = sqlConnString, sqlQuery = importQuery, colInfo = importColInfo)
-
Then, call rxImport to write the data to a file named
ccFraudSub.xdf, in the current working directory.localDS <- rxImport(inData = sqlServerImportDS, outFile = "ccFraudSub.xdf", overwrite = TRUE)
The localDs object returned by the rxImport function is a light-weight RxXdfData data source object that represents the ccFraud.xdf data file stored locally on disk.
-
Call rxGetVarInfo on the XDF file to verify that the data schema is the same.
rxGetVarInfo(data = localDS)
Results
rxGetVarInfo(data = localDS)
Var 1: gender, Type: factor, no factor levels available
Var 2: cardholder, Type: factor, no factor levels available
Var 3: balance, Type: integer, Low/High: (0, 22463)
Var 4: state, Type: factor, no factor levels available
-
You can now call various R functions to analyze the localDs object, just as you would with the source data on [!INCLUDEssNoVersion]. For example:
rxSummary(~gender + cardholder + balance + state, data = localDS)
Now that you've mastered the use of compute contexts and working with various data sources, it's time to try something fun. In the next and final lesson, you'll create a simple simulation using a custom R function and run it on the remote server.