| title | Create New SQL Server Table using rxDataStep| Microsoft Docs | |
|---|---|---|
| ms.custom | ||
| ms.date | 05/18/2017 | |
| ms.prod | sql-server-2016 | |
| ms.reviewer | ||
| ms.suite | ||
| ms.technology |
|
|
| ms.tgt_pltfrm | ||
| ms.topic | article | |
| applies_to |
|
|
| dev_langs |
|
|
| ms.assetid | 98cead96-6de7-4edf-98b9-a1efb09297b9 | |
| caps.latest.revision | 19 | |
| author | jeannt | |
| ms.author | jeannt | |
| manager | jhubbard | |
| ms.workload | Inactive |
In this lesson, you'll learn how to move data between in-memory data frames, the [!INCLUDEssNoVersion] context, and local files.
Note
For this lesson, you'll use a different data set. The Airline Delays dataset is a public dataset that is widely used for machine learning experiments. If you're just getting started with R, this dataset is handy to keep around for testing, as it is used in various product samples for [!INCLUDErsql_productname] that were published with [!INCLUDEssCurrent]. The data files you'll need for this example are available in the same directory as other product samples.
In the first part of this tutorial, you used the RxTextData function to import data into R from a text file, and then used the RxDataStep function to move the data into [!INCLUDEssNoVersion].
In this lesson, you'll use a different approach, and get data from a file saved in the XDF format. The XDF format is an XML standard developed for high-dimensional data. It is a binary file format with an R interface that optimizes row and column processing and analysis. You can use it for moving data and to store subsets of data that are useful for analysis.
After doing some lightweight transformations on the data using the XDF file, you'll save the transformed data into a new [!INCLUDEssNoVersion] table.
Note
You'll need DDL permissions for this step.
-
Set the compute context to the local workstation.
rxSetComputeContext("local") -
Define a new data source object using the RxXdfData function. For an XDF data source, you just specify the path to the data file. You could specify the path to the file using a text variable, but in this case, there's a handy shortcut, because the sample data file (AirlineDemoSmall.xdf) is in the directory returned by the rxGetOption function.
xdfAirDemo <- RxXdfData(file.path(rxGetOption("sampleDataDir"), "AirlineDemoSmall.xdf"))
-
Call rxGetVarInfo on the in-memory data to view a summary of the dataset.
rxGetVarInfo(xdfAirDemo)
Results
Var 1: ArrDelay, Type: integer, Low/High: (-86, 1490)
Var 2: CRSDepTime, Type: numeric, Storage: float32, Low/High: (0.0167, 23.9833)
Var 3: DayOfWeek 7 factor levels: Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Note
Did you notice that you did not need to call any other functions to load the data into the XDF file, and could call rxGetVarInfo on the data immediately? That's because XDF is the default interim storage method for RevoScaleR. For more information about XDF files, see Create an XDF.
-
Now, you'll put this data into a [!INCLUDEssNoVersion] table, storing DayOfWeek as an integer with values from 1 to 7.
To do this, first define a SQL Server data source.
sqlServerAirDemo <- RxSqlServerData(table = "AirDemoSmallTest", connectionString = sqlConnString)
-
Check whether a table with the same name already exists, and delete the table if it exists.
if (rxSqlServerTableExists("AirDemoSmallTest", connectionString = sqlConnString)) rxSqlServerDropTable("AirDemoSmallTest", connectionString = sqlConnString)
-
Create the table and load the data using rxDataStep. This function moves data between two already defined data sources and can transform the data en route.
rxDataStep(inData = xdfAirDemo, outFile = sqlServerAirDemo, transforms = list( DayOfWeek = as.integer(DayOfWeek), rowNum = .rxStartRow : (.rxStartRow + .rxNumRows - 1) ), overwrite = TRUE )
This is a fairly large table, so wait will you see the final status message: Rows Read: 200000, Total Rows Processed: 600000.
-
Set the compute context back to the [!INCLUDEssNoVersion] computer.
rxSetComputeContext(sqlCompute) -
Create a new SQL Server data source, using a simple SQL query on the new table. This definition adds factor levels for the DayOfWeek column, using the colInfo argumrnt to RxSqlServerData.
SqlServerAirDemo <- RxSqlServerData( sqlQuery = "SELECT * FROM AirDemoSmallTest", connectionString = sqlConnString, rowsPerRead = 50000, colInfo = list(DayOfWeek = list(type = "factor", levels = as.character(1:7))))
-
Call rxSummary once more to review a summary of the data in your query.
rxSummary(~., data = sqlServerAirDemo)