---
title: Transform data using RevoScaleR
description: "Learn about the RevoScaleR functions for transforming data at various stages of your analysis and how to transform data using the R language on SQL Server."
ms.service: sql
ms.subservice: machine-learning-services

ms.date: 11/27/2018  
ms.topic: tutorial
author: WilliamDAssafMSFT
ms.author: wiassaf
ms.custom: seo-lt-2019
monikerRange: ">=sql-server-2016||>=sql-server-linux-ver15"
---
# Transform data using R (SQL Server and RevoScaleR tutorial)
[!INCLUDE [SQL Server 2016 and later](../../includes/applies-to-version/sqlserver2016.md)]

This is tutorial 9 of the [RevoScaleR tutorial series](deepdive-data-science-deep-dive-using-the-revoscaler-packages.md) on how to use [RevoScaleR functions](/machine-learning-server/r-reference/revoscaler/revoscaler) with SQL Server.

In this tutorial, you'll learn about the **RevoScaleR** functions for transforming data at various stages of your analysis.

> [!div class="checklist"]
> * Use **rxDataStep** to create and transform a data subset
> * Use **rxImport** to transform in-transit data to or from an XDF file or an in-memory data frame during import

Although not specifically for data movement, the functions **rxSummary**, **rxCube**, **rxLinMod**, and **rxLogit** all support data transformations.

## Use rxDataStep to transform variables

The [rxDataStep](/machine-learning-server/r-reference/revoscaler/rxdatastep) function processes data one chunk at a time, reading from one data source and writing to another. You can specify the columns to transform, the transformations to load, and so forth.

To make this example interesting, let's use a function from another R package to transform the data. The **boot** package is one of the "recommended" packages, meaning that **boot** is included with every distribution of R, but is not loaded automatically on start-up. Therefore, the package should already be available on the [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance configured for R integration.

From the **boot** package, use the  function **inv.logit**, which computes the inverse of a logit. That is, the **inv.logit** function converts a logit back to a probability on the [0,1] scale.

> [!TIP] 
> Another way to  get predictions in this scale would be to set the *type* parameter to **response** in the original call to **rxPredict**.

1. Start by creating a data source to hold the data destined for the table, `ccScoreOutput`.
  
    ```R
    sqlOutScoreDS <- RxSqlServerData( table =  "ccScoreOutput",  connectionString = sqlConnString, rowsPerRead = sqlRowsPerRead )
    ```
  
2. Add another data source to hold the data for the table `ccScoreOutput2`.
  
    ```R
    sqlOutScoreDS2 <- RxSqlServerData( table =  "ccScoreOutput2",  connectionString = sqlConnString, rowsPerRead = sqlRowsPerRead )
    ```
  
    In the new table, store all the variables from the previous `ccScoreOutput` table, plus the newly created variable.
  
3. Set the compute context to the [!INCLUDE[ssNoVersion](../../includes/ssnoversion-md.md)] instance.
  
    ```R
    rxSetComputeContext(sqlCompute)
    ```
  
4. Use the function **rxSqlServerTableExists** to check whether the output table `ccScoreOutput2` already exists; and if so, use the function **rxSqlServerDropTable** to delete the table.
  
    ```R
    if (rxSqlServerTableExists("ccScoreOutput2"))     rxSqlServerDropTable("ccScoreOutput2")
    ```
  
5. Call the **rxDataStep** function, and specify the desired transforms in a list.
  
    ```R
    rxDataStep(inData = sqlOutScoreDS,
        outFile = sqlOutScoreDS2,
        transforms = list(ccFraudProb = inv.logit(ccFraudLogitScore)),
        transformPackages = "boot",
        overwrite = TRUE)
    ```

    When you define the transformations that are applied to each column, you can also specify any additional R packages that are needed to perform the transformations.  For more information about the types of transformations that you can perform, see [How to transform and subset data using RevoScaleR](/machine-learning-server/r/how-to-revoscaler-data-transform).
  
6. Call **rxGetVarInfo** to view a summary of the variables in the new data set.
  
```R
rxGetVarInfo(sqlOutScoreDS2)
```

**Results**

```R
Var 1: ccFraudLogitScore, Type: numeric
Var 2: state, Type: character
Var 3: gender, Type: character
Var 4: cardholder, Type: character
Var 5: balance, Type: integer
Var 6: numTrans, Type: integer
Var 7: numIntlTrans, Type: integer
Var 8: creditLine, Type: integer
Var 9: ccFraudProb, Type: numeric
```

The original logit scores are preserved, but a new column, *ccFraudProb*, has been added, in which the logit scores are represented as values between 0 and 1.

Notice that the factor variables have been written to the table `ccScoreOutput2` as character data. To use them as factors in subsequent analyses, use the parameter *colInfo* to specify the levels.

## Next steps

> [!div class="nextstepaction"]
> [Load data into memory using rxImport](../../machine-learning/tutorials/deepdive-load-data-into-memory-using-rximport.md)