| title | View and explore the data using SQL (walkthrough)| Microsoft Docs |
|---|---|
| ms.prod | sql |
| ms.technology | machine-learning |
| ms.date | 04/15/2018 |
| ms.topic | tutorial |
| author | HeidiSteen |
| ms.author | heidist |
| manager | cgronlun |
[!INCLUDEappliesto-ss-xxxx-xxxx-xxx-md-winonly]
Data exploration is an important part of modeling data, and involves reviewing summaries of data objects to be used in the analyses, as well as data visualization. In this lesson, you explore the data objects and generate plots, using both [!INCLUDEtsql] and R functions included in [!INCLUDErsql_productname].
Then you generate plots to visualize the data, using new functions provided by packages installed with [!INCLUDErsql_productname].
Tip
Already an R maestro?
Now that you've downloaded all the data and prepared the environment, you are welcome to run the complete R script in RStudio or any other environment, and explore the functionality on your own. Just open the file RSQL_Walkthrough.R and highlight and run individual lines, or run the entire script as a demo.
To get additional explanations of the RevoScaleR functions, and tips for working with [!INCLUDEssNoVersion] data in R, continue with the tutorial. It uses exactly the same script.
First, take a minute to ascertain that your data was loaded correctly.
-
Connect to your [!INCLUDEssNoVersion] instance using your favorite database management tool, such as [!INCLUDEssNoVersion], Server Explorer in Visual Studio, or Visual Studio Code.
-
Select the database you created, and expand to see the new database, tables, and functions.
-
To verify that the data loaded correctly, right-click the table and select Select TOP 1000 rows. The menu option runs this query:
SELECT TOP 1000 * FROM [dbo].[nyctaxi_sample]
If you don't see any data in the table, refer to the Troubleshooting section in the previous topic.
-
This data table has been optimized for set-based calculations, by adding a columnstore index. Run this statement to generate a quick summary on the table.
SELECT DISTINCT [passenger_count] , ROUND (SUM ([fare_amount]),0) as TotalFares , ROUND (AVG ([fare_amount]),0) as AvgFares FROM [dbo].[nyctaxi_sample] GROUP BY [passenger_count] ORDER BY AvgFares DESC
In the next lesson, you'll generate some more complex summaries using R.