| title | Lesson 2: View and Explore the Data (Data Science End-to-End Walkthrough) | Microsoft Docs | |
|---|---|---|
| ms.custom | ||
| ms.date | 03/17/2017 | |
| ms.prod | sql-server-2016 | |
| ms.reviewer | ||
| ms.suite | ||
| ms.technology |
|
|
| ms.tgt_pltfrm | ||
| ms.topic | article | |
| applies_to |
|
|
| dev_langs |
|
|
| ms.assetid | d3835d6d-e68b-486d-81a0-81b717cc6134 | |
| caps.latest.revision | 32 | |
| author | jeannt | |
| ms.author | jeannt | |
| manager | jhubbard |
Data exploration is an important part of modeling data, and involves reviewing summaries of data objects to be used in the analyses, as well as data visualization. In this lesson, you'll explore the data objects and generate plots, using both [!INCLUDEtsql] and R functions included in [!INCLUDErsql_productname].
Then you will generate plots to visualize the data, using new functions provided by packages installed with [!INCLUDErsql_productname].
Tip
Already an R maestro?
Now that you've downloaded all the data and prepared the environment, you are welcome to run the complete R script in RStudio or any other environment, and explore the functionality on your own. Just open the file RSQL_Walkthrough.R and highlight and run individual lines, or run the entire script as a demo.
To get additional explanations of the RevoScaleR functions, and tips for working with [!INCLUDEssNoVersion] data in R, continue with the tutorial. It uses exactly the same script.
First, take a minute to ascertain that your data was loaded correctly.
-
Connect to your [!INCLUDEssNoVersion] instance using your favorite database management tool, such as [!INCLUDEssNoVersion], Server Explorer in Visual Studio, or VS Code.
-
Click on the database you created, and expand to see the new database, tables, and functions
-
To verify that the data was loaded correctly, right-click the table and select Select top 1000 rows to run this query:
SELECT TOP 1000 * FROM [dbo].[nyctaxi_sample]If you don't see any data in the table, refer to the Troubleshooting section in the previous topic.
-
This data table has been optimized for set-based calculations, by adding a columnstore index. For details about how that was done, review the script used to create the table,
create-db-tb-upload-data.sqlFor now, run this statement to generate a quick summary on the table.
SELECT DISTINCT [passenger_count] , ROUND (SUM ([fare_amount]),0) as TotalFares , ROUND (AVG ([fare_amount]),0) as AvgFares FROM [dbo].[nyctaxi_sample] GROUP BY [passenger_count] ORDER BY AvgFares DESCIn the next lesson, you'll generate some more complex summaries using R.
View and Summarize Data using R (Data Science End-to-End Walkthrough)
Create Graphs and Plots Using R (Data Science End-to-End Walkthrough)
Lesson 1: Prepare the Data (Data Science End-to-End Walkthrough)