Skip to content

Commit f0d85b3

Browse files
authored
Merge pull request #25364 from rwestMSFT/rw-1223-python
Fix Python script to address issue 8415
2 parents b277648 + 04ad2da commit f0d85b3

1 file changed

Lines changed: 51 additions & 33 deletions

File tree

docs/machine-learning/data-exploration/python-plot-histogram.md

Lines changed: 51 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,88 +4,106 @@ titleSuffix: SQL machine learning
44
description: Learn how to create a histogram to visualize data using Python.
55
author: WilliamDAssafMSFT
66
ms.author: wiassaf
7-
ms.date: 07/14/2020
8-
ms.topic: how-to
7+
ms.reviewer: randolphwest
8+
ms.date: 12/23/2022
99
ms.service: sql
1010
ms.subservice: machine-learning
11+
ms.topic: how-to
1112
monikerRange: ">=sql-server-2017||>=sql-server-linux-ver15||=azuresqldb-mi-current||=azuresqldb-current"
1213
---
14+
# Plot histograms in Python
1315

14-
# Plot histograms in Python
1516
[!INCLUDE[SQL Server SQL DB SQL MI](../../includes/applies-to-version/sql-asdb-asdbmi.md)]
1617

1718
This article describes how to plot data using the Python package [pandas'.hist()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.hist.html). A SQL database is the source used to visualize the histogram data intervals that have consecutive, non-overlapping values.
1819

19-
## Prerequisites:
20+
## Prerequisites
2021

2122
::: moniker range=">=sql-server-2017||>=sql-server-linux-ver15"
22-
* [SQL Server for Windows](../../database-engine/install-windows/install-sql-server.md) or [for Linux](../../linux/sql-server-linux-overview.md)
23+
- [SQL Server for Windows](../../database-engine/install-windows/install-sql-server.md) or [for Linux](../../linux/sql-server-linux-overview.md)
2324
::: moniker-end
2425

2526
::: moniker range="=azuresqldb-current"
26-
* [Azure SQL Database](/azure/sql-database/sql-database-get-started-portal)
27+
- [Azure SQL Database](/azure/sql-database/sql-database-get-started-portal)
2728
::: moniker-end
2829

2930
::: moniker range="=azuresqldb-mi-current"
30-
* [Azure SQL Managed Instance](/azure/azure-sql/managed-instance/instance-create-quickstart)
31+
- [Azure SQL Managed Instance](/azure/azure-sql/managed-instance/instance-create-quickstart)
3132

32-
* [SQL Server Management Studio](../../ssms/download-sql-server-management-studio-ssms.md) for restoring the sample database to Azure SQL Managed Instance.
33+
- [SQL Server Management Studio](../../ssms/download-sql-server-management-studio-ssms.md) for restoring the sample database to Azure SQL Managed Instance.
3334
::: moniker-end
3435

35-
* Azure Data Studio. To install, see [Azure Data Studio](../../azure-data-studio/what-is-azure-data-studio.md).
36+
- Azure Data Studio. To install, see [Azure Data Studio](../../azure-data-studio/what-is-azure-data-studio.md).
3637

37-
* [Restore sample DW database](../../samples/adventureworks-install-configure.md) to get sample data used in this article.
38+
- [Restore sample DW database](../../samples/adventureworks-install-configure.md) to get sample data used in this article.
3839

39-
## Verify restored Database
40+
## Verify restored database
4041

4142
You can verify that the restored database exists by querying the **Person.CountryRegion** table:
43+
4244
```sql
4345
USE AdventureWorksDW;
4446
SELECT * FROM Person.CountryRegion;
4547
```
46-
48+
4749
## Install Python packages
4850

4951
[Download and Install Azure Data Studio](../../azure-data-studio/download-azure-data-studio.md).
5052

5153
Install the following Python packages:
52-
* pyodbc
53-
* pandas
54+
- `pyodbc`
55+
- `pandas`
56+
- `sqlalchemy`
57+
- `matplotlib`
5458

55-
To install these packages:
59+
To install these packages:
5660

57-
1. In your Azure Data Studio notebook, select **Manage Packages**.
58-
2. In the **Manage Packages** pane, select the **Add new** tab.
59-
3. For each of the following packages, enter the package name, click **Search**, then click **Install**.
61+
1. In your Azure Data Studio notebook, select **Manage Packages**.
62+
1. In the **Manage Packages** pane, select the **Add new** tab.
63+
1. For each of the following packages, enter the package name, select **Search**, then select **Install**.
6064

6165
## Plot histogram
6266

63-
The distributed data displayed in the histogram is based on a SQL query from AdventureWorksDW. The histogram visualizes data and the frequency of data values.
64-
Edit the connection string variables: 'server', 'database', 'username', and 'password' to connect to SQL database.
67+
The distributed data displayed in the histogram is based on a SQL query from `AdventureWorksDW`. The histogram visualizes data and the frequency of data values.
68+
69+
Edit the connection string variables: 'server', 'database', 'username', and 'password' to connect to SQL Server database.
6570

6671
To create a new notebook:
6772

6873
1. In Azure Data Studio, select **File**, select **New Notebook**.
69-
2. In the notebook, select kernel **Python3**, select the **+code**.
70-
3. Paste code in notebook, select **Run All**.
74+
1. In the notebook, select kernel **Python3**, select the **+code**.
75+
1. Paste code in notebook, select **Run All**.
7176

7277
```python
7378
import pyodbc
74-
import pandas as plt
79+
import pandas as pd
80+
import matplotlib
81+
import sqlalchemy
82+
83+
from sqlalchemy import create_engine
84+
85+
matplotlib.use('TkAgg', force=True)
86+
from matplotlib import pyplot as plt
87+
7588
# Some other example server values are
7689
# server = 'localhost\sqlexpress' # for a named instance
7790
# server = 'myserver,port' # to specify an alternate port
78-
server = 'servername'
79-
database = 'AdventureWorksDW'
80-
username = 'yourusername'
81-
password = 'databasename'
82-
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
83-
cursor = cnxn.cursor()
91+
server = 'servername'
92+
database = 'AdventureWorksDW2019'
93+
username = 'yourusername'
94+
password = 'databasename'
95+
96+
url = 'mssql+pyodbc://{user}:{passwd}@{host}:{port}/{db}?driver=SQL+Server'.format(user=username, passwd=password, host=server, port=port, db=database)
97+
engine = create_engine(url)
98+
8499
sql = "SELECT DATEDIFF(year, c.BirthDate, GETDATE()) AS Age FROM [dbo].[FactInternetSales] s INNER JOIN dbo.DimCustomer c ON s.CustomerKey = c.CustomerKey"
85-
df = plt.read_sql(sql, cnxn)
86-
df.hist(bins=10)
100+
101+
df = pd.read_sql(sql, engine)
102+
df.hist(bins=50)
103+
104+
plt.show()
87105
```
88106

89-
The display shows the age distribution of customers in the FactInternetSales table.
107+
The display shows the age distribution of customers in the `FactInternetSales` table.
90108

91-
![Pandas Histogram](./media/python-histogram.png)
109+
:::image type="content" source="media/python-histogram.png" alt-text="Diagram showing the Pandas histogram distribution.":::

0 commit comments

Comments
 (0)