Skip to content

Latest commit

 

History

History
125 lines (99 loc) · 4.79 KB

File metadata and controls

125 lines (99 loc) · 4.79 KB
title Insert data from a SQL table into a Python pandas dataframe
titleSuffix SQL machine learning
description Learn how to read data from a SQL table and insert into a pandas dataframe using Python.
author WilliamDAssafMSFT
ms.author wiassaf
ms.date 07/23/2020
ms.service sql
ms.subservice machine-learning
ms.topic how-to
monikerRange >=sql-server-2017||>=sql-server-linux-ver15||=azuresqldb-mi-current||=azuresqldb-current

Insert data from a SQL table into a Python pandas dataframe

[!INCLUDESQL Server SQL DB SQL MI]

This article describes how to insert SQL data into a pandas dataframe using the pyodbc package in Python. The rows and columns of data contained within the dataframe can be used for further data exploration.

Prerequisites

::: moniker range=">=sql-server-2017||>=sql-server-linux-ver15"

::: moniker range="=azuresqldb-current"

::: moniker range="=azuresqldb-mi-current"

Verify restored database

You can verify that the restored database exists by querying the Person.CountryRegion table:

USE AdventureWorks;
SELECT * FROM Person.CountryRegion;

Install Python packages

Download and Install Azure Data Studio.

Install the following Python packages:

  • pyodbc
  • pandas

To install these packages:

  1. In your Azure Data Studio notebook, select Manage Packages.
  2. In the Manage Packages pane, select the Add new tab.
  3. For each of the following packages, enter the package name, click Search, then click Install.

Insert data

Use the following script to select data from Person.CountryRegion table and insert into a dataframe. Edit the connection string variables: 'server', 'database', 'username', and 'password' to connect to SQL.

To create a new notebook:

  1. In Azure Data Studio, select File, select New Notebook.
  2. In the notebook, select kernel Python3, select the +code.
  3. Paste code in notebook, select Run All.
import pyodbc
import pandas as pd
# Some other example server values are
# server = 'localhost\sqlexpress' # for a named instance
# server = 'myserver,port' # to specify an alternate port
server = 'servername' 
database = 'AdventureWorks' 
username = 'yourusername' 
password = 'databasename'  
cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
# select 26 rows from SQL table to insert in dataframe.
query = "SELECT [CountryRegionCode], [Name] FROM Person.CountryRegion;"
df = pd.read_sql(query, cnxn)
print(df.head(26))

Output

The print command in the preceding script displays the rows of data from the pandas dataframe df.

CountryRegionCode                 Name
0                 AF          Afghanistan
1                 AL              Albania
2                 DZ              Algeria
3                 AS       American Samoa
4                 AD              Andorra
5                 AO               Angola
6                 AI             Anguilla
7                 AQ           Antarctica
8                 AG  Antigua and Barbuda
9                 AR            Argentina
10                AM              Armenia
11                AW                Aruba
12                AU            Australia
13                AT              Austria
14                AZ           Azerbaijan
15                BS         Bahamas, The
16                BH              Bahrain
17                BD           Bangladesh
18                BB             Barbados
19                BY              Belarus
20                BE              Belgium
21                BZ               Belize
22                BJ                Benin
23                BM              Bermuda
24                BT               Bhutan
25                BO              Bolivia

Next steps