Skip to content

Commit 62864c3

Browse files
authored
Merge pull request #32328 from MicrosoftDocs/main
OOB Publish for PASS Summit
2 parents 6bd5c91 + abb949b commit 62864c3

10 files changed

Lines changed: 778 additions & 17 deletions

azure-sql/database/ai-artificial-intelligence-intelligent-applications.md

Lines changed: 35 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: "Use AI options such as OpenAI and vectors to build intelligent app
44
author: damauri
55
ms.author: damauri
66
ms.reviewer: damauri, josephsack, randolphwest, mathoma
7-
ms.date: 08/01/2024
7+
ms.date: 10/15/2024
88
ms.service: azure-sql-database
99
ms.topic: conceptual
1010
ms.collection: ce-skilling-ai-copilot
@@ -75,7 +75,7 @@ In Azure OpenAI, input text provided to the API is turned into tokens (tokenized
7575

7676
### Vectors
7777

78-
Vectors are ordered arrays of numbers (typically floats) that can represent information about some data. For example, an image can be represented as a vector of pixel values, or a string of text can be represented as a vector or ASCII values. The process to turn data into a vector is called *vectorization*.
78+
Vectors are ordered arrays of numbers (typically floats) that can represent information about some data. For example, an image can be represented as a vector of pixel values, or a string of text can be represented as a vector or ASCII values. The process to turn data into a vector is called *vectorization*. For more information, see [Vectors](#vectors-1).
7979

8080
### Embeddings
8181

@@ -89,7 +89,7 @@ Vector search refers to the process of finding all vectors in a dataset that are
8989

9090
Consider a scenario where you run a query over millions of document to find the most similar documents in your data. You can create embeddings for your data and query documents using Azure OpenAI. Then, you can perform a vector search to find the most similar documents from your dataset. However, performing a vector search across a few examples is trivial. Performing this same search across thousands, or millions, of data points becomes challenging. There are also trade-offs between exhaustive search and approximate nearest neighbor (ANN) search methods including latency, throughput, accuracy, and cost, all of which depends on the requirements of your application.
9191

92-
Since Azure SQL Database embeddings can be efficiently stored and queried using to columnstore index support, allowing exact nearest neighbor search with great performance, you don't have to decide between accuracy and speed: you can have both. Storing vector embeddings alongside the data in an integrated solution minimizes the need to manage data synchronization and accelerates your time-to-market for AI application development.
92+
Vectors in Azure SQL Database can be efficiently stored and queried, as described in the next sections, allowing exact nearest neighbor search with great performance. You don't have to decide between accuracy and speed: you can have both. Storing vector embeddings alongside the data in an integrated solution minimizes the need to manage data synchronization and accelerates your time-to-market for AI application development.
9393

9494
## Azure OpenAI
9595

@@ -123,7 +123,33 @@ For additional examples on using SQL Database and OpenAI, see the following arti
123123

124124
## Vectors
125125

126-
Although Azure SQL Database doesn't have a native **vector** type, a vector is nothing more than an ordered tuple, and relational databases are great at managing tuples. You can think of a tuple as the formal term for a row in a table.
126+
### Vector data type
127+
128+
In November 2024, the new **vector** data type was introduced in Azure SQL Database.
129+
130+
The dedicated **vector** type allows for efficient and optimized storing of vector data, and comes with a set of functions to help developers streamline vector and similarity search implementation. Calculating distance between two vectors can be done in one line of code using the new `VECTOR_DISTANCE` function. For more information on the [**vector** data type](/sql/t-sql/data-types/vector-data-type) and related functions, see [Overview of vectors in the SQL Database Engine](/sql/relational-databases/vectors/vectors-sql-server).
131+
132+
For example:
133+
134+
```sql
135+
CREATE TABLE [dbo].[wikipedia_articles_embeddings_titles_vector]
136+
(
137+
[article_id] [int] NOT NULL,
138+
[embedding] [vector](1536) NOT NULL,
139+
)
140+
GO
141+
142+
SELECT TOP(10)
143+
*
144+
FROM
145+
[dbo].[wikipedia_articles_embeddings_titles_vector]
146+
ORDER BY
147+
VECTOR_DISTANCE('cosine', @my_reference_vector, embedding)
148+
```
149+
150+
### Vectors in older versions of SQL Server
151+
152+
While older versions of SQL Server engine, up to and including SQL Server 2022, doesn't have a native **vector** type, a vector is nothing more than an ordered tuple, and relational databases are great at managing tuples. You can think of a tuple as the formal term for a row in a table.
127153

128154
Azure SQL Database also supports columnstore indexes and [batch mode execution](/sql/relational-databases/query-processing-architecture-guide#batch-mode-execution). A vector-based approach is used for batch mode processing, which means that each column in a batch has its own memory location where it's stored as a vector. This allows for faster and more efficient processing of data in batches.
129155

@@ -169,16 +195,18 @@ For an end-to-end sample to build a AI-enabled application using sessions abstra
169195

170196
### LangChain integration
171197

172-
LangChain is a well-known framework for developing applications powered by language models.
198+
LangChain is a well-known framework for developing applications powered by language models. For examples that show how LangChain can be used to create a Chatbot on your own data, see:
173199

174-
For an example that shows how LangChain can be used to create a Chatbot on your own data, see [Building your own DB Copilot for Azure SQL with Azure OpenAI GPT-4](https://devblogs.microsoft.com/azure-sql/building-your-own-db-copilot-for-azure-sql-with-azure-openai-gpt-4/).
200+
- [Build a chatbot on your own data in 1 hour with Azure SQL, Langchain and Chainlit](https://devblogs.microsoft.com/azure-sql/build-a-chatbot-on-your-own-data-in-1-hour-with-azure-sql-langchain-and-chainlit/): Build a chatbot using the RAG pattern on your own data using Langchain for orchestrating LLM calls and Chainlit for the UI.
201+
- [Building your own DB Copilot for Azure SQL with Azure OpenAI GPT-4](https://devblogs.microsoft.com/azure-sql/building-your-own-db-copilot-for-azure-sql-with-azure-openai-gpt-4/): Build a copilot-like experience to query your databases using natural language.
175202

176203
### Semantic Kernel integration
177204

178205
[Semantic Kernel is an open-source SDK](/semantic-kernel/overview/) that lets you easily build agents that can call your existing code. As a highly extensible SDK, you can use Semantic Kernel with models from OpenAI, Azure OpenAI, Hugging Face, and more! By combining your existing C#, Python, and Java code with these models, you can build agents that answer questions and automate processes.
179206

180-
- [Semantic Kernel & Kernel Memory - SQL Connector](https://github.com/kbeaugrand/SemanticKernel.Connectors.Memory.SqlServer) - Provides a connection to a SQL database for the Semantic Kernel for the memories.
207+
- [The ultimate chatbot?](https://devblogs.microsoft.com/azure-sql/the-ultimate-chatbot/): Build a chatbot on your own data using both NL2SQL and RAG patterns for the ultimate user experience.
181208
- [OpenAI Embeddings Sample](https://github.com/marcominerva/OpenAIEmbeddingSample): An example that shows how to use Semantic Kernel and Kernel Memory to work with embeddings in a .NET application using SQL Server as Vector Database.
209+
- [Semantic Kernel & Kernel Memory - SQL Connector](https://github.com/kbeaugrand/SemanticKernel.Connectors.Memory.SqlServer) - Provides a connection to a SQL database for the Semantic Kernel for the memories.
182210

183211
## Microsoft Copilot skills in Azure SQL Database
184212

azure-sql/database/doc-changes-updates-release-notes-whats-new.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: WilliamDAssafMSFT
66
ms.author: wiassaf
77
ms.reviewer: mathoma, randolphwest
88
ms.service: azure-sql-database
9-
ms.date: 10/16/2024
9+
ms.date: 10/22/2024
1010
ms.subservice: service-overview
1111
ms.topic: whats-new
1212
ms.custom:
@@ -59,6 +59,7 @@ The following table lists the features of Azure SQL Database that are currently
5959
| [Query editor in the Azure portal](query-editor.md) | The query editor in the portal allows you to run queries against your Azure SQL Database directly from the [Azure portal](https://portal.azure.com). |
6060
| [SQL Analytics](/azure/azure-monitor/insights/azure-sql) | Azure SQL Analytics is an advanced cloud monitoring solution for monitoring performance of all of your Azure SQL databases at scale and across multiple subscriptions in a single view. Azure SQL Analytics collects and visualizes key performance metrics with built-in intelligence for performance troubleshooting. |
6161
| [UNISTR (Transact-SQL)](/sql/t-sql/functions/unistr-transact-sql) | Azure SQL Database now supports the `UNISTR` T-SQL syntax for Unicode string literals. For more information, see [UNISTR (Transact-SQL)](/sql/t-sql/functions/unistr-transact-sql).|
62+
| [Vector data type (preview)](/sql/relational-databases/vectors/vectors-sql-server) | Working with vector data is now easier in Azure SQL Database with the introduction of a new [vector data type](/sql/t-sql/data-types/vector-data-type) and [functions](/sql/t-sql/functions/vector-functions-transact-sql). For more information, see [Intelligent applications with Azure SQL Database](ai-artificial-intelligence-intelligent-applications.md#vectors).|
6263
| [\|\|](/sql/t-sql/language-elements/string-concatenation-pipes-transact-sql) and [\|\|=](/sql/t-sql/language-elements/compound-assignment-pipes-transact-sql) syntax support | Azure SQL Database now supports [\|\| (String concatenation)](/sql/t-sql/language-elements/string-concatenation-pipes-transact-sql) and [\|\|= (Compound assignment)](/sql/t-sql/language-elements/compound-assignment-pipes-transact-sql) Transact-SQL syntax.|
6364

6465
## General availability (GA)
@@ -76,7 +77,7 @@ The following table lists features of Azure SQL Database that have been made gen
7677
| [Automatic backups on secondary replicas](automated-backups-overview.md#automatic-backups-on-secondary-replicas) | August 2024 | Mitigate the performance impact on your workload by taking automated backups from the non-readable secondary replica in the Business Critical service tier. |
7778
| [Database compatibility level 160 is now default](/sql/t-sql/statements/alter-database-transact-sql-compatibility-level?view=azuresqldb-current&preserve-view=true) | June 2024 | Database compatibility level 160 is now the default for new databases created in Azure SQL Database. For more information on this announcement, see [General availability: Database compatibility level 160 in Azure SQL Database](https://techcommunity.microsoft.com/t5/azure-sql-blog/general-availability-database-compatibility-level-160-in-azure/ba-p/4172039). |
7879
| [Hyperscale named replica zone redundant support](service-tier-hyperscale-replicas.md) | June 2024 | [Zone redundancy support for Hyperscale named replicas](https://aka.ms/ZRSupportForNRPreview) is now generally available. |
79-
| [License-free standby replica](standby-replica-how-to-configure.md) | May 2024 | Save on licensing costs by configuring your secondary database replica for disaster recovery standby. |
80+
| [License-free standby replica](standby-replica-how-to-configure.md) | May 2024 | Save on licensing costs by configuring your secondary database replica for disaster recovery standby. |
8081
| [Elastic jobs](elastic-jobs-overview.md) | April 2024 | [Elastic jobs, now generally available](https://techcommunity.microsoft.com/t5/azure-sql-blog/general-availability-elastic-jobs-in-azure-sql-database/ba-p/4087140), are the SQL Server Agent replacement for Azure SQL Database. Elastic jobs support Microsoft Entra ID authentication, private endpoints, management via REST APIs, Azure Alerts, and more new features since public preview began. |
8182
| [Maintenance window advance notifications](advance-notifications.md) | March 2024 | Advance notifications are now generally available for databases configured to use a nondefault [maintenance window](maintenance-window.md). |
8283
| [Azure SQL triggers for Azure Functions](/azure/azure-functions/functions-bindings-azure-sql-trigger) | March 2024 | Azure Functions supports function triggers for Azure SQL Database. |
@@ -93,6 +94,7 @@ Learn about significant changes to the Azure SQL Database documentation. For pre
9394

9495
| Changes | Details |
9596
| --- | --- |
97+
| **Vector data type (preview)** | Working with vector data is now easier in Azure SQL Database with the introduction of a new [vector data type](/sql/t-sql/data-types/vector-data-type) and [functions](/sql/t-sql/functions/vector-functions-transact-sql). For more information, see [Intelligent applications with Azure SQL Database](ai-artificial-intelligence-intelligent-applications.md#vectors).|
9698
| **Hyperscale single database increased maximum size** | The maximum single database size in Azure SQL Database Hyperscale has been increased from 100 TB to 128 TB. For more information, see [Blog: November 2024 Hyperscale enhancements](https://aka.ms/AAslnql).|
9799
| **Hyperscale increased log generation rate (preview)**| The transaction log generation rate in Azure SQL Database Hyperscale single databases is set to increase from 100 MB/s to 150 MB/s. The increased log generation rate is available as an opt-in preview feature. For more information and to opt-in to 150 MB/s, see [Blog: November 2024 Hyperscale enhancements](https://aka.ms/AAslnql).|
98100
| **Hyperscale continuous priming (preview)**| [Continuous priming](service-tier-hyperscale.md#buffer-pool-resilient-buffer-pool-extension-and-continuous-priming) is an innovative new feature is designed to optimize Hyperscale performance during failovers by priming secondary compute replicas. Continuous priming is currently in a gated preview. For more information and to opt-in to continuous priming, see [Blog: November 2024 Hyperscale enhancements](https://aka.ms/AAslnql).|
@@ -101,7 +103,7 @@ Learn about significant changes to the Azure SQL Database documentation. For pre
101103

102104
| Changes | Details |
103105
| --- | --- |
104-
| **Lower auto-pause delay for serverless** | Reduce costs by lowering the [auto-pause delay for serverless compute in Azure SQL Database](https://aka.ms/AAs7lpz). |
106+
| **Lower auto-pause delay for serverless** | Reduce costs by lowering the [auto-pause delay for serverless compute in Azure SQL Database](https://aka.ms/AAs7lpz). For more information, see [serverless compute tier](serverless-tier-overview.md).|
105107

106108
### September 2024
107109

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
title: Work with vectors in the SQL Database Engine
3+
description: How to create, manage, and search vectors in the SQL Database Engine.
4+
author: WilliamDAssafMSFT
5+
ms.author: wiassaf
6+
ms.reviewer: damauri, pookam, jovanpop, randolphwest
7+
ms.date: 10/22/2024
8+
ms.service: sql
9+
ms.topic: conceptual
10+
ms.custom:
11+
- intro-quickstart
12+
helpviewer_keywords:
13+
- "Vectors"
14+
- "Vectors, built-in support"
15+
monikerRange: "=azuresqldb-current"
16+
---
17+
# Overview of vectors in the SQL Database Engine
18+
19+
[!INCLUDE [Azure SQL Database](../../includes/applies-to-version/asdb.md)]
20+
21+
Vectors are ordered arrays of numbers (typically floats) that can represent information about some data. For example, an image can be represented as a vector of pixel values, or a string of text can be represented as a vector or ASCII values. The process to turn data into a vector is called vectorization.
22+
23+
## Embeddings
24+
25+
Embeddings are vectors that represent important features of data. Embeddings are often learned by using a deep learning model, and machine learning and AI models utilize them as features. Embeddings can also capture semantic similarity between similar concepts. For example, in generating an embedding for the words `person` and `human`, we would expect their embeddings (vector representation) to be similar in value since the words are also semantically similar.
26+
27+
Azure OpenAI features models to create embeddings from text data. The service breaks text out into tokens and generates embeddings using models pretrained by OpenAI. To learn more, see [Creating embeddings with Azure OpenAI](/azure/ai-services/openai/concepts/understand-embeddings).
28+
29+
Once embeddings are generated, they can be stored into a SQL Server database. This allows you to store the embeddings alongside the data they represent, and to perform vector search queries to find similar data points.
30+
31+
## Vector search
32+
33+
Vector search refers to the process of finding all vectors in a dataset that are similar to a specific query vector. Therefore, a query vector for the word `human` searches the entire dataset for similar vectors, and thus similar words: in this example it should find the word `person` as a close match. This closeness, or distance, is measured using a distance metric such as cosine distance. The closer vectors are, the more similar they are.
34+
35+
SQL Server provides built-in support for vectors via the **vector** data type. Vectors are stored in an optimized binary format but exposed as JSON arrays for convenience. Each element of the vector is stored using single-precision (4 bytes) floating-point value. Along with the data type there are dedicated functions to operate on vectors. For example, it's possible to find the distance between two vectors using the `VECTOR_DISTANCE` function. The function returns a scalar value with the distance between two vectors based on the distance metric you specify.
36+
37+
Since vectors are typically managed as arrays of floats, creating a vector can be done simply casting a JSON array to a **vector** data type. For example, the following code creates a vector from a JSON array:
38+
39+
```sql
40+
SELECT CAST('[1.0, -0.2, 30]' AS VECTOR(3)) AS vector;
41+
```
42+
43+
or, using implicit casting
44+
45+
```sql
46+
DECLARE @v VECTOR(3) = '[1.0, -0.2, 30]';
47+
SELECT @v;
48+
```
49+
50+
Same goes for converting a vector into a JSON array:
51+
52+
```sql
53+
DECLARE @v VECTOR(3) = '[1.0, -0.2, 30]';
54+
SELECT CAST(@v AS NVARCHAR(MAX)) AS vector;
55+
```
56+
57+
## Limitations
58+
59+
In the current preview casting to and from JSON data type is not supported yet. The workaround is to first convert from/to **NVARCHAR(MAX)** and then to/from JSON. For example, to convert a vector to a JSON type:
60+
61+
```sql
62+
DECLARE @v VECTOR(3) = '[1.0, -0.2, 30]';
63+
SELECT CAST(CAST(@v AS NVARCHAR(MAX)) AS JSON) AS j;
64+
```
65+
66+
and to convert from a JSON type to vector:
67+
68+
```sql
69+
DECLARE @j JSON = JSON_ARRAY(1.0, -0.2, 30)
70+
SELECT CAST(CAST(@j AS NVARCHAR(MAX)) AS VECTOR(3)) AS v;
71+
```
72+
73+
More details on how to use vectors in SQL Server can be found in the following articles:
74+
75+
- [Vector Data Types](../../t-sql/data-types/vector-data-type.md)
76+
- [Vector Functions](../../t-sql/functions/vector-functions-transact-sql.md)
77+
- [Azure SQL DB Vector Search Samples](https://github.com/Azure-Samples/azure-sql-db-vector-search)
78+
79+
## Related content
80+
81+
- [Intelligent applications with Azure SQL Database](/azure/azure-sql/database/ai-artificial-intelligence-intelligent-applications)

docs/t-sql/data-types/data-types-transact-sql.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -110,16 +110,17 @@ In [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)], based on their st
110110

111111
### Other data types
112112

113-
- [cursor](cursor-transact-sql.md)
113+
- [cursor](cursor-transact-sql.md)
114114
- [geography](../spatial-geography/spatial-types-geography.md) <sup>1</sup>
115-
- [geometry](../spatial-geometry/spatial-types-geometry-transact-sql.md) <sup>1</sup>
115+
- [geometry](../spatial-geometry/spatial-types-geometry-transact-sql.md) <sup>1</sup>
116116
- [hierarchyid](hierarchyid-data-type-method-reference.md)
117117
- [json](json-data-type.md)
118-
- [rowversion](rowversion-transact-sql.md)
119-
- [sql_variant](sql-variant-transact-sql.md)
120-
- [table](table-transact-sql.md)
121-
- [uniqueidentifier](uniqueidentifier-transact-sql.md)
122-
- [xml](../xml/xml-transact-sql.md)
118+
- [vector](vector-data-type.md)
119+
- [rowversion](rowversion-transact-sql.md)
120+
- [sql_variant](sql-variant-transact-sql.md)
121+
- [table](table-transact-sql.md)
122+
- [uniqueidentifier](uniqueidentifier-transact-sql.md)
123+
- [xml](../xml/xml-transact-sql.md)
123124

124125
<sup>1</sup> The **geography** and **geometry** data types are *spatial types*.
125126

0 commit comments

Comments
 (0)