Skip to content

Commit 45d352a

Browse files
Merge pull request #35093 from MashaMSFT/fixes
Clarifying joining AG & SQL MI failover group updates
2 parents 19828d1 + f123269 commit 45d352a

5 files changed

Lines changed: 61 additions & 59 deletions

File tree

azure-sql/includes/failover-group-overview.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ To achieve full business continuity, adding regional database redundancy is only
2525
Failover groups support two failover policies:
2626

2727
- **Customer managed (recommended)** - Customers can perform a failover of a group when they notice an unexpected outage impacting one or more databases in the failover group. When using command line tools such as PowerShell, the Azure CLI, or the Rest API, the failover policy value for customer managed is `manual`.
28-
- **Microsoft managed** - In the event of a widespread outage that impacts a primary region, Microsoft initiates failover of all impacted failover groups that have their failover policy configured to be Microsoft-managed. Microsoft managed failover won't be initiated for individual failover groups or a subset of failover groups in a region. When using command line tools such as PowerShell, the Azure CLI, or the Rest API, the failover policy value for Microsoft-managed is `automatic`.
28+
- **Microsoft managed** - In the event of a widespread outage that impacts a primary region, Microsoft initiates failover of *all* impacted failover groups that have their failover policy configured to be Microsoft-managed. Microsoft managed failover won't be initiated for individual failover groups or a subset of failover groups in a region. When using command line tools such as PowerShell, the Azure CLI, or the Rest API, the failover policy value for Microsoft-managed is `automatic`.
2929

3030
Each failover policy has a unique set of use cases and corresponding expectations on the failover scope and data loss, as the following table summarizes:
3131

@@ -36,7 +36,7 @@ Each failover policy has a unique set of use cases and corresponding expectation
3636

3737
### Customer managed
3838

39-
On rare occasions, the built-in [availability or high availability](../database/high-availability-sla-local-zone-redundancy.md) isn't enough to mitigate an outage, and your databases in a failover group might be unavailable for a duration that isn't acceptable to the service level agreement (SLA) of the applications using the databases. Databases can be unavailable due to a localized issue impacting just a few databases, or it could be at the datacenter, availability zone, or region level. In any of these cases, to restore business continuity, you can initiate a forced failover.
39+
On rare occasions, the built-in [availability or high availability](../database/high-availability-sla-local-zone-redundancy.md) isn't enough to mitigate an outage, and your databases in a failover group might be unavailable for a duration that isn't acceptable to the service level agreement (SLA) of the applications that use the databases. Databases can be unavailable due to a localized issue impacting just a few databases, or it could be at the datacenter, availability zone, or region level. In any of these cases, to restore business continuity, you can initiate a forced failover.
4040

4141
_Setting your failover policy to customer managed is highly recommended_, as it keeps you in control of when to initiate a failover and restore business continuity. You can initiate a failover when you notice an unexpected outage impacting one or more databases in the failover group.
4242

@@ -50,17 +50,17 @@ With a Microsoft managed failover policy, disaster recovery responsibility is de
5050
When these conditions are met, the Azure SQL service initiates forced failovers for all failover groups in the region that have the failover policy set to Microsoft managed.
5151

5252
> [!IMPORTANT]
53-
> Use customer managed failover policy to test and implement your disaster recovery plan. **Do not** rely on Microsoft managed failover, which might only be executed by Microsoft in extreme circumstances.
54-
> A Microsoft managed failover would be initiated for all failover groups in the region that have failover policy set to Microsoft managed. It can't be initiated for individual failover group. If you need the ability to selectively failover your failover group, use customer managed failover policy.
53+
> Use the customer managed failover policy to test and implement your disaster recovery plan. **Do not** rely on Microsoft managed failover, which might only be executed by Microsoft in extreme circumstances.
54+
> A Microsoft managed failover is initiated for all failover groups in the region that have their failover policy set to Microsoft managed. It can't be initiated for individual failover groups. If you need to selectively fail over your failover group, use customer managed failover policy.
5555
5656
Set the failover policy to Microsoft managed only when:
5757

5858
- You want to delegate disaster recovery responsibility to the Azure SQL service.
5959
- The application is tolerant to your database being unavailable for at least one hour or more.
60-
- It's acceptable to trigger forced failovers some time after the grace period expires as the actual time for the forced failover can vary significantly.
60+
- It's acceptable to trigger forced failovers some time after the grace period expires, as the actual time for the forced failover can vary significantly.
6161
- It's acceptable that all databases within the failover group fail over, regardless of their zone redundancy configuration or availability status. Although databases configured for zone redundancy are resilient to zonal failures and might not be impacted by an outage, they'll still be failed over if they're part of a failover group with a Microsoft managed failover policy.
6262
- It's acceptable to have forced failovers of databases in the failover group without taking into consideration the application's dependency on other Azure services or components used by the application, which can cause performance degradation or unavailability of the application.
6363
- It's acceptable to incur an unknown amount of data loss, as the exact time of forced failover can't be controlled, and ignores the synchronization status of the secondary databases.
64-
- All the primary and secondary database(s) in the failover group and any geo replication relationships have the same service tier, compute tier (provisioned or serverless) & compute size (DTUs or vCores). If the service level objective (SLO) of all the databases don't match, then the failover policy will be eventually updated from Microsoft Managed to Customer Managed by Azure SQL service.
64+
- The primary and secondary replicas in the failover group have the same service tier, compute tier, and compute size.
6565

6666
When a failover is triggered by Microsoft, an entry for the operation name **Failover Azure SQL failover group** is added to the [Azure Monitor activity log](/azure/azure-monitor/essentials/activity-log). The entry includes the name of the failover group under **Resource**, and **Event initiated by** displays a single hyphen (-) to indicate the failover was initiated by Microsoft. This information can also be found on the **Activity log** page of the new primary server or instance in the Azure portal.

azure-sql/managed-instance/doc-changes-updates-known-issues.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn about the currently known issues with Azure SQL Managed Insta
55
author: MashaMSFT
66
ms.author: mathoma
77
ms.reviewer: randolphwest, mathoma
8-
ms.date: 06/09/2025
8+
ms.date: 08/25/2025
99
ms.service: azure-sql-managed-instance
1010
ms.subservice: service-overview
1111
ms.topic: troubleshooting-known-issue
@@ -28,7 +28,7 @@ This article lists the currently known issues with [Azure SQL Managed Instance](
2828
| [Error 8992 when running DBCC CHECKDB on a SQL Server database that originated from SQL Managed Instance](#error-8992-when-running-dbcc-checkdb-on-a-sql-server-database-that-originated-from-sql-managed-instance) | March 2025 | Has workaround | |
2929
| [Differential backups aren't taken when an instance is linked to SQL Server](#differential-backups-arent-taken-when-an-instance-is-linked-to-sql-server) | Sept 2024 | By design | |
3030
| [List of long-term backups in Azure portal shows backup files for active and deleted databases with the same name](#list-of-long-term-backups-in-azure-portal-shows-backup-files-for-active-and-deleted-databases-with-the-same-name) | Mar 2024 | Has Workaround | |
31-
| [Temporary instance inaccessibility using the failover group listener during scaling operation](#temporary-instance-inaccessibility-using-the-failover-group-listener-during-scaling-operation) | Jan 2024 | No resolution | |
31+
| [Temporary instance inaccessibility using the failover group listener during scaling operation](#temporary-instance-inaccessibility-using-the-failover-group-listener-during-scaling-operation) | Jan 2024 | Resolved | April 2025 |
3232
| [The event_file target of the system_health event session is not accessible](#the-event_file-target-of-the-system_health-event-session-is-not-accessible) | Dec 2023 | Resolved | May 2025 |
3333
| [Procedure sp_send_dbmail might fail when @query parameter is used on Nov22FW enabled managed instances](#procedure-sp_send_dbmail-may-fail-when-query-parameter-is-used-on-nov22fw-enabled-managed-instances) | Dec 2023 | Has Workaround | |
3434
| [Increased number of system logins used for transactional replication](#increased-number-of-system-logins-used-for-transactional-replication) | Dec 2022 | No resolution | |
@@ -213,7 +213,11 @@ A DNS record of `<name>.database.windows.com` is created when you create a [logi
213213

214214
In some circumstances, there might exist an issue with Service Principal used to access Microsoft Entra ID ([formerly Azure Active Directory](/entra/fundamentals/new-name)) and Azure Key Vault (AKV) services. As a result, this issue impacts usage of Microsoft Entra authentication and transparent data encryption (TDE) with SQL Managed Instance. This might be experienced as an intermittent connectivity issue, or not being able to run statements such are `CREATE LOGIN/USER FROM EXTERNAL PROVIDER` or `EXECUTE AS LOGIN/USER`. Setting up TDE with customer-managed key on a new Azure SQL Managed Instance might also not work in some circumstances.
215215

216-
**Workaround**: To prevent this issue from occurring on your SQL Managed Instance, before executing any update commands, or in case you have already experienced this issue after update commands, go to the **Overview page** of your SQL managed instance in the Azure portal. Under **Settings**, select **Microsoft Entra ID** to access the SQL Managed Instance [Microsoft Entra ID admin page](../database/authentication-aad-configure.md#azure-sql-managed-instance). Verify if you can see the error message "Managed Instance needs a Service Principal to access Microsoft Entra ID. Click here to create a Service Principal". In case you've encountered this error message, select it, and follow the step-by-step instructions provided until this error has been resolved.
216+
**Workaround**: To prevent this issue from occurring on your SQL Managed Instance, before executing any update commands, or in case you have already experienced this issue after update commands, go to the **Overview page** of your SQL managed instance in the Azure portal. Under **Settings**, select **Microsoft Entra ID** to access the SQL Managed Instance [Microsoft Entra ID admin page](../database/authentication-aad-configure.md#azure-sql-managed-instance). Verify if you can see the error message:
217+
218+
`Managed Instance needs a Service Principal to access Microsoft Entra ID. Click here to create a Service Principal`.
219+
220+
In case you've encountered this error message, select it, and follow the step-by-step instructions provided until this error has been resolved.
217221

218222
### SQL Agent roles need explicit EXECUTE permissions for non-sysadmin logins
219223

@@ -362,8 +366,13 @@ The `tempdb` database is always split into 12 data files, and the file structure
362366

363367
Error logs that are available in SQL Managed Instance aren't persisted, and their size isn't included in the maximum storage limit. Error logs might be automatically erased if failover occurs. There might be gaps in the error log history because SQL Managed Instance was moved several times on several virtual machines.
364368

369+
370+
## Resolved
371+
365372
### Temporary instance inaccessibility using the failover group listener during scaling operation
366373

374+
**(Resolved in April 2025)**
375+
367376
Scaling managed instance sometimes requires moving the instance to a different virtual cluster, along with the associated service-maintained DNS records. If the managed instance participates in a failover group, the DNS record corresponding to its associated failover group listener (read-write listener, if the instance is the current geo-primary read-only listener, if the instance is the current geo-secondary) is moved to the new virtual cluster.
368377

369378
In the current scaling operation design, the listener DNS records are removed from the originating virtual cluster before the managed instance itself is fully migrated to the new virtual cluster, which in some situations can lead to prolonged time during which the instance's IP address can't be resolved using the listener. During this time, a SQL client attempting to access the instance being scaled using the listener endpoint can expect login failures with the following error message:
@@ -374,7 +383,7 @@ Error 40532: Cannot open server "xxx.xxx.xxx.xxx" requested by the login. The lo
374383

375384
The issue will be addressed through scaling operation redesign.
376385

377-
## Resolved
386+
378387

379388
<a id="msdb-table-for-manual-backups-doesnt-preserve-the-username"></a>
380389

@@ -438,7 +447,7 @@ The `@query` parameter in the [sp_send_db_mail](/sql/relational-databases/system
438447

439448
The **Active Directory admin** page of Azure portal for Azure SQL Managed Instance might show the following error message, even though Service Principal already exists:
440449

441-
"Managed Instance needs a Service Principal to access Microsoft Entra ID ([formerly Azure Active Directory](/entra/fundamentals/new-name)). Click here to create a Service Principal"
450+
`Managed Instance needs a Service Principal to access Microsoft Entra ID ([formerly Azure Active Directory](/entra/fundamentals/new-name)). Click here to create a Service Principal`
442451

443452
You can neglect this error message if Service Principal for the managed instance already exists, and/or Microsoft Entra authentication on the managed instance works.
444453

0 commit comments

Comments
 (0)