Facing Performance issue at application end, after migrating the backups to Azure recovery service vault

 ISSUE DESCRIPTION

Facing Performance issue at application end, after migrating the backups to Azure recovery service vault (RSV) from Azure Blob storage.

 SUMMARY

Once backup process started, we could see severe IO latency. The increased IO activity could be due to all backups starting at the same time at 12:30 and reading from the drive. 

Earlier, when SQL agent-based BACKUP TO URL was running, it would be working with one database backup at a time which might not put high stress on disk, but Azure RSV is designed to take backup of all databases at the same time (For quick backup, to save time) which can invite considerable IO load on the disks. We checked internally and there are no options to change this behavior for Azure RSV.

In Azure blob storage when the backups are running, backup operation is performed serial manner that means server will take the single database backup at a time.

But when it comes to Azure RSV, the database backup operation is performed parallel manner that means it will trigger all the database backups at a time.a

So, when the multiple backups are running parallelly on the server. There will be huge impact on the IO resources due to that application team will face the slowness issue while running the queries or application jobs.

ACTION Needs to be taken from DBA end:

Redirect your backups to secondary replica (even if DB is not set for READ load on secondary, backup will work) which would reduce the IO latency on PRIMARY. If AG is in SYNC COMMIT mode, these intensive backups happening on secondary can add a little overhead to the DML operations on PRIMARY.

https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/configure-backup-on-availability-replicas-sql-server?view=sql-server-ver16

You can check whether your VM is configured as per best practices (especially the Storage and VM sizing) as per the document below

https://learn.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/performance-guidelines-best-practices-checklist?view=azuresql

You can switch back to BACKUP TO URL method as you were doing earlier to mitigate the situation.

Check the AG configuration best practices.

As you suggested we are looking to check whether the PRIMARY and SECONDARY machines of the AG follow the best practices shown in the document below.


https://learn.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/performance-guidelines-best-practices-checklist?view=azuresql#vm-size                                                                                                        


You can also to enable resource governor on the database server so that we could assign limited resources throughout to the backup process.


You can refer to the RESOURCE GOVERNOR configuration document in MS site to configure the resource governor related components (Resource pool, Workload group and classifier function)

Link - https://learn.microsoft.com/en-us/sql/relational-databases/resource-governor/resource-governor?view=sql-server-ver16

Link - https://learn.microsoft.com/en-us/sql/relational-databases/resource-governor/resource-governor-resource-pool?view=sql-server-ver16#min_iops_per_volume-and-max_iops_per_volume

 In this RESOURCE POOL configuration, you can set the MIN_IOPS_PER_VOLUME & MAX_IOPS_PER_VOLUME setting to restrict the IO. 

**************

CREATE RESOURCE POOL BACKUPPOOL1

WITH (MIN_IOPS_PER_VOLUME = 100, MAX_IOPS_PER_VOLUME = 250);  

GO

ALTER RESOURCE GOVERNOR RECONFIGURE;

GO

************** 

To know the right value for it, during off business hours:

Set it to a value say Min100-Max250, start the backup and check if it shows high amount of PAGEIOLATCH in the waits. 

If yes, reconfigure RESOURCE POOL to set only the MAX value… Mention the MIN value as 80 and MAX value as 100 and retry. Repeat the same operation until you do not see much PAGEIOLATCH waits.

If No, then you can use this setting for backups and increase the MAX little by little to reduce the time of backups until you start witnessing PAGEIOLATCH waits.

 


Comments

Popular posts from this blog

SQL Server DBA scripts for day-to-day tasks

SQL Server Dynamic management views (DMV's)

A brief comparison of Microsoft SQL Server versions from SQL Server 2012 to SQL Server 2022