It is significant for mission-critical applications to run continuously, even if there are unplanned outages and potential errors. We know that Microsoft Azure guarantees high availability (99.9%) for Service Bus Queues and Topics to receive & send messages when appropriately configured.
Errors are bound to happen, but due to the design of Azure systems, issues tend to be short-lived. Nevertheless, many enterprises are still concerned and want to ensure the Service Bus handles their business-critical data to be always up and running. If you are among them, then this article is for you.
This article is indented to explain why the Service Bus may go unavailable due to component failure, server failure, or a faulty data centre network switch, rather than disasters like floods or earthquakes, where the user may lose the data permanently.
To handle the failures beforehand, you must first understand what can cause the Azure Service Bus to be unavailable. Below are the most common reasons:
- The Queue may go Disabled/Send Disabled/Receive Disabled state
- The Queue may accidentally get removed from the Service Bus namespace itself
- The Subscription might be expired where the Queue is present
- Throttling from an external system on which the Service Bus depends
- The message quota on the Queue may get exceeded
Now, let’s dive into the individual challenges and analyze the workarounds.
The Queue may be Disabled/Send Disabled/Receive Disabled State
Whenever there is temporary unavailability, or an outage happens due to some reasons like a server error, generally we see the entity become unavailable to applications which we represent in the following different ways:
- ‘Send Disabled’ – sending messages to the Queue is not possible
- ‘Disabled’ – the Queue will not be available for message send or receive operations
- ‘Receive Disabled’ – receiving messages from the Queue, other than peeking lock, is not possible
The Queue may Accidentally get Removed from the Service Bus Namespace
This scenario is likely to happen in enterprises where any team members may accidentally remove the Service Bus namespace itself. This scenario could potentially affect the business if not noticed beforehand by the support or operations team. The queue status will be “Unknown” and will not be available for any client applications’ operations.
The Subscription might be Expired where the Queue is Present
The expiration might happen due to the delay in renewing the Subscription or disabling the Subscription even when it is active, similar to the above scenario, which could happen accidentally. This situation can potentially affect the active Queue, which is present in a subscription. Eventually, the Queue will be detected to be in status ‘Unknown’.
If you are looking for a solution to fix the challenges mentioned earlier under one roof, we got you back.
Serverless360 can monitor Azure Service Bus Queue state and notify on the expected state not being met. Configure the threshold monitor to get notified on the above three scenarios.
The notification forwarded due to the unavailability of the Queue will look similar to the above picture.
Moreover, if the outage is due to any temporary reasons, then the threshold monitor in Serverless360 can auto-correct the Queue’s state to active. This auto-correct will reduce the support person’s manual intervention and help fix the issue a lot faster.
Furthermore, you can set several retry attempts to auto-correct the expected state if the issue persists for a more extended period.
Throttling from an External System on which the Service Bus Depends
Microsoft clearly states in its document that several thresholds will affect the maximum throughput achieved before running into throttling conditions like the no. of messages per transaction, message size of the Queue, size of Queue or topic, etc. It is significant to ensure your entity does not get throttled.
The Quota on the Queue Might get Exceeded
When the Queue already has messages that occupy its total size, sending any more messages to the Queue is impossible. Any more attempts to send a message to the Queue will result in a User error.
Bingo, you can fix even the last two challenges within the same roof- Serverless360. To provide an out of the box solution, we have come up with another monitor called Data Monitor, which helps you to keep an eye on the Throttled Requests and user error metric, in fact, on even more properties.
Real-time use case
If you wonder why one should be concerned about the service bus availability given the Microsoft SLA, this real-time use case might help you understand the significance.
Consider a Northwind company with a simple web application that pushes a message onto a service bus queue when a form gets filled.
The form takes 5 minutes of a user’s time to fill out, and the company wants to ensure that the Service Bus is available when the user pushes the Submit button. As they are more concerned about the user’s time and do not want to lose the business-critical message, they want the check done before filling in the form.
If they get notified of the Service Bus Queue unavailability, they could redirect the user to an error page, save the user’s time, and get the form filled later.
Serverless360 comes into the game and notifies the stakeholders on the unavailability of the Azure Service Bus through its extensive monitors. Also, it tries to bring back the Queue to the active state via its unique “AutoCorrect” feature.
I hope you now understand the key things you need to keep track of to ensure your Azure Service Bus availability. You can also use third party tooling like Serverless360 to seamlessly ensure that the business-critical Azure Serverless service (Service Bus) is up and running.