Home / .NET / Batch completion with multiple receivers on Azure Service Bus

Batch completion with multiple receivers on Azure Service Bus

In the last post, we created multiple receivers with multiple factories to speed up our message processing. Unfortunately, this has some consequences for our background completion loop. Since the message processing logic is shared between multiple receivers, all the receivers will try to push lock tokens into the concurrent stack. So the following code will be called by numberOfReceivers * concurrencyPerReceiver concurrently.

// same as before
await DoSomethingWithTheMessageAsync().ConfigureAwait(false);
lockTokensToComplete.Push(message.LockToken);
// same as before

So for example when we’d use 10 receivers with each a concurrency setting of 32 we’d be ending up pushing lock tokens to the concurrent stack from up to 320 simultaneous operations. Not a big deal you could say since the ConcurrentStack implementation is lock-free. Unfortunately, lock-free doesn’t necessarily mean it is contention free.

ConcurrentQueue<T> and ConcurrentStack<T> are completely lock-free in this way. They will never take a lock, but they may end up spinning and retrying an operation when faced with contention (when the CAS operations fail). Read more

The concurrent stack implementation internally uses a technique called spinning and retrying (for more information see the excellent Blocking vs. Spinning section from Joe Albahari). Under contention with a large number of concurrent operations these operations can fail multiple times and might become less efficient than we assumed them to be.

In the previous post I made the following statement:

When we’d received several hundred messages per seconds our randomly chosen “complete every one-hundredth messages” and then “sleep for five seconds” might turn out to be a suboptimal choice.

With multiple receivers pushing lock tokens into the concurrent stack, we might have made our problem even worse. It is entirely possible that our multiple concurrent receivers can fill the concurrent stack faster with lock tokens than our completion loop manage to complete. Our previously chosen five seconds sleep duration might turn out to be way too long under heavy load.

How about we just spin up multiple batch completion tasks like the following?

var completionTasks = new Task[numberOfReceivers];

for(int i = 0; i < numberOfReceivers; i++) { 
   completionTasks[i] = Task.Run(() => BatchCompletionLoop());
}

static async Task BatchCompletionLoop() {
   while(!token.IsCancellationRequested) {
    var lockTokens = new Guid[100];
      int numberOfItems = lockTokensToComplete.TryPopRange(lockTokens)
      if(numberOfItems > 0) {
         await receiveClient.CompleteBatchAsync(lockTokens).ConfigureAwait(false);
      }
      await Task.Delay(TimeSpan.FromSeconds(5), token).ConfigureAwait(false);
   }
}

Now we’ve just made the contention problem on the concurrent stack even worse. Multiple background completion operations are competing on the concurrent stack as well. Because all those completion tasks would have the same Task.Delay value, there is a high chance we would see a pattern where multiple background completion operations are dispatch on a worker thread but only a few would succeed and idle again, effectively wasting a lot of resources in production.

Leaving contention and resource waste aside we’ve now introduced another major flaw in our completion logic. Can you spot it? If not, don’t worry. I’ll pick it up in the next post.