Azure Service Bus: Earn the redesign

TL;DR: Micro-optimizations are not a substitute for design work. They are how you earn the right to redesign. In the Azure Service Bus SDK, repeated work in the Body property first led to smaller allocation fixes. Once those fixes exposed the shape of the problem, a small internal redesign made the code faster, clearer, and easier to reason about.

“This code is bad. We should rewrite it.”

Most developers have heard that sentence. Many have said it. I have too. The problem is not that rewrites are always wrong. The problem is that a rewrite without measurements is often just the same misunderstanding with newer syntax.

Performance work gives you a better path. Profile the code. Improve one thing. Benchmark it. Profile again. Repeat until the shape of the problem stops being mysterious. Sometimes that loop ends with a tiny change. Sometimes it teaches you enough to make a redesign safe.

The earlier Event Hubs posts in this series looked at that loop from the micro-optimization side: first removing temporary allocations from partition-key encoding, then tightening the Jenkins lookup3 hash loop. This story looks at the point where the same loop starts pointing past another small tweak and toward a better internal design.

The harmless property that was not harmless

The Azure Service Bus client exposes message payloads through a Body property. That is the kind of API developers expect to be cheap. Properties feel like fields. A caller might read message.Body once to deserialize JSON, again for logging, and once more while debugging. Nothing about the shape of the API suggests that each access might rebuild the body.

But under the covers, the client also has to deal with Advanced Message Queuing Protocol (AMQP) message bodies. Those bodies can be represented as one data section, multiple data sections, or lower-level structures exposed through raw AMQP APIs. Turning that into BinaryData, the Azure SDK type used to represent binary payloads, is not just returning a field. It can involve combining memory segments and copying bytes.

The first pull request in this sequence tried the direct fix: cache the computed BinaryData so the same body was not rebuilt over and over. The review discussion quickly found the catch. The raw AMQP message is mutable. AMQP is the wire protocol underneath Service Bus, with its own message representation that advanced callers can reach into directly. If a caller gets the raw message and changes the body, a cached Body value can become stale.

That is where the performance problem became more interesting. The problem was not just “there is an allocation.” The problem was that the design did not clearly express ownership, mutation, or when bytes had to be copied.

The first fixes taught us where the boundaries were

The next version introduced an internal body wrapper so the SDK could avoid recomputing the same data in the common case. For most users, the raw AMQP message is never touched. They create a ServiceBusMessage, set the body, send it, receive it, and read the Body property. Optimizing that path matters because it is the path most applications use.

At the same time, the SDK still had to preserve the advanced path. If someone reaches into the raw AMQP message and mutates the body, the code cannot pretend that nothing happened. That case may be uncommon, but it is still part of the contract.

The PR discussion separated the cases into two different lifetimes. On the send path, the body often starts as caller-owned ReadOnlyMemory<byte>. If it is a single segment, the SDK does not need to copy it immediately. On the receive path, the SDK receives buffers from the AMQP library, and those buffers need to be copied before the underlying message can be released.

That distinction was the design insight. The code did not need one generic “body memory” trick. It needed names for the different body behaviors.

The redesign made the intent visible

The follow-up PR split the internal body handling into distinct implementations. The names matter here. They tell future readers why each path exists.

internal abstract class MessageBody : IEnumerable<ReadOnlyMemory<byte>>
{
    public static MessageBody FromReadOnlyMemorySegment(ReadOnlyMemory<byte> segment)
    {
        return new NonCopyingSingleSegmentMessageBody(segment);
    }

    public static MessageBody FromReadOnlyMemorySegments(IEnumerable<ReadOnlyMemory<byte>> segments)
    {
        return segments is MessageBody messageBody
            ? messageBody
            : new CopyingOnConversionMessageBody(segments);
    }

    public static MessageBody FromDataSegments(IEnumerable<Data> segments)
    {
        return new EagerCopyingMessageBody(segments);
    }
}

This sample is simplified from the production code, but it captures the important shift. The implementation no longer hides three behaviors behind one vague helper. It says what the code is doing:

The segments is MessageBody check is a small fast path. If the data is already one of the SDK’s internal body wrappers, the factory returns it instead of wrapping it again. That keeps repeated conversions from adding another layer of indirection.

  • NonCopyingSingleSegmentMessageBody wraps the common send path without copying.
  • CopyingOnConversionMessageBody delays copying until a flattened body is needed.
  • EagerCopyingMessageBody copies receive buffers immediately because the source lifetime does not belong to the SDK.

That is not just faster code. It is more honest code. A reviewer can read the type name and understand the trade-off before opening the method body.

The awkward allocation was a design smell

The original symptom was allocation on property access. The deeper problem was that callers and internal code were crossing representation boundaries too often. A public Body property wants one continuous BinaryData value. The AMQP data body may be multiple sections. The send path and receive path have different ownership rules. The old implementation paid conversion costs because those differences were not modeled explicitly.

Once the code had separate internal body types, the rules moved into one place. A single segment could stay a single segment. Multiple segments could remain separate until flattened. Receive buffers could be copied at the moment where copying was required for safety.

The pull request review also improved the internal names. What started as BodyMemory became Body, and later MessageBody. That kind of rename looks small in a diff, but it matters. The name changed from describing a storage detail to describing the domain concept inside the SDK.

Performance work often starts with bytes and ends with language. Once you understand the code, you can name the concepts that were missing.

The next wall was inside the copy itself

A year later, another pass looked at the receive-side eager copy path again. The redesign had made the path clear enough to optimize further. This time the issue was not whether to copy. The receive path had to copy. The question was whether the copy code was doing extra work while building the destination buffer.

The later PR changed the copy helper to first walk the segments, calculate the total length, and create the destination buffer with enough capacity up front. That avoided repeated buffer growth while appending segment after segment.

int length = 0;
List<ReadOnlyMemory<byte>> segments = new();

foreach (Data segment in dataSegments)
{
    ReadOnlyMemory<byte> data = GetData(segment);
    length += data.Length;
    segments.Add(data);
}

ArrayBufferWriter<byte> writer = new(length);

Again, the sample is simplified. The production code then makes a second pass and copies each segment into the pre-sized writer. The point is the pattern: when a copy is unavoidable, make it one intentional copy into a correctly sized destination. Avoid making the buffer discover its final size by growing repeatedly.

Do not apply that pattern blindly to unbounded data. The first pass keeps segment references so the second pass can copy them, and the total length is accumulated in an int. That is reasonable for Service Bus message bodies, where message size is bounded by the service, but general-purpose code should still think about maximum size and overflow behavior.

StepWhat changedWhat the team learned
Cache attemptAvoid rebuilding Body on repeated property accessThe raw AMQP message can be mutable, so caching has correctness boundaries
Internal body abstractionRepresent send, receive, and conversion paths explicitlyThe performance problem was tied to ownership and lifetime
Split implementationsUse non-copying, copy-on-conversion, and eager-copying pathsClearer design made the common path cheaper and the uncommon path safer
Copy helper optimizationPre-size the buffer before copying segmentsOnce the design was clear, smaller optimizations became easier to target

Why not redesign first?

Because the early attempts were not wasted. They produced the knowledge needed for the redesign. The first PR showed that repeated property access was expensive. The review showed that mutability prevented a naive cache. The next PRs separated the send and receive assumptions. Later benchmarks showed that the eager-copy implementation still had room to improve.

If someone had started with “let’s rewrite body handling,” the discussion would have been abstract. Instead, each small change exposed a constraint. By the time the internal design changed, the team had concrete examples, profiler snapshots, tests, and review comments to guide it.

That is the part I wish more teams wrote down in pull requests and architecture decision records. Not just the final design, but the path that made the design obvious. Future maintainers need to know why the code has three body implementations. Without that context, they may collapse it back into one helper and reintroduce the same cost.

What I took away

Do the boring performance loop before reaching for a rewrite. Profile, improve, benchmark, and repeat. Use stable machines where you can. Look at production telemetry after shipping, because benchmarks can prove that a change helps a scenario, but production tells you whether that scenario mattered.

Micro-optimizations are not the opposite of design. They are a way to learn the design pressure in real code. In the Event Hubs partition-key resolver, that meant making a small hot path do less work while keeping the same design. In the Service Bus body path, the same loop showed that the design needed sharper boundaries: non-copying when ownership is safe, copying on conversion when flattening is requested, and eager copying when the receive path requires it.

That is the kind of redesign I trust: measured, incremental, and informed by the code that came before it.

Further reading:

Common questions

This section answers the questions I would ask before applying the same idea to application code.

Should I redesign code as soon as profiling finds allocations?

No. Try to understand the allocation first. A small change may be enough, and even when it is not, the small change teaches you which constraints the redesign must respect.

Is a property allowed to do expensive work?

Sometimes it has to, but callers usually assume properties are cheap. If a property sits on a hot path, repeated allocation or conversion inside the getter deserves extra scrutiny.

What is the main lesson?

Optimize first to learn. Redesign only when the measurements and constraints show that the old shape is the problem.

About the author

Daniel Marbach

Add comment

Recent Posts