Invoke different versions of your service using APIM

I’m working on something where I need to deploy multiple versions of my software and validate whether there’s an improvement or regression. The solution heavily relies on deployed language models, so that’s something we want to evaluate first.
The solution I’m working on looks fairly similar to the setup in my Trial & Error GitHub repository, so I have a .NET frontend service and a Python backend service, both invoking language models deployed in Microsoft Foundry.

The baseline

I’ve deployed this using the following services:

  • Azure API Management
    This is the public entry point for my services
  • Azure Container Apps Environment
    Hosting two Container Apps for both .NET and Python
  • Microsoft Foundry + some models For my agents & MAF to work with

Of course, there are some other resources deployed too, but those aren’t important for now.

The way this works is:

  1. A client makes an HTTP request
  2. APIM processes the request and routes it to the default backend, the .NET Container App
  3. The .NET Container App takes the request and processes it
  4. The request gets forwarded to the Python Container App, which processes it
  5. The processed request is passed along to a language model in Microsoft Foundry
  6. The response gets processed and sent back to the client through the .NET Container App and APIM
  sequenceDiagram
    participant Client
    participant APIM as Azure API Management
    participant NET as .NET Container App
    participant Python as Python Container App
    participant Foundry as Microsoft Foundry (LLM)

    Client->>APIM: HTTP Request
    APIM->>NET: Route to default backend
    NET->>Python: Forward request
    Python->>Foundry: Send to language model
    Foundry-->>Python: Model response
    Python-->>NET: Processed response
    NET-->>APIM: Return response
    APIM-->>Client: HTTP Response

I think you’ll find this to be a common design. Obviously, the .NET and Python services also provide some additional value on top of forwarding requests, but that’s not relevant to this post.

My regular APIM inbound policy looks pretty much like this:

<inbound>
    <base />
    <set-backend-service backend-id="container-apps-backend" />
</inbound>

This sets the backend to container-apps-backend created earlier in the deployment.

// apim.bicep
module containerAppsBackend 'modules/apim-backend.bicep' = {
  name: '${applicationName}-apim-backend-deployment'
  params: {
    backendName: 'container-apps-backend'
    apimServiceName: apiManagementName
    backendUrl: apiBackendUrl
    backendDescription: 'Container Apps backend'
  }
}

// apim-backend.bicep
resource backend 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = {
  name: backendName
  parent: apimService
  properties: {
    url: backendUrl
    protocol: 'http'
    description: backendDescription
    // omitted for brevity
  }
}

The setup

As mentioned, I want the Container Apps to be deployed with different versions or configuration values.
As a concrete example, I want to validate the responses using different language models. This way, I can determine if improvements can be made in terms of performance, costs, or accuracy of the language model responses. However, you should also be able to deploy different versions of the containers to do similar tests.

Deploy the backends & subscriptions

To accomplish this, I have deployed the Container Apps multiple times, each with different language models configured. It’s also possible to change other properties or versions of the images if you want to, but changing the language models sufficed for me.
Now I have 8 combinations of .NET & Python container apps running.
The next thing to do is route the requests to the correct backend services. My approach was to use multiple backends and subscriptions for this purpose. To accomplish this, I’ve created a small array of candidates I want to verify and deploy the backends and subscriptions based on that collection.

// Sample collection with the application pair/candidates that need testing
var enabledCandidateMappings = [
  {
    candidate_index: 1
    candidate_name: 'gpt-4o-candidate'
    backend_name: 'evaluation-backend-1'
    subscription_name: 'evaluation-subscription-1'
  }
  {
    candidate_index: 2
    candidate_name: 'gpt-4.1-candidate'
    backend_name: 'evaluation-backend-2'
    subscription_name: 'evaluation-subscription-2'
  }
]

// Deploying the application pair.
module candidateAppPairs 'modules/container-app-pair.bicep' = [for candidate in enabledCandidateMappings: {
  name: '${environment}CandidateAppPair${int(candidate.candidate_index)}'
  params: {
    candidateName: string(candidate.candidate_name)
    // other params omitted for brevity
  }
}]

// Deploying the backends with the `apiUrl` retrieved from the `container-app-pair` module.
module evaluationApimBackends 'modules/apim-backend.bicep' = [for (candidate, i) in enabledCandidateMappings: {
  name: '${environment}ApimBackend${int(candidate.candidate_index)}'
  scope: resourceGroup(apimResourceGroupName)
  params: {
    backendName: string(candidate.backend_name)
    apimServiceName: apimServiceName
    backendUrl: candidateAppPairs[i].outputs.apiUrl
    backendDescription: 'Evaluation backend for ${string(candidate.candidate_name)}'
  }
}]

// The subscriptions for each application pair.
module evaluationApimSubscriptions 'modules/apim-subscription.bicep' = [for candidate in enabledCandidateMappings: {
  name: '${environment}ApimSubscription${int(candidate.candidate_index)}'
  scope: resourceGroup(apimResourceGroupName)
  params: {
    apimServiceName: apimServiceName
    subscriptionName: string(candidate.subscription_name)
    displayName: string(candidate.subscription_name)
    apiName: apimApiName
    allowTracing: false
  }
}]

// apim-subscription.bicep
resource subscription 'Microsoft.ApiManagement/service/subscriptions@2023-09-01-preview' = {
  name: subscriptionName
  parent: apimService
  properties: {
    displayName: displayName
    scope: '${apimService.id}/apis/${apiName}'
    state: 'active'
    allowTracing: allowTracing
  }
}

After deploying this, you’ll end up with 8 additional backends and subscriptions in your Azure API Management resource.

Create a routing policy

With the 8 different Container App pairs, the backends and subscriptions in place, we can add a new inbound policy in APIM. Based on a specific subscription key, a request gets routed to different backends.
As my backends can differ per run, I can’t have a static policy, so I have to generate this dynamically too.

To start, I’ve defined two policy files in my codebase.

First, the evaluation-subscription-routing-policy.xml file is responsible for routing requests to the correct backend using a choose/otherwise pattern. This file only contains a small portion of an inbound policy.

<!--evaluation-subscription-routing-policy.xml-->
<choose>
__SUBSCRIPTION_ROUTING_RULES__       <!-- filled with one <when> per candidate-->
    <otherwise>
        <set-backend-service backend-id="container-apps-backend" />
    </otherwise>
</choose>

Second, the main api-policy.xml file. This one contains all the policies in place for the API.

<inbound>
    <base />
    <rate-limit-by-key ... />
    <cors>...</cors>
    <!--
        Evaluation routing is injected by infrastructure templates.
        In default deployments this resolves to:
        <set-backend-service backend-id="container-apps-backend" />
    -->
__EVALUATION_SUBSCRIPTION_ROUTING_BLOCK__    <!-- filled with the assembled <choose> block-->
</inbound>

The string placeholders are replaced in my Bicep files by first loading the contents and applying the required replacements.

// apim-policy.bicep

// Load both XML templates as raw strings at compile time
var apiPolicyTemplate = loadTextContent('../apim/api-policy.xml')
var evaluationApiRoutingTemplate   = loadTextContent('../apim/evaluation-subscription-routing-policy.xml')

// Step 1: Build one <when> element per enabled candidate
var evaluationApiPolicyRouteRulesArray = [for candidate in enabledCandidateMappings:
  format(
    '            <when condition=\'@(context.Subscription?.Name == "{0}")\'><set-backend-service backend-id="{1}" /></when>',
    string(candidate.subscription_name),
    string(candidate.backend_name)
  )
]
var evaluationApiPolicyRoutingRules = join(evaluationApiPolicyRouteRulesArray, '\n')

// Step 2: Inject the <when> rules into the routing block template
//         (replaces __SUBSCRIPTION_ROUTING_RULES__ in evaluation-subscription-routing-policy.xml)
var evaluationApiRoutingBlock = replace(
  evaluationApiRoutingTemplate,
  '__SUBSCRIPTION_ROUTING_RULES__',
  evaluationApiPolicyRoutingRules
)

// Step 3: Inject the complete routing block into the main API policy
//         (replaces __EVALUATION_SUBSCRIPTION_ROUTING_BLOCK__ in api-policy.xml)
var evaluationApiPolicy = replace(
  apiPolicyTemplate,
  '__EVALUATION_SUBSCRIPTION_ROUTING_BLOCK__',
  evaluationApiRoutingBlock
)

// Step 4: Deploy the generated policies
module evaluationApimPolicy 'modules/apim-api-policy.bicep' = {
  name: '${environment}EvaluationApimPolicy'
  scope: resourceGroup(apimResourceGroupName)
  params: {
    apimServiceName: apimServiceName
    apiName:         apimApiName
    policyXml:       evaluationApiPolicy   // ← the composed XML string
  }
  dependsOn: [evaluationApimBackends, evaluationApimSubscriptions]
}

// apim-api-policy.bicep
resource apiPolicy 'Microsoft.ApiManagement/service/apis/policies@2023-09-01-preview' = {
  name: 'policy'
  parent: apimApi
  properties: {
    format: 'rawxml'
    value: policyXml
  }
}

When deployed, the APIM inbound policy will look similar to this:

<policies>
    <inbound>
        <base />
        <!-- Omitted for brevity -->
        <choose>
            <when condition="@(context.Subscription?.Name == "evaluation-subscription-1")">
                <set-backend-service backend-id="evaluation-backend-1" />
            </when>
            <when condition="@(context.Subscription?.Name == "evaluation-subscription-2")">
                <set-backend-service backend-id="evaluation-backend-2" />
            </when>

This is all there is to it.

Validating

You can now make an HTTP request to the service using one of the subscription keys. You’ll notice each subscription key will route to a different backend and can potentially give different types of responses.

  sequenceDiagram
    participant C1 as Client (Subscription 1)
    participant CN as Client (Subscription N)
    box LightSteelBlue Azure API Management
    participant APIM as Azure API Management
    end
    box LightGreen Backend 1 pair
    participant B1 as .NET Container App (Backend 1)
    participant P1 as Python Container App (Backend 1)
    end
    box LightCoral Backend N pair
    participant BN as .NET Container App (Backend N)
    participant PN as Python Container App (Backend N)
    end
    box LightYellow Microsoft Foundry
    participant Foundry as Microsoft Foundry (LLM)
    end

    C1->>APIM: HTTP Request + Subscription Key 1
    APIM->>B1: Route to evaluation-backend-1
    B1->>P1: Forward request
    P1->>Foundry: Send to language model
    Foundry-->>P1: Model response
    P1-->>B1: Processed response
    B1-->>APIM: Return response
    APIM-->>C1: HTTP Response

    CN->>APIM: HTTP Request + Subscription Key N
    APIM->>BN: Route to evaluation-backend-N
    BN->>PN: Forward request
    PN->>Foundry: Send to language model
    Foundry-->>PN: Model response
    PN-->>BN: Processed response
    BN-->>APIM: Return response
    APIM-->>CN: HTTP Response

Having a setup like this offers quite a few possibilities. You can use it for a different kind of A/B testing, route customers to higher- or lower-performing backends, validate configurations, route customers to alpha, beta, or stable tiers of the application, and much more.

So far, the above setup works like a charm for what I’m doing with it. Note that I’m only using it for development purposes right now, where the load isn’t very high. Performance might be impacted if your list of backends grows quite large, but that’s always the case when adding policies to APIM.