Invoke different versions of your service using APIM
I’m working on something where I need to deploy multiple versions of my software and validate whether there’s an improvement or regression. The solution heavily relies on deployed language models, so that’s something we want to evaluate first.
The solution I’m working on looks fairly similar to the setup in my Trial & Error GitHub repository, so I have a .NET frontend service and a Python backend service, both invoking language models deployed in Microsoft Foundry.
The baseline
I’ve deployed this using the following services:
- Azure API Management
This is the public entry point for my services - Azure Container Apps Environment
Hosting two Container Apps for both .NET and Python - Microsoft Foundry + some models For my agents & MAF to work with
Of course, there are some other resources deployed too, but those aren’t important for now.
The way this works is:
- A client makes an HTTP request
- APIM processes the request and routes it to the default backend, the .NET Container App
- The .NET Container App takes the request and processes it
- The request gets forwarded to the Python Container App, which processes it
- The processed request is passed along to a language model in Microsoft Foundry
- The response gets processed and sent back to the client through the .NET Container App and APIM
sequenceDiagram
participant Client
participant APIM as Azure API Management
participant NET as .NET Container App
participant Python as Python Container App
participant Foundry as Microsoft Foundry (LLM)
Client->>APIM: HTTP Request
APIM->>NET: Route to default backend
NET->>Python: Forward request
Python->>Foundry: Send to language model
Foundry-->>Python: Model response
Python-->>NET: Processed response
NET-->>APIM: Return response
APIM-->>Client: HTTP Response
I think you’ll find this to be a common design. Obviously, the .NET and Python services also provide some additional value on top of forwarding requests, but that’s not relevant to this post.
My regular APIM inbound policy looks pretty much like this:
<inbound>
<base />
<set-backend-service backend-id="container-apps-backend" />
</inbound>
This sets the backend to container-apps-backend created earlier in the deployment.
// apim.bicep
module containerAppsBackend 'modules/apim-backend.bicep' = {
name: '${applicationName}-apim-backend-deployment'
params: {
backendName: 'container-apps-backend'
apimServiceName: apiManagementName
backendUrl: apiBackendUrl
backendDescription: 'Container Apps backend'
}
}
// apim-backend.bicep
resource backend 'Microsoft.ApiManagement/service/backends@2023-09-01-preview' = {
name: backendName
parent: apimService
properties: {
url: backendUrl
protocol: 'http'
description: backendDescription
// omitted for brevity
}
}
The setup
As mentioned, I want the Container Apps to be deployed with different versions or configuration values.
As a concrete example, I want to validate the responses using different language models. This way, I can determine if improvements can be made in terms of performance, costs, or accuracy of the language model responses. However, you should also be able to deploy different versions of the containers to do similar tests.
Deploy the backends & subscriptions
To accomplish this, I have deployed the Container Apps multiple times, each with different language models configured. It’s also possible to change other properties or versions of the images if you want to, but changing the language models sufficed for me.
Now I have 8 combinations of .NET & Python container apps running.
The next thing to do is route the requests to the correct backend services. My approach was to use multiple backends and subscriptions for this purpose. To accomplish this, I’ve created a small array of candidates I want to verify and deploy the backends and subscriptions based on that collection.
// Sample collection with the application pair/candidates that need testing
var enabledCandidateMappings = [
{
candidate_index: 1
candidate_name: 'gpt-4o-candidate'
backend_name: 'evaluation-backend-1'
subscription_name: 'evaluation-subscription-1'
}
{
candidate_index: 2
candidate_name: 'gpt-4.1-candidate'
backend_name: 'evaluation-backend-2'
subscription_name: 'evaluation-subscription-2'
}
]
// Deploying the application pair.
module candidateAppPairs 'modules/container-app-pair.bicep' = [for candidate in enabledCandidateMappings: {
name: '${environment}CandidateAppPair${int(candidate.candidate_index)}'
params: {
candidateName: string(candidate.candidate_name)
// other params omitted for brevity
}
}]
// Deploying the backends with the `apiUrl` retrieved from the `container-app-pair` module.
module evaluationApimBackends 'modules/apim-backend.bicep' = [for (candidate, i) in enabledCandidateMappings: {
name: '${environment}ApimBackend${int(candidate.candidate_index)}'
scope: resourceGroup(apimResourceGroupName)
params: {
backendName: string(candidate.backend_name)
apimServiceName: apimServiceName
backendUrl: candidateAppPairs[i].outputs.apiUrl
backendDescription: 'Evaluation backend for ${string(candidate.candidate_name)}'
}
}]
// The subscriptions for each application pair.
module evaluationApimSubscriptions 'modules/apim-subscription.bicep' = [for candidate in enabledCandidateMappings: {
name: '${environment}ApimSubscription${int(candidate.candidate_index)}'
scope: resourceGroup(apimResourceGroupName)
params: {
apimServiceName: apimServiceName
subscriptionName: string(candidate.subscription_name)
displayName: string(candidate.subscription_name)
apiName: apimApiName
allowTracing: false
}
}]
// apim-subscription.bicep
resource subscription 'Microsoft.ApiManagement/service/subscriptions@2023-09-01-preview' = {
name: subscriptionName
parent: apimService
properties: {
displayName: displayName
scope: '${apimService.id}/apis/${apiName}'
state: 'active'
allowTracing: allowTracing
}
}
After deploying this, you’ll end up with 8 additional backends and subscriptions in your Azure API Management resource.
Create a routing policy
With the 8 different Container App pairs, the backends and subscriptions in place, we can add a new inbound policy in APIM. Based on a specific subscription key, a request gets routed to different backends.
As my backends can differ per run, I can’t have a static policy, so I have to generate this dynamically too.
To start, I’ve defined two policy files in my codebase.
First, the evaluation-subscription-routing-policy.xml file is responsible for routing requests to the correct backend using a choose/otherwise pattern. This file only contains a small portion of an inbound policy.
<!--evaluation-subscription-routing-policy.xml-->
<choose>
__SUBSCRIPTION_ROUTING_RULES__ <!-- filled with one <when> per candidate-->
<otherwise>
<set-backend-service backend-id="container-apps-backend" />
</otherwise>
</choose>
Second, the main api-policy.xml file. This one contains all the policies in place for the API.
<inbound>
<base />
<rate-limit-by-key ... />
<cors>...</cors>
<!--
Evaluation routing is injected by infrastructure templates.
In default deployments this resolves to:
<set-backend-service backend-id="container-apps-backend" />
-->
__EVALUATION_SUBSCRIPTION_ROUTING_BLOCK__ <!-- filled with the assembled <choose> block-->
</inbound>
The string placeholders are replaced in my Bicep files by first loading the contents and applying the required replacements.
// apim-policy.bicep
// Load both XML templates as raw strings at compile time
var apiPolicyTemplate = loadTextContent('../apim/api-policy.xml')
var evaluationApiRoutingTemplate = loadTextContent('../apim/evaluation-subscription-routing-policy.xml')
// Step 1: Build one <when> element per enabled candidate
var evaluationApiPolicyRouteRulesArray = [for candidate in enabledCandidateMappings:
format(
' <when condition=\'@(context.Subscription?.Name == "{0}")\'><set-backend-service backend-id="{1}" /></when>',
string(candidate.subscription_name),
string(candidate.backend_name)
)
]
var evaluationApiPolicyRoutingRules = join(evaluationApiPolicyRouteRulesArray, '\n')
// Step 2: Inject the <when> rules into the routing block template
// (replaces __SUBSCRIPTION_ROUTING_RULES__ in evaluation-subscription-routing-policy.xml)
var evaluationApiRoutingBlock = replace(
evaluationApiRoutingTemplate,
'__SUBSCRIPTION_ROUTING_RULES__',
evaluationApiPolicyRoutingRules
)
// Step 3: Inject the complete routing block into the main API policy
// (replaces __EVALUATION_SUBSCRIPTION_ROUTING_BLOCK__ in api-policy.xml)
var evaluationApiPolicy = replace(
apiPolicyTemplate,
'__EVALUATION_SUBSCRIPTION_ROUTING_BLOCK__',
evaluationApiRoutingBlock
)
// Step 4: Deploy the generated policies
module evaluationApimPolicy 'modules/apim-api-policy.bicep' = {
name: '${environment}EvaluationApimPolicy'
scope: resourceGroup(apimResourceGroupName)
params: {
apimServiceName: apimServiceName
apiName: apimApiName
policyXml: evaluationApiPolicy // ← the composed XML string
}
dependsOn: [evaluationApimBackends, evaluationApimSubscriptions]
}
// apim-api-policy.bicep
resource apiPolicy 'Microsoft.ApiManagement/service/apis/policies@2023-09-01-preview' = {
name: 'policy'
parent: apimApi
properties: {
format: 'rawxml'
value: policyXml
}
}
When deployed, the APIM inbound policy will look similar to this:
<policies>
<inbound>
<base />
<!-- Omitted for brevity -->
<choose>
<when condition="@(context.Subscription?.Name == "evaluation-subscription-1")">
<set-backend-service backend-id="evaluation-backend-1" />
</when>
<when condition="@(context.Subscription?.Name == "evaluation-subscription-2")">
<set-backend-service backend-id="evaluation-backend-2" />
</when>
This is all there is to it.
Validating
You can now make an HTTP request to the service using one of the subscription keys. You’ll notice each subscription key will route to a different backend and can potentially give different types of responses.
sequenceDiagram
participant C1 as Client (Subscription 1)
participant CN as Client (Subscription N)
box LightSteelBlue Azure API Management
participant APIM as Azure API Management
end
box LightGreen Backend 1 pair
participant B1 as .NET Container App (Backend 1)
participant P1 as Python Container App (Backend 1)
end
box LightCoral Backend N pair
participant BN as .NET Container App (Backend N)
participant PN as Python Container App (Backend N)
end
box LightYellow Microsoft Foundry
participant Foundry as Microsoft Foundry (LLM)
end
C1->>APIM: HTTP Request + Subscription Key 1
APIM->>B1: Route to evaluation-backend-1
B1->>P1: Forward request
P1->>Foundry: Send to language model
Foundry-->>P1: Model response
P1-->>B1: Processed response
B1-->>APIM: Return response
APIM-->>C1: HTTP Response
CN->>APIM: HTTP Request + Subscription Key N
APIM->>BN: Route to evaluation-backend-N
BN->>PN: Forward request
PN->>Foundry: Send to language model
Foundry-->>PN: Model response
PN-->>BN: Processed response
BN-->>APIM: Return response
APIM-->>CN: HTTP Response
Having a setup like this offers quite a few possibilities. You can use it for a different kind of A/B testing, route customers to higher- or lower-performing backends, validate configurations, route customers to alpha, beta, or stable tiers of the application, and much more.
So far, the above setup works like a charm for what I’m doing with it. Note that I’m only using it for development purposes right now, where the load isn’t very high. Performance might be impacted if your list of backends grows quite large, but that’s always the case when adding policies to APIM.
