Azure Virtual WAN migration – tales from the frontline

We recently conducted an Azure Virtual WAN migration for a customer to help resolve a number of issues in their legacy Azure networking environment. Some statistics about their environment included:

  • 60 connected virtual networks
  • 20 virtual network gateways deployed
  • 6 firewall instances
  • 2 SDWAN fabrics
  • 300+ route tables
  • multiple hub and spoke deployments

The goals for the program were set out as follows:

  • Simplified network design
  • Simplified network management
  • Centralised security controls for all traffic
  • Support for SDWAN platforms
  • Right sizing of resources such as ExpressRoute
  • Ability to support additional regions and office locations

I came up with the following high level design which was accepted:

The Azure Virtual WAN migration approach was run along the following lines:

  • Virtual WAN to be deployed into a parallel environment including new ExpressRoute circuits
  • ExpressRoute routers to be linked to provide access between legacy environment and the new virtual WAN environment
  • Staged migration to allow for dev/test environments to be migrated first to complete testing and prove capability

This meant during migration, we had an environment running similar to this:

Azure Virtual WAN migration

What follows are an outline of some of the issues we faced during the migration. From the “that seems obvious” to the more “not so obvious” issues we faced.

The “That seems obvious” issues we faced

Conflicting Network Address Ranges

This one seems the most obvious and hindsight is always 20-20. Some of the networks migrated were connected in strange and undocumented ways:

In this configuration, the network range was not automatically routed and could only be seen by its immediate neighbour but the migration process broke and reconnected all peers to meet with the centralised managed traffic requirement. When the network was migrated to the virtual WAN, everything could see it, including a remote office site with the same subnet for its local servers.

Network Policy for Private Endpoints required if using “Routing Intent and Routing Policies”

This one is also obvious in hindsight. It was missed initially due to inconsistent deployment configurations on subnets. Not all subnets were subject to route tables and user defined routes, so some subnets with private endpoints had been deployed without this configured:

When “Routing intent and Routing Policies” are enabled, this effectively is the same as applying a route table to the subnet and therefore a private endpoint network policy is required.

Propagate Gateway Routes

Some of the virtual networks contained firewalls that provided zone isolation between subnets. The default route table for a network with Routing Intent enabled sends Virtual Network traffic to VNETLocal. To shift the functionality to the new centralised firewalls, a local Virtual Network route via the managed appliance was needed.

Without “Propagate Gateway Routes” enabled the locally generated route table at the device included the new Virtual Network route plus the default set of routes that Microsoft apply to all network interfaces including a default to internet.

The “Not So Obvious” issues we faced

Enabling “Routing Intent and Routing Policies” for Private traffic only

Initially when deciding how internet egress would be handled, the initial directive by Cyber team was to ingress/egress through the same point. As there was a “default route” coming up the ExpressRoute from the on-premises connection, I turned on “Routing Intent and Routing Policies” for Private traffic only:

The unexpected behaviour of only managing internal traffic is that in all connected Virtual Networks, the route table applied sends the RFC1918 address ranges to route via the managed application, but then applies the remaining default route table you would normally see on any standard virtual network. All routes being broadcast via the ExpressRoute gateways are not propagated to the end devices. In the end, we needed to apply “Internet traffic” policies via Routing Intent and Routing Policies to egress through our central managed applications as well.

Asymmetric Routing

Asymmetric routing, the bane of Azure network administrators everywhere. With ExpressRoute connections to the on-premises network in two locations, all Virtual WAN networks are made available via two paths into the environment.

Hub to hub communication adds additional AS-Path info to the route path which should play into route determination, but in our case the on-premises router connections added even more. Therefore traffic originating in Hub2 would route onto the network via the local ExpressRoute, but the return path was the same or shorter (preferenced) via Hub1. With firewalls in play, traffic was dropped due to an unknown session and an asymmetric route.

There were two ways to handle this. The new (Preview) Route Map feature for Virtual Hubs is designed to assist with this issue by using policies to introduce additional AS-Path information to the route path. The problem is, (at the time of this writing) this feature is in preview and we are in production.

The alternative was to use BGP community tags and allow the ExpressRoute routers to apply policy based routing information to the AS path.

BGP Community tags

On the surface, this looked to be a simple solution. Apply a BGP community tag to each VNET based on the hub it is peered with. By default, the BGP community region tag is also automatically applied and this information is shared with the ExpressRoute routers.

Except, Virtual WAN does not fully support BGP community tagging. Community tags are shared with the local ExpressRoute service, but are stripped in inter-hub communication. Applying region specific policies to the routing path is not possible if both regions community tags are not shared.

Next-gen firewalls

The next-gen managed applications that were deployed presented a number of issues for us in our migration configuration as well. Some of the issues we faced are vendor agnostic and not specific to our deployment, some specific to the brand.

I will cover these in another post.


This article is also available here: Azure Virtual WAN migration – tales from the frontline – Arinco

Securing service traffic in Azure – Part 2

In “Securing service traffic in Azure – Part 1” I talked about using private endpoints to secure traffic between your Azure services. In this part, I explain another way to secure your traffic between Azure services using service endpoints.

So why not just use private endpoints?

As I demonstrated in Part 1, by configuring a private endpoint, I successfully secured traffic between my function app and my data source, so why would we use something else? The two primary reasons are:

  • Purpose – You should only ever deploy the minimum required services to support your environment. Private endpoints are designed to provide “private” routable access to a service running in Azure. This means traffic from anywhere within your network environment can access this endpoint. If your only requirement is to secure traffic between services in Azure, then the routing capability is not required.
  • Cost – For each private endpoint there is a cost required to run the endpoint, approximately $9USD/month. As usage of endpoints increase, so to do your costs scale and this can add up quickly over time.

If our only purpose is to secure traffic between services in Azure, then this is where service endpoints provide an alternative.

The Test Environment

Like Part 1 of this series, for simplicity, I will continue to use a test environment that consists of:

  • A storage account hosting a CSV file of data.
  • A PowerShell function app based on a http trigger that will retrieve the data from the CSV file.

Securing the storage account with a service endpoint

This time, the first step to securing the storage account is to use the firewall to disable public access and restrict access to selected virtual networks:

In this case I am connecting to the subnet used in Part 1 of this series. Similar to disabling public network access completely, by restricting to specific virtual networks, I have now disabled access from the internet:

and my function app no longer works:

Securing the service traffic in Azure by linking the function app to the storage account

Just like with private endpoints, I need to link the function app to the service endpoint that was created when I restricted the access to specific virtual network subnets. This is done in exactly the same method for private endpoints by enabling “virtual network integration” and linking it to the same subnet used for the storage account.

Now our environment will look like this:

  • The function app is integrated with the virtual network
  • The storage account has a delegation to the same subnet as the virtual network integration
  • Storage access is now only accessible within the Azure fabric and not across the open internet

And now, my function app is again returning data from the file on the storage account:

In part 3, I will demonstrate how to present a protected address with traffic secured between each layer.

Securing service traffic in Azure – Part 1

A question I come across a lot is how to secure traffic between services running in Azure. In this multi-part series, I will look at using two different approaches for securing service traffic in Azure using a Function App and a storage account as my example services.

The test environment

For the purposes of this post, I am going to use a simple function app that communicates with a storage account and retrieves the content of a file. Granted, I could get the content directly from the storage account, but this would not demonstrate the required functionality. Therefore the demo environment will consist of:

  • A storage account hosting a CSV file full of data.
  • A PowerShell function app based on a http trigger that will retrieve the data from the CSV file.
  • And an account with contributor rights to the resource group in which everything is being created.
Function App accessing storage file

When called, our function app will write the contents of the CSV to the browser:

Function app returned data

Concerns about the described environment

Now while our communications are over HTTPS and using access keys (for the sake of this demonstration), concerns are sometimes raised that the communication, encrypted or not, is over open internet. In addition, for this communication to occur over the open internet, our data source (the storage account) requires a public endpoint. All it takes is a simple misstep and our data could be exposed to the internet.

So how do we restrict public access to our source and ensure communications occur over a more secure channel?

Microsoft provide two mechanisms to secure the connection for the data source:

In this post, I am going to focus on private endpoints.

Securing the storage account with a private endpoint

Our first step is to secure the storage account from public access by disabling “Public Network Access” in the Firewall and Virtual Networks tab:

securing the storage account

With this committed, you can see that our function app no longer returns the content of the data file:

Securing Azure storage account prevents services from accessing it

To allow secured access to our data store, I am now going to need to create a private endpoint for the storage account. For this demo, I am going to call the instance “storage-endpoint” and attach it to the blob resource:

and now I need a virtual network to attach the endpoint to. For this demo, I have created a virtual network and added a subnet “privateips” to place private endpoints in:

I am accepting the defaults for DNS:

and then creating the private endpoint. As I accepted the default DNS settings, this will also create a privatelink DNS zone to host the A record for the private endpoint:

I have now successfully secured the blob storage from the internet:

However, as demonstrated earlier, our function app also has no access. I will now show you how to remedy that.

Connecting the function app to the private endpoint

Before we go any further, it is worth noting that the SSL certificate for the storage account is against “apptestrmcn0.blob.core.windows.net” and I have just created a private endpoint with the name “apptestrmcn0.privatelink.blob.core.windows.net“. Any service trying to connect to the private endpoint is going to fail due to a certificate mismatch. Not to worry, if I try resolving the original name of the storage account from an internal host in the same virtual network, you will see that the FQDN also maps to the new private endpoint:

As the storage account has now been isolated to private endpoint traffic only, I need to connect the Function App to the virtual network too. This is achieved via a “VNet integration” which relies on service delegations. Service delegations require a dedicated subnet per delegation type, and I therefore cannot use the same subnet as our private endpoint. For the sake of simplicity, I am going to use a separate subnet call “functionapps” within the same Virtual Network to demonstrate the functionality.

VNET integration is configured via the Networking tab for the function app:

When I select VNET integration, I am going to then select our subnet for function apps:

When Iclick connect, the service delegation will happen in the background if not already delegated and the function app will get a connection to the subnet. In the next window, as our function app’s sole purpose is to fetch data from the blob, I will disable internet traffic for the function app and apply the change.:

This is the last of the configuration. It should look like this:

And as you can see, our function app is, again, able to retrieve data the storage account:

In “Securing service traffic in Azure – Part 2” I will look at the alternative to private endpoints.

New enhanced connection troubleshoot for Azure Networking

On the 1st March, 2023, Microsoft announced “New enhanced connection troubleshoot” for Azure Network watcher has gone GA. Previously Azure Network Watcher provided specialised stand alone tools for use with network troubleshooting but these have now been consolidated into one place with additional tests and actionable insights to assist with troubleshooting.

Complex network paths
Network Troubleshooting can be difficult and time consuming.

With customers migrating advanced, high-performance workloads to Azure, it’s essential to have better oversight and management of the intricate networks that support these workloads. A lack of visibility can make it challenging to diagnose issues, leaving customers with limited control and feeling trapped in a “black box.” To enhance your network troubleshooting experience, Azure Network Watcher combines these tools with the following features:

  • Unified solution for troubleshooting all NSG, user defined routes, and blocked ports
  • Actionable insights with step-by-step guide to resolve issues
  • Identifying configuration issues impacting connectivity
  • NSG rules that are blocking traffic
  • Inability to open a socket at the specified source port
  • No servers listening on designated destination ports
  • Misconfigured or missing routes

These new features are not available via the portal at the moment:

connection troubleshoot via portal does not display enhanced connection troubleshoot results
Connection Troubleshooting via the portal

The portal will display that there are connectivity issues, but will not provide the enhanced information. This is accessible via PowerShell, Azure CLI and the Rest API. I will now show the real reason this is not working.

Accessing “enhanced connection troubleshoot” output via PowerShell

I am using the following PowerShell to test the connection between the two machines:

$nw = get-aznetworkwatcher -location australiaeast
$svm = get-azvm -Name Machine1
$dvm = get-azvm -Name Machine2
Test-AzNetworkWatcherConnectivity -NetworkWatcher $nw -SourceId $svm.Id -DestinationId $dvm.Id -DestinationPort 445

This returns the following JSON:

ConnectionStatus : Unreachable
AvgLatencyInMs   :
MinLatencyInMs   :
MaxLatencyInMs   :
ProbesSent       : 30
ProbesFailed     : 30
Hops             : [
                     {
                       "Type": "Source",
                       "Id": "a49b4961-b82f-49da-ae2c-8470a9f4c8a6",
                       "Address": "10.0.0.4",
                       "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine1",
                       "NextHopIds": [
                         "6c6f06de-ea3c-45e3-8a1d-372624475ced"
                       ],
                       "Issues": [
                         {
                           "Origin": "Local",
                           "Severity": "Error",
                           "Type": "GuestFirewall",
                           "Context": []
                         }
                       ]
                     },
                     {
                       "Type": "VirtualMachine",
                       "Id": "6c6f06de-ea3c-45e3-8a1d-372624475ced",
                       "Address": "172.16.0.4",
                       "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine2",
                       "NextHopIds": [],
                       "Issues": []
                     }
                   ]

As you can see, the issues discovered are explained in more detail, in this case, the local firewall is affecting the communication. If we check the local Defender firewall, we can see there is a specific rule blocking this traffic:

Blocked outbound protocols

If we remove the local firewall rule, connectivity is restored:

ConnectionStatus : Reachable
AvgLatencyInMs   : 1
MinLatencyInMs   : 1
MaxLatencyInMs   : 2
ProbesSent       : 66
ProbesFailed     : 0
Hops             : [
                     {
                       "Type": "Source",
                       "Id": "f1b763a1-f7cc-48b6-aec7-f132d3fdadf8",
                       "Address": "10.0.0.4",
                       "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine1",
                       "NextHopIds": [
                         "7c9c103c-44ab-4fd8-9444-22354e5f9672"
                       ],
                       "Issues": []
                     },
                     {
                       "Type": "VirtualMachine",
                       "Id": "7c9c103c-44ab-4fd8-9444-22354e5f9672",
                       "Address": "172.16.0.4",
                       "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine2",
                       "NextHopIds": [],
                       "Issues": []
                     }
                   ]

The enhanced connection troubleshoot can detect 6 fault types:

  • Source high CPU utilisation
  • Source high memory utilisation
  • Source Guest firewall
  • DNS resolution
  • Network security rule configuration
  • User defined route configuration

The first four faults are returned by the Network Watcher Agent extension for Windows as demonstrated above. The remaining two faults are from the Azure fabric. As you can see below, when a Network Security Group is misconfigured on the source or destination, our issue returns, but the output displays clearly where and which network security group is at fault:

ConnectionStatus : Unreachable
AvgLatencyInMs   :
MinLatencyInMs   :
MaxLatencyInMs   :
ProbesSent       : 30
ProbesFailed     : 30
Hops             : [
                     {
                       "Type": "Source",
                       "Id": "3cbcbdbe-a6ec-454f-ad2e-946d6731278a",
                       "Address": "10.0.0.4",
                       "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine1",
                       "NextHopIds": [
                         "29e33dac-45ae-4ea3-8a9d-83dccddcc0eb"
                       ],
                       "Issues": []
                     },
                     {
                       "Type": "VirtualMachine",
                       "Id": "29e33dac-45ae-4ea3-8a9d-83dccddcc0eb",
                       "Address": "172.16.0.4",
                       "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine2",
                       "NextHopIds": [],
                       "Issues": [
                         {
                           "Origin": "Inbound",
                           "Severity": "Error",
                           "Type": "NetworkSecurityRule",
                           "Context": [
                             {
                               "key": "RuleName",
                               "value": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/ConnectivityTest/providers/Microsoft.Network/networkSecurityGroups/Ma
                   chine2-nsg/SecurityRules/DenyAnyCustom445Inbound"
                             }
                           ]
                         }
                       ]
                     }
                   ]

In addition to the fault detection, IP Flow is also a part of the enhanced connection troubleshoot, providing a list of hops to a service. An excerpt of a trace to a public storage account is below:

PS C:\temp> Test-AzNetworkWatcherConnectivity -NetworkWatcher $nw -SourceId $svm.Id -DestinationAddress https://announcementtest.blob.core.windows.net/test1 -DestinationPort 443

ConnectionStatus : Reachable
AvgLatencyInMs   : 1
MinLatencyInMs   : 1
MaxLatencyInMs   : 1
ProbesSent       : 66
ProbesFailed     : 0
Hops             : [
                     {
                       "Type": "Source",
                       "Id": "23eb09fd-b5fa-4be1-83f2-caf09d18ada0",
                       "Address": "10.0.0.4",
                       "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine1",
                       "NextHopIds": [
                         "78f3961c-9937-4679-97a7-4a19f4d1232a"
                       ],
                       "Issues": []
                     },
                     {
                       "Type": "PublicLoadBalancer",
                       "Id": "78f3961c-9937-4679-97a7-4a19f4d1232a",
                       "Address": "20.157.155.128",
                       "NextHopIds": [
                         "574ad521-7ab7-470c-b5aa-f1b4e6088888",
                         "e717c4bd-7916-45bd-b3d1-f8eecc7ed1e3",
                         "cbe6f6a6-4281-402c-a81d-c4e3d30d2247",
                         "84769cde-3f92-4134-8d48-82141f2d9bfd",
                         "aa7c2b73-0892-4d15-96c6-45b9b033829c",
                         "1c3e3043-98f2-4510-b37f-307d3a98a55b",
                         "b97778cb-9ece-4e87-bf6d-71b90fac3847",
                         "cb92d16d-d4fe-4233-b958-a4d3dbe78303",
                         "ec9a2753-3a60-4fce-9d92-7dbbc0d0219d",
                         "df2b1a3e-6555-424c-8e48-5cc0feba3623"
                       ],
                       "Issues": []
                     },
                     {
                       "Type": "VirtualNetwork",
                       "Id": "574ad521-7ab7-470c-b5aa-f1b4e6088888",
                       "Address": "10.124.144.2",
                       "NextHopIds": [],
                       "Issues": []
                     },
                     {
                       "Type": "VirtualNetwork",
                       "Id": "e717c4bd-7916-45bd-b3d1-f8eecc7ed1e3",
                       "Address": "10.124.146.2",
                       "NextHopIds": [],
                       "Issues": []
                     },

Centralising the troubleshooting tools under one command is obviously a great enhancement, but by also providing increased visibility into configurations or system performance make this a great update for your troubleshooting toolbox.