On the 1st March, 2023, Microsoft announced “New enhanced connection troubleshoot” for Azure Network watcher has gone GA. Previously Azure Network Watcher provided specialised stand alone tools for use with network troubleshooting but these have now been consolidated into one place with additional tests and actionable insights to assist with troubleshooting.
With customers migrating advanced, high-performance workloads to Azure, it’s essential to have better oversight and management of the intricate networks that support these workloads. A lack of visibility can make it challenging to diagnose issues, leaving customers with limited control and feeling trapped in a “black box.” To enhance your network troubleshooting experience, Azure Network Watcher combines these tools with the following features:
- Unified solution for troubleshooting all NSG, user defined routes, and blocked ports
- Actionable insights with step-by-step guide to resolve issues
- Identifying configuration issues impacting connectivity
- NSG rules that are blocking traffic
- Inability to open a socket at the specified source port
- No servers listening on designated destination ports
- Misconfigured or missing routes
These new features are not available via the portal at the moment:
The portal will display that there are connectivity issues, but will not provide the enhanced information. This is accessible via PowerShell, Azure CLI and the Rest API. I will now show the real reason this is not working.
Accessing “enhanced connection troubleshoot” output via PowerShell
I am using the following PowerShell to test the connection between the two machines:
$nw = get-aznetworkwatcher -location australiaeast
$svm = get-azvm -Name Machine1
$dvm = get-azvm -Name Machine2
Test-AzNetworkWatcherConnectivity -NetworkWatcher $nw -SourceId $svm.Id -DestinationId $dvm.Id -DestinationPort 445
This returns the following JSON:
ConnectionStatus : Unreachable AvgLatencyInMs : MinLatencyInMs : MaxLatencyInMs : ProbesSent : 30 ProbesFailed : 30 Hops : [ { "Type": "Source", "Id": "a49b4961-b82f-49da-ae2c-8470a9f4c8a6", "Address": "10.0.0.4", "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine1", "NextHopIds": [ "6c6f06de-ea3c-45e3-8a1d-372624475ced" ], "Issues": [ { "Origin": "Local", "Severity": "Error", "Type": "GuestFirewall", "Context": [] } ] }, { "Type": "VirtualMachine", "Id": "6c6f06de-ea3c-45e3-8a1d-372624475ced", "Address": "172.16.0.4", "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine2", "NextHopIds": [], "Issues": [] } ]
As you can see, the issues discovered are explained in more detail, in this case, the local firewall is affecting the communication. If we check the local Defender firewall, we can see there is a specific rule blocking this traffic:
If we remove the local firewall rule, connectivity is restored:
ConnectionStatus : Reachable AvgLatencyInMs : 1 MinLatencyInMs : 1 MaxLatencyInMs : 2 ProbesSent : 66 ProbesFailed : 0 Hops : [ { "Type": "Source", "Id": "f1b763a1-f7cc-48b6-aec7-f132d3fdadf8", "Address": "10.0.0.4", "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine1", "NextHopIds": [ "7c9c103c-44ab-4fd8-9444-22354e5f9672" ], "Issues": [] }, { "Type": "VirtualMachine", "Id": "7c9c103c-44ab-4fd8-9444-22354e5f9672", "Address": "172.16.0.4", "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine2", "NextHopIds": [], "Issues": [] } ]
The enhanced connection troubleshoot can detect 6 fault types:
- Source high CPU utilisation
- Source high memory utilisation
- Source Guest firewall
- DNS resolution
- Network security rule configuration
- User defined route configuration
The first four faults are returned by the Network Watcher Agent extension for Windows as demonstrated above. The remaining two faults are from the Azure fabric. As you can see below, when a Network Security Group is misconfigured on the source or destination, our issue returns, but the output displays clearly where and which network security group is at fault:
ConnectionStatus : Unreachable AvgLatencyInMs : MinLatencyInMs : MaxLatencyInMs : ProbesSent : 30 ProbesFailed : 30 Hops : [ { "Type": "Source", "Id": "3cbcbdbe-a6ec-454f-ad2e-946d6731278a", "Address": "10.0.0.4", "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine1", "NextHopIds": [ "29e33dac-45ae-4ea3-8a9d-83dccddcc0eb" ], "Issues": [] }, { "Type": "VirtualMachine", "Id": "29e33dac-45ae-4ea3-8a9d-83dccddcc0eb", "Address": "172.16.0.4", "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine2", "NextHopIds": [], "Issues": [ { "Origin": "Inbound", "Severity": "Error", "Type": "NetworkSecurityRule", "Context": [ { "key": "RuleName", "value": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/ConnectivityTest/providers/Microsoft.Network/networkSecurityGroups/Ma chine2-nsg/SecurityRules/DenyAnyCustom445Inbound" } ] } ] } ]
In addition to the fault detection, IP Flow is also a part of the enhanced connection troubleshoot, providing a list of hops to a service. An excerpt of a trace to a public storage account is below:
PS C:\temp> Test-AzNetworkWatcherConnectivity -NetworkWatcher $nw -SourceId $svm.Id -DestinationAddress https://announcementtest.blob.core.windows.net/test1 -DestinationPort 443 ConnectionStatus : Reachable AvgLatencyInMs : 1 MinLatencyInMs : 1 MaxLatencyInMs : 1 ProbesSent : 66 ProbesFailed : 0 Hops : [ { "Type": "Source", "Id": "23eb09fd-b5fa-4be1-83f2-caf09d18ada0", "Address": "10.0.0.4", "ResourceId": "/subscriptions/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/resourceGroups/CONNECTIVITYTEST/providers/Microsoft.Compute/virtualMachines/Machine1", "NextHopIds": [ "78f3961c-9937-4679-97a7-4a19f4d1232a" ], "Issues": [] }, { "Type": "PublicLoadBalancer", "Id": "78f3961c-9937-4679-97a7-4a19f4d1232a", "Address": "20.157.155.128", "NextHopIds": [ "574ad521-7ab7-470c-b5aa-f1b4e6088888", "e717c4bd-7916-45bd-b3d1-f8eecc7ed1e3", "cbe6f6a6-4281-402c-a81d-c4e3d30d2247", "84769cde-3f92-4134-8d48-82141f2d9bfd", "aa7c2b73-0892-4d15-96c6-45b9b033829c", "1c3e3043-98f2-4510-b37f-307d3a98a55b", "b97778cb-9ece-4e87-bf6d-71b90fac3847", "cb92d16d-d4fe-4233-b958-a4d3dbe78303", "ec9a2753-3a60-4fce-9d92-7dbbc0d0219d", "df2b1a3e-6555-424c-8e48-5cc0feba3623" ], "Issues": [] }, { "Type": "VirtualNetwork", "Id": "574ad521-7ab7-470c-b5aa-f1b4e6088888", "Address": "10.124.144.2", "NextHopIds": [], "Issues": [] }, { "Type": "VirtualNetwork", "Id": "e717c4bd-7916-45bd-b3d1-f8eecc7ed1e3", "Address": "10.124.146.2", "NextHopIds": [], "Issues": [] },
Centralising the troubleshooting tools under one command is obviously a great enhancement, but by also providing increased visibility into configurations or system performance make this a great update for your troubleshooting toolbox.