This week I’ve been working with a customer that is experiencing intermittent issues connecting into Windows Virtual Desktop from the RD Client installed on their corporate Windows 10 devices and was asked to formulate a list of troubleshooting steps that their IT team could follow to help find and resolve the root cause.
I reached out to Jim Moyle, our aligned WVD Global Black Belt for his thoughts, and in this post, I want to share those initial troubleshooting steps, along with the rationale for each, put them out to the wider WVD community for feedback and over the coming weeks update this article with the identified root cause and the steps taken to resolve the issue.
For info and early clarification, this customer is using the Fall 2019 release of Windows Virtual Desktop.
So, what is the problem?
As introduced, the customer is reporting that certain users, not all, are intermittently unable to connect to their WVD resource using the RD Client installed on their corporate Windows 10 device, furthermore, they never receive any errors (such as resources not available) nor have they indicated that the connection attempt times out (I’ll double-check this and update if required), it simply doesn’t connect.
The below screenshot was provided by an affected user, this shows the RD Client attempting to connect to a selected remote desktop.
Before delving any deeper and starting to define the tests to be undertaken lets quickly recap on how a user connects to WVD, that way once we do start defining a particular test we better understand the rationale behind it, that is, what are we trying to prove or disprove.
The diagram below shows the WVD connection flow.
Step 1 > The user launches the RD Client which connects to Azure AD, user signs in, and Azure AD returns token.
Step 2 > The RD Client uses the previously generated token and authenticates to Web Access, the Broker then queries the database to determine the resources (Remote Apps and Desktops) that the user is assigned to.
Step 3 > The user selects a resource (Remote App or Desktop) and the RD Client connects to the Gateway.
Step 4 > Finally, the Broker orchestrates the connection from the WVD instance (the Azure VM) to the Gateway (aka Reverse Connect).
Note, on start-up the RD Client will always refresh your feed, as below, this is the RD Client running through steps 1 and 2 in the above connection flow.
Now, let’s quickly cover some basic assumptions before we get into the testing, again I’ll update these as I speak with the customer IT team and understand more of the nuances of the issue.
Assumption 1 > WVD is a global service, as such, for resilience, it operates many instances of the WVD control plane (the backend services, such as Web Access, Broker and Gateway shown in the connection flow diagram) in each region, however, based on availability at the time the control plane managing your user’s connections into WVD may not be running in the same region as the WVD VM’s themselves. Azure use their Front Door (and Traffic Manager) service to provide a resilient and optimised connection to the control plane – let’s assume a control plane is always up and available.
However, if we wanted to clarify the control plane you’re using is healthy we could use the below Powershell commands.
# Import Fall 2019 WVD module Import-Module -Name Microsoft.RDInfra.RDPowerShell # Connect to WVD Add-RdsAccount -DeploymentUrl "https://rdbroker.wvd.microsoft.com" # Get control plane info Invoke-RestMethod -Uri "https://rdweb.wvd.microsoft.com/api/health"
The below shows the results of the script, note the service is reporting as healthy and more importantly the Region URL shows that actual control plane you’re using.
Assumption 2 > This is only affecting certain users, other users can successfully connect to the same WVD resource at the same time others cannot indicating that the WVD VM’s themselves are healthy.
Again, if we want to verify the health of the WVD session hosts within a given host pool we could use the below Powershell commands.
# Import Fall 2019 WVD module Import-Module -Name Microsoft.RDInfra.RDPowerShell # Connect to WVD Add-RdsAccount -DeploymentUrl "https://rdbroker.wvd.microsoft.com" # Set Session Host Status Get-RdsSessionHost -TenantName WVD-Tenant-Name -HostPoolName WVD-Host-Pool-Name | Select SessionHostName, Status
The results will be shown as below.
Assumption 3 > This is only affecting users on their corporate devices, I’ll double-check this with the customer IT team and update if needed.
Assumption 4 > The customer has all the WVD backend services opened and available through the corporate firewall and web proxies. I know they use Cisco Umbrella for web proxy services, so something to be mindful of.
Assumption 5 > The user sees the same issue if they are on their corporate LAN, VPN or connected to the open internet from their home broadband as the majority of their staff are working from home.
So, what tests are we going to run when a user reports they cannot connect and why?
Test 1 > Can the user access the same WVD resource from the HTML5 interface at https://aka.ms/wvdweb?
Why? We need to narrow down whether the issues is only with the connection initiated from the RD Client, testing from the WVD Web Interface should help prove this. If the user is able to authenticate to the web interface, see all of their assigned WVD resources and then successfully logon to a desktop it proves the issue is not with the control plane, their assignment or WVD session host.
Test 2 > Can the user resolve the WVD Global URL in DNS?
Why? I’m almost certain that whatever the route cause is it will be environmental, that is, something occurring at that exact time on that device that is hindering the connection attempt. This is a very simple test to ensure that the device is able to resolve the WVD Global URL that is used to forward to the regional control plane instance and initiate the connections.
C:\Users\DeanLawrence>nslookup rdweb.wvd.microsoft.com Server: SkyRouter.Home Address: 192.168.0.1 Non-authoritative answer: Name: waws-prod-db3-3dec2181.cloudapp.net Address: 18.104.22.168 Aliases: rdweb.wvd.microsoft.com rdweb-prod-geo.trafficmanager.net mrs-neur1c100-rdweb-prod.wvd-ase-neur1c100-prod.p.azurewebsites.net waws-prod-db3-3dec2181.sip.p.azurewebsites.windows.net
We could build on that test slightly using the Test-NetConnection cmdlet to test connectivity to the Regional URL (from test 2) over HTTP.
PS C:\Users\DeanLawrence> Test-NetConnection rdweb-neu-r1.wvd.microsoft.com -CommonTCPPort HTTP -InformationLevel Detailed ComputerName : rdweb-neu-r1.wvd.microsoft.com RemoteAddress : 22.214.171.124 RemotePort : 80 NameResolutionResults : 126.96.36.199 MatchingIPsecRules : NetworkIsolationContext : Internet IsAdmin : False InterfaceAlias : WiFi SourceAddress : 192.168.0.124 NetRoute (NextHop) : 192.168.0.1 TcpTestSucceeded : True
Test 3 > Clear RD Client Subscriptions and Re-Subscribe
Why? As mentioned earlier, on start-up the RD Client performed a refresh of the feed, this then caches the subscription details in the registry. This test is looking at whether there could potentially be an issue either with the cached settings or the authentication token. You can unsubscribe from the RD Client itself by clicking the 3 dot menu next to the tenant name and selecting Unsubscribe or running the below from a command prompt.
msrdcw.exe /reset /f
Well, that’s it for now (16/05/20), if you have any suggestions please send them my way on Twitter, I’ll update this post as soon as I’ve had a chance to work with the customers IT team and investigate these issues closer.
Part 2 in this blog series is up now, click here.