Dynatrace Performance Troubleshooting Example

They do say the exception proves the rule. In this example, I do deviate from how I normally troubleshoot a problem. But, it was a bit of an odd problem!

One of the account managers I work with was getting worried. They are using data from Dynatrace to calculate an SLA metric for end user response times. In the recent months the response time for a key transaction had jumped significantly. pushing the value into the red. The SLA reports the 95th percentile. So, I was asked to have a look to see what the problem could be.

Looking at the response time for the transaction the median is very good at 350ms so I switched to the slowest 10% (90th percentile) and that was just over one minute. So, the next step was to look at the waterfall diagram for one of the slow pages. The waterfall is shown in the graphic below:

The waterfall is showing the delay is for the OnLoad processing. Typically the OnLoad processing is the execution of JavaScript after the page is loaded.

Normally, I would then try to recreate the problem myself and try to profile the JavaScript with the browser developer tools but that takes time and I still had some more digging to do in Dynatrace. Next, I wanted to see how this looked for a users session and I noticed something odd. Here, is the session for a user.

There are few things of interest:

(1) The user is just using the problem page/transaction

(2) The page is called every minute. (This suggests that this is automatically called rather than initiated by the user)

(3) Response time is good and then goes bad, and is a pretty consistent value.

I was surprised that a page this slow didn’t already have users complaining about it. So, I decided to see if I knew any of the users and then I could just check this with them. As it happens I recognized one of the users and I gave them a call and the mystery was explained.

The config.aspx page is a status page used by certain users, It transpires that this is accessed from a desktop device within the data center which means users have to remote desktop onto the desktop from their laptops (don’t ask why). We, did some live tests and discovered it occurred when the browser was left open and the remote desktop went into sleep mode! Therefore, when we get slow pages there isn’t a real user at the end waiting for a response.

Quick chat with the account manager and we agreed that page should be excluded from the SLA calculation. Problem sorted!

Author: loadtestguy

Just a performance tester that wants to record the things I should remember about LoadTesting

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: