This was featured as the Capacitas thought of the Week and as I was the author thought I would capture it on my own blog! See the original here: https://www.capacitas.co.uk/insights/top-10-tips-on-getting-cloud-observability-right
- After buying a tool, create an ecosystem of people and integrations to get observability that provides value.
- Make sure the tool fits your technology stack (ok, this may seem obvious but be careful as the tool needs to work with legacy tech as well as the new and shiny tech).
- Spend time configuring the tools to make it readable and relevant to the users i.e., Rename IIS application pool AGKP_04520756 to UKAGKPPortal.
- Remove noise from the tool e.g. If volume /dev/u001/fred is always full then configure the tool not to show it in red!
- Try to avoid underlying infrastructure alerting, it doesn’t matter what your hardware is doing, it is your users experience that matter. Set alerts at the UX level.
- Configure alerts that align with the business. For example, set alerts for critical user transactions such as time to generate a quote or complete payment rather than a generic alert across all user transactions.
- Decide who are the consumers of the tool and work with them to make sure they are trained; the tool should be configured for their needs. I am not talking about just turning them into a dashboard. That would be teaching them how to fish!
- These tools do love data sources. The more you give them the more likely you can correlate the source of problems i.e., poor performance could be due to waiting for a VM to be scheduled on the hypervisor. You may never know this unless you are monitoring the hypervisor. Of course, the more you monitor, the more you pay!
- Don’t be afraid to have a dedicated monitoring team but ensure they are skilled in using the tool and not just configuring the tool. i.e. When production goes down, they are in the thick of it trying to resolve the issue.
- Keep an eye on the costs!