It has been a few years since we last talked. Again, I apologize for being three and crying hysterically on your lap at the mall. I know you were probably gearing up for your trip around the globe, and little me did not make that easier. Please remember the good times we had, like this one:
|Purple was in that year|
Logically Monitors ApplicationsI am a firm believer that transaction based monitoring from the end user perspective is the only way to truly understand where and when a problem occurs in todays applications. Would it not be great if there was solution that could tell me EXACTLY what services are invoked in a delivery chain? I drew up something to help visualize this communication path:
The tool should be able to inject its own tag to follow a transaction across multiple tiers, I hate solutions that require someone to define how transactions flow through an environment. We can put robots on Mars, so we should be able to figure out how services communicate within an application without having to manually define each interaction.
Eliminates the Need for a War RoomI have a problem with war rooms. Fundamentally when a war room is called together the individuals are not looking for the root cause, but looking for data to back up the fact they are not at fault. If I was on a call at 8PM on a Friday, I am looking to get my butt off that call ASAP so I can go out... I mean go do good Christian things like volunteer at a soup kitchen. In most cases a war room can cause more confusion and frustration. They are are an illogical approach to resolving an issue because when an application problem happens, it is most likely because of a singular event that cascades to other incidents. The tool should be able to determine what service/process/host/link is causing the end user impacting issues immediately. On top of that, how many times do multiple alarms go off because of a single problem? The tool should be able to use causation to get to the real root cause as opposed to guessing a user action response time anomaly is caused by a singular query degradation:
Powerful, Simple, and Trustworthy AnalysisIf a tool is blaming my component for a problem, it better have some damn (err DARN) good proof that I am at fault. I want to challenge it with questions like; "So, you're telling me this method has high execution time, what is that execution time breakdown?" I want to know that the timeout exception seen at the front end tier is directly correlated to a downstream call:
Data driven decisions are key. Without the proper data, I end up making doubt driven decisions which ultimately do not resolve the open issue.
Deployable to any EnvironmentTodays world is all about the cloud. Tomorrows world will probably be a micro-service container based deployments. The solution should address the needs of today, with a roadmap to properly prepare my initiatives for future endeavors. I do not want to work with a tool that is going to rely off of the same architecture for ever. The tool should already start answering my concerns around internet of things and docker based applications:
Let us not forget about today as well. I still work with mainframes and large ERP like systems which are not going away any time soon. These applications although stable most of the time, have their own performance issues. Please make sure we have coverage for these types of systems as well. Something like this should help:
Easy to Setup AND Maintain
Todays applications are not heterogeneous. However, the way to get visibility into a .NET process on windows should practically the same as monitoring a docker JVM deployed on Linux. On top of that, I should not have to bring on a certified expert every time I want to change my dashboards or upgrade the monitoring tool. I am going to swing for the fences and say the administration of the tool should be able to get handled by less than one full time resource. I cannot really portray this in a screenshot, so here is a picture I drew for you:
PS: Give Mrs. Clause my love
PPS: I also want a drone