Government Technology Insider
  • About
  • State & Local
  • Civilian
  • Defense & IC
SUBSCRIBE
No Result
View All Result
  • Acquisition
  • AI & Data
  • Cybersecurity
  • CX
  • Digital Transformation
  • Hybrid Work
    • Work Smarter
  • Public Safety
  • Resources
    • Technology Trends Shaping the Future of Government
    • World of Work
    • Your Digital Transformation Path Starts Here
    • The Frontlines of Customer Experience
    • Innovative Solutions for Connecting Agencies
    • Be Ready For What’s Next
Government Technology Insider
  • Acquisition
  • AI & Data
  • Cybersecurity
  • CX
  • Digital Transformation
  • Hybrid Work
    • Work Smarter
  • Public Safety
  • Resources
    • Technology Trends Shaping the Future of Government
    • World of Work
    • Your Digital Transformation Path Starts Here
    • The Frontlines of Customer Experience
    • Innovative Solutions for Connecting Agencies
    • Be Ready For What’s Next
No Result
View All Result
Government Technology Insider
No Result
View All Result
Home Operations

IT Monitoring & the Five Stages of Grief

by Joel Dolisy
April 20, 2016
in Operations
Reading Time: 6 mins read
A A
Share on FacebookShare on Twitter

If you’ve worked in IT for more than 10 minutes, you know that things go wrong. In fact, that there’s no end of job security in IT because things go wrong.  But with the advent of IT monitoring and automation – building systems that automatically mind the shop, raise a flag when things start to go south, and give you the information needed to know what happened and when it happened so you can fix it – the future seems a little brighter.

After over a decade implementing monitoring systems at organizations large and small, I’ve become all too familiar with what might be called monitoring grief. This is what often occurs when you are tasked to monitor something, anything, and they ask you to do things you know are going to cause problems. It involves a series of behaviors I’ve grouped into five stages. Get it –the five stages of (IT monitoring) grief?

While agencies often go through these stages when rolling out monitoring for the first time, they can also occur when a group or department starts to seriously implement an existing solution, when new capabilities are added to a current monitoring suite, or simply when it’s Tuesday.

Spoiler alert: If you’re at all familiar with the standard Kubler-Ross model of the five stages of grief model, acceptance is not on this list.

Stage One: Monitor Everything

This is the initial monitoring non-decision, a response to the simple and innocent question, “What do I need to monitor?” The favorite choice of managers and teams who won’t actually get the ticket is to simply open the fire hose wide and request you to monitor “everything.” This choice is also frequently made by admins with a “hair-on-fire” problem in progress. This decision assumes that all the information is good information, and can be “tuned up” later. I guess everyone is in denial that there’s about to be an alert-storm.

Stage Two: The Prozac Moment

This stage follows closely on the heels of the first, when the recipient of 734 monitoring alert emails in five minutes comes to you and exclaims, “All these things can’t possibly be going wrong!” While this may be correct in principle, it ignores the fact that a computer only defines “going wrong” as specifically as the humans who requested the monitors in the first place. So, you ratchet things down to reasonable levels, but “too much” is still showing red and the reaction remains the same.

Worse, because the team is overloaded, they get angry and feel that monitoring must be wrong again. Except this time it isn’t wrong. It’s catching all the stuff that’s been going up and down for weeks, months, or years, but which nobody noticed. Either the failures self-corrected quickly enough, users never complained, or someone somewhere was jumping in and fixing it before anybody knew about it.

It’s at this moment you wish you could give the system owner Prozac so they will chill out and realize that knowing about outages is the first step to avoiding them in the future.

Stage Three: Painting the Roses Green

The next stage occurs when too many things are still showing as “down” and no amount of tweaking is making them show “up” because, ahem, they are down.

In a fit of stubborn pride, the system owner often admits something like: “They’re not down-down, they’re just, you know, a little down-ish right now.” And so they demand that you do whatever it takes to show the systems as up/good/green. This behavior characterizes the bargaining stage.

And I mean they’ll ask you to do anything like changing alert thresholds to impossible levels (“Only alert if it’s been down for 30 hours. No, make that a full week.”) and disabling alerts entirely. I can understand the pressure to adjust reporting to senior management, but let’s not defeat the purpose of monitoring, especially on critical systems.

What makes this stage even more embarrassing for all concerned is that the work involved is often greater than the work to actually fix the issue.

Stage Four: An Inconvenient Truth

If issues are suppressed sometimes for weeks or months, they will reach a point when there’s a critical error that can’t be glossed over. At that point, you and the system owner find yourselves on a service restoration team phone call with about a dozen other engineers and a few IT directors where everything is analyzed, checked and restarted in real-time.

This is about the time someone asks to see the performance data for the system — the one that’s been down for a month and a half, but hasn’t shown up on reports. For a system owner who has been avoiding dealing with the real issues, there is nowhere left to run or hide.

Stage Five: Finding the Right Balance

Assuming the system owner has managed through stage four with his or her job intact, stage five involves trying to get it right. Agencies need to make the investment to get their alerting thresholds set correctly, and vary them based on the criticality of the systems. There’s also a lot that smart tools can do to correlate alerts, and reduce the number of alerts the IT team has to manage. And you’ll just have to migrate some of your unreliable systems and fix the issues that are causing network or systems management problems, as time and budget allow.

And what of the system owners who started off by demanding, “monitor everything?” Don’t worry, they’ll be back after the next system outage — to give you more grief.

Looking for good resources to thwart the stages of IT monitoring grief?  Here’s the 101 from the SolarWinds Lab team:

Video not playing? Try it here.

Tags: IT AlertsIT MonitoringIT Troubleshooting

RELATED POSTS

observability over a network represented by an ipad with a microscope and various entities surrounding it.
AI & Data

Shifting from Monitoring to Observability Offers Government Agencies a Strategic Advantage

August 31, 2022
hybrid IT
Operations

Hybrid IT: Seeing is Believing

October 5, 2017
Please login to join discussion

TRENDING NOW

  • Advana

    Meet Advana: How the Department of Defense Solved its Data Interoperability Challenges

    9333 shares
    Share 3733 Tweet 2333
  • Laid-off Tech Workers Find Unexpected Opportunities with Federal Government Contractors

    13 shares
    Share 5 Tweet 3
  • The Five Pillars of Zero Trust Architecture

    430 shares
    Share 172 Tweet 108

CONNECT WITH US

Advertisement Banner Ad Advertisement Banner Ad Advertisement Banner Ad
Advertisement Banner Advertisement Banner Advertisement Banner
Advertisement Banner Advertisement Banner Advertisement Banner
Advertisement Banner Ad Advertisement Banner Ad Advertisement Banner Ad
MaaS Nebula Software Factory Banner Ad MaaS Nebula Software Factory Banner Ad MaaS Nebula Software Factory Banner Ad
Advertisement Banner Ad Advertisement Banner Ad Advertisement Banner Ad
Advertisment Banner Ad Advertisment Banner Ad Advertisment Banner Ad
Advertisement Banner Advertisement Banner Advertisement Banner
Advertisement Banner Ad Advertisement Banner Ad Advertisement Banner Ad

BECOME AN INSIDER

Get Government Technology Insider news and updates in your inbox.

Strategic Communications Group is a digital media company that helps business-to-business marketers drive customer demand through content marketing, content syndication, and lead identification.

Related Communities

Financial Technology Today
Future Healthcare Today
Modern Marketing Today
Retail Technology Insider
Today’s Modern Educator

Quick Links

  • Home
  • About
  • Contact Us

Become a Sponsor

Strategic Communications Group offers analytics, content marketing, and lead identification services. Interested?
Contact us!

© 2023 Strategic Communications Group, Inc.
Privacy Policy      |      Terms of Service

No Result
View All Result
  • Home
  • About Government Technology Insider
  • State & Local
  • Civilian
  • Defense & IC
  • Categories
    • Acquisition
    • AI & Data
    • Customer Experience
    • Cybersecurity
    • Digital Transformation
    • Hybrid Work
    • Public Safety
  • Contact Us