CrowdStrike Outage Punctuates Vendor Fatigue

Outsourcing Gaps Show IT Weaknesses

An exhausted customer vendor service rep snoozing on her computer with her hand still clutched to her coffee
Picture of Shawn Stewart

Shawn Stewart

Mr. Stewart has 27 years of experience with hundreds of international, commercial, military, and government IT projects. He holds certifications with ISC2, Cisco, Microsoft, CompTIA, ITIL, Novell, and others. He has a Masters in Cybersecurity, a Bachelors in IT, a Minor in Professional Writing, and is a published author.

Even before CrowdStrike disbaled Windows computers worldwide, companies of all sizes were feeling vendor fatigue. Many companies who were wooed to Cloud-based, off-site support for IT services were already exploring either in-house or local alternatives. Besides delays and communication issues, customers found far too many gaps from remote services. The CrowdStrike outage exposed one such major gap; the lack of physical in-house technology expertise.

RECAP

What Happened

According to CrowdStrike, an update was pushed to all of their Windows-based software agents without properly utilizing the Software Development Life Cycle (SDLC) (Link). In other words, they failed to test their update. The faulty update instigated a null exception point (Link) that caused Windows to fail, indicated by the Blue Screen of Death (BSOD) (Link).

View of airport screens in Windows Recovery mode - picture from euronews.com

Who Was Affected

The affected CrowdStrike agent software runs on Windows laptops, desktops, and servers. 8.5 million devices were affected by the CrowdStrike error, according to Microsoft. That is still only 1% of the global computers running Windows. However, CrowdStrike is the favored security software of Fortune 500 companies. The more sensitive the data, the more likely CrowdStrike ran on the system.

How Did CrowdStrike Crash Windows?

The CrowdStrike agent is an Endpoint Detection and Response (EDR) software. Because of its security and scanning requirements, the software loads into protected memory between Windows and the processor. This is to protect data in the event of Windows system files compromise before startup. The EDR software can intercept and prevent malicious actions that appear to come from Windows. This is the reason a failure with the EDR affected Windows fatally. Read about secure networks here (Link).

How Was It Is Resolved

Close up of woman hands connecting usb flash drive on a laptop computer vendorBecause the failed software lived as a system file inside the Windows core, someone had to physically remove the offending file from the device hard drive. For companies with local IT staff, this was a daunting task. Many affected computers running displays, such as in airport terminals, hanging in ceilings, or mounted in walls, required direct access. For those without local IT staff, the task often fell to knowledge workers and managers, and many waited for strapped resources to deploy to their site the following week.

Microsoft quickly acknowledged both the issue and the resolution, primarily to confirm the issue wasn’t their fault. They provided step-by-step instructions and a downloadable script to auto-correct the issue, if your computer could connect an external USB. Unfortunately, most security-conscious companies disable the recognition of USBs to prevent physical hacks and breaches.

How Could You Have Protected Yourself?

Software Development chart - SDLC - Drawn on greenboard vendorYou couldn’t. Everyone trusted CrowdStrike to follow the SDLC. They didn’t. They created the largest single global outage in Internet history (though I would debate WannaCry was far worse). It comes down to trust and Wall Street made it clear they do not trust CrowdStrike at the moment.

Vendor Gaps Abound

While CrowdStrike may have accentuated one of the primary gaps in remote IT coverage, physical device support is often the first gap to appear. The daunting task companies face today is managing the myriad of vendors that seemingly multiply every year. With each new vendor, a gap appears, created from the detailed legal language of what vendors will and will not cover.

Many organizations perceive gaps such as this through the lens of Risk Appetite. Deciding not to opt for local support, for example, is weighed against the expected need of such a resource. Vendors say once all systems are online and functional initially, no onsite resource is necessary. Many IT vendors ship new computers preloaded with all required software. Remote access allows support from anywhere, as long as the Internet is functioning. If a device fails, it is replaced, either by a knowledge worker or a dispatched IT technician.

No Plans For The Unexpected

The vendor has sold the company on the unnecessary cost of staffing resources locally, mainly because physical resources cost the most. The resources are then contractors and must be scheduling. The belief onsite resources are not required is pressed upon customers. That is, of course, until something unprecedented occurs. Think of not having a car. You would need to schedule a ride when you needed it. Sure, you could save money. But what if something unexpected happens. Risk Appetite and Avoidance use statistics provided by the vendor to weigh costs. The question becomes, “What are the odds I will ever need to touch every computer physically?”

young man holding electrical cable smoking after electrical accident with dirty burnt face in funny desperate expression calling with mobile phone asking for help in electricity DIY repairs danger concept vendorThe real fatigue of vendors is the unexpected burden placed at the feet of the customer. Your company doesn’t use CrowdStrike. However, your outsourced departments such as IT, Human Resources, Accounts Payable, and Logistics for the company faltered as their vendors were affected. How do you recover from an entire day of lost production that occurred at no fault of your own?

Who Do You Trust Now?

For those customers who were affected first hand, the CrowdStrike failure initially appeared to be a Microsoft failure from a faulty Windows update. Across the support gap, the finger pointed squarely on the victim of the failure and not the culprit. In the initial wave, it was Microsoft, not CrowdStrike, that discovered the issue and brought it to light. This is because Microsoft received the first support calls to fix Windows.

Companies with multiple vendors find themselves in the finger pointing game every time a failure occurs. For example, when a local phone, computer, or printer is not functioning properly, the company must follow the chain of support and engage the expected responsible party. It becomes that party’s duty to resolve the issue or provide a reasonable explanation. The copier technician declares, “the copier is fine. The network is not reaching a Cloud-based shared file system”. And the finger-pointing begins.

Fingers pointing to the center vendorWho’s On First?

The Internet Service Provider (ISP) is contacted but immediately responds, “the Internet connection is fine. You should verify your local connections and your Cloud connection.” Another finger points. Learn how the Internet works here (Link).

The Cloud provider is contacted but also responds, “all Cloud services are online. Verify your Internet connection and your local connections.” More finger pointing.

The outsourced IT support, who bills hourly, is reluctantly contacted. They spend hours troubleshooting. They declare, “the systems appear to be working but there may be an issue with the network.” Another pointed finger.

The outsourced Network support team, who also bills hourly, is contacted. They reiterate what every other vendor says, then as a default response says, “please have your wiring vendor confirm the cabling.”

Wiring vendor? You don’t have a wiring vendor. Once you find one, they come out, completely replace the wiring, verify and test to confirm, and send a bill. The problem remains and the wiring vendor says, “speak with your network vendor. The wiring is fine.” Now you’re going in circles.

Comfort Zones and Job Descriptions

An office with only knowledge workers, meaning they have no local IT support, must follow the directions from their vendors. No one has the experience or desire to question their recommendations. When the CrowdStrike error occurred, it wasn’t an IT employee sitting on the phone with Microsoft, it was a receptionist or a billing clerk. Imagine the horror when the Microsoft representative told them to boot into Safe Mode and delete a system file? “What? That’s not my job. How do I even do that?”

Portrait of a young woman at the desk with a laptop, looking puzzled at the screen of the mobile she holds at her hand and message. Business concept photo, lifestyle vendorThe wasted time and money companies encounter bouncing from vendor to vendor when any issue occurs creates more than a simple inconvenience for knowledge workers. Many will say, “I am a bookkeeper, not a tech,” when told to troubleshoot for the remote vendor. A common perk for those moving from SMB businesses to large companies is an IT staff. This allows knowledge workers to focus on their job roles and not be the hands and eyes of an outside vendor.

The Wrong Vendor

Companies are sold a streamline, cost-effective solution that eliminates full-time employees. Accountants, bookkeepers, IT, Human Resources, attorneys, and other professional jobs are replaced by outsourced vendors in nearly every company and not at a significant cost savings. In the best cases, companies receive more expertise than they could afford with the minor inconvenience of remote collaboration. In reality, companies receive a minimal safety net of hourly, absent, and often rude contractors.

The problem? Companies fall for the marketing that outsourcing is cost effective without any negative impact. Rarely is this the case without spreading undue stress to the remaining employees. Additional costs and loss of production far outweigh any projected savings. Employee morale suffers. Productivity screeches to a halt. Savings are lost at the first error or outage. Customers and employees are mistreated or disregarded.

Finding The Right Vendor

Service Level Agreement SLA is shown using a text in the document vendorThe right solution starts with the contract. A strong Service Level Agreement (SLA) ensures the vendor reacts and provides service in a timely manner. If the vendor is unable to meet the SLA, they are typically required to refund or credit a portion of their service cost. For IT, do not expect to see an SLA based on problem resolution. IT issues are sometimes complex and often caused by, you guessed it, another vendor. The most common SLA involves response time to acknowledge the problem and assign a resource. With SLAs, you get what you pay for.

Specifically with IT outsourcing, ensure your vendor is required to manage all other vendors involved with IT services. A solid vendor will manage all communications with the Internet Service Provider, Cloud Providers, security vendors, local staff, and management. If Letters of Authorization (LOAs) are required, company management and ownership must provide these to give the vendor the ability to manage accounts on their behalf thereby resolving issues without involving knowledge workers. This is often called turn-key or white glove service and, again, you get what you pay for.

Keeping Vendors Honest

Service excellence, high quality service concept. Continuous improvement and invest in training and technology to enhance the customer experience, increased customer satisfaction, loyalty vendorConstant reporting and transparency ensures the vendor is meeting their SLAs and completing tasks in an efficient manner. Every IT vendor provides a Help Desk with a ticketing solution for both interface with customers and reporting. Weekly and root cause analysis reports give customers the understanding of why issues occurred, how they were resolved, and if additional steps are required, such as replacing equipment or an inadequate vendor. Weekly touch point calls allow the customers to voice concerns and prevent vendors from hiding behind email.

The hardest part for most companies is coping with the expected cost of quality services. Statistics show companies spend on average between $7,500 and $10,000 per user annually for IT and cybersecurity. A company with ten (10) employees should expect to spend $100,000 annually for IT. That includes hardware, software, subscriptions, and vendors. If you’re not seeing the level of support you expect, you may need to adjust your budget.

What We’ve Learned

Outsourcing rarely lives up the hype and effectiveness presented by marketing or salespeople. When implemented successfully by a qualified vendor who is willing to become a partner of the company, it can be an extremely beneficial arrangement. Understanding, however, that someone needs to accept the risk of gaps will help companies banish the myth that more vendors mean less headache. While finding the “one vendor to rule them all” may be difficult, it is the key to thriving in your vendor-filled business.

Need Help?

Reach out to us! We’re all in this together. Visit our contact page to submit an inquiry. Also, please follow us on social media for the latest updates.

Check Out Our Podcast!

The Hillbilly Hacker Podcast is the hottest new show on the Internet to learn about today’s latest technology in simple words. You can find the Hillbilly Hacker on Spotify, Apple, Amazon, or where ever you find your podcasts. (Link)

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *