Major IT outage affecting banks, airlines, media outlets across the world

rxxrc@lemmy.ml · edit-2 2 months ago

Major IT outage affecting banks, airlines, media outlets across the world

Mikina@programming.dev · 2 months ago

I see a lot of hate ITT on kernel-level EDRs, which I wouldn’t say they deserve. Sure, for your own use, an AV is sufficient and you don’t need an EDR, but they make a world of difference. I work in cybersecurity doing Red Teamings, so my job is mostly about bypassing such solutions and making malware/actions within the network that avoids being detected by it as much as possible, and ever since EDRs started getting popular, my job got several leagues harder.

The advantage of EDRs in comparison to AVs is that they can catch 0-days. AV will just look for signatures, a known pieces or snippets of malware code. EDR, on the other hand, looks for sequences of actions a process does, by scanning memory, logs and hooking syscalls. So, if for example you would make an entirely custom program that allocates memory as Read-Write-Execute, then load a crypto dll, unencrypt something into such memory, and then call a thread spawn syscall to spawn a thread on another process that runs it, and EDR would correlate such actions and get suspicious, while for regular AV, the code would probably look ok. Some EDRs even watch network packets and can catch suspicious communication, such as port scanning, large data extraction, or C2 communication.

Sure, in an ideal world, you would have users that never run malware, and network that is impenetrable. But you still get at avarage few % of people running random binaries that came from phishing attempts, or around 50% people that fall for vishing attacks in your company. Having an EDR increases your chances to avoid such attack almost exponentionally, and I would say that the advantage it gives to EDRs that they are kernel-level is well worth it.

I’m not defending CrowdStrike, they did mess up to the point where I bet that the amount of damages they caused worldwide is nowhere near the amount damages all cyberattacks they prevented would cause in total. But hating on kernel-level EDRs in general isn’t warranted here.

Kernel-level anti-cheat, on the other hand, can go burn in hell, and I hope that something similar will eventually happen with one of them. Fuck kernel level anti-cheats.

YTG123@sopuli.xyz · 2 months ago

>Make a kernel-level antivirus
>Make it proprietary
>Don’t test updates… for some reason??

CircuitSpells@lemmy.world · 2 months ago

I mean I know it’s easy to be critical but this was my exact thought, how the hell didn’t they catch this in testing?

Voroxpete@sh.itjust.works · 2 months ago

Completely justified reaction. A lot of the time tech companies and IT staff get shit for stuff that, in practice, can be really hard to detect before it happens. There are all kinds of issues that can arise in production that you just can’t test for.

But this… This has no justification. A issue this immediate, this widespread, would have instantly been caught with even the most basic of testing. The fact that it wasn’t raises massive questions about the safety and security of Crowdstrike’s internal processes.

Mikina@programming.dev · 2 months ago

From what I’ve heard and to play a devil’s advocate, it coincidented with Microsoft pushing out a security update at basically the same time, that caused the issue. So it’s possible that they didn’t have a way how to test it properly, because they didn’t have the update at hand before it rolled out. So, the fault wasn’t only in a bug in the CS driver, but in the driver interaction with the new win update - which they didn’t have.

CircuitSpells@lemmy.world · 2 months ago

How sure are you about that? Microsoft very dependably releases updates on the second Tuesday of the month, and their release notes show if updates are pushed out of schedule. Their last update was on schedule, July 9th.

Mikina@programming.dev · 2 months ago

I’m not. I vaguely remember seeing it in some posts and comments, and it would explain it pretty well, so I kind of took it as a likely outcome. In hindsight, You are right, I shouldnt have been spreading hearsay. Thanks for the wakeup call, honestly!

areyouevenreal@lemm.ee · 2 months ago

Lots of security systems are kernel level (at least partially) this includes SELinux and AppArmor by the way. It’s a necessity for these things to actually be effective.

bdonvr@thelemmy.club · 2 months ago

The amount of servers running Windows out there is depressing to me

Rinox@feddit.it · 2 months ago

I dunno, but doesn’t like a quarter of the internet kinda run on Azure?

jedibob5@lemmy.world · 2 months ago

Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.

Wooki@lemmy.world · edit-2 2 months ago

Testing in production will do that

NaibofTabr@infosec.pub · 2 months ago

If all the computers stuck in boot loop can’t be recovered… yeah, that’s a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you’re responsible for it.

This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

rxxrc@lemmy.ml · 2 months ago

Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven’t bricked everything.

And yeah staged updates or even just… some testing? Not sure how this one slipped through.

dactylotheca@suppo.fi · 2 months ago

Not sure how this one slipped through.

I’d bet my ass this was caused by terrible practices brought on by suits demanding more “efficient” releases.

“Why do we do so much testing before releases? Have we ever had any problems before? We’re wasting so much time that I might not even be able to buy another yacht this year”

rozodru@lemmy.ca · edit-2 1 month ago

deleted by creator

candybrie@lemmy.world · 2 months ago

Why is it bad to do on a Friday? Based on your last paragraph, I would have thought Friday is probably the best week day to do it.

Lightor@lemmy.world · edit-2 2 months ago

Most companies, mine included, try to roll out updates during the middle or start of a week. That way if there are issues the full team is available to address them.

Bell@lemmy.world · 2 months ago

Don’t we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don’t we have a better recovery system?

sandalbucket@lemmy.world · 2 months ago

Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.

wizardbeard@lemmy.dbzer0.com · edit-2 2 months ago

This didn’t go through Windows Update. It went through the ctowdstrike software directly.

NaibofTabr@infosec.pub · edit-2 2 months ago

Wow, I didn’t realize CrowdStrike was widespread enough to be a single point of failure for so much infrastructure. Lot of airports and hospitals offline.

The Federal Aviation Administration (FAA) imposed the global ground stop for airlines including United, Delta, American, and Frontier.

Flights grounded in the US.

The System is Down

Monument@lemmy.sdf.org · edit-2 2 months ago

Honestly kind of excited for the company blogs to start spitting out their ~~disaster recovery~~ crisis management stories.

I mean - this is just a giant test of ~~disaster recovery~~ crisis management plans. And while there are absolutely real-world consequences to this, the fix almost seems scriptable.

If a company uses IPMI (~~Called~~ Branded AMT and sometimes vPro by Intel), and their network is intact/the devices are on their network, they ought to be able to remotely address this.
But that’s obviously predicated on them having already deployed/configured the tools.

Encrypt-Keeper@lemmy.world · 2 months ago

Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao

szczuroarturo@programming.dev · 2 months ago

I always wondered who even used windows server given how marginal its marketshare is. Now i know from the news.

Pringles@lemm.ee · 2 months ago

Marginal? You must be joking. A vast amount of servers run on Windows Server. Where I work alone we have several hundred and many companies have a similar setup. Statista put the Windows Server OS market share over 70% in 2019. While I find it hard to believe it would be that high, it does clearly indicate it’s most certainly not a marginal percentage.

jj4211@lemmy.world · 2 months ago

I’m not getting an account on Statista, and I agree that its marketshare isn’t “marginal” in practice, but something is up with those figures, since overwhelmingly internet hosted services are on top of Linux. Internal servers may be a bit different, but “servers” I’d expect to count internet servers…

catloaf@lemm.ee · 2 months ago

Most servers aren’t Internet-facing.

richtellyard@lemmy.world · 2 months ago

This is going to be a Big Deal for a whole lot of people. I don’t know all the companies and industries that use Crowdstrike but I might guess it will result in airline delays, banking outages, and hospital computer systems failing. Hopefully nobody gets hurt because of it.

RegalPotoo@lemmy.world · 2 months ago

Big chunk of New Zealands banks apparently run it, cos 3 of the big ones can’t do credit card transactions right now

index@sh.itjust.works · 2 months ago

cos 3 of the big ones can’t do credit card transactions right now

Bitcoin still up and running perhaps people can use that

sasquash@sopuli.xyz · 2 months ago

never do updates on a Friday.

rozodru@lemmy.ca · edit-2 1 month ago

deleted by creator

Gemini24601@lemmy.world · 2 months ago

Why do people run windows servers when Linux exists, it’s literally a no brainer.

shirro@aussie.zone · edit-2 2 months ago

They run Windows and all this third party software because they would rather pay subscriptions and give up control of their business than retain skilled staff. It has nothing todo with Linux vs Windows. Linux won’t stop doors falling off Boeing planes. It is the myopia of modern business culture.

Swarfega@lemm.ee · 2 months ago

Because all software runs from Linux right…

secret300@lemmy.sdf.org · edit-2 2 months ago

It could if more people just used Linux

thearch@sh.itjust.works · 2 months ago

Irrelevant but I keep reading “crowd strike” as “counter strike” and it’s really messing with me

ChapulinColorado@lemmy.world · 2 months ago

Think of it as ClownStrike, they will be known as a bunch of clowns after this.

ari_verse@lemm.ee · 2 months ago

A few years ago when my org got the ask to deploy the CS agent in linux production servers and I also saw it getting deployed in thousands of windows and mac desktops all across, the first thought that came to mind was “massive single point of failure and security threat”, as we were putting all the trust in a single relatively small company that will (has?) become the favorite target of all the bad actors across the planet. How long before it gets into trouble, either because if it’s own doing or due to others?

I guess that we now know

SupraMario@lemmy.world · 2 months ago

No bad actors did this, and security goes in fads. Crowdstrike is king right now, just as McAfee/Trellix was in the past. If you want to run around without edr/xdr software be my guest.

Saik0@lemmy.saik0.com · 2 months ago

If you want to run around without edr/xdr software be my guest.

I don’t think anyone is saying that… But picking programs that your company has visibility into is a good idea. We use Wazuh. I get to control when updates are rolled out. It’s not a massive shit show when the vendor rolls out the update globally without sufficient internal testing. I can stagger the rollout as I see fit.

misk@sopuli.xyz · 2 months ago

My work PC is affected. Nice!

wreckedcarzz@lemmy.world · 2 months ago

Plot twist: you’re head of IT

scripthook@lemmy.world · 2 months ago

crowdstrike sent a corrupt file with a software update for windows servers. this caused a blue screen of death on all the windows servers globally for crowdstrike clients causing that blue screen of death. even people in my company. luckily i shut off my computer at the end of the day and missed the update. It’s not an OTA fix. they have to go into every data center and manually fix all the computer servers. some of these severs have encryption. I see a very big lawsuit coming…

dan@upvote.au · edit-2 2 months ago

. they have to go into every data center and manually fix all the computer servers

Do they not have IPMI/BMC for the servers? Usually you can access KVM over IP and remotely power-off/power-on/reboot servers without having to physically be there. KVM over IP shows the video output of the system so you can use it to enter the UEFI, boot in safe/recovery mode, etc.

I’ve got IPMI on my home server and I’m just some random guy on the internet, so I’d be surprised if a data center didn’t.

lud@lemm.ee · edit-2 2 months ago

deleted by creator

index@sh.itjust.works · 2 months ago

play stupid games win stupid prizes

Major IT outage affecting banks, airlines, media outlets across the world

Major IT outage affecting banks, airlines, media outlets across the world

'Completely unprecedented' outage causes havoc with IT systems across globe — as it happened