All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We’ll see if that changes over the weekend…

  • Mikina@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    I see a lot of hate ITT on kernel-level EDRs, which I wouldn’t say they deserve. Sure, for your own use, an AV is sufficient and you don’t need an EDR, but they make a world of difference. I work in cybersecurity doing Red Teamings, so my job is mostly about bypassing such solutions and making malware/actions within the network that avoids being detected by it as much as possible, and ever since EDRs started getting popular, my job got several leagues harder.

    The advantage of EDRs in comparison to AVs is that they can catch 0-days. AV will just look for signatures, a known pieces or snippets of malware code. EDR, on the other hand, looks for sequences of actions a process does, by scanning memory, logs and hooking syscalls. So, if for example you would make an entirely custom program that allocates memory as Read-Write-Execute, then load a crypto dll, unencrypt something into such memory, and then call a thread spawn syscall to spawn a thread on another process that runs it, and EDR would correlate such actions and get suspicious, while for regular AV, the code would probably look ok. Some EDRs even watch network packets and can catch suspicious communication, such as port scanning, large data extraction, or C2 communication.

    Sure, in an ideal world, you would have users that never run malware, and network that is impenetrable. But you still get at avarage few % of people running random binaries that came from phishing attempts, or around 50% people that fall for vishing attacks in your company. Having an EDR increases your chances to avoid such attack almost exponentionally, and I would say that the advantage it gives to EDRs that they are kernel-level is well worth it.

    I’m not defending CrowdStrike, they did mess up to the point where I bet that the amount of damages they caused worldwide is nowhere near the amount damages all cyberattacks they prevented would cause in total. But hating on kernel-level EDRs in general isn’t warranted here.

    Kernel-level anti-cheat, on the other hand, can go burn in hell, and I hope that something similar will eventually happen with one of them. Fuck kernel level anti-cheats.

  • YTG123@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    >Make a kernel-level antivirus
    >Make it proprietary
    >Don’t test updates… for some reason??

    • CircuitSpells@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      I mean I know it’s easy to be critical but this was my exact thought, how the hell didn’t they catch this in testing?

      • Voroxpete@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        Completely justified reaction. A lot of the time tech companies and IT staff get shit for stuff that, in practice, can be really hard to detect before it happens. There are all kinds of issues that can arise in production that you just can’t test for.

        But this… This has no justification. A issue this immediate, this widespread, would have instantly been caught with even the most basic of testing. The fact that it wasn’t raises massive questions about the safety and security of Crowdstrike’s internal processes.

      • Mikina@programming.dev
        link
        fedilink
        English
        arrow-up
        0
        ·
        4 months ago

        From what I’ve heard and to play a devil’s advocate, it coincidented with Microsoft pushing out a security update at basically the same time, that caused the issue. So it’s possible that they didn’t have a way how to test it properly, because they didn’t have the update at hand before it rolled out. So, the fault wasn’t only in a bug in the CS driver, but in the driver interaction with the new win update - which they didn’t have.

        • CircuitSpells@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 months ago

          How sure are you about that? Microsoft very dependably releases updates on the second Tuesday of the month, and their release notes show if updates are pushed out of schedule. Their last update was on schedule, July 9th.

          • Mikina@programming.dev
            link
            fedilink
            English
            arrow-up
            2
            ·
            4 months ago

            I’m not. I vaguely remember seeing it in some posts and comments, and it would explain it pretty well, so I kind of took it as a likely outcome. In hindsight, You are right, I shouldnt have been spreading hearsay. Thanks for the wakeup call, honestly!

    • areyouevenreal@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      4 months ago

      Lots of security systems are kernel level (at least partially) this includes SELinux and AppArmor by the way. It’s a necessity for these things to actually be effective.

    • Rinox@feddit.it
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 months ago

      I dunno, but doesn’t like a quarter of the internet kinda run on Azure?

  • jedibob5@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 months ago

    Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.

    • NaibofTabr@infosec.pub
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 months ago

      If all the computers stuck in boot loop can’t be recovered… yeah, that’s a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you’re responsible for it.

      This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

      • rxxrc@lemmy.mlOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven’t bricked everything.

        And yeah staged updates or even just… some testing? Not sure how this one slipped through.

        • dactylotheca@suppo.fi
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 months ago

          Not sure how this one slipped through.

          I’d bet my ass this was caused by terrible practices brought on by suits demanding more “efficient” releases.

          “Why do we do so much testing before releases? Have we ever had any problems before? We’re wasting so much time that I might not even be able to buy another yacht this year”

      • candybrie@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        4 months ago

        Why is it bad to do on a Friday? Based on your last paragraph, I would have thought Friday is probably the best week day to do it.

        • Lightor@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          4 months ago

          Most companies, mine included, try to roll out updates during the middle or start of a week. That way if there are issues the full team is available to address them.

    • Bell@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      1
      ·
      4 months ago

      Don’t we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don’t we have a better recovery system?

      • sandalbucket@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 months ago

        Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.

  • NaibofTabr@infosec.pub
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    4 months ago

    Wow, I didn’t realize CrowdStrike was widespread enough to be a single point of failure for so much infrastructure. Lot of airports and hospitals offline.

    The Federal Aviation Administration (FAA) imposed the global ground stop for airlines including United, Delta, American, and Frontier.

    Flights grounded in the US.

    The System is Down

  • Monument@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    4 months ago

    Honestly kind of excited for the company blogs to start spitting out their disaster recovery crisis management stories.

    I mean - this is just a giant test of disaster recovery crisis management plans. And while there are absolutely real-world consequences to this, the fix almost seems scriptable.

    If a company uses IPMI (Called Branded AMT and sometimes vPro by Intel), and their network is intact/the devices are on their network, they ought to be able to remotely address this.
    But that’s obviously predicated on them having already deployed/configured the tools.

  • Encrypt-Keeper@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao

      • Pringles@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        4 months ago

        Marginal? You must be joking. A vast amount of servers run on Windows Server. Where I work alone we have several hundred and many companies have a similar setup. Statista put the Windows Server OS market share over 70% in 2019. While I find it hard to believe it would be that high, it does clearly indicate it’s most certainly not a marginal percentage.

        • jj4211@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          4 months ago

          I’m not getting an account on Statista, and I agree that its marketshare isn’t “marginal” in practice, but something is up with those figures, since overwhelmingly internet hosted services are on top of Linux. Internal servers may be a bit different, but “servers” I’d expect to count internet servers…

  • richtellyard@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    This is going to be a Big Deal for a whole lot of people. I don’t know all the companies and industries that use Crowdstrike but I might guess it will result in airline delays, banking outages, and hospital computer systems failing. Hopefully nobody gets hurt because of it.

    • RegalPotoo@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Big chunk of New Zealands banks apparently run it, cos 3 of the big ones can’t do credit card transactions right now

      • index@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        4 months ago

        cos 3 of the big ones can’t do credit card transactions right now

        Bitcoin still up and running perhaps people can use that

    • shirro@aussie.zone
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      4 months ago

      They run Windows and all this third party software because they would rather pay subscriptions and give up control of their business than retain skilled staff. It has nothing todo with Linux vs Windows. Linux won’t stop doors falling off Boeing planes. It is the myopia of modern business culture.

  • thearch@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    Irrelevant but I keep reading “crowd strike” as “counter strike” and it’s really messing with me

  • ari_verse@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    A few years ago when my org got the ask to deploy the CS agent in linux production servers and I also saw it getting deployed in thousands of windows and mac desktops all across, the first thought that came to mind was “massive single point of failure and security threat”, as we were putting all the trust in a single relatively small company that will (has?) become the favorite target of all the bad actors across the planet. How long before it gets into trouble, either because if it’s own doing or due to others?

    I guess that we now know

    • SupraMario@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      No bad actors did this, and security goes in fads. Crowdstrike is king right now, just as McAfee/Trellix was in the past. If you want to run around without edr/xdr software be my guest.

      • Saik0@lemmy.saik0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        If you want to run around without edr/xdr software be my guest.

        I don’t think anyone is saying that… But picking programs that your company has visibility into is a good idea. We use Wazuh. I get to control when updates are rolled out. It’s not a massive shit show when the vendor rolls out the update globally without sufficient internal testing. I can stagger the rollout as I see fit.

  • scripthook@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    crowdstrike sent a corrupt file with a software update for windows servers. this caused a blue screen of death on all the windows servers globally for crowdstrike clients causing that blue screen of death. even people in my company. luckily i shut off my computer at the end of the day and missed the update. It’s not an OTA fix. they have to go into every data center and manually fix all the computer servers. some of these severs have encryption. I see a very big lawsuit coming…

    • dan@upvote.au
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      4 months ago

      . they have to go into every data center and manually fix all the computer servers

      Do they not have IPMI/BMC for the servers? Usually you can access KVM over IP and remotely power-off/power-on/reboot servers without having to physically be there. KVM over IP shows the video output of the system so you can use it to enter the UEFI, boot in safe/recovery mode, etc.

      I’ve got IPMI on my home server and I’m just some random guy on the internet, so I’d be surprised if a data center didn’t.