That’s all well and good, but many of these Windows machines were headless or used by extremely non-technical people - think tills at your supermarket or airport check-in desks. Worse, some of these installations were running in the cloud, so console access would have been tricky.
Funny you should mention people at the airport. I work at the airport, but not for Fronteer. My sister was flying on thursday, and nobody could get a boarding pass printed. When I came down, thinking my sister was throwing a tantrum over nothing, I see a line longer than a football field. When trying to ask a Fronteer employee what happened, he just threw his hands in the air and said “I DON’T FUCKING KNOW, OK??? NOBODY KNOWS WHAT THE FUCK IS GOING ON!!! YOU SEE THIS??? YOU SEE THIS SHIT??? YOU THINK I’M JUST DENYING PEOPLE FOR FUN??? WHY DON’T I GO GRAB MY TRIDENT, AND I CAN STAB ALL OF YOU OVER AN OPEN FLAME!!! BECAUSE I’M THE DEVIL, RIGHT??? RIGHT??? THAT’S WHAT YOU’RE SAYING!!!”
And all I said was “Hey, my sister is flying today and…”
You think THAT guy is going to sit there and reformat a PC, or restore PC snapshots to previous update? He’s the kind of guy who SHOULD BE smoking weed at work. This platform is very tech savy, but they often forget that a very very small percentage of people hold their PC knowledge. Now what would happen if I threw a tech savy person into an auto garage, and told him to replace the gaskets of an engine. Would they know how? Would they enjoy a room full of mechanics laughing at them?
I’m not saying you specifically. I’m agreeing with you. I’m just adding to your point to an audience that I think sometimes misses the forest through the trees.
The cloud systems would have been a problem. Any local systems, a non-technical user, could have easily done because their IT department could simply tell them, turn on your computer, and when it gets to this screen with these words, press the down arrow key one time and press enter, and your computer will boot normally.
Their willingness to do it would primarily come from the fact that they have a job to do, and if their co-workers are doing their jobs because they followed the instruction and they are not, then the boss is going to have a nice look at them.
This relies on the assumption that everyone else, or at least a significant portion, in the office managed to do it.
I’m not talking about whether or not they’re actually physically capable of it, of course they are. Im talking about how people immediately shut down and pretend they can’t follow simple directions the second something relates to a compute.
…until the CrowdStrike agent updated, and you wind up dead in the water again.
The whole point of CrowdStrike is to be able to detect and prevent security vulnerabilities, including zero-days. As such, they can release updates multiple times per day. Rebooting in a known-safe state is great, but unless you follow that up with disabling the agent from redownloading the sensor configuration update again, you’re just going to wing up in a BSOD loop.
A better architectural solution like would have been to have Windows drivers run in Ring 1, giving the kernel the ability to isolate those that are misbehaving. But that risks a small decrease in performance, and Microsoft didn’t want that, so we’re stuck with a Ring 0/Ring 3 only architecture in Windows that can cause issues like this.
That assums the file is not stored on a writable section of the filesystem and treated as application data and thus wouldn’t survive a rollback. Which it likey would.
I’m familiar enough with Linux but never used an immutable distro. I recognize the technical difference between what you describe and “go delete a specific file in safe mode”. But how about the more generic statement? Is this much different from “boot in a special way and go fix the problem”? Is any easier or more difficult than what people had to do on windows?
Primarily it’s different because you would not have had to boot into any safe mode. You would have just booted from the last good image from like a day ago and deleted the current image and kept using the computer.
I don’t think any of the major distros do it currently (some are working twards it tho), but there are ways (primarily/only one I know is with systemd-boot). It invokes one of the boot binaries (usually “Unified Kernel Images”) that are marked as “good” or one that still has “tries left” (whichever is newer). A binary that has “tries left” gets that count decremented when the boot is unsuccessful and when it reaches 0 it is marked as “bad” and if it boot successfully it gets marked as “good”.
So this system is basically just requires restarting the system on an unsuccessful boot if it isn’t done already automatically.
Turn off computer boot from previous day’s image, wipe current day’s image, continue using computer.
That’s all well and good, but many of these Windows machines were headless or used by extremely non-technical people - think tills at your supermarket or airport check-in desks. Worse, some of these installations were running in the cloud, so console access would have been tricky.
Funny you should mention people at the airport. I work at the airport, but not for Fronteer. My sister was flying on thursday, and nobody could get a boarding pass printed. When I came down, thinking my sister was throwing a tantrum over nothing, I see a line longer than a football field. When trying to ask a Fronteer employee what happened, he just threw his hands in the air and said “I DON’T FUCKING KNOW, OK??? NOBODY KNOWS WHAT THE FUCK IS GOING ON!!! YOU SEE THIS??? YOU SEE THIS SHIT??? YOU THINK I’M JUST DENYING PEOPLE FOR FUN??? WHY DON’T I GO GRAB MY TRIDENT, AND I CAN STAB ALL OF YOU OVER AN OPEN FLAME!!! BECAUSE I’M THE DEVIL, RIGHT??? RIGHT??? THAT’S WHAT YOU’RE SAYING!!!”
And all I said was “Hey, my sister is flying today and…”
You think THAT guy is going to sit there and reformat a PC, or restore PC snapshots to previous update? He’s the kind of guy who SHOULD BE smoking weed at work. This platform is very tech savy, but they often forget that a very very small percentage of people hold their PC knowledge. Now what would happen if I threw a tech savy person into an auto garage, and told him to replace the gaskets of an engine. Would they know how? Would they enjoy a room full of mechanics laughing at them?
I’m not saying you specifically. I’m agreeing with you. I’m just adding to your point to an audience that I think sometimes misses the forest through the trees.
The cloud systems would have been a problem. Any local systems, a non-technical user, could have easily done because their IT department could simply tell them, turn on your computer, and when it gets to this screen with these words, press the down arrow key one time and press enter, and your computer will boot normally.
You wildly overestimate the average person’s willingness to do that.
Their willingness to do it would primarily come from the fact that they have a job to do, and if their co-workers are doing their jobs because they followed the instruction and they are not, then the boss is going to have a nice look at them.
This relies on the assumption that everyone else, or at least a significant portion, in the office managed to do it.
I’m not talking about whether or not they’re actually physically capable of it, of course they are. Im talking about how people immediately shut down and pretend they can’t follow simple directions the second something relates to a compute.
…until the CrowdStrike agent updated, and you wind up dead in the water again.
The whole point of CrowdStrike is to be able to detect and prevent security vulnerabilities, including zero-days. As such, they can release updates multiple times per day. Rebooting in a known-safe state is great, but unless you follow that up with disabling the agent from redownloading the sensor configuration update again, you’re just going to wing up in a BSOD loop.
A better architectural solution like would have been to have Windows drivers run in Ring 1, giving the kernel the ability to isolate those that are misbehaving. But that risks a small decrease in performance, and Microsoft didn’t want that, so we’re stuck with a Ring 0/Ring 3 only architecture in Windows that can cause issues like this.
That assums the file is not stored on a writable section of the filesystem and treated as application data and thus wouldn’t survive a rollback. Which it likey would.
Would still need to be on site.
True
I’m familiar enough with Linux but never used an immutable distro. I recognize the technical difference between what you describe and “go delete a specific file in safe mode”. But how about the more generic statement? Is this much different from “boot in a special way and go fix the problem”? Is any easier or more difficult than what people had to do on windows?
Primarily it’s different because you would not have had to boot into any safe mode. You would have just booted from the last good image from like a day ago and deleted the current image and kept using the computer.
What’s the user experience like there? Are you prompted to do it if the system fails to boot “happily”?
I don’t think any of the major distros do it currently (some are working twards it tho), but there are ways (primarily/only one I know is with
systemd-boot
). It invokes one of the boot binaries (usually “Unified Kernel Images”) that are marked as “good” or one that still has “tries left” (whichever is newer). A binary that has “tries left” gets that count decremented when the boot is unsuccessful and when it reaches 0 it is marked as “bad” and if it boot successfully it gets marked as “good”.So this system is basically just requires restarting the system on an unsuccessful boot if it isn’t done already automatically.