Wednesday, 3 January 2018

'Kernel memory leaking' Intel processor design flaw forces Linux, Windows redesign

quote [ A fundamental design flaw in Intel's processor chips has forced a significant redesign of the Linux and Windows kernels to defang the chip-level security bug. Other OSes will need an update, performance hits loom. ]


Affects pretty much all Intel CPUs going back a decade (Coffee Lake appears to be the only exception). MacOS will need a patch too, it's a hardware problem so I'm not sure why they called out Windows and Linux. This looks to be worse than the Pentium floating-point bug - there's no hardware fix, and the software fix will impose a 5-30% performance hit. And I just bought a new computer with a 10-core Xeon CPU. Think I'm going to keep it off the internet and avoid the patch, I need maximum horsepower.

[SFW] [science & technology] [+10 WTF]
[by hellboy@8:50amGMT]


steele said @ 11:26pm GMT on 3rd Jan [Score:2]
rhesusmonkey said @ 5:00am GMT on 4th Jan [Score:2]
i spent a while reading on this earlier today. Basically with any kind of out-of-order processor (which includes the Cortex A7n, Apple's Cyclone / Hurricane CPUs in their A* processors, and Qualcomm's Kryo and Krait cores as well as all x86 "Core" family and Zen) you have cases where code branches and there is a path taken or a path not taken. Speculative execution basically decodes both paths in parallel then waits for the hazards to resolve to determine the branch (basically "if A then B else C", where both B and C are executed in parallel to determining if the Condition of A is true or false). from what i've read, on certain designs (Intel), one of those paths can execute in a Kernel mode while the other is in User mode - the different modes are meant to protect the user land process from mucking about with critical things like the memory management unit / MMU (which does the virtual to physical address translation). The MMU has a component called the Translation Lookup (or lookaside) buffer, which is basically a cache of the page tables for V2P mapping for a given process. "Modern" OSes have set the MMU to include the kernel address space in the pages tables of the userland processes so that you can execute privileged code without having to flush the TLB on every context switch, the assumption being that your execution privilege state protects you from reading the range of the MMU that corresponds to the kernel pages (because the system-level or hypervisor-level control manages this). So for Intel vs AMD, the function is basically the same, but Intel's cores do not force the speculative execution path to validate that it is operating in the correct privilege state, while AMD's cores do. or so i assume from what i've read so far. likely Intel doesn't bother to do this because the results from the speculative path are not supposed to be observable and should be discarded once the correct path is taken (or, presumably once it is determined that the Kernel privilege path should be taken, then they check for correct access privilege in the MMU).
ARM cores (v7 and later) have four privilege states as well, each with a set of shadow registers and ability to modify different parts of the core registers. So on their OOO machines there is at least the possibility that this could happen as well, but it is up to the design of the machine to be checked. Qualcomm's S835 and (unreleased ) 845 are using Cortex A* processors that have some tweaks to make data sharing with the Hexagon DSPs and GPUs "easier" bjt something as fundamental as this would be unchanged. their older Krait and Kryo100 (sd820) cores were built in-house, along with their Falkor Server CPU, so it is anyone's guess if this issue affects them, same as for Apple's home-grown CPUs.

the "fix" is to completely separate the kernel process memory space from user processes, which means no longer having kernel pages in the TLB, which means higher context switch overhead by flushing the TLB. Oddly enough, this is how Microkernel systems like QNX are designed to run in order to ensure that the kernel is protected, so at least BlackBerry OS and OS10 devices are immune :P

tl;dr clusterfuck. the article i read advised investing in popcorn.
Ankylosaur said @ 10:39am GMT on 3rd Jan [Score:1 Insightful]
On the brighter side, if you have computers you don't need to put online, you're about to get some cheap cpu upgrades.
backSLIDER said @ 4:51pm GMT on 3rd Jan [Score:1 WTF]
Scuttlebutt is that linux will force the same level of slow fown on AMD chips as well. We may see forks in the kernels for amd and intel at some point. But we'll see.
LurkerAtTheGate said @ 3:12am GMT on 5th Jan
At the moment, Linus is granting a specific exemption to the slowdown for AMD cpus.
hellboy said @ 8:52am GMT on 3rd Jan
Hey, what happened to the edit button?
steele said @ 12:11pm GMT on 3rd Jan
Edit button is only supposes to be for 15 minutes but if you're wondering so soon, I think the time change may have fucked it up. I'll test it out later.
hellboy said @ 7:42pm GMT on 3rd Jan
Yeah, I went to edit (just to move my Xeon comment off the front page, NBD) as soon as I'd posted and there was no button.
rndmnmbr said @ 9:37am GMT on 3rd Jan
Fuck me, if I'm reading this right then the exploit is in speculative execution itself, which means it goes back clear to the Netburst architecture. Wowee, what a fuckup.

I was already halfway set on building a Ryzen box the next time I build a computer, looks like my mind has just been made up for me.
hellboy said @ 7:44pm GMT on 3rd Jan
Unfortunately for me Ryzen Hackintosh isn't an option yet.
midden said @ 8:20pm GMT on 3rd Jan
As our IT security guy just explained it to me, yes, it's in the hardwired level of speculative execution. To bypass it, speculative execution will have to be moved up a couple of levels of abstraction into software, which, while still faster than no S.E. (see what I did there?), will be significantly slower.

My home machine is an 8 core tower from 2008 and still going strong. Anyone see a list of affected CPUs yet? I was thinking of upgrading, but now I will just wait for the next gen of CPUs where the issue has been properly fixed.
rndmnmbr said @ 11:11pm GMT on 3rd Jan [Score:1 Informative]
IIRC, wasn't the base of the Core architecture basically the old P6 architecture with Netburst speculative execution and branch prediction bolted on, plus significant die shrinks to add speed? I imagine tweaked all to hell and not even closely resembling P6 at all by this point. But if I'm right, the flaw affects everything Intel has done since 2001.
midden said @ 5:38am GMT on 4th Jan
Ugh. Oh well. I guess it'll still be the most price/performance cost effective to buy hardware that was high-end three years or so ago. There will probably be a premium tacked onto any new CPUs that come out over the next year or two which have this particular problem fixed.
LurkerAtTheGate said @ 2:59am GMT on 5th Jan [Score:1 Informative]
List of Intel CPUs affected here.
midden said @ 3:35am GMT on 5th Jan
Thanks for the link. Looks like my dual first generation 4 Core CPUs just squeaked under the wire. Good thing, since I think this machine is too old for the 10.13.2 update. Technically, 10.11.x is the last officially supported OS, but with a little mac-foo, you can run 10.12.x with minimal issues. (On my particular hardware config, 10.12 breaks the built in wifi. Easily fixed with a USB wifi stick, though.)
Kama-Kiri said @ 10:20am GMT on 3rd Jan
Story hasn't been picked up by arstechnica or even semiaccurate? That's quite the scoop by the registrar, but I'd wait for a bit to see if it is corroborated elsewhere.
midden said @ 9:44pm GMT on 3rd Jan [Score:3]
spaceloaf said @ 10:16pm GMT on 3rd Jan
Thanks, that was a nice high-level overview.
Paracetamol said @ 3:56pm GMT on 3rd Jan
Slashdot has some discussion on it.
steele said @ 12:31pm GMT on 3rd Jan
ubie said @ 3:05pm GMT on 3rd Jan
I have friends and family spread across several fields of IT that are pretty much incommunicado this morning. They *kinda* knew this was coming, at least some of them, but they are now Very Busy People.
Pandafaust said @ 3:34pm GMT on 3rd Jan
I wonder how many other systems use intel-based architecture? I am suddenly very glad that Qualcomm dominates the phone market...
steele said @ 4:51pm GMT on 3rd Jan
This brings up some old but interesting thoughts on a companies liability for faulty products, too big to fail, and nationalising companies that reach a point society requires their existence.
mechanical contrivance said @ 5:42pm GMT on 3rd Jan
That's the whole reason AMD exists.
steele said @ 7:09pm GMT on 3rd Jan
Lol, that'll be the excuse for not nationalizing, but I imagine we're still going to see this be used as an excuse for attacking class action lawsuits, with a special bonus of just such legislation being sponsored by congressmen that previously sponsored bills calling for hacking to be punishable by death.
King Of The Hill said @ 2:16pm GMT on 4th Jan
AMD has similar features as Intel's VPro which is where part of this issues lives, so it would be wise to not assume they don't also have some sort of issue.
midden said @ 7:35pm GMT on 4th Jan
I'm no expert in liability law, but as I understand it, as long as they provide an acceptable fix once the problem becomes known (and hadn't been intentionally hidden by the manufacturer), the are pretty much off the hook. Think of it like an automotive recall; if it wasn't actually negligent, they're not liable for any problems the defect may have caused.
steele said @ 7:45pm GMT on 4th Jan
Well that's where defining acceptability becomes an issue. Like VR is dependent on the latest and greatest speeds, a potential 30% drop in processing power could make or break presence. Rendering is mostly covered by GPUs, but tracking? Multiplayer functionality? People bought the these processors, which are marketed by potential speeds, with the impression that they would be receiving certain performance. If ten years worth of products are now hindered from reaching their advertised performance or face the potential of a security risk that's quite the kick to the nuts for the customer.
midden said @ 12:40am GMT on 5th Jan
Imagine that an unforeseen design flaw in all the engines manufactured by BMW could potentially lead to raging fires under the hood. This flaw has been undetected for the last decade. Nobody noticed the flaw, including the world's best mechanics and engineers. Oh, shit!

Well, the best BMW can do is issue a recall and change the way the fuel injector works to eliminate the fire risk. Unfortunately it also means a 15-30% drop in horsepower, depending on various other road conditions. A lot of BMW owners will be understandably pissed, but BMW hasn't been intentionally negligent. It would be tough to argue that BMW is legally liable to any Paris-Dakar race teams who happen to drive BMWs.

steele said[1] @ 1:31am GMT on 5th Jan
Computers aren't cars, though cars are quickly becoming computers, which is all the more reason to be concerned. Nobody noticing the flaw in a closed architecture isn't something that should be surprising. This is an argument that goes back to the birth of the open source movement. Back when the nerds thought the open nature of the internet was going to open up the world to gay space communism and lead us to a post scarcity star trek-esque utopia. Instead, I think it's safe to say, the pendulum has swung entirely the other way.

They key issue is here:

but BMW hasn't been intentionally negligent.

You can't prove that. You have no knowledge of their quality control process, their design process, their manufacturing process, or the profit based factors that went into those processes.

IP based companies are already claiming a level of corporate personhood that far outstrips anything they deserve under the fantastic premise that we only license the right to use IP, not own it. IP which you're saying we're owed no guarantee of quality, despite having no assurances of the quality of the process with which it's created. And to top it all of, car manufacturers have been trying to gain those same benefits! And really, it's just a matter of time before they get them at this rate.

I'm not saying you're wrong and they're not going to be held liable, I'm just saying the system is fucked and it's largely to due to the trust and control we give to for profit organizations who are so ubiquitous and irresponsible with the power they wield that they basically own our asses and the dystopian future they're forcing us into.
midden said @ 3:17am GMT on 5th Jan
"They key issue is here:

but BMW hasn't been intentionally negligent.

You can't prove that. You have no knowledge of their quality control process, their design process, their manufacturing process, or the profit based factors that went into those processes."

Exactly. You can't prove it. If you could prove it, then yes, they can be held liable in a civil court. That's what I meant by, "...and hadn't been intentionally hidden by the manufacturer."

"'re saying we're owed no guarantee of quality, despite having no assurances of the quality of the process with which it's created."

Yup, you got it; No guarantee of quality, for most basic goods at least. No assurances of the quality of the process with which it's created. That's pretty much how the free market works. If you make a crappy frammis, and someone else makes a better frammis, your sales will tank. Intel's frammis has a nasty hole in it that nobody noticed until recently. It's a little different when you are talking medical devices and such, but something like a general purpose computer is just a commodity device, a tool, like a bucket or a milling machine. If you build one with a serious defect but sold it that way in good faith, all that society can demand from you is that you make a reasonable fix available if reasonably possible. Even then, you, the manufacturer, have no legal requirement to a damned thing about the defect, as long as you don't mind all your customers all jumping ship.

I'm not really seeing why you think Intel making a CPU with a defect discovered years later is fundamentally different than any other manufactured good, from a ceramic mug to a Ferrari. I'm not saying the Free Market should be the single driving force of society. Far from it! But the basic economics of selling stuff is that good stuff sells, bad stuff doesn't. (Of course monopolies are a whole other matter. You could certainly argue that Intel has been allowed to grow way to big and poses a risk/threat to the greater good of society. I'd be inclined to agree.)
steele said @ 3:57am GMT on 5th Jan
(Of course monopolies are a whole other matter. You could certainly argue that Intel has been allowed to grow way to big and poses a risk/threat to the greater good of society. I'd be inclined to agree.)

Yes, this is the core of my argument, go back to that initial comment. ;) I'm also saying that computers broke the mold and Intellectual Property is a different beast that we've been hammering into that broken mold and it's us the consumers, not the corporations that are going to pay for it. You're making a will of the free market argument here and that shit just doesn't pan out in the real world. Key word being will. Basic concepts of supply and demand break down when your supplier can buy the methods by which demands are manufactured, even more so when we throw in that fact that IP can be supplied for minimal cost infinitely once the cost of initial creation is covered.

I'm not really seeing why you think Intel making a CPU with a defect discovered years later is fundamentally different than any other manufactured good, from a ceramic mug to a Ferrari.

I'm really not, in reality. I guess I'm pointing out how bullshit I think that argument is for IP based products and questioning why you're so quick to trust the circumstances around other manufactured goods. I mean, in the face of how many recent emissions cheating scandals there have been just in the last year, to try and sell me on the good intentions of an auto manufacturer seems ludicrous! :D These companies aren't our friends. The only way 'good stuff sells, bad stuff doesn't' could ever even hope to work is, as we should with politicians, we hold them to highest standards possible and quit making excuses for them based on these manufactured images they've sold us on. Our lives, our society, are more important than their profits and we're basically Brand Loyalty-ing ourselves into extinction. If beyond benefit of a doubt is the measure we're supposed to be using in measuring someone's innocence to ensure that no innocent person is wrongly caught up in the process to protect that person's life, than the exact opposite approach should be taken towards non-person entities which exist primarily to build profit. They deserve no benefit of the doubt and our utmost skepticism. If the rest of society has to be shifted around to provide a safety net for people so that can be accomplished then that's really the duty we have to future generations. Anything else is just...

Slavery With Extra Steps
steele said @ 7:12pm GMT on 3rd Jan
Ah, this also explains why the latest Microsoft Surface Tablets are going to be using the Qualcomm Snapdragon.
hellboy said[1] @ 8:06pm GMT on 3rd Jan
Yeah, I'm sure Apple is already looking at putting their ARM chips in Macs. It's too bad they don't have an AMD kernel.
midden said @ 8:23pm GMT on 3rd Jan [Score:1 Interesting]
I read somewhere that it's been patched since 10.13.2, but haven't benchmarks on how it affects performance under Mac OS.
hellboy said @ 8:46pm GMT on 3rd Jan

It'll be interesting to see how much of a performance hit there is, if any (I would think that reducing performance in 10.13.2 without telling anyone would be a much bigger deal than the bullshit battery flap, as it would affect everyone including people who just bought brand new iMac Pros). If I understand the problem correctly it's hard for me to understand how there *wouldn't* be a hit, but if somehow it's not as bad as Windows or Linux that's potentially a win for Apple.

Looks like ARM chips may be affected as well, somehow (Intel is claiming it's an industry-wide problem, AMD says hell no it's not).
midden said @ 7:30pm GMT on 4th Jan
With all the benchmarking folks in the graphics industry do on their machines, I'm surprised there wasn't an outcry when 10.13.2 came out. Perhaps it's not as much of an issue for certain kinds of benchmark calculations? It's been almost a month since that update, and I haven't seen anyone complaining.
mechanical contrivance said @ 8:10pm GMT on 3rd Jan
midden said[1] @ 9:47pm GMT on 3rd Jan
From the way Apple has hedged its bets in the past, I wouldn't be surprised if they have an internal fork of the OS running on AMD and keep it fairly up to date.

And I just noticed this:
conception said @ 10:15pm GMT on 3rd Jan
As soon as they can make a speedy enough "rosetta" emulation stack, ARM is in. 100%.
King Of The Hill said @ 2:18pm GMT on 4th Jan
This has been known for some time already. The right way to fix it is via firmware... The easiest way to fix it is via OS. That is the rub.

We (IBM) have already released content (via BigFix) to at least detect and inventory every system that is impacted by this issue.
steele said @ 6:26pm GMT on 4th Jan
Is it firmware fixable? I was under the impression this was a hardwired (so to speak) issue.
King Of The Hill said @ 10:55pm GMT on 4th Jan [Score:1 Interesting]
MS Released to us last night the patch content they published today.

Recommended to chase with firmware updates.

Firmware updates take longer to produce and test. More importantly, firmware updates are a royal pain in the ass to deliver remotely. additionally the sheer number to firmware updates by model is going to be fucking horrible to deal with - Hence the OS level patches.
steele said @ 3:25am GMT on 5th Jan
I get all that, I'm just wondering is this something that's firmware fixable with the feature intact or is this something that the firmware will "fix" Cask of Amontillado style. #AndWeWillNeverSpeakOfThisAgain ;)
midden said @ 7:21pm GMT on 4th Jan
As I understand it, yes, it's hardwired, but you can basically disable the function in hardware and instead do it in software. The downside is that implementing it in software is waaaay slower. The Arstechnica story I linked to earlier has a good explanation.
steele said @ 7:28pm GMT on 4th Jan
Ah, to me a fix would actually be fixing the hole so the feature still works. Disabling is like killing the horse and calling the broken leg fixed. :D
midden said @ 7:31pm GMT on 4th Jan [Score:1 Informative]
The feature still works, just a lot slower. But even the much slower version running in software is a lot faster than not having the feature at all.
steele said[1] @ 7:39pm GMT on 4th Jan
Does it? I thought the fix was just providing the security feature of separating ring 0 and ring 3 memory spaces, but you're basically losing L1,L3 cache functionality? Blah. I gotta go reading all this again. Too much happening right now and I am so far out of the loop on how this architecture shit works nowadays. #NoClue ;)
dolemite said @ 6:26pm GMT on 4th Jan
You work at IBM? cool.

I always thought you sold propane and propane accessories.
Bruceski said @ 10:37pm GMT on 4th Jan
I've got some friends in computer security related fields. The one in actual computer security has gone completely radio silent (may be for unrelated reasons, but given his "I can't talk about this" in the past I wasn't surprised), and the ones in more support-level jobs have been posting a lot of drinking gifs.
machpi said @ 5:31pm GMT on 5th Jan
My (perhaps paranoid) assumption is that elements of the government have known about this for years. When I read the first reports of this, my initial fear wasn't cyber-thuggery, but governmental surveillance.

Post a comment
[note: if you are replying to a specific comment, then click the reply link on that comment instead]

You must be logged in to comment on posts.

Posts of Import
4 More Years!
SE v2 Closed BETA
First Post
Subscriptions and Things
AskSE: What do you look like?

Karma Rankings