For a brief period, the kernel tried to deal with gamma rays corrupting the processor cache

At one point, the following code was added to the part of the kernel that brings the system out of a low-power state:

        ; Invalidate the processor cache so that any stray gamma
        ; rays (I'm serious) that may have flipped cache bits
        ; while in S1 will be ignored.
        ; Honestly.  The processor manufacturer asked for this.
        ; I'm serious.

I'm not sure what the thinking here is. I mean, if the cache might have been zapped by a stray gamma ray, then couldn't RAM have been zapped by a stray gamma ray, too? Or is processor cache more susceptible to gamma rays than RAM? The person who wrote the comment seems to share my incredulity.

Less than three weeks later, the INVD instruction was commented out. But the comment block remains.

In case we decide to resume trying to deal with gamma rays corrupting the the processor cache, I guess.

Bonus chatter: One of my colleagues wasn't part of this specific change, but recalled that these sorts of strange-sounding requests were not uncommon, especially for early processor steppings. The workaround was removed once the problem was fixed in microcode or in a later processor stepping.

Comments (30)
  1. Brian_EE says:

    There is a market for “Rad-Hard” components. They tend to be super expensive parts. I find it hard to imagine a space vehicle using an embedded computer that is running Windows.

    1. I have no trouble whatever imagining that. But the key here is “early processor stepping”. Cosmic rays had nothing to do with it.

      1. Brian_EE says:

        In my world – space vehicle == satellite (like GEOS-R, or communications). Those use customized hardware running customized OS’s. I guess could imagine some UI type computer on the ISS, but that is a special case.

      2. Actually they probably did. I have an idea who the manufacturer was, and what prompted it. Hint: Los Alamos is at high altitude and gets far more cosmic ray strikes than any other location. This was a real issue for awhile.

    2. fweep says:

      Windows is used on laptops on the ISS. The military is also interested in radiation-hardened systems. I’ve worked on both military and space systems, and this is a serious concern. You won’t see Windows running critical flight systems, but certainly in ancillary systems, particularly with orbital research.

      1. csptrun says:

        @fweep, according to this media report: Linux is used on laptops on the ISS. not sure if the astronauts have other laptops, but I doubt it.

        1. IF you read to the end, you’ll see that Windows is also used. I remember watching NASA-TV and listening to them talking about uploading a new Outlook address book.

  2. Kai Schätzl says:

    Only if that protection is on ;-)

  3. IanBoyd says:

    c. 2008: “Invalidate the gamma rays (I’m serious) that may have flipped cache bits while in S1 will be ignored. Honestly. [processor manufacturer] asked for this. I’m serious”

  4. C B P B C says:

    >> Or is processor cache more susceptible to gamma rays than RAM?

    Certainly a possibility, as there are vendors of rad hard/tolerant SRAMs, and standard ECC likely mitigates some issues. Some applications all have to deal with unusual environmental concerns, and have corresponding rules & regulations requiring protection against things like, for example, bits flipped due to gamma rays. Medical devices & particle accelerator could certainly be running Windows.

    If you were building Bruce Banner’s Hulkimatic 2000, you can either spend $$$ on a processor with ECC cache, or simply clear the cache after each firing of the particle accelerator. Fair warning: cost overruns make Hulk angry.

  5. Jyjec says:

    You guys know that there is a thing called NDT (Non-Destructive Testing) they do digital radiography and they use Windows computer to run the software required to extract the information from the sensors. This involves using nuclear sources to take the shots, Wich can entail some radiation affecting the computer that is being used.

    1. Jyjec says:

      The companies that use these computers do not spend the $$$ to buy the parts that are nuclear shielded.

      1. Erik F says:

        I doubt that the computers are in sleep mode while they’re doing the analysis though. :-)

  6. _Nicholas says:

    This is exactly why I wrap my computer in three layers of heavy aluminum foil (Reynold’s brand only, that store-brand stuff is a fasco-communist trap).

    In a way this reminds me of the DRAM and HDD error rates where an unrecoverable read error occurs (gamma rays may or may not apply). I guess it’s just flipped turtles all the way down.

    (Also, was kinda sad to see we have to log in now to comment)

    1. BlueRaja says:

      You need several inches of lead to stop gamma particles. A few layers of aluminum foil won’t do anything.

      1. ZakMO says:

        Depends what else is in between you and the radiation source. The atmosphere/ionosphere absorbs a decent amount.

      2. Depends on their intensity and energy. If you’re running the computer anywhere where humans are also able to be present then a few mm of lead are probably enough (the half value layer for 200keV gammas is less than a mm), if you need inches of lead then the operators will be dead long before anything else crops up.

    2. Well, it does at least stop the page getting confused when it tries to scroll you to your new comment and it’s not there yet.

  7. RKPatrick says:

    I would have just pasted the manufacturer’s email instead of spending any time trying to rationalize the request.

  8. Performance parity may’ve been expected between radiation-hardened and conventionally-deployed components to reduce QA requirements. Government regulatory agencies including the FDA and FAA mandate full testing of software across all architecturally different hardware systems on which it runs.

  9. cheong00 says:

    When I discussed with my classmates when saw something like this in computing magazines, we assume the “gamma ray corrupting processor cache” thing is much like the “my program hangs because of Windows/DOS”. It’s a process of shifting blames.

  10. I’ve worked on code to recover from bit flips in L1 cache. In this particular processor/system, RAM and L2 cache used ECC. This means a single bit error was invisibly corrected. L1 cache, however, used per-byte parity bits. You could recover from a bit flip if the cache was clean by re-reading the data. If a corrupted cache line had been modified (had the dirty bit set) there wasn’t anything you could do.

    1. That was fairly similar to the situation that I think is the one Raymond is describing above. They addressed the problem in software via scrubbing before it was fixed long-term by hardware changes. Modern server CPUs, particularly from Intel and IBM (and I assume AMD as well, I’m just less familiar with their stuff), are practically rad-hard parts with all the SEU (or more generally fault) countermeasures they employ.

    2. Oh, and a comment on parity in L1 vs. ECC in L2, this sounds counterintuitive but you couldn’t do ECC at L1 performance levels so they could only do parity checks.

  11. Stuart Sands says:

    Just flying in a commercial airliner is enough to increase exposure to gamma rays. And, yes, those gamma rays can change memory. Why only cache? No idea.

  12. James Curran says:

    I wonder if “gamma rays” is just the hardware guys’ jargon for “thing causing unexplained behavior”, and the software guy misunderstood….

  13. I was at DEC in the 1970’s working with the new KL10 CPU that had MOS memory. I heard they had problems with random bit errors in the MOS meory and so had to go to a double-error-detection, single-error-correction memory scheme. Later on, they found that the random non-reproducible errors, were due to the radiation from trace amounts of uranium in the epoxy of the memory chips. When they purified the epoxy, the errors went away.

  14. Hey Raymond, is there any way to contact you in private about this issue? There’s some additional comments I’d like to make about this, but not in a public forum…

  15. ColinKW says:

    Is this how kernel code is developed? Comments are written first by somebody else and you fill in the code?

  16. CherryDT says:

    Congrats, you made it into an Austrian newspaper with this post.
    And I think it’s slightly misinterpreted…

Comments are closed.

Skip to main content