On Mon, Mar 11, 2019 at 5:08 PM Linus Torvalds
<torvalds(a)linux-foundation.org> wrote:
On Mon, Mar 11, 2019 at 8:37 AM Dan Williams <dan.j.williams(a)intel.com> wrote:
>
> Another feature the userspace tooling can support for the PMEM as RAM
> case is the ability to complete an Address Range Scrub of the range
> before it is added to the core-mm. I.e at least ensure that previously
> encountered poison is eliminated.
Ok, so this at least makes sense as an argument to me.
In the "PMEM as filesystem" part, the errors have long-term history,
while in "PMEM as RAM" the memory may be physically the same thing,
but it doesn't have the history and as such may not be prone to
long-term errors the same way.
So that validly argues that yes, when used as RAM, the likelihood for
errors is much lower because they don't accumulate the same way.
> The driver can also publish an
> attribute to indicate when rep; mov is recoverable, and gate the
> hotplug policy on the result. In my opinion a positive indicator of
> the cpu's ability to recover rep; mov exceptions is a gap that needs
> addressing.
Is there some way to say "don't raise MC for this region"? Or at least
limit it to a nonfatal one?
I wish, but no. The poison consumption always raises the MC then it's
whether MCI_STATUS_PCC (processor context corrupt) is set as to
whether the cpu indicates it is safe to proceed. There's no way to
indicate, "never set MCI_STATUS_PCC", or silence the exception.