Fix client *Dma address translation (KeyExport / NvmAdd / NvmRead) and KeyCacheDma use-after-free#403
Fix client *Dma address translation (KeyExport / NvmAdd / NvmRead) and KeyCacheDma use-after-free#403Frauschi wants to merge 4 commits into
Conversation
364edf6 to
b611887
Compare
b611887 to
d126890
Compare
| c->dma.asyncCtx.buf.xformedAddr = keyAddrPtr; | ||
| c->dma.asyncCtx.buf.clientAddr = (uintptr_t)keyAddr; | ||
| c->dma.asyncCtx.buf.sz = keySz; | ||
| c->dma.asyncCtx.buf.postOper = WH_DMA_OPER_CLIENT_READ_POST; |
There was a problem hiding this comment.
This seems really clunky - should we not have a generic pre- wrapper like you added for the post operation that primes the asyncCtx under the hood based on the input args?
There was a problem hiding this comment.
Fixed by adding the new wh_Client_DmaAsyncPre() helper analogous to the post helper.
| } | ||
|
|
||
| ret = wh_Client_RecvResponse(c, &group, &action, &size, (uint8_t*)resp); | ||
| if (ret == WH_ERROR_NOTREADY) { |
There was a problem hiding this comment.
it would be helpful to have a comment about why NOTREADY gets an early return but other errors do not (releasing the DMA mapping)
There was a problem hiding this comment.
Added an explanatory comment (on all locations)
| buf->postOper, (whDmaFlags){0}); | ||
| /* Clear the slot even on failure so a later Response cannot re-run the | ||
| * POST; the failure is returned to the caller. */ | ||
| buf->sz = 0; |
There was a problem hiding this comment.
part of me thinks we should just clear the whole struct since it is no longer in use? Makes it seem more explicit.
| * unset, and the Response must not POST a stale (shared-union) size. */ | ||
| c->dma.asyncCtx.nvmAdd.meta.sz = 0; | ||
| c->dma.asyncCtx.nvmAdd.data.sz = 0; | ||
|
|
There was a problem hiding this comment.
same comment as above re- clearing the whole struct. I wonder if this should have a helper just to make it explicit (1wh_Client_DmaAsyncCtxReset()`) or "clear" or something like that?
There was a problem hiding this comment.
Cleared now completely within the post helper.
There was a problem hiding this comment.
@Frauschi I'm approving in light of my comments earlier. If you want it in ASAP you can go ahead and click merge since both Marco and I approved, though I'd like the comments addressed at some point. If you would rather do it in a follow-up PR, go ahead and merge. If you want to follow up with another PR on this branch, then you are welcome to do that too. Up to you, both fine with me.
… buffers
KeyExportDma, KeyExportPublicDma, NvmAddObjectDma and NvmReadDma put the raw
client pointer in the wire message instead of running the DMA translation
callback -- invisible on flat-memory ports (POSIX SHM) but rejected by a
split-address-space server. Each now runs the PRE translation before sending
and the matching POST on the Response.
KeyCacheDmaRequest also ran its POST inside the Request, before the server read
the buffer (a use-after-free on ports whose callback allocates); it now defers
the POST to KeyCacheDmaResponse.
The per-call-site async structs are consolidated into whClientDmaAsyncBuf with a
shared wh_Client_DmaAsyncPost() helper that returns the POST status (so a failed
unmap/copy-back is surfaced). Each Request clears its slot(s) before the PRE, so
an unpopulated slot (e.g. metadata-only NvmAddObjectDma) cannot POST stale union
memory, and forwards data_hostaddr = 0 when there is no data buffer. Requests
fail fast with WH_ERROR_REQUEST_PENDING on a busy transport, and
wh_{Client,Server}_DmaRegisterAllowList() now accept NULL to unregister.
The existing POSIX tests never registered a non-identity DMA callback, so a *Dma API that skipped translation looked identical to a correct one -- which is why the keystore/NVM translation bugs survived CI. Add a shared "bounce-pool" harness (test/wh_test_dma.c) modeling a split-address-space port: the client callback bounces each buffer through a pool, the server callback rejects any untranslated address, freed slots are poisoned so a premature POST (use-after-free) corrupts the data, and a POST matching no live slot is recorded as a stray/double free. - wh_test_clientserver.c drives the keystore/NVM *Dma APIs (including metadata-only and injected PRE-failure cases) in the single-thread pump harness, where ordering makes the use-after-free deterministic. - wh_test_crypto.c registers the same callback on the threaded harness so the whole crypto/cert *Dma suite runs through translation. - _testDma now unregisters its stack-local allow list before returning (a pre-existing dangling pointer ASAN flags once a later test uses DMA). Runs under `make DMA=1`.
d126890 to
089eee2
Compare
|
@bigbrett I addressed your comments right away. The new pre and post helper are currently only applied to the *DMA methods touched in this PR. I'll do a follow-up PR that migrates all the other *Dma methods (mostly the many crypto operations) to these as well. |
Background
A few client *Dma APIs wrote the raw client pointer straight into the wire message instead of routing it through the DMA address-translation callback like the rest of the family.
This is invisible on flat-memory ports (the POSIX SHM reference, where client and server share an address space), so it survived CI. It breaks on a real split-address-space port — found during DMA bring-up on one of the new hardware platforms for wolfHSM, where the server runs on a separate security CPU with no view of host RAM, so it bounds-checks the untranslated pointer and rejects the request (
WH_ERROR_BADARGS).What this fixes
KeyExportDma,KeyExportPublicDma,NvmAddObjectDma,NvmReadDma. Each now runs*_PREtranslation before sending and the matching*_POSTon the response, mirroringRngGenerateDma.KeyCacheDmaRequest, which ran itsPOSTinside the request — releasing/recycling the scratch before the server read the buffer (benign on POSIX, a real UAF on any port whose callback allocates). It now defers thePOSTtoKeyCacheDmaResponse.Notable implementation details
whClientDmaAsyncBuf(carrying thePOSTdirection) plus a two-buffernvmAddvariant, with a sharedwh_Client_DmaAsyncPost()helper that returns thePOSTstatus (a failed unmap/copy-back is surfaced, not swallowed).PRE, so a path that leaves a slot unpopulated (e.g. metadata-onlyNvmAddObjectDma) can't make the responsePOSTstale union memory;data_hostaddris 0 when there is no data buffer, so a raw client pointer is never forwarded.WH_ERROR_REQUEST_PENDINGon a busy transport (no mapping acquired then leaked).wh_{Client,Server}_DmaRegisterAllowList()now acceptNULLto unregister, symmetric withDmaRegisterCb(NULL).Tests
The reason these bugs survived CI is that no POSIX test ever registered a non-identity DMA callback — a missing translation looked identical to a correct call. This PR adds a shared bounce-pool harness (
test/wh_test_dma.c) that models a split-address-space port: the client callback bounces each buffer through a dedicated pool and hands the server a pool address; the server callback rejects any untranslated address. Freed slots are poisoned (so a prematurePOSTcorrupts the data) and aPOSTmatching no live slot is recorded as a stray/double free.wh_test_clientserver.cdrives the keystore/NVM*DmaAPIs end-to-end — including metadata-only and injectedPRE-failure cases — in the single-thread pump harness, where ordering makes the use-after-free deterministic.wh_test_crypto.cregisters the same callback on the threaded harness, so the entire crypto/cert*Dmasuite (AES/SHA/CMAC/RNG/Ed25519/ML-DSA/ML-KEM/cert, plus KeyExportPublicDma) runs through translation automatically._testDmaleft a stack-local allow list registered after returning (a dangling pointer ASAN flags once a later test uses server DMA).Runs under
make DMA=1(already set by every CI job).Validation
make DMA=1 ASAN=1: passes, 0 ASAN findings.