[PR #2819] [MERGED] video_core: Implement DMA. #3006

New issue

Closed

opened 2026-02-27 22:02:05 +03:00 by kerem · 0 comments

kerem commented

2026-02-27 22:02:05 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/2819
Author: @LNDF
Created: 4/20/2025
Status: ✅ Merged
Merged: 5/22/2025
Merged by: @georgemoralis

Base: main ← Head: hybrid

📝 Commits (10+)

52253b4 Import memory
d5e45fb 64K pages and fix memory mapping
83255ee Queue coverage
94a0782 Buffer syncing, faulted readback adn BDA in Buffer
c077fb9 Base DMA implementation
9356779 Preparations for implementing SPV DMA access
68a33cd Base impl (pending 16K pages and getbuffersize)
31df795 16K pages and stack overflow fix
d89d937 clang-format
20aacec clang-format but for real this time

📊 Changes

40 files changed (+1641 additions, -311 deletions)

View changed files

📝 CMakeLists.txt (+3 -0)
📝 externals/sirit (+1 -1)
➕ src/common/recursive_lock.cpp (+37 -0)
➕ src/common/recursive_lock.h (+67 -0)
📝 src/common/slot_vector.h (+109 -1)
📝 src/shader_recompiler/backend/spirv/emit_spirv.cpp (+9 -2)
📝 src/shader_recompiler/backend/spirv/emit_spirv_atomic.cpp (+3 -3)
📝 src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp (+40 -41)
📝 src/shader_recompiler/backend/spirv/emit_spirv_instructions.h (+1 -1)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.cpp (+185 -13)
📝 src/shader_recompiler/backend/spirv/spirv_emit_context.h (+106 -11)
📝 src/shader_recompiler/frontend/translate/scalar_memory.cpp (+7 -6)
📝 src/shader_recompiler/info.h (+12 -2)
➕ src/shader_recompiler/ir/abstract_syntax_list.cpp (+44 -0)
📝 src/shader_recompiler/ir/abstract_syntax_list.h (+5 -0)
📝 src/shader_recompiler/ir/passes/shader_info_collection_pass.cpp (+27 -4)
📝 src/shader_recompiler/ir/program.cpp (+30 -4)
📝 src/shader_recompiler/ir/program.h (+1 -1)
📝 src/shader_recompiler/recompiler.cpp (+2 -0)
📝 src/video_core/amdgpu/liverpool.cpp (+1 -0)

...and 20 more files

📄 Description

This implements arbitrary memory access from the GPU to CPU memory.

When mapping memory, tries to import host memory with VK_KHR_external_memory. When a buffer is created in the same address range of an improted region, device local memory is used.

Buffer Device Addresses are used to get pointers to the data in shaders. A BDA pagetable buffer is created on the GPU that holds the BDA for every 16K page.

Additionaly a "fault" buffer is also created which holds a bitfield that tells the CPU if a non GPU local address has been accessed from a shader. This is done by embedding in the LSB of the BDA if a page is cached by the buffer cache. If a page has been accessed by the GPU and is not cached by the buffer cache it will be cached before the next frame.

It may happen that the host memory import fails. (on AMD GPUs on Linux sometimes), in those cases, a fallback value is used and the page bit is set in the "fault" buffer.

Problems:

I am not able to make the "fault" buffer work. Either the shader is not writing or the CPU side is not reading.
Seems wrong to do scheduler.Finish(); every time I download the "fault" buffer. in BufferCache::CreateFaultBuffers, If there is a better way to sync I can implement it.
@raphaelthegreat, you told me in the other PR that I need 64K pages to do the BDA LSB optimization. I can't use 64K pages because we could accidetly try to track non-GPU memory. Hpe 16K is fine.
Can't be merged until https://github.com/shadps4-emu/sirit/pull/11 is merged. Currently Sirit is pointing to my fork.
I am not able to do a lot of testing right now, so this needs testing.
Probably something I forgot to mention.

This is currently used for ReadConst, and work like this:

First, we try to access the memory if it exists in device local memory
If it fails (will only fail the first frame), we try to access host memory.
If it fails, we fallback to the old flatbuf method.
If it fails (the offset is synamic), we return zero.

This can also potentialy be used in GetBufferSize.

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/2819 **Author:** [@LNDF](https://github.com/LNDF) **Created:** 4/20/2025 **Status:** ✅ Merged **Merged:** 5/22/2025 **Merged by:** [@georgemoralis](https://github.com/georgemoralis) **Base:** `main` ← **Head:** `hybrid` --- ### 📝 Commits (10+) - [`52253b4`](https://github.com/shadps4-emu/shadPS4/commit/52253b45fbf7a7b3eedffd55a026a5c4f3264387) Import memory - [`d5e45fb`](https://github.com/shadps4-emu/shadPS4/commit/d5e45fb4923f32e22b0be57a2bf7c822e1744a58) 64K pages and fix memory mapping - [`83255ee`](https://github.com/shadps4-emu/shadPS4/commit/83255ee68f74a2dc6b64c99529434044d220d413) Queue coverage - [`94a0782`](https://github.com/shadps4-emu/shadPS4/commit/94a078207fec06881ff8d7dbf90f76b13ea045b6) Buffer syncing, faulted readback adn BDA in Buffer - [`c077fb9`](https://github.com/shadps4-emu/shadPS4/commit/c077fb97da133529ca164558758b06d2c02a81b8) Base DMA implementation - [`9356779`](https://github.com/shadps4-emu/shadPS4/commit/9356779bb38455a752cce6d444aff0209203cbc7) Preparations for implementing SPV DMA access - [`68a33cd`](https://github.com/shadps4-emu/shadPS4/commit/68a33cd38cc2ed45a544601c952fe7ea38adf08e) Base impl (pending 16K pages and getbuffersize) - [`31df795`](https://github.com/shadps4-emu/shadPS4/commit/31df79570138f360f92e6ef4a4db01be6923bc35) 16K pages and stack overflow fix - [`d89d937`](https://github.com/shadps4-emu/shadPS4/commit/d89d937501fed5f2992c53f6aa01148a6a0cf287) clang-format - [`20aacec`](https://github.com/shadps4-emu/shadPS4/commit/20aacec41f342ddc47c689621917fddc0d375a64) clang-format but for real this time ### 📊 Changes **40 files changed** (+1641 additions, -311 deletions) <details> <summary>View changed files</summary> 📝 `CMakeLists.txt` (+3 -0) 📝 `externals/sirit` (+1 -1) ➕ `src/common/recursive_lock.cpp` (+37 -0) ➕ `src/common/recursive_lock.h` (+67 -0) 📝 `src/common/slot_vector.h` (+109 -1) 📝 `src/shader_recompiler/backend/spirv/emit_spirv.cpp` (+9 -2) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_atomic.cpp` (+3 -3) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_context_get_set.cpp` (+40 -41) 📝 `src/shader_recompiler/backend/spirv/emit_spirv_instructions.h` (+1 -1) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.cpp` (+185 -13) 📝 `src/shader_recompiler/backend/spirv/spirv_emit_context.h` (+106 -11) 📝 `src/shader_recompiler/frontend/translate/scalar_memory.cpp` (+7 -6) 📝 `src/shader_recompiler/info.h` (+12 -2) ➕ `src/shader_recompiler/ir/abstract_syntax_list.cpp` (+44 -0) 📝 `src/shader_recompiler/ir/abstract_syntax_list.h` (+5 -0) 📝 `src/shader_recompiler/ir/passes/shader_info_collection_pass.cpp` (+27 -4) 📝 `src/shader_recompiler/ir/program.cpp` (+30 -4) 📝 `src/shader_recompiler/ir/program.h` (+1 -1) 📝 `src/shader_recompiler/recompiler.cpp` (+2 -0) 📝 `src/video_core/amdgpu/liverpool.cpp` (+1 -0) _...and 20 more files_ </details> ### 📄 Description This implements arbitrary memory access from the GPU to CPU memory. When mapping memory, tries to import host memory with VK_KHR_external_memory. When a buffer is created in the same address range of an improted region, device local memory is used. Buffer Device Addresses are used to get pointers to the data in shaders. A BDA pagetable buffer is created on the GPU that holds the BDA for every 16K page. Additionaly a "fault" buffer is also created which holds a bitfield that tells the CPU if a non GPU local address has been accessed from a shader. This is done by embedding in the LSB of the BDA if a page is cached by the buffer cache. If a page has been accessed by the GPU and is not cached by the buffer cache it will be cached before the next frame. It may happen that the host memory import fails. (on AMD GPUs on Linux sometimes), in those cases, a fallback value is used and the page bit is set in the "fault" buffer. Problems: * I am not able to make the "fault" buffer work. Either the shader is not writing or the CPU side is not reading. * Seems wrong to do `scheduler.Finish();` every time I download the "fault" buffer. in `BufferCache::CreateFaultBuffers`, If there is a better way to sync I can implement it. * @raphaelthegreat, you told me in the other PR that I need 64K pages to do the BDA LSB optimization. I can't use 64K pages because we could accidetly try to track non-GPU memory. Hpe 16K is fine. * Can't be merged until https://github.com/shadps4-emu/sirit/pull/11 is merged. Currently Sirit is pointing to my fork. * I am not able to do a lot of testing right now, so this needs testing. * Probably something I forgot to mention. This is currently used for ReadConst, and work like this: 1. First, we try to access the memory if it exists in device local memory 2. If it fails (will only fail the first frame), we try to access host memory. 3. If it fails, we fallback to the old flatbuf method. 4. If it fails (the offset is synamic), we return zero. This can also potentialy be used in GetBufferSize. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>