[PR #118] [MERGED] core: Rewrite thread local storage implementation #1307

Closed
opened 2026-02-27 21:12:01 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/118
Author: @raphaelthegreat
Created: 4/30/2024
Status: Merged
Merged: 5/1/2024
Merged by: @raphaelthegreat

Base: mainHead: main


📝 Commits (1)

  • 495c002 core: Rewrite thread local storage implementation

📊 Changes

11 files changed (+175 additions, -188 deletions)

View changed files

📝 .gitmodules (+3 -0)
📝 CMakeLists.txt (+34 -32)
📝 externals/CMakeLists.txt (+4 -1)
externals/xbyak (+1 -0)
📝 src/common/logging/backend.cpp (+11 -21)
📝 src/core/linker.cpp (+19 -9)
📝 src/core/tls.cpp (+80 -112)
📝 src/core/tls.h (+8 -4)
📝 src/main.cpp (+0 -1)
📝 src/video_core/texture_cache/texture_cache.cpp (+9 -6)
📝 src/video_core/texture_cache/texture_cache.h (+6 -2)

📄 Description

It's not uncommon for ps4 guest applications to launch and use many threads, which also necessitates handling thread local storage properly. In x86 thread local accesses are performed by loading the pointer in the fs segment register. This is a problem as Windows doesn't allow you to change the value of this register to what the guest expects. Not quite true, see first reply

On master this is handled with a simple exception handler that will patch the value of the destination register with a thread_local buffer. This works fine but will be a problem later on. Obviously the performance impact is pretty large for any access. In addition, the new texture cache that does fault tracking also needs a custom exception handler, so they end up conflicting. Also, guest apps can use negative offsets when accessing the buffer, so the current implementation would trigger UB in these cases.

This PR attempts to fix all of the above, by using assembly trampolines instead of the exception handler. For storing the TLS image pointer, a new TLS slot is allocated from the parent process and the logic from wine's TlsGetValue is used to retrieve the value. This means we also don't have to rely on undefined/unused spaces in TEB structure to store our data. Each mov instruction from FS segment is patched with a jump to a trampoline that loads the actual pointer.

While at it, also fixed a problem with fault tracking that caused crashing in pngdec demo. The tracking was being performed in the texture cache page size, when it should be on 4KB boundary like the host/guest. Also bumped the cache page size to vastly reduce the amount of page table accesses.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/118 **Author:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Created:** 4/30/2024 **Status:** ✅ Merged **Merged:** 5/1/2024 **Merged by:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (1) - [`495c002`](https://github.com/shadps4-emu/shadPS4/commit/495c002579f339877685fb5f3155358dace538a4) core: Rewrite thread local storage implementation ### 📊 Changes **11 files changed** (+175 additions, -188 deletions) <details> <summary>View changed files</summary> 📝 `.gitmodules` (+3 -0) 📝 `CMakeLists.txt` (+34 -32) 📝 `externals/CMakeLists.txt` (+4 -1) ➕ `externals/xbyak` (+1 -0) 📝 `src/common/logging/backend.cpp` (+11 -21) 📝 `src/core/linker.cpp` (+19 -9) 📝 `src/core/tls.cpp` (+80 -112) 📝 `src/core/tls.h` (+8 -4) 📝 `src/main.cpp` (+0 -1) 📝 `src/video_core/texture_cache/texture_cache.cpp` (+9 -6) 📝 `src/video_core/texture_cache/texture_cache.h` (+6 -2) </details> ### 📄 Description It's not uncommon for ps4 guest applications to launch and use many threads, which also necessitates handling thread local storage properly. In x86 thread local accesses are performed by loading the pointer in the fs segment register. ~~This is a problem as Windows doesn't allow you to change the value of this register to what the guest expects~~. Not quite true, see first reply On master this is handled with a simple exception handler that will patch the value of the destination register with a thread_local buffer. This works fine but will be a problem later on. Obviously the performance impact is pretty large for any access. In addition, the new texture cache that does fault tracking also needs a custom exception handler, so they end up conflicting. Also, guest apps can use negative offsets when accessing the buffer, so the current implementation would trigger UB in these cases. This PR attempts to fix all of the above, by using assembly trampolines instead of the exception handler. For storing the TLS image pointer, a new TLS slot is allocated from the parent process and the logic from wine's TlsGetValue is used to retrieve the value. This means we also don't have to rely on undefined/unused spaces in TEB structure to store our data. Each mov instruction from FS segment is patched with a jump to a trampoline that loads the actual pointer. While at it, also fixed a problem with fault tracking that caused crashing in pngdec demo. The tracking was being performed in the texture cache page size, when it should be on 4KB boundary like the host/guest. Also bumped the cache page size to vastly reduce the amount of page table accesses. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 21:12:01 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shadPS4#1307
No description provided.