[PR #164] [MERGED] core: Properly implement TLS #1353

Closed
opened 2026-02-27 21:12:13 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/shadps4-emu/shadPS4/pull/164
Author: @raphaelthegreat
Created: 6/5/2024
Status: Merged
Merged: 6/5/2024
Merged by: @raphaelthegreat

Base: mainHead: tls


📝 Commits (4)

  • 40b68ac core: Split module code from linker
  • f86ef64 linker: Properly implement thread local storage
  • dcaedc8 kernel: Fix a few memory functions
  • 5d230cb kernel: Implement module loading

📊 Changes

26 files changed (+970 additions, -746 deletions)

View changed files

📝 CMakeLists.txt (+2 -1)
📝 src/common/alignment.h (+3 -3)
📝 src/core/address_space.cpp (+3 -3)
📝 src/core/address_space.h (+1 -1)
📝 src/core/libraries/kernel/libkernel.cpp (+43 -0)
📝 src/core/libraries/kernel/memory_management.cpp (+19 -7)
📝 src/core/libraries/kernel/memory_management.h (+3 -0)
📝 src/core/libraries/kernel/physical_memory.cpp (+1 -1)
📝 src/core/libraries/kernel/thread_management.cpp (+15 -0)
📝 src/core/linker.cpp (+183 -568)
📝 src/core/linker.h (+30 -117)
📝 src/core/loader/elf.cpp (+1 -1)
📝 src/core/loader/elf.h (+5 -2)
📝 src/core/loader/symbols_resolver.cpp (+6 -17)
📝 src/core/loader/symbols_resolver.h (+13 -4)
📝 src/core/memory.cpp (+4 -4)
📝 src/core/memory.h (+2 -2)
src/core/module.cpp (+419 -0)
src/core/module.h (+183 -0)
📝 src/core/tls.cpp (+12 -5)

...and 6 more files

📄 Description

Turns out thread local storage implementation on main was completely wrong and a lot of games are starting to get stuck on __tls_get_addr so it's time to actually implement it correctly. This was based on libkernel reversing. The Linker class was a natural place to implement this code, however it was starting to get quite large. So before adding TLS I split off the module loading code into a separate Module class for convenience reasons.

How does it work

This article has been a pretty big help in understand what is happening, alongside glibc code. The dynamic linker allocates space that contains a TLS block for each module, the size of each block being determined by the PT_TLS section. The section also provided an init image for us to copy over, which is often smaller than the actual section size.

The main module block lives right next to the TCB, which is what %FS points to, so the main application can access it directly using negative offsets from FS. For other modules, there exists an additional table, called the DTV table that is indexed by the TLS index of each module and stores the pointer to the TLS block of that module.

However libkernel doesn't seem to quite match what glibc is doing. Each entry in DTV table in glibc is 16-byte structure and dtv[0] contains the generation counter. In libkernel each DTV entry only seems to be 8 bytes. dtv[0] also holds generation counter but dtv[1] holds the number of allocated DTVs. This means that modules start at index 2 and onwards.

In addition the orbis dynamic linker has a few different paths for TLS bloc allocations. For newer SDK versions (> 1.7) primary thread allocation is done with flexible memory mapping, while secondary threads attempt to use the libc heap function API (probably since libc has already been loaded). Older sdks use plain malloc. Here I've implemented only the newer behaviour for both types of threads.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/shadps4-emu/shadPS4/pull/164 **Author:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Created:** 6/5/2024 **Status:** ✅ Merged **Merged:** 6/5/2024 **Merged by:** [@raphaelthegreat](https://github.com/raphaelthegreat) **Base:** `main` ← **Head:** `tls` --- ### 📝 Commits (4) - [`40b68ac`](https://github.com/shadps4-emu/shadPS4/commit/40b68acfb167d60453782b1f5af48a9a95d1faeb) core: Split module code from linker - [`f86ef64`](https://github.com/shadps4-emu/shadPS4/commit/f86ef649729ba78ac133aada453239aa9901fe6c) linker: Properly implement thread local storage - [`dcaedc8`](https://github.com/shadps4-emu/shadPS4/commit/dcaedc89a00778c513f82786eee5e9b28f660a49) kernel: Fix a few memory functions - [`5d230cb`](https://github.com/shadps4-emu/shadPS4/commit/5d230cb542b4cfd69c9be8b924e79f53cfcf4a25) kernel: Implement module loading ### 📊 Changes **26 files changed** (+970 additions, -746 deletions) <details> <summary>View changed files</summary> 📝 `CMakeLists.txt` (+2 -1) 📝 `src/common/alignment.h` (+3 -3) 📝 `src/core/address_space.cpp` (+3 -3) 📝 `src/core/address_space.h` (+1 -1) 📝 `src/core/libraries/kernel/libkernel.cpp` (+43 -0) 📝 `src/core/libraries/kernel/memory_management.cpp` (+19 -7) 📝 `src/core/libraries/kernel/memory_management.h` (+3 -0) 📝 `src/core/libraries/kernel/physical_memory.cpp` (+1 -1) 📝 `src/core/libraries/kernel/thread_management.cpp` (+15 -0) 📝 `src/core/linker.cpp` (+183 -568) 📝 `src/core/linker.h` (+30 -117) 📝 `src/core/loader/elf.cpp` (+1 -1) 📝 `src/core/loader/elf.h` (+5 -2) 📝 `src/core/loader/symbols_resolver.cpp` (+6 -17) 📝 `src/core/loader/symbols_resolver.h` (+13 -4) 📝 `src/core/memory.cpp` (+4 -4) 📝 `src/core/memory.h` (+2 -2) ➕ `src/core/module.cpp` (+419 -0) ➕ `src/core/module.h` (+183 -0) 📝 `src/core/tls.cpp` (+12 -5) _...and 6 more files_ </details> ### 📄 Description Turns out thread local storage implementation on main was completely wrong and a lot of games are starting to get stuck on __tls_get_addr so it's time to actually implement it correctly. This was based on libkernel reversing. The Linker class was a natural place to implement this code, however it was starting to get quite large. So before adding TLS I split off the module loading code into a separate Module class for convenience reasons. ### How does it work This [article](https://chao-tic.github.io/blog/2018/12/25/tls) has been a pretty big help in understand what is happening, alongside glibc code. The dynamic linker allocates space that contains a TLS block for each module, the size of each block being determined by the PT_TLS [section](https://github.com/bminor/glibc/blob/92c270d32caf3f8d5a02b8e46c7ec5d9d0315158/elf/dl-load.c#L1174). The section also provided an init image for us to copy over, which is often smaller than the actual section size. The main module block lives right next to the TCB, which is what %FS points to, so the main application can access it directly using negative offsets from FS. For other modules, there exists an additional table, called the DTV table that is indexed by the TLS index of each module and stores the pointer to the TLS block of that module. However libkernel doesn't seem to quite match what glibc is doing. Each entry in DTV table in glibc is 16-byte [structure](https://github.com/bminor/glibc/blob/92c270d32caf3f8d5a02b8e46c7ec5d9d0315158/sysdeps/generic/dl-dtv.h#L22) and dtv[0] contains the generation counter. In libkernel each DTV entry only seems to be 8 bytes. dtv[0] also holds generation counter but dtv[1] holds the number of allocated DTVs. This means that modules start at index 2 and onwards. In addition the orbis dynamic linker has a few different paths for TLS bloc allocations. For newer SDK versions (> 1.7) primary thread allocation is done with flexible memory mapping, while secondary threads attempt to use the libc heap function API (probably since libc has already been loaded). Older sdks use plain malloc. Here I've implemented only the newer behaviour for both types of threads. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 21:12:13 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shadPS4#1353
No description provided.