[GH-ISSUE #4049] [GAME BUG]: The Gravity Rush 2 renders a downsized copy of the scene in the top‑left quarter of the screen, creating a recursive feedback effect.

kerem commented

2026-02-27 21:10:29 +03:00

Owner

Originally created by @wuguo13842 on GitHub (Feb 18, 2026).
Original GitHub issue: https://github.com/shadps4-emu/shadPS4/issues/4049

I'm sorry, my English is not very good, the text is generated by "ai", I'm sorry.!
I'm not very familiar with the process on GitHub, sorry.
Do you want me to grab any more information?

Checklist (we expect you to perform these steps before opening the issue)

I have searched for a similar issue in this repository and did not find one.
I am using an official build obtained from releases or updated one of those builds using its in-app updater.
I have re-dumped the game and performed a clean install without mods and the issue is still present.
I have disabled all patches and cheats and the issue is still present.
I have all the required system modules installed.

Describe the Bug

I'm experiencing a "picture-in-picture" rendering issue exclusively on my NVIDIA RTX 2070 Ti (Windows) while testing Gravity Rush 2 with this PR (gr2fix branch). The problem does not occur on AMD GPUs or Linux systems according to user reports.

Description
The game renders a downsized copy of the scene in the top‑left quarter of the screen, creating a recursive feedback effect. The rest of the screen renders correctly.

Investigation so far

The fragment shader that blends the depth‑of‑field layer uses correct full‑screen UV coordinates.

The compute shader (hash 0x8503bcb7) writes a full‑screen texture but only processes pixels whose coordinates are less than two values read from ssbo_1 at indices 4 and 5.

The original values at those indices are 1920.0 and 1088.0 (as floats). Even after forcing them to 4096.0 (to cover the whole screen), the problem persists.

Logs confirm that the forced values are successfully written.

This suggests the issue may lie elsewhere – possibly a missing memory barrier, incorrect image layout transition, or resource aliasing that only manifests on NVIDIA drivers.

Steps to reproduce

Build this PR on Windows with an NVIDIA GPU.

Launch Gravity Rush 2 and observe the top‑left corner after the first few frames.

I've attached relevant logs showing the forced buffer writes. Any insight or guidance on further debugging would be greatly appreciated. Thanks!

is RenderDoc file here
https://1drv.ms/u/c/76f2fe0b033370d0/IQDQBFqC2Yu5SafVTR9YW2c7AUZ2Do3FzaDERknHSLaFSPQ?e=d653BN
https://1drv.ms/u/c/76f2fe0b033370d0/IQDA57nZJEUIRoMFCp6v_DyjAeJFSS2lT-xzRF9ZFdE4rl8?e=6maewH

CUSA04934.log
Do you want me to grab any more information?

Reproduction Steps

After setting up the compilation environment on Windows and successfully compiling, I attempted to modify the code to fix some errors in the repair log. While some fixes succeeded, unfortunately, they were unrelated to the issue at hand.
I began analyzing the image cache and RenderDoc data, but due to limited capabilities, my attempts at modification consistently failed.

Specify OS Version

Windows 11 26100.4202

CPU

4790k

GPU

NVIDIA RTX 2070 Ti

Amount of RAM in GB

32G

Amount of VRAM in GB

8G

Log File

CUSA04934.log
Do you want me to grab any more information?

Originally created by @wuguo13842 on GitHub (Feb 18, 2026). Original GitHub issue: https://github.com/shadps4-emu/shadPS4/issues/4049 **I'm sorry, my English is not very good, the text is generated by "ai", I'm sorry.!** **I'm not very familiar with the process on GitHub, sorry.** **Do you want me to grab any more information?** ### Checklist (we expect you to perform these steps before opening the issue) - [x] I have searched for a similar issue in this repository and did not find one. - [x] I am using an official build obtained from [releases](https://github.com/shadps4-emu/shadPS4/releases) or updated one of those builds using its in-app updater. - [x] I have re-dumped the game and performed a clean install without mods and the issue is still present. - [x] I have disabled all patches and cheats and the issue is still present. - [x] I have all the required [system modules](https://github.com/shadps4-emu/shadPS4/wiki/I.-Quick-start-%5BUsers%5D#4-dumping-firmware-modules) installed. ### Describe the Bug <img width="1280" height="720" alt="000" src="https://github.com/user-attachments/assets/fb5874a2-5578-48f3-a4d5-2b76666b8e67" /> I'm experiencing a "picture-in-picture" rendering issue exclusively on my NVIDIA RTX 2070 Ti (Windows) while testing Gravity Rush 2 with this PR (gr2fix branch). The problem does not occur on AMD GPUs or Linux systems according to user reports. Description The game renders a downsized copy of the scene in the top‑left quarter of the screen, creating a recursive feedback effect. The rest of the screen renders correctly. Investigation so far The fragment shader that blends the depth‑of‑field layer uses correct full‑screen UV coordinates. The compute shader (hash 0x8503bcb7) writes a full‑screen texture but only processes pixels whose coordinates are less than two values read from ssbo_1 at indices 4 and 5. The original values at those indices are 1920.0 and 1088.0 (as floats). Even after forcing them to 4096.0 (to cover the whole screen), the problem persists. Logs confirm that the forced values are successfully written. This suggests the issue may lie elsewhere – possibly a missing memory barrier, incorrect image layout transition, or resource aliasing that only manifests on NVIDIA drivers. Steps to reproduce Build this PR on Windows with an NVIDIA GPU. Launch Gravity Rush 2 and observe the top‑left corner after the first few frames. I've attached relevant logs showing the forced buffer writes. Any insight or guidance on further debugging would be greatly appreciated. Thanks! [is RenderDoc file here](https://1drv.ms/u/c/76f2fe0b033370d0/IQDQBFqC2Yu5SafVTR9YW2c7AUZ2Do3FzaDERknHSLaFSPQ?e=d653BN) https://1drv.ms/u/c/76f2fe0b033370d0/IQDQBFqC2Yu5SafVTR9YW2c7AUZ2Do3FzaDERknHSLaFSPQ?e=d653BN https://1drv.ms/u/c/76f2fe0b033370d0/IQDA57nZJEUIRoMFCp6v_DyjAeJFSS2lT-xzRF9ZFdE4rl8?e=6maewH [CUSA04934.log](https://github.com/user-attachments/files/25394843/CUSA04934.log) Do you want me to grab any more information? ### Reproduction Steps After setting up the compilation environment on Windows and successfully compiling, I attempted to modify the code to fix some errors in the repair log. While some fixes succeeded, unfortunately, they were unrelated to the issue at hand. I began analyzing the image cache and RenderDoc data, but due to limited capabilities, my attempts at modification consistently failed. ### Specify OS Version Windows 11 26100.4202 ### CPU 4790k ### GPU NVIDIA RTX 2070 Ti ### Amount of RAM in GB 32G ### Amount of VRAM in GB 8G ### Log File [CUSA04934.log](https://github.com/user-attachments/files/25394843/CUSA04934.log) Do you want me to grab any more information?

kerem commented

2026-02-27 21:10:30 +03:00

Author

Owner

@wuguo13842 commented on GitHub (Feb 18, 2026):

I observed that displays appear normal for Linux users.
Some users also reported that AMD graphics cards function properly on Windows systems.
However, the second report is unverified and may be inaccurate—it could refer to AMD graphics cards on Linux instead.

@wuguo13842 commented on GitHub (Feb 18, 2026): 1. I observed that displays appear normal for Linux users. 2. Some users also reported that AMD graphics cards function properly on Windows systems. However, the second report is unverified and may be inaccurate—it could refer to AMD graphics cards on Linux instead.

kerem commented

2026-02-27 21:10:30 +03:00

Author

Owner

@wuguo13842 commented on GitHub (Feb 19, 2026):

Oh, and one more thing—I saw a fix for Nvidia's picture-in-picture issue in the commit history. Compiling it myself and forcing it to run on “Gravity Rush 2” didn't solve the problem.

@wuguo13842 commented on GitHub (Feb 19, 2026): Oh, and one more thing—I saw a fix for Nvidia's picture-in-picture issue in the commit history. Compiling it myself and forcing it to run on “Gravity Rush 2” didn't solve the problem.

kerem commented

2026-02-27 21:10:30 +03:00

Author

Owner

@Randomuser8219 commented on GitHub (Feb 19, 2026):

The whole issue is because IMAGE_MIP_STORE is not implemented on NVIDIA.

@Randomuser8219 commented on GitHub (Feb 19, 2026): The whole issue is because IMAGE_MIP_STORE is not implemented on NVIDIA.

kerem commented

2026-02-27 21:10:30 +03:00

Author

Owner

@wuguo13842 commented on GitHub (Feb 19, 2026):

The whole issue is because IMAGE_MIP_STORE is not implemented on NVIDIA.

I figured it out—the emulator leverages AMD's proprietary “VK_AMD_shader_image_load_store_lod” API for hardware acceleration.
The emulator doesn't include VK API support code for Nvidia cards.

NVIDIA 驱动不支持 VK_AMD_shader_image_load_store_lod ，我们确实不应该再纠结于修改着色器代码去“模拟”显式 LOD 写入，而是应该利用 NVIDIA 硬件原生支持的标准 Vulkan 功能来实现同样的目标：为纹理生成完整的 mip 链。

以下是三种完全基于 NVIDIA 硬件支持的替代方案，您可以评估哪个更适合集成到模拟器中

VK_AMD_shader_image_load_store_lod extension. Since your current code tries to use this AMD-specific extension, it fails on NVIDIA hardware, leading to the "画中画" (picture-in-picture) effect. The solution is to stop relying on this unsupported extension and instead use standard Vulkan features that are fully supported on NVIDIA.

Here is a comparison of three alternative solutions that use NVIDIA's native capabilities. Please review them and tell me which one you would like to implement.

@wuguo13842 commented on GitHub (Feb 19, 2026): > The whole issue is because IMAGE_MIP_STORE is not implemented on NVIDIA. I figured it out—the emulator leverages AMD's proprietary “VK_AMD_shader_image_load_store_lod” API for hardware acceleration. The emulator doesn't include VK API support code for Nvidia cards. NVIDIA 驱动不支持 VK_AMD_shader_image_load_store_lod ，我们确实不应该再纠结于修改着色器代码去“模拟”显式 LOD 写入，而是应该利用 NVIDIA 硬件原生支持的标准 Vulkan 功能来实现同样的目标：为纹理生成完整的 mip 链。以下是三种完全基于 NVIDIA 硬件支持的替代方案，您可以评估哪个更适合集成到模拟器中 <img width="1395" height="403" alt="Image" src="https://github.com/user-attachments/assets/e2081d48-5a81-4cab-96ae-292d8ecb7656" /> VK_AMD_shader_image_load_store_lod extension. Since your current code tries to use this AMD-specific extension, it fails on NVIDIA hardware, leading to the "画中画" (picture-in-picture) effect. The solution is to stop relying on this unsupported extension and instead use standard Vulkan features that are fully supported on NVIDIA. Here is a comparison of three alternative solutions that use NVIDIA's native capabilities. Please review them and tell me which one you would like to implement. <img width="1408" height="545" alt="Image" src="https://github.com/user-attachments/assets/9da78875-ed40-43c8-b537-7ede9f315efe" />

kerem commented

2026-02-27 21:10:31 +03:00

Author

Owner

@wuguo13842 commented on GitHub (Feb 19, 2026):

### Insufficient capacity to solve problems，It cannot be solved

image.h
bool generated_mip_chain = false;

// SPDX-FileCopyrightText: Copyright 2024 shadPS4 Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later

#pragma once

#include "common/recursive_lock.h"
#include "common/shared_first_mutex.h"
#include "video_core/buffer_cache/buffer_cache.h"
#include "video_core/page_manager.h"
#include "video_core/renderer_vulkan/vk_pipeline_cache.h"
#include "video_core/texture_cache/texture_cache.h"

namespace AmdGpu {
struct Liverpool;
}

namespace Core {
class MemoryManager;
}

namespace Vulkan {

class Scheduler;
class RenderState;
class GraphicsPipeline;

class Rasterizer {
public:
    explicit Rasterizer(const Instance& instance, Scheduler& scheduler,
                        AmdGpu::Liverpool* liverpool);
    ~Rasterizer();

    [[nodiscard]] Scheduler& GetScheduler() noexcept {
        return scheduler;
    }

    [[nodiscard]] VideoCore::BufferCache& GetBufferCache() noexcept {
        return buffer_cache;
    }

    [[nodiscard]] VideoCore::TextureCache& GetTextureCache() noexcept {
        return texture_cache;
    }

    void Draw(bool is_indexed, u32 index_offset = 0);
    void DrawIndirect(bool is_indexed, VAddr arg_address, u32 offset, u32 stride, u32 max_count,
                      VAddr count_address);

    void DispatchDirect();
    void DispatchIndirect(VAddr address, u32 offset, u32 size);

    void ScopeMarkerBegin(const std::string_view& str, bool from_guest = false);
    void ScopeMarkerEnd(bool from_guest = false);
    void ScopedMarkerInsert(const std::string_view& str, bool from_guest = false);
    void ScopedMarkerInsertColor(const std::string_view& str, const u32 color,
                                 bool from_guest = false);

    void FillBuffer(VAddr address, u32 num_bytes, u32 value, bool is_gds);
    void CopyBuffer(VAddr dst, VAddr src, u32 num_bytes, bool dst_gds, bool src_gds);
    u32 ReadDataFromGds(u32 gsd_offset);
    bool InvalidateMemory(VAddr addr, u64 size);
    bool ReadMemory(VAddr addr, u64 size);
    bool IsMapped(VAddr addr, u64 size);
    void MapMemory(VAddr addr, u64 size);
    void UnmapMemory(VAddr addr, u64 size);

    void CpSync();
    u64 Flush();
    void Finish();
    void OnSubmit();

    PipelineCache& GetPipelineCache() {
        return pipeline_cache;
    }

    template <typename Func>
    void ForEachMappedRangeInRange(VAddr addr, u64 size, Func&& func) {
        const auto range = decltype(mapped_ranges)::interval_type::right_open(addr, addr + size);
        Common::RecursiveSharedLock lock{mapped_ranges_mutex};
        for (const auto& mapped_range : (mapped_ranges & range)) {
            func(mapped_range);
        }
    }

private:
    void PrepareRenderState(const GraphicsPipeline* pipeline);
    RenderState BeginRendering(const GraphicsPipeline* pipeline);
    void Resolve();
    void DepthStencilCopy(bool is_depth, bool is_stencil);
    void EliminateFastClear();

    void UpdateDynamicState(const GraphicsPipeline* pipeline, bool is_indexed) const;
    void UpdateViewportScissorState() const;
    void UpdateDepthStencilState() const;
    void UpdatePrimitiveState(bool is_indexed) const;
    void UpdateRasterizationState() const;
    void UpdateColorBlendingState(const GraphicsPipeline* pipeline) const;

    bool FilterDraw();

    void BindBuffers(const Shader::Info& stage, Shader::Backend::Bindings& binding,
                     Shader::PushData& push_data);
    void BindTextures(const Shader::Info& stage, Shader::Backend::Bindings& binding);
    bool BindResources(const Pipeline* pipeline);

    void ResetBindings() {
        for (auto& image_id : bound_images) {
            texture_cache.GetImage(image_id).binding = {};
        }
        bound_images.clear();
    }

    bool IsComputeMetaClear(const Pipeline* pipeline);
    bool IsComputeImageCopy(const Pipeline* pipeline);
    bool IsComputeImageClear(const Pipeline* pipeline);

    // --- 新增：为特定计算着色器生成 mip 链 ---
    void GenerateMipChainForWrittenImages(const Shader::Info& cs_info);
    void GenerateMipChainForImage(VideoCore::Image& image);

private:
    friend class VideoCore::BufferCache;

    const Instance& instance;
    Scheduler& scheduler;
    VideoCore::PageManager page_manager;
    VideoCore::BufferCache buffer_cache;
    VideoCore::TextureCache texture_cache;
    AmdGpu::Liverpool* liverpool;
    Core::MemoryManager* memory;
    boost::icl::interval_set<VAddr> mapped_ranges;
    Common::SharedFirstMutex mapped_ranges_mutex;
    PipelineCache pipeline_cache;

    using RenderTargetInfo = std::pair<VideoCore::ImageId, VideoCore::TextureCache::ImageDesc>;
    std::array<RenderTargetInfo, AmdGpu::NUM_COLOR_BUFFERS> cb_descs;
    std::pair<VideoCore::ImageId, VideoCore::TextureCache::ImageDesc> db_desc;
    boost::container::static_vector<vk::DescriptorImageInfo, Shader::NUM_IMAGES> image_infos;
    boost::container::static_vector<vk::DescriptorBufferInfo, Shader::NUM_BUFFERS> buffer_infos;
    boost::container::static_vector<VideoCore::ImageId, Shader::NUM_IMAGES> bound_images;

    Pipeline::DescriptorWrites set_writes;
    Pipeline::BufferBarriers buffer_barriers;
    Shader::PushData push_data;

    using BufferBindingInfo = std::tuple<VideoCore::BufferId, AmdGpu::Buffer, u64>;
    boost::container::static_vector<BufferBindingInfo, Shader::NUM_BUFFERS> buffer_bindings;
    using ImageBindingInfo = std::pair<VideoCore::ImageId, VideoCore::TextureCache::ImageDesc>;
    boost::container::static_vector<ImageBindingInfo, Shader::NUM_IMAGES> image_bindings;
    bool fault_process_pending{};
    bool attachment_feedback_loop{};
};

} // namespace Vulkan

// SPDX-FileCopyrightText: Copyright 2024 shadPS4 Emulator Project
// SPDX-License-Identifier: GPL-2.0-or-later

#include "common/config.h"
#include "common/debug.h"
#include "core/memory.h"
#include "shader_recompiler/runtime_info.h"
#include "video_core/amdgpu/liverpool.h"
#include "video_core/renderer_vulkan/liverpool_to_vk.h"
#include "video_core/renderer_vulkan/vk_instance.h"
#include "video_core/renderer_vulkan/vk_rasterizer.h"
#include "video_core/renderer_vulkan/vk_scheduler.h"
#include "video_core/renderer_vulkan/vk_shader_hle.h"
#include "video_core/texture_cache/image_view.h"
#include "video_core/texture_cache/texture_cache.h"

#ifdef MemoryBarrier
#undef MemoryBarrier
#endif

namespace Vulkan {

static Shader::PushData MakeUserData(const AmdGpu::Regs& regs) {
    // TODO(roamic): Add support for multiple viewports and geometry shaders when ViewportIndex
    // is encountered and implemented in the recompiler.
    Shader::PushData push_data{};
    push_data.xoffset = regs.viewport_control.xoffset_enable ? regs.viewports[0].xoffset : 0.f;
    push_data.xscale = regs.viewport_control.xscale_enable ? regs.viewports[0].xscale : 1.f;
    push_data.yoffset = regs.viewport_control.yoffset_enable ? regs.viewports[0].yoffset : 0.f;
    push_data.yscale = regs.viewport_control.yscale_enable ? regs.viewports[0].yscale : 1.f;
    return push_data;
}

Rasterizer::Rasterizer(const Instance& instance_, Scheduler& scheduler_,
                       AmdGpu::Liverpool* liverpool_)
    : instance{instance_}, scheduler{scheduler_}, page_manager{this},
      buffer_cache{instance, scheduler, liverpool_, texture_cache, page_manager},
      texture_cache{instance, scheduler, liverpool_, buffer_cache, page_manager},
      liverpool{liverpool_}, memory{Core::Memory::Instance()},
      pipeline_cache{instance, scheduler, liverpool} {
    if (!Config::nullGpu()) {
        liverpool->BindRasterizer(this);
    }
    memory->SetRasterizer(this);
}

Rasterizer::~Rasterizer() = default;

void Rasterizer::CpSync() {
    scheduler.EndRendering();
    auto cmdbuf = scheduler.CommandBuffer();

    const vk::MemoryBarrier ib_barrier{
        .srcAccessMask = vk::AccessFlagBits::eShaderWrite,
        .dstAccessMask = vk::AccessFlagBits::eIndirectCommandRead,
    };
    cmdbuf.pipelineBarrier(vk::PipelineStageFlagBits::eComputeShader,
                           vk::PipelineStageFlagBits::eDrawIndirect,
                           vk::DependencyFlagBits::eByRegion, ib_barrier, {}, {});
}

bool Rasterizer::FilterDraw() {
    const auto& regs = liverpool->regs;
    if (regs.color_control.mode == AmdGpu::ColorControl::OperationMode::EliminateFastClear) {
        // Clears the render target if FCE is launched before any draws
        EliminateFastClear();
        return false;
    }
    if (regs.color_control.mode == AmdGpu::ColorControl::OperationMode::FmaskDecompress) {
        // TODO: check for a valid MRT1 to promote the draw to the resolve pass.
        LOG_TRACE(Render_Vulkan, "FMask decompression pass skipped");
        ScopedMarkerInsert("FmaskDecompress");
        return false;
    }
    if (regs.color_control.mode == AmdGpu::ColorControl::OperationMode::Resolve) {
        LOG_TRACE(Render_Vulkan, "Resolve pass");
        Resolve();
        return false;
    }
    if (regs.primitive_type == AmdGpu::PrimitiveType::None) {
        LOG_TRACE(Render_Vulkan, "Primitive type 'None' skipped");
        ScopedMarkerInsert("PrimitiveTypeNone");
        return false;
    }

    const bool cb_disabled =
        regs.color_control.mode == AmdGpu::ColorControl::OperationMode::Disable;
    const auto depth_copy =
        regs.depth_render_override.force_z_dirty && regs.depth_render_override.force_z_valid &&
        regs.depth_buffer.DepthValid() && regs.depth_buffer.DepthWriteValid() &&
        regs.depth_buffer.DepthAddress() != regs.depth_buffer.DepthWriteAddress();
    const auto stencil_copy =
        regs.depth_render_override.force_stencil_dirty &&
        regs.depth_render_override.force_stencil_valid && regs.depth_buffer.StencilValid() &&
        regs.depth_buffer.StencilWriteValid() &&
        regs.depth_buffer.StencilAddress() != regs.depth_buffer.StencilWriteAddress();
    if (cb_disabled && (depth_copy || stencil_copy)) {
        // Games may disable color buffer and enable force depth/stencil dirty and valid to
        // do a copy from one depth-stencil surface to another, without a pixel shader.
        // We need to detect this case and perform the copy, otherwise it will have no effect.
        LOG_TRACE(Render_Vulkan, "Performing depth-stencil override copy");
        DepthStencilCopy(depth_copy, stencil_copy);
        return false;
    }

    return true;
}

void Rasterizer::PrepareRenderState(const GraphicsPipeline* pipeline) {
    // Prefetch render targets to handle overlaps with bound textures (e.g. mipgen)
    const auto& key = pipeline->GetGraphicsKey();
    const auto& regs = liverpool->regs;
    if (regs.color_control.degamma_enable) {
        LOG_WARNING(Render_Vulkan, "Color buffers require gamma correction");
    }

    const bool skip_cb_binding =
        regs.color_control.mode == AmdGpu::ColorControl::OperationMode::Disable;
    for (s32 cb = 0; cb < std::bit_width(key.mrt_mask); ++cb) {
        auto& [image_id, desc] = cb_descs[cb];
        const auto& col_buf = regs.color_buffers[cb];
        const u32 target_mask = regs.color_target_mask.GetMask(cb);
        if (skip_cb_binding || !col_buf || !target_mask || (key.mrt_mask & (1 << cb)) == 0) {
            image_id = {};
            continue;
        }
        const auto& hint = liverpool->last_cb_extent[cb];
        std::construct_at(&desc, col_buf, hint);
        image_id = bound_images.emplace_back(texture_cache.FindImage(desc));
        auto& image = texture_cache.GetImage(image_id);
        image.binding.is_target = 1u;
    }

    if ((regs.depth_control.depth_enable && regs.depth_buffer.DepthValid()) ||
        (regs.depth_control.stencil_enable && regs.depth_buffer.StencilValid())) {
        const auto htile_address = regs.depth_htile_data_base.GetAddress();
        const auto& hint = liverpool->last_db_extent;
        auto& [image_id, desc] = db_desc;
        std::construct_at(&desc, regs.depth_buffer, regs.depth_view, regs.depth_control,
                          htile_address, hint);
        image_id = bound_images.emplace_back(texture_cache.FindImage(desc));
        auto& image = texture_cache.GetImage(image_id);
        image.binding.is_target = 1u;
    } else {
        db_desc.first = {};
    }
}

static std::pair<u32, u32> GetDrawOffsets(
    const AmdGpu::Regs& regs, const Shader::Info& info,
    const std::optional<Shader::Gcn::FetchShaderData>& fetch_shader) {
    u32 vertex_offset = regs.index_offset;
    u32 instance_offset = 0;
    if (fetch_shader) {
        if (vertex_offset == 0 && fetch_shader->vertex_offset_sgpr != -1) {
            vertex_offset = info.user_data[fetch_shader->vertex_offset_sgpr];
        }
        if (fetch_shader->instance_offset_sgpr != -1) {
            instance_offset = info.user_data[fetch_shader->instance_offset_sgpr];
        }
    }
    return {vertex_offset, instance_offset};
}

void Rasterizer::EliminateFastClear() {
    auto& col_buf = liverpool->regs.color_buffers[0];
    if (!col_buf || !col_buf.info.fast_clear) {
        return;
    }
    VideoCore::TextureCache::ImageDesc desc(col_buf, liverpool->last_cb_extent[0]);
    const auto image_id = texture_cache.FindImage(desc);
    const auto& image_view = texture_cache.FindRenderTarget(image_id, desc);
    if (!texture_cache.IsMetaCleared(col_buf.CmaskAddress(), col_buf.view.slice_start)) {
        return;
    }
    for (u32 slice = col_buf.view.slice_start; slice <= col_buf.view.slice_max; ++slice) {
        texture_cache.TouchMeta(col_buf.CmaskAddress(), slice, false);
    }
    auto& image = texture_cache.GetImage(image_id);
    const auto clear_value = LiverpoolToVK::ColorBufferClearValue(col_buf);

    ScopeMarkerBegin(fmt::format("EliminateFastClear:MRT={:#x}:M={:#x}", col_buf.Address(),
                                 col_buf.CmaskAddress()));
    image.Clear(clear_value, desc.view_info.range);
    ScopeMarkerEnd();
}

void Rasterizer::Draw(bool is_indexed, u32 index_offset) {
    RENDERER_TRACE;

    scheduler.PopPendingOperations();

    if (!FilterDraw()) {
        return;
    }

    const auto& regs = liverpool->regs;
    const GraphicsPipeline* pipeline = pipeline_cache.GetGraphicsPipeline();
    if (!pipeline) {
        return;
    }

    PrepareRenderState(pipeline);
    if (!BindResources(pipeline)) {
        return;
    }
    const auto state = BeginRendering(pipeline);

    buffer_cache.BindVertexBuffers(*pipeline);
    if (is_indexed) {
        buffer_cache.BindIndexBuffer(index_offset);
    }

    pipeline->BindResources(set_writes, buffer_barriers, push_data);
    UpdateDynamicState(pipeline, is_indexed);
    scheduler.BeginRendering(state);

    const auto& vs_info = pipeline->GetStage(Shader::LogicalStage::Vertex);
    const auto& fetch_shader = pipeline->GetFetchShader();
    const auto [vertex_offset, instance_offset] = GetDrawOffsets(regs, vs_info, fetch_shader);

    const auto cmdbuf = scheduler.CommandBuffer();
    cmdbuf.bindPipeline(vk::PipelineBindPoint::eGraphics, pipeline->Handle());

    if (is_indexed) {
        cmdbuf.drawIndexed(regs.num_indices, regs.num_instances.NumInstances(), 0,
                           s32(vertex_offset), instance_offset);
    } else {
        cmdbuf.draw(regs.num_indices, regs.num_instances.NumInstances(), vertex_offset,
                    instance_offset);
    }

    ResetBindings();
}

void Rasterizer::DrawIndirect(bool is_indexed, VAddr arg_address, u32 offset, u32 stride,
                              u32 max_count, VAddr count_address) {
    RENDERER_TRACE;

    scheduler.PopPendingOperations();

    if (!FilterDraw()) {
        return;
    }

    const GraphicsPipeline* pipeline = pipeline_cache.GetGraphicsPipeline();
    if (!pipeline) {
        return;
    }

    PrepareRenderState(pipeline);
    if (!BindResources(pipeline)) {
        return;
    }
    const auto state = BeginRendering(pipeline);

    buffer_cache.BindVertexBuffers(*pipeline);
    if (is_indexed) {
        buffer_cache.BindIndexBuffer(0);
    }

    const auto& [buffer, base] =
        buffer_cache.ObtainBuffer(arg_address + offset, stride * max_count, false);

    VideoCore::Buffer* count_buffer{};
    u32 count_base{};
    if (count_address != 0) {
        std::tie(count_buffer, count_base) = buffer_cache.ObtainBuffer(count_address, 4, false);
    }

    pipeline->BindResources(set_writes, buffer_barriers, push_data);
    UpdateDynamicState(pipeline, is_indexed);
    scheduler.BeginRendering(state);

    // We can safely ignore both SGPR UD indices and results of fetch shader parsing, as vertex and
    // instance offsets will be automatically applied by Vulkan from indirect args buffer.

    const auto cmdbuf = scheduler.CommandBuffer();
    cmdbuf.bindPipeline(vk::PipelineBindPoint::eGraphics, pipeline->Handle());

    if (is_indexed) {
        ASSERT(sizeof(VkDrawIndexedIndirectCommand) == stride);

        if (count_address != 0) {
            cmdbuf.drawIndexedIndirectCount(buffer->Handle(), base, count_buffer->Handle(),
                                            count_base, max_count, stride);
        } else {
            cmdbuf.drawIndexedIndirect(buffer->Handle(), base, max_count, stride);
        }
    } else {
        ASSERT(sizeof(VkDrawIndirectCommand) == stride);

        if (count_address != 0) {
            cmdbuf.drawIndirectCount(buffer->Handle(), base, count_buffer->Handle(), count_base,
                                     max_count, stride);
        } else {
            cmdbuf.drawIndirect(buffer->Handle(), base, max_count, stride);
        }
    }

    ResetBindings();
}

void Rasterizer::DispatchDirect() {
    RENDERER_TRACE;

    scheduler.PopPendingOperations();

    const auto& cs_program = liverpool->GetCsRegs();
    const ComputePipeline* pipeline = pipeline_cache.GetComputePipeline();
    if (!pipeline) {
        return;
    }

    const auto& cs = pipeline->GetStage(Shader::LogicalStage::Compute);
    if (ExecuteShaderHLE(cs, liverpool->regs, cs_program, *this)) {
        return;
    }

    if (!BindResources(pipeline)) {
        return;
    }

    scheduler.EndRendering();
    pipeline->BindResources(set_writes, buffer_barriers, push_data);

    const auto cmdbuf = scheduler.CommandBuffer();
    cmdbuf.bindPipeline(vk::PipelineBindPoint::eCompute, pipeline->Handle());
    cmdbuf.dispatch(cs_program.dim_x, cs_program.dim_y, cs_program.dim_z);

    // --- 新增：为特定计算着色器生成 mip 链（仅当不支持 LOD 写入时）---
    const auto& cs_info = pipeline->GetStage(Shader::LogicalStage::Compute);
    if (!pipeline_cache.GetProfile().supports_image_load_store_lod &&
        cs_info.pgm_hash == 0x8503bcb7) {
        GenerateMipChainForWrittenImages(cs_info);
    }

    ResetBindings();
}

void Rasterizer::DispatchIndirect(VAddr address, u32 offset, u32 size) {
    RENDERER_TRACE;

    scheduler.PopPendingOperations();

    const auto& cs_program = liverpool->GetCsRegs();
    const ComputePipeline* pipeline = pipeline_cache.GetComputePipeline();
    if (!pipeline) {
        return;
    }

    if (!BindResources(pipeline)) {
        return;
    }

    const auto [buffer, base] = buffer_cache.ObtainBuffer(address + offset, size, false);

    scheduler.EndRendering();
    pipeline->BindResources(set_writes, buffer_barriers, push_data);

    const auto cmdbuf = scheduler.CommandBuffer();
    cmdbuf.bindPipeline(vk::PipelineBindPoint::eCompute, pipeline->Handle());
    cmdbuf.dispatchIndirect(buffer->Handle(), base);

    ResetBindings();
}

u64 Rasterizer::Flush() {
    const u64 current_tick = scheduler.CurrentTick();
    SubmitInfo info{};
    scheduler.Flush(info);
    return current_tick;
}

void Rasterizer::Finish() {
    scheduler.Finish();
}

void Rasterizer::OnSubmit() {
    if (fault_process_pending) {
        fault_process_pending = false;
        buffer_cache.ProcessFaultBuffer();
    }
    texture_cache.ProcessDownloadImages();
    texture_cache.RunGarbageCollector();
    buffer_cache.RunGarbageCollector();
}

bool Rasterizer::BindResources(const Pipeline* pipeline) {
    if (IsComputeImageCopy(pipeline) || IsComputeMetaClear(pipeline) ||
        IsComputeImageClear(pipeline)) {
        return false;
    }

    set_writes.clear();
    buffer_barriers.clear();
    buffer_infos.clear();
    image_infos.clear();

    bool uses_dma = false;

    // Bind resource buffers and textures.
    Shader::Backend::Bindings binding{};
    push_data = MakeUserData(liverpool->regs);
    for (const auto* stage : pipeline->GetStages()) {
        if (!stage) {
            continue;
        }
        stage->PushUd(binding, push_data);
        BindBuffers(*stage, binding, push_data);
        BindTextures(*stage, binding);
        uses_dma |= stage->uses_dma;
    }

    if (uses_dma) {
        // We only use fault buffer for DMA right now.
        Common::RecursiveSharedLock lock{mapped_ranges_mutex};
        for (auto& range : mapped_ranges) {
            buffer_cache.SynchronizeBuffersInRange(range.lower(), range.upper() - range.lower());
        }
        fault_process_pending = true;
    }

    return true;
}

bool Rasterizer::IsComputeMetaClear(const Pipeline* pipeline) {
    if (!pipeline->IsCompute()) {
        return false;
    }

    // Most of the time when a metadata is updated with a shader it gets cleared. It means
    // we can skip the whole dispatch and update the tracked state instead. Also, it is not
    // intended to be consumed and in such rare cases (e.g. HTile introspection, CRAA) we
    // will need its full emulation anyways.
    const auto& info = pipeline->GetStage(Shader::LogicalStage::Compute);

    // Assume if a shader reads metadata, it is a copy shader.
    for (const auto& desc : info.buffers) {
        const VAddr address = desc.GetSharp(info).base_address;
        if (!desc.IsSpecial() && !desc.is_written && texture_cache.IsMeta(address)) {
            return false;
        }
    }

    // Metadata surfaces are tiled and thus need address calculation to be written properly.
    // If a shader wants to encode HTILE, for example, from a depth image it will have to compute
    // proper tile address from dispatch invocation id. This address calculation contains an xor
    // operation so use it as a heuristic for metadata writes that are probably not clears.
    if (!info.has_bitwise_xor) {
        // Assume if a shader writes metadata without address calculation, it is a clear shader.
        for (const auto& desc : info.buffers) {
            const VAddr address = desc.GetSharp(info).base_address;
            if (!desc.IsSpecial() && desc.is_written && texture_cache.ClearMeta(address)) {
                // Assume all slices were updates
                LOG_TRACE(Render_Vulkan, "Metadata update skipped");
                return true;
            }
        }
    }
    return false;
}

bool Rasterizer::IsComputeImageCopy(const Pipeline* pipeline) {
    if (!pipeline->IsCompute()) {
        return false;
    }

    // Ensure shader only has 2 bound buffers
    const auto& cs_pgm = liverpool->GetCsRegs();
    const auto& info = pipeline->GetStage(Shader::LogicalStage::Compute);
    if (cs_pgm.num_thread_x.full != 64 || info.buffers.size() != 2 || !info.images.empty()) {
        return false;
    }

    // Those 2 buffers must both be formatted. One must be source and another destination.
    const auto& desc0 = info.buffers[0];
    const auto& desc1 = info.buffers[1];
    if (!desc0.is_formatted || !desc1.is_formatted || desc0.is_written == desc1.is_written) {
        return false;
    }

    // Buffers must have the same size and each thread of the dispatch must copy 1 dword of data
    const AmdGpu::Buffer buf0 = desc0.GetSharp(info);
    const AmdGpu::Buffer buf1 = desc1.GetSharp(info);
    if (buf0.GetSize() != buf1.GetSize() || cs_pgm.dim_x != (buf0.GetSize() / 256)) {
        return false;
    }

    // Find images the buffer alias
    const auto image0_id = texture_cache.FindImageFromRange(buf0.base_address, buf0.GetSize());
    if (!image0_id) {
        return false;
    }
    const auto image1_id =
        texture_cache.FindImageFromRange(buf1.base_address, buf1.GetSize(), false);
    if (!image1_id) {
        return false;
    }

    // Image copy must be valid
    VideoCore::Image& image0 = texture_cache.GetImage(image0_id);
    VideoCore::Image& image1 = texture_cache.GetImage(image1_id);
    if (image0.info.guest_size != image1.info.guest_size ||
        image0.info.pitch != image1.info.pitch || image0.info.guest_size != buf0.GetSize() ||
        image0.info.num_bits != image1.info.num_bits) {
        return false;
    }

    // Perform image copy
    VideoCore::Image& src_image = desc0.is_written ? image1 : image0;
    VideoCore::Image& dst_image = desc0.is_written ? image0 : image1;
    if (instance.IsMaintenance8Supported() ||
        src_image.info.props.is_depth == dst_image.info.props.is_depth) {
        dst_image.CopyImage(src_image);
    } else {
        const auto& copy_buffer =
            buffer_cache.GetUtilityBuffer(VideoCore::MemoryUsage::DeviceLocal);
        dst_image.CopyImageWithBuffer(src_image, copy_buffer.Handle(), 0);
    }
    dst_image.flags |= VideoCore::ImageFlagBits::GpuModified;
    dst_image.flags &= ~VideoCore::ImageFlagBits::Dirty;
    return true;
}

bool Rasterizer::IsComputeImageClear(const Pipeline* pipeline) {
    if (!pipeline->IsCompute()) {
        return false;
    }

    // Ensure shader only has 2 bound buffers
    const auto& cs_pgm = liverpool->GetCsRegs();
    const auto& info = pipeline->GetStage(Shader::LogicalStage::Compute);
    if (cs_pgm.num_thread_x.full != 64 || info.buffers.size() != 2 || !info.images.empty()) {
        return false;
    }

    // From those 2 buffers, first must hold the clear vector and second the image being cleared
    const auto& desc0 = info.buffers[0];
    const auto& desc1 = info.buffers[1];
    if (desc0.is_formatted || !desc1.is_formatted || desc0.is_written || !desc1.is_written) {
        return false;
    }

    // First buffer must have size of vec4 and second the size of a single layer
    const AmdGpu::Buffer buf0 = desc0.GetSharp(info);
    const AmdGpu::Buffer buf1 = desc1.GetSharp(info);
    const u32 buf1_bpp = AmdGpu::NumBitsPerBlock(buf1.GetDataFmt());
    if (buf0.GetSize() != 16 || (cs_pgm.dim_x * 128ULL * (buf1_bpp / 8)) != buf1.GetSize()) {
        return false;
    }

    // Find image the buffer alias
    const auto image1_id =
        texture_cache.FindImageFromRange(buf1.base_address, buf1.GetSize(), false);
    if (!image1_id) {
        return false;
    }

    // Image clear must be valid
    VideoCore::Image& image1 = texture_cache.GetImage(image1_id);
    if (image1.info.guest_size != buf1.GetSize() || image1.info.num_bits != buf1_bpp ||
        image1.info.props.is_depth) {
        return false;
    }

    // Perform image clear
    const float* values = reinterpret_cast<float*>(buf0.base_address);
    const vk::ClearValue clear = {
        .color = {.float32 = std::array<float, 4>{values[0], values[1], values[2], values[3]}},
    };
    const VideoCore::SubresourceRange range = {
        .base =
            {
                .level = 0,
                .layer = 0,
            },
        .extent = image1.info.resources,
    };
    image1.Clear(clear, range);
    image1.flags |= VideoCore::ImageFlagBits::GpuModified;
    image1.flags &= ~VideoCore::ImageFlagBits::Dirty;
    return true;
}

void Rasterizer::BindBuffers(const Shader::Info& stage, Shader::Backend::Bindings& binding,
                             Shader::PushData& push_data) {
    buffer_bindings.clear();

    for (const auto& desc : stage.buffers) {
        const auto vsharp = desc.GetSharp(stage);
        if (!desc.IsSpecial() && vsharp.base_address != 0 && vsharp.GetSize() > 0) {
            const u64 size = memory->ClampRangeSize(vsharp.base_address, vsharp.GetSize());
            const auto buffer_id = buffer_cache.FindBuffer(vsharp.base_address, size);
            buffer_bindings.emplace_back(buffer_id, vsharp, size);
        } else {
            buffer_bindings.emplace_back(VideoCore::BufferId{}, vsharp, 0);
        }
    }

    // Second pass to re-bind buffers that were updated after binding
    for (u32 i = 0; i < buffer_bindings.size(); i++) {
        const auto& [buffer_id, vsharp, size] = buffer_bindings[i];
        const auto& desc = stage.buffers[i];
        const bool is_storage = desc.IsStorage(vsharp);
        const u32 alignment =
            is_storage ? instance.StorageMinAlignment() : instance.UniformMinAlignment();
        // Buffer is not from the cache, either a special buffer or unbound.
        if (!buffer_id) {
            if (desc.buffer_type == Shader::BufferType::GdsBuffer) {
                const auto* gds_buf = buffer_cache.GetGdsBuffer();
                buffer_infos.emplace_back(gds_buf->Handle(), 0, gds_buf->SizeBytes());
            } else if (desc.buffer_type == Shader::BufferType::Flatbuf) {
                auto& vk_buffer = buffer_cache.GetUtilityBuffer(VideoCore::MemoryUsage::Stream);
                const u32 ubo_size = stage.flattened_ud_buf.size() * sizeof(u32);
                const u64 offset =
                    vk_buffer.Copy(stage.flattened_ud_buf.data(), ubo_size, alignment);
                buffer_infos.emplace_back(vk_buffer.Handle(), offset, ubo_size);
            } else if (desc.buffer_type == Shader::BufferType::BdaPagetable) {
                const auto* bda_buffer = buffer_cache.GetBdaPageTableBuffer();
                buffer_infos.emplace_back(bda_buffer->Handle(), 0, bda_buffer->SizeBytes());
            } else if (desc.buffer_type == Shader::BufferType::FaultBuffer) {
                const auto* fault_buffer = buffer_cache.GetFaultBuffer();
                buffer_infos.emplace_back(fault_buffer->Handle(), 0, fault_buffer->SizeBytes());
            } else if (desc.buffer_type == Shader::BufferType::SharedMemory) {
                auto& lds_buffer = buffer_cache.GetUtilityBuffer(VideoCore::MemoryUsage::Stream);
                const auto& cs_program = liverpool->GetCsRegs();
                const auto lds_size = cs_program.SharedMemSize() * cs_program.NumWorkgroups();
                const auto [data, offset] = lds_buffer.Map(lds_size, alignment);
                std::memset(data, 0, lds_size);
                buffer_infos.emplace_back(lds_buffer.Handle(), offset, lds_size);
            } else if (instance.IsNullDescriptorSupported()) {
                buffer_infos.emplace_back(VK_NULL_HANDLE, 0, VK_WHOLE_SIZE);
            } else {
                auto& null_buffer = buffer_cache.GetBuffer(VideoCore::NULL_BUFFER_ID);
                buffer_infos.emplace_back(null_buffer.Handle(), 0, VK_WHOLE_SIZE);
            }
        } else {
            const auto [vk_buffer, offset] = buffer_cache.ObtainBuffer(
                vsharp.base_address, size, desc.is_written, desc.is_formatted, buffer_id);
            const u32 offset_aligned = Common::AlignDown(offset, alignment);
            const u32 adjust = offset - offset_aligned;
            ASSERT(adjust % 4 == 0);
            push_data.AddOffset(binding.buffer, adjust);
            buffer_infos.emplace_back(vk_buffer->Handle(), offset_aligned, size + adjust);
            if (auto barrier =
                    vk_buffer->GetBarrier(desc.is_written ? vk::AccessFlagBits2::eShaderWrite
                                                          : vk::AccessFlagBits2::eShaderRead,
                                          vk::PipelineStageFlagBits2::eAllCommands)) {
                buffer_barriers.emplace_back(*barrier);
            }
            if (desc.is_written && desc.is_formatted) {
                texture_cache.InvalidateMemoryFromGPU(vsharp.base_address, size);
            }
        }

        set_writes.push_back({
            .dstSet = VK_NULL_HANDLE,
            .dstBinding = binding.unified++,
            .dstArrayElement = 0,
            .descriptorCount = 1,
            .descriptorType = is_storage ? vk::DescriptorType::eStorageBuffer
                                         : vk::DescriptorType::eUniformBuffer,
            .pBufferInfo = &buffer_infos.back(),
        });
        ++binding.buffer;
    }
}

void Rasterizer::BindTextures(const Shader::Info& stage, Shader::Backend::Bindings& binding) {
    image_bindings.clear();

    for (const auto& image_desc : stage.images) {
        const auto tsharp = image_desc.GetSharp(stage);
        if (texture_cache.IsMeta(tsharp.Address())) {
            LOG_WARNING(Render_Vulkan, "Unexpected metadata read by a shader (texture)");
        }

        if (tsharp.GetDataFmt() == AmdGpu::DataFormat::FormatInvalid) {
            image_bindings.emplace_back(std::piecewise_construct, std::tuple{}, std::tuple{});
            continue;
        }

        auto& [image_id, desc] = image_bindings.emplace_back(std::piecewise_construct, std::tuple{},
                                                             std::tuple{tsharp, image_desc});
        image_id = texture_cache.FindImage(desc);
        auto* image = &texture_cache.GetImage(image_id);
        if (image->depth_id) {
            // If this image has an associated depth image, it's a stencil attachment.
            // Redirect the access to the actual depth-stencil buffer.
            image_id = image->depth_id;
            image = &texture_cache.GetImage(image_id);
        }
        if (image->binding.is_bound) {
            // The image is already bound. In case if it is about to be used as storage we need
            // to force general layout on it.
            image->binding.force_general |= image_desc.is_written;
        }
        image->binding.is_bound = 1u;
    }

    // Second pass to re-bind images that were updated after binding
    for (auto& [image_id, desc] : image_bindings) {
        bool is_storage = desc.type == VideoCore::TextureCache::BindingType::Storage;
        if (!image_id) {
            if (instance.IsNullDescriptorSupported()) {
                image_infos.emplace_back(VK_NULL_HANDLE, VK_NULL_HANDLE, vk::ImageLayout::eGeneral);
            } else {
                auto& null_image_view = texture_cache.FindTexture(VideoCore::NULL_IMAGE_ID, desc);
                image_infos.emplace_back(VK_NULL_HANDLE, *null_image_view.image_view,
                                         vk::ImageLayout::eGeneral);
            }
        } else {
            if (auto& old_image = texture_cache.GetImage(image_id);
                old_image.binding.needs_rebind) {
                old_image.binding = {};
                image_id = texture_cache.FindImage(desc);
            }

            bound_images.emplace_back(image_id);

            auto& image = texture_cache.GetImage(image_id);
            auto& image_view = texture_cache.FindTexture(image_id, desc);

            // The image is either bound as storage in a separate descriptor or bound as render
            // target in feedback loop. Depth images are excluded because they can't be bound as
            // storage and feedback loop doesn't make sense for them
            if ((image.binding.force_general || image.binding.is_target) &&
                !image.info.props.is_depth) {
                image.Transit(instance.IsAttachmentFeedbackLoopLayoutSupported() &&
                                      image.binding.is_target
                                  ? vk::ImageLayout::eAttachmentFeedbackLoopOptimalEXT
                                  : vk::ImageLayout::eGeneral,
                              vk::AccessFlagBits2::eShaderRead |
                                  (image.info.props.is_depth
                                       ? vk::AccessFlagBits2::eDepthStencilAttachmentWrite
                                       : vk::AccessFlagBits2::eColorAttachmentWrite),
                              {});
            } else {
                if (is_storage) {
                    image.Transit(vk::ImageLayout::eGeneral,
                                  vk::AccessFlagBits2::eShaderRead |
                                      vk::AccessFlagBits2::eShaderWrite,
                                  desc.view_info.range);
                } else {
                    const auto new_layout = image.info.props.is_depth
                                                ? vk::ImageLayout::eDepthStencilReadOnlyOptimal
                                                : vk::ImageLayout::eShaderReadOnlyOptimal;
                    image.Transit(new_layout, vk::AccessFlagBits2::eShaderRead,
                                  desc.view_info.range);
                }
            }
            image.usage.storage |= is_storage;
            image.usage.texture |= !is_storage;

            image_infos.emplace_back(VK_NULL_HANDLE, *image_view.image_view,
                                     image.backing->state.layout);
        }

        set_writes.push_back({
            .dstSet = VK_NULL_HANDLE,
            .dstBinding = binding.unified++,
            .dstArrayElement = 0,
            .descriptorCount = 1,
            .descriptorType =
                is_storage ? vk::DescriptorType::eStorageImage : vk::DescriptorType::eSampledImage,
            .pImageInfo = &image_infos.back(),
        });
    }

    for (const auto& sampler : stage.samplers) {
        auto ssharp = sampler.GetSharp(stage);
        if (sampler.disable_aniso) {
            const auto& tsharp = stage.images[sampler.associated_image].GetSharp(stage);
            if (tsharp.base_level == 0 && tsharp.last_level == 0) {
                ssharp.max_aniso.Assign(AmdGpu::AnisoRatio::One);
            }
        }
        const auto vk_sampler = texture_cache.GetSampler(ssharp, liverpool->regs.ta_bc_base);
        image_infos.emplace_back(vk_sampler, VK_NULL_HANDLE, vk::ImageLayout::eGeneral);
        set_writes.push_back({
            .dstSet = VK_NULL_HANDLE,
            .dstBinding = binding.unified++,
            .dstArrayElement = 0,
            .descriptorCount = 1,
            .descriptorType = vk::DescriptorType::eSampler,
            .pImageInfo = &image_infos.back(),
        });
    }
}

RenderState Rasterizer::BeginRendering(const GraphicsPipeline* pipeline) {
    attachment_feedback_loop = false;
    const auto& regs = liverpool->regs;
    const auto& key = pipeline->GetGraphicsKey();
    RenderState state;
    state.width = instance.GetMaxFramebufferWidth();
    state.height = instance.GetMaxFramebufferHeight();
    state.num_layers = std::numeric_limits<u32>::max();
    state.num_color_attachments = std::bit_width(key.mrt_mask);
    for (auto cb = 0u; cb < state.num_color_attachments; ++cb) {
        auto& [image_id, desc] = cb_descs[cb];
        if (!image_id) {
            continue;
        }
        auto* image = &texture_cache.GetImage(image_id);
        if (image->binding.needs_rebind) {
            image_id = bound_images.emplace_back(texture_cache.FindImage(desc));
            image = &texture_cache.GetImage(image_id);
        }
        texture_cache.UpdateImage(image_id);
        image->SetBackingSamples(key.color_samples[cb]);
        const auto& image_view = texture_cache.FindRenderTarget(image_id, desc);
        const auto slice = image_view.info.range.base.layer;
        const auto mip = image_view.info.range.base.level;

        const auto& col_buf = regs.color_buffers[cb];
        const bool is_clear = texture_cache.IsMetaCleared(col_buf.CmaskAddress(), slice);
        texture_cache.TouchMeta(col_buf.CmaskAddress(), slice, false);

        if (image->binding.is_bound) {
            ASSERT_MSG(!image->binding.force_general,
                       "Having image both as storage and render target is unsupported");
            image->Transit(instance.IsAttachmentFeedbackLoopLayoutSupported()
                               ? vk::ImageLayout::eAttachmentFeedbackLoopOptimalEXT
                               : vk::ImageLayout::eGeneral,
                           vk::AccessFlagBits2::eColorAttachmentWrite, {});
            attachment_feedback_loop = true;
        } else {
            image->Transit(vk::ImageLayout::eColorAttachmentOptimal,
                           vk::AccessFlagBits2::eColorAttachmentWrite |
                               vk::AccessFlagBits2::eColorAttachmentRead,
                           desc.view_info.range);
        }

        state.width = std::min<u32>(state.width, std::max(image->info.size.width >> mip, 1u));
        state.height = std::min<u32>(state.height, std::max(image->info.size.height >> mip, 1u));
        state.num_layers = std::min<u32>(state.num_layers, image_view.info.range.extent.layers);
        state.color_attachments[cb] = {
            .imageView = *image_view.image_view,
            .imageLayout = image->backing->state.layout,
            .loadOp = is_clear ? vk::AttachmentLoadOp::eClear : vk::AttachmentLoadOp::eLoad,
            .storeOp = vk::AttachmentStoreOp::eStore,
            .clearValue =
                is_clear ? LiverpoolToVK::ColorBufferClearValue(col_buf) : vk::ClearValue{},
        };
        image->usage.render_target = 1u;
    }

    if (auto image_id = db_desc.first; image_id) {
        auto& desc = db_desc.second;
        const auto htile_address = regs.depth_htile_data_base.GetAddress();
        const auto& image_view = texture_cache.FindDepthTarget(image_id, desc);
        auto& image = texture_cache.GetImage(image_id);

        const auto slice = image_view.info.range.base.layer;
        const bool is_depth_clear = regs.depth_render_control.depth_clear_enable ||
                                    texture_cache.IsMetaCleared(htile_address, slice);
        const bool is_stencil_clear = regs.depth_render_control.stencil_clear_enable;
        texture_cache.TouchMeta(htile_address, slice, false);
        ASSERT(desc.view_info.range.extent.levels == 1 && !image.binding.needs_rebind);

        const bool has_stencil = image.info.props.has_stencil;
        const auto new_layout = desc.view_info.is_storage
                                    ? has_stencil ? vk::ImageLayout::eDepthStencilAttachmentOptimal
                                                  : vk::ImageLayout::eDepthAttachmentOptimal
                                : has_stencil ? vk::ImageLayout::eDepthStencilReadOnlyOptimal
                                              : vk::ImageLayout::eDepthReadOnlyOptimal;
        image.Transit(new_layout,
                      vk::AccessFlagBits2::eDepthStencilAttachmentWrite |
                          vk::AccessFlagBits2::eDepthStencilAttachmentRead,
                      desc.view_info.range);

        state.width = std::min<u32>(state.width, image.info.size.width);
        state.height = std::min<u32>(state.height, image.info.size.height);
        state.has_depth = regs.depth_buffer.DepthValid();
        state.has_stencil = regs.depth_buffer.StencilValid();
        state.num_layers = std::min<u32>(state.num_layers, image_view.info.range.extent.layers);
        if (state.has_depth) {
            state.depth_attachment = {
                .imageView = *image_view.image_view,
                .imageLayout = image.backing->state.layout,
                .loadOp =
                    is_depth_clear ? vk::AttachmentLoadOp::eClear : vk::AttachmentLoadOp::eLoad,
                .storeOp = vk::AttachmentStoreOp::eStore,
                .clearValue = vk::ClearValue{.depthStencil = {.depth = regs.depth_clear}},
            };
        }
        if (state.has_stencil) {
            state.stencil_attachment = {
                .imageView = *image_view.image_view,
                .imageLayout = image.backing->state.layout,
                .loadOp =
                    is_stencil_clear ? vk::AttachmentLoadOp::eClear : vk::AttachmentLoadOp::eLoad,
                .storeOp = vk::AttachmentStoreOp::eStore,
                .clearValue = vk::ClearValue{.depthStencil = {.stencil = regs.stencil_clear}},
            };
        }

        image.usage.depth_target = true;
    }

    if (state.num_layers == std::numeric_limits<u32>::max()) {
        state.num_layers = 1;
    }

    return state;
}

void Rasterizer::Resolve() {
    const auto& mrt0_hint = liverpool->last_cb_extent[0];
    const auto& mrt1_hint = liverpool->last_cb_extent[1];
    VideoCore::TextureCache::ImageDesc mrt0_desc{liverpool->regs.color_buffers[0], mrt0_hint};
    VideoCore::TextureCache::ImageDesc mrt1_desc{liverpool->regs.color_buffers[1], mrt1_hint};
    auto& mrt0_image = texture_cache.GetImage(texture_cache.FindImage(mrt0_desc, true));
    auto& mrt1_image = texture_cache.GetImage(texture_cache.FindImage(mrt1_desc, true));

    ScopeMarkerBegin(fmt::format("Resolve:MRT0={:#x}:MRT1={:#x}",
                                 liverpool->regs.color_buffers[0].Address(),
                                 liverpool->regs.color_buffers[1].Address()));
    mrt1_image.Resolve(mrt0_image, mrt0_desc.view_info.range, mrt1_desc.view_info.range);
    ScopeMarkerEnd();
}

void Rasterizer::DepthStencilCopy(bool is_depth, bool is_stencil) {
    auto& regs = liverpool->regs;

    auto read_desc = VideoCore::TextureCache::ImageDesc(
        regs.depth_buffer, regs.depth_view, regs.depth_control,
        regs.depth_htile_data_base.GetAddress(), liverpool->last_db_extent, false);
    auto write_desc = VideoCore::TextureCache::ImageDesc(
        regs.depth_buffer, regs.depth_view, regs.depth_control,
        regs.depth_htile_data_base.GetAddress(), liverpool->last_db_extent, true);

    auto& read_image = texture_cache.GetImage(texture_cache.FindImage(read_desc));
    auto& write_image = texture_cache.GetImage(texture_cache.FindImage(write_desc));

    VideoCore::SubresourceRange sub_range;
    sub_range.base.layer = liverpool->regs.depth_view.slice_start;
    sub_range.extent.layers = liverpool->regs.depth_view.NumSlices() - sub_range.base.layer;

    ScopeMarkerBegin(fmt::format(
        "DepthStencilCopy:DR={:#x}:SR={:#x}:DW={:#x}:SW={:#x}", regs.depth_buffer.DepthAddress(),
        regs.depth_buffer.StencilAddress(), regs.depth_buffer.DepthWriteAddress(),
        regs.depth_buffer.StencilWriteAddress()));

    read_image.Transit(vk::ImageLayout::eTransferSrcOptimal, vk::AccessFlagBits2::eTransferRead,
                       sub_range);
    write_image.Transit(vk::ImageLayout::eTransferDstOptimal, vk::AccessFlagBits2::eTransferWrite,
                        sub_range);

    auto aspect_mask = vk::ImageAspectFlags(0);
    if (is_depth) {
        aspect_mask |= vk::ImageAspectFlagBits::eDepth;
    }
    if (is_stencil) {
        aspect_mask |= vk::ImageAspectFlagBits::eStencil;
    }

    vk::ImageCopy region = {
        .srcSubresource =
            {
                .aspectMask = aspect_mask,
                .mipLevel = 0,
                .baseArrayLayer = sub_range.base.layer,
                .layerCount = sub_range.extent.layers,
            },
        .srcOffset = {0, 0, 0},
        .dstSubresource =
            {
                .aspectMask = aspect_mask,
                .mipLevel = 0,
                .baseArrayLayer = sub_range.base.layer,
                .layerCount = sub_range.extent.layers,
            },
        .dstOffset = {0, 0, 0},
        .extent = {write_image.info.size.width, write_image.info.size.height, 1},
    };
    scheduler.CommandBuffer().copyImage(read_image.GetImage(), vk::ImageLayout::eTransferSrcOptimal,
                                        write_image.GetImage(),
                                        vk::ImageLayout::eTransferDstOptimal, region);

    ScopeMarkerEnd();
}

void Rasterizer::FillBuffer(VAddr address, u32 num_bytes, u32 value, bool is_gds) {
    buffer_cache.FillBuffer(address, num_bytes, value, is_gds);
}

void Rasterizer::CopyBuffer(VAddr dst, VAddr src, u32 num_bytes, bool dst_gds, bool src_gds) {
    buffer_cache.CopyBuffer(dst, src, num_bytes, dst_gds, src_gds);
}

u32 Rasterizer::ReadDataFromGds(u32 gds_offset) {
    auto* gds_buf = buffer_cache.GetGdsBuffer();
    u32 value;
    std::memcpy(&value, gds_buf->mapped_data.data() + gds_offset, sizeof(u32));
    return value;
}

bool Rasterizer::InvalidateMemory(VAddr addr, u64 size) {
    if (!IsMapped(addr, size)) {
        // Not GPU mapped memory, can skip invalidation logic entirely.
        return false;
    }
    buffer_cache.InvalidateMemory(addr, size);
    texture_cache.InvalidateMemory(addr, size);
    return true;
}

bool Rasterizer::ReadMemory(VAddr addr, u64 size) {
    if (!IsMapped(addr, size)) {
        // Not GPU mapped memory, can skip invalidation logic entirely.
        return false;
    }
    buffer_cache.ReadMemory(addr, size);
    return true;
}

bool Rasterizer::IsMapped(VAddr addr, u64 size) {
    if (size == 0) {
        // There is no memory, so not mapped.
        return false;
    }
    if (static_cast<u64>(addr) > std::numeric_limits<u64>::max() - size) {
        // Memory range wrapped the address space, cannot be mapped.
        return false;
    }
    const auto range = decltype(mapped_ranges)::interval_type::right_open(addr, addr + size);

    Common::RecursiveSharedLock lock{mapped_ranges_mutex};
    return boost::icl::contains(mapped_ranges, range);
}

void Rasterizer::MapMemory(VAddr addr, u64 size) {
    {
        std::scoped_lock lock{mapped_ranges_mutex};
        mapped_ranges += decltype(mapped_ranges)::interval_type::right_open(addr, addr + size);
    }
    page_manager.OnGpuMap(addr, size);
}

void Rasterizer::UnmapMemory(VAddr addr, u64 size) {
    buffer_cache.InvalidateMemory(addr, size);
    texture_cache.UnmapMemory(addr, size);
    page_manager.OnGpuUnmap(addr, size);
    {
        std::scoped_lock lock{mapped_ranges_mutex};
        mapped_ranges -= decltype(mapped_ranges)::interval_type::right_open(addr, addr + size);
    }
}

void Rasterizer::UpdateDynamicState(const GraphicsPipeline* pipeline, const bool is_indexed) const {
    UpdateViewportScissorState();
    UpdateDepthStencilState();
    UpdatePrimitiveState(is_indexed);
    UpdateRasterizationState();
    UpdateColorBlendingState(pipeline);

    auto& dynamic_state = scheduler.GetDynamicState();
    dynamic_state.Commit(instance, scheduler.CommandBuffer());
}

void Rasterizer::UpdateViewportScissorState() const {
    const auto& regs = liverpool->regs;

    const auto combined_scissor_value_tl = [](s16 scr, s16 win, s16 gen, s16 win_offset) {
        return std::max({scr, s16(win + win_offset), s16(gen + win_offset)});
    };
    const auto combined_scissor_value_br = [](s16 scr, s16 win, s16 gen, s16 win_offset) {
        return std::min({scr, s16(win + win_offset), s16(gen + win_offset)});
    };
    const bool enable_offset = !regs.window_scissor.window_offset_disable;

    AmdGpu::Scissor scsr{};
    scsr.top_left_x = combined_scissor_value_tl(
        regs.screen_scissor.top_left_x, s16(regs.window_scissor.top_left_x),
        s16(regs.generic_scissor.top_left_x),
        enable_offset ? regs.window_offset.window_x_offset : 0);
    scsr.top_left_y = combined_scissor_value_tl(
        regs.screen_scissor.top_left_y, s16(regs.window_scissor.top_left_y),
        s16(regs.generic_scissor.top_left_y),
        enable_offset ? regs.window_offset.window_y_offset : 0);
    scsr.bottom_right_x = combined_scissor_value_br(
        regs.screen_scissor.bottom_right_x, regs.window_scissor.bottom_right_x,
        regs.generic_scissor.bottom_right_x,
        enable_offset ? regs.window_offset.window_x_offset : 0);
    scsr.bottom_right_y = combined_scissor_value_br(
        regs.screen_scissor.bottom_right_y, regs.window_scissor.bottom_right_y,
        regs.generic_scissor.bottom_right_y,
        enable_offset ? regs.window_offset.window_y_offset : 0);

    boost::container::static_vector<vk::Viewport, AmdGpu::NUM_VIEWPORTS> viewports;
    boost::container::static_vector<vk::Rect2D, AmdGpu::NUM_VIEWPORTS> scissors;

    if (regs.polygon_control.enable_window_offset &&
        (regs.window_offset.window_x_offset != 0 || regs.window_offset.window_y_offset != 0)) {
        LOG_ERROR(Render_Vulkan,
                  "PA_SU_SC_MODE_CNTL.VTX_WINDOW_OFFSET_ENABLE support is not yet implemented.");
    }

    const auto& vp_ctl = regs.viewport_control;
    for (u32 i = 0; i < AmdGpu::NUM_VIEWPORTS; i++) {
        const auto& vp = regs.viewports[i];
        const auto& vp_d = regs.viewport_depths[i];
        if (vp.xscale == 0) {
            continue;
        }

        const auto zoffset = vp_ctl.zoffset_enable ? vp.zoffset : 0.f;
        const auto zscale = vp_ctl.zscale_enable ? vp.zscale : 1.f;

        vk::Viewport viewport{};

        // https://gitlab.freedesktop.org/mesa/mesa/-/blob/209a0ed/src/amd/vulkan/radv_pipeline_graphics.c#L688-689
        // https://gitlab.freedesktop.org/mesa/mesa/-/blob/209a0ed/src/amd/vulkan/radv_cmd_buffer.c#L3103-3109
        // When the clip space is ranged [-1...1], the zoffset is centered.
        // By reversing the above viewport calculations, we get the following:
        if (regs.clipper_control.clip_space == AmdGpu::ClipSpace::MinusWToW) {
            viewport.minDepth = zoffset - zscale;
            viewport.maxDepth = zoffset + zscale;
        } else {
            viewport.minDepth = zoffset;
            viewport.maxDepth = zoffset + zscale;
        }

        if (!instance.IsDepthRangeUnrestrictedSupported()) {
            // Unrestricted depth range not supported by device. Restrict to valid range.
            viewport.minDepth = std::max(viewport.minDepth, 0.f);
            viewport.maxDepth = std::min(viewport.maxDepth, 1.f);
        }

        if (regs.IsClipDisabled()) {
            // In case if clipping is disabled we patch the shader to convert vertex position
            // from screen space coordinates to NDC by defining a render space as full hardware
            // window range [0..16383, 0..16383] and setting the viewport to its size.
            viewport.x = 0.f;
            viewport.y = 0.f;
            viewport.width = float(std::min<u32>(instance.GetMaxViewportWidth(), 16_KB));
            viewport.height = float(std::min<u32>(instance.GetMaxViewportHeight(), 16_KB));
        } else {
            const auto xoffset = vp_ctl.xoffset_enable ? vp.xoffset : 0.f;
            const auto xscale = vp_ctl.xscale_enable ? vp.xscale : 1.f;
            const auto yoffset = vp_ctl.yoffset_enable ? vp.yoffset : 0.f;
            const auto yscale = vp_ctl.yscale_enable ? vp.yscale : 1.f;

            viewport.x = xoffset - xscale;
            viewport.y = yoffset - yscale;
            viewport.width = xscale * 2.0f;
            viewport.height = yscale * 2.0f;
        }

        viewports.push_back(viewport);

        auto vp_scsr = scsr;
        if (regs.mode_control.vport_scissor_enable) {
            vp_scsr.top_left_x =
                std::max(vp_scsr.top_left_x, s16(regs.viewport_scissors[i].top_left_x));
            vp_scsr.top_left_y =
                std::max(vp_scsr.top_left_y, s16(regs.viewport_scissors[i].top_left_y));
            vp_scsr.bottom_right_x = std::min(AmdGpu::Scissor::Clamp(vp_scsr.bottom_right_x),
                                              regs.viewport_scissors[i].bottom_right_x);
            vp_scsr.bottom_right_y = std::min(AmdGpu::Scissor::Clamp(vp_scsr.bottom_right_y),
                                              regs.viewport_scissors[i].bottom_right_y);
        }
        scissors.push_back({
            .offset = {vp_scsr.top_left_x, vp_scsr.top_left_y},
            .extent = {vp_scsr.GetWidth(), vp_scsr.GetHeight()},
        });
    }

    if (viewports.empty()) {
        // Vulkan requires providing at least one viewport.
        constexpr vk::Viewport empty_viewport = {
            .x = -1.0f,
            .y = -1.0f,
            .width = 1.0f,
            .height = 1.0f,
            .minDepth = 0.0f,
            .maxDepth = 1.0f,
        };
        constexpr vk::Rect2D empty_scissor = {
            .offset = {0, 0},
            .extent = {1, 1},
        };
        viewports.push_back(empty_viewport);
        scissors.push_back(empty_scissor);
    }

    auto& dynamic_state = scheduler.GetDynamicState();
    dynamic_state.SetViewports(viewports);
    dynamic_state.SetScissors(scissors);
}

void Rasterizer::UpdateDepthStencilState() const {
    const auto& regs = liverpool->regs;
    auto& dynamic_state = scheduler.GetDynamicState();

    const auto depth_test_enabled =
        regs.depth_control.depth_enable && regs.depth_buffer.DepthValid();
    dynamic_state.SetDepthTestEnabled(depth_test_enabled);
    if (depth_test_enabled) {
        dynamic_state.SetDepthWriteEnabled(regs.depth_control.depth_write_enable &&
                                           !regs.depth_render_control.depth_clear_enable);
        dynamic_state.SetDepthCompareOp(LiverpoolToVK::CompareOp(regs.depth_control.depth_func));
    }

    const auto depth_bounds_test_enabled = regs.depth_control.depth_bounds_enable;
    dynamic_state.SetDepthBoundsTestEnabled(depth_bounds_test_enabled);
    if (depth_bounds_test_enabled) {
        dynamic_state.SetDepthBounds(regs.depth_bounds_min, regs.depth_bounds_max);
    }

    const auto depth_bias_enabled = regs.polygon_control.NeedsBias();
    dynamic_state.SetDepthBiasEnabled(depth_bias_enabled);
    if (depth_bias_enabled) {
        const bool front = regs.polygon_control.enable_polygon_offset_front;
        dynamic_state.SetDepthBias(
            front ? regs.poly_offset.front_offset : regs.poly_offset.back_offset,
            regs.poly_offset.depth_bias,
            (front ? regs.poly_offset.front_scale : regs.poly_offset.back_scale) / 16.f);
    }

    const auto stencil_test_enabled =
        regs.depth_control.stencil_enable && regs.depth_buffer.StencilValid();
    dynamic_state.SetStencilTestEnabled(stencil_test_enabled);
    if (stencil_test_enabled) {
        const StencilOps front_ops{
            .fail_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_fail_front),
            .pass_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_zpass_front),
            .depth_fail_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_zfail_front),
            .compare_op = LiverpoolToVK::CompareOp(regs.depth_control.stencil_ref_func),
        };
        const StencilOps back_ops = regs.depth_control.backface_enable ? StencilOps{
            .fail_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_fail_back),
            .pass_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_zpass_back),
            .depth_fail_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_zfail_back),
            .compare_op = LiverpoolToVK::CompareOp(regs.depth_control.stencil_bf_func),
        } : front_ops;
        dynamic_state.SetStencilOps(front_ops, back_ops);

        const bool stencil_clear = regs.depth_render_control.stencil_clear_enable;
        const auto front = regs.stencil_ref_front;
        const auto back =
            regs.depth_control.backface_enable ? regs.stencil_ref_back : regs.stencil_ref_front;
        dynamic_state.SetStencilReferences(front.stencil_test_val, back.stencil_test_val);
        dynamic_state.SetStencilWriteMasks(!stencil_clear ? front.stencil_write_mask : 0U,
                                           !stencil_clear ? back.stencil_write_mask : 0U);
        dynamic_state.SetStencilCompareMasks(front.stencil_mask, back.stencil_mask);
    }
}

void Rasterizer::UpdatePrimitiveState(const bool is_indexed) const {
    const auto& regs = liverpool->regs;
    auto& dynamic_state = scheduler.GetDynamicState();

    const auto prim_restart = (regs.enable_primitive_restart & 1) != 0;
    ASSERT_MSG(!is_indexed || !prim_restart || regs.primitive_restart_index == 0xFFFF ||
                   regs.primitive_restart_index == 0xFFFFFFFF,
               "Primitive restart index other than -1 is not supported yet");

    const auto cull_mode = LiverpoolToVK::IsPrimitiveCulled(regs.primitive_type)
                               ? LiverpoolToVK::CullMode(regs.polygon_control.CullingMode())
                               : vk::CullModeFlagBits::eNone;
    const auto front_face = LiverpoolToVK::FrontFace(regs.polygon_control.front_face);

    dynamic_state.SetPrimitiveRestartEnabled(prim_restart);
    dynamic_state.SetRasterizerDiscardEnabled(regs.clipper_control.dx_rasterization_kill);
    dynamic_state.SetCullMode(cull_mode);
    dynamic_state.SetFrontFace(front_face);
}

void Rasterizer::UpdateRasterizationState() const {
    const auto& regs = liverpool->regs;
    auto& dynamic_state = scheduler.GetDynamicState();
    dynamic_state.SetLineWidth(regs.line_control.Width());
}

void Rasterizer::UpdateColorBlendingState(const GraphicsPipeline* pipeline) const {
    const auto& regs = liverpool->regs;
    auto& dynamic_state = scheduler.GetDynamicState();
    dynamic_state.SetBlendConstants(regs.blend_constants);
    dynamic_state.SetColorWriteMasks(pipeline->GetGraphicsKey().write_masks);
    dynamic_state.SetAttachmentFeedbackLoopEnabled(attachment_feedback_loop);
}

void Rasterizer::ScopeMarkerBegin(const std::string_view& str, bool from_guest) {
    if ((from_guest && !Config::getVkGuestMarkersEnabled()) ||
        (!from_guest && !Config::getVkHostMarkersEnabled())) {
        return;
    }
    const auto cmdbuf = scheduler.CommandBuffer();
    cmdbuf.beginDebugUtilsLabelEXT(vk::DebugUtilsLabelEXT{
        .pLabelName = str.data(),
    });
}

void Rasterizer::ScopeMarkerEnd(bool from_guest) {
    if ((from_guest && !Config::getVkGuestMarkersEnabled()) ||
        (!from_guest && !Config::getVkHostMarkersEnabled())) {
        return;
    }
    const auto cmdbuf = scheduler.CommandBuffer();
    cmdbuf.endDebugUtilsLabelEXT();
}

void Rasterizer::ScopedMarkerInsert(const std::string_view& str, bool from_guest) {
    if ((from_guest && !Config::getVkGuestMarkersEnabled()) ||
        (!from_guest && !Config::getVkHostMarkersEnabled())) {
        return;
    }
    const auto cmdbuf = scheduler.CommandBuffer();
    cmdbuf.insertDebugUtilsLabelEXT(vk::DebugUtilsLabelEXT{
        .pLabelName = str.data(),
    });
}

void Rasterizer::ScopedMarkerInsertColor(const std::string_view& str, const u32 color,
                                         bool from_guest) {
    if ((from_guest && !Config::getVkGuestMarkersEnabled()) ||
        (!from_guest && !Config::getVkHostMarkersEnabled())) {
        return;
    }
    const auto cmdbuf = scheduler.CommandBuffer();
    cmdbuf.insertDebugUtilsLabelEXT(vk::DebugUtilsLabelEXT{
        .pLabelName = str.data(),
        .color = std::array<f32, 4>(
            {(f32)((color >> 16) & 0xff) / 255.0f, (f32)((color >> 8) & 0xff) / 255.0f,
             (f32)(color & 0xff) / 255.0f, (f32)((color >> 24) & 0xff) / 255.0f})});
}

// --- 新增：为特定计算着色器生成 mip 链 ---
void Rasterizer::GenerateMipChainForWrittenImages(const Shader::Info& cs_info) {
    for (const auto& img_desc : cs_info.images) {
        if (!img_desc.is_written) continue; // 仅处理被写入的图像

        auto tsharp = img_desc.GetSharp(cs_info);
        if (tsharp.GetDataFmt() == AmdGpu::DataFormat::FormatInvalid) continue;

        // 使用 tsharp 和 img_desc 构造 ImageDesc（与 BindTextures 中一致）
        VideoCore::TextureCache::ImageDesc desc(tsharp, img_desc);
        auto image_id = texture_cache.FindImage(desc);
        if (!image_id) continue;

        auto& image = texture_cache.GetImage(image_id);
        if (image.info.resources.levels <= 1) continue; // 无需 mip 链

        // 如果已经生成过，跳过（避免重复生成）
        if (image.generated_mip_chain) continue;

        GenerateMipChainForImage(image);
        image.generated_mip_chain = true;
    }
}

void Rasterizer::GenerateMipChainForImage(VideoCore::Image& image) {
    auto cmdbuf = scheduler.CommandBuffer();
    const auto& resources = image.info.resources;
    const vk::ImageAspectFlags aspect = image.info.props.is_depth
                                            ? vk::ImageAspectFlagBits::eDepth
                                            : vk::ImageAspectFlagBits::eColor;
    const vk::Filter filter = (aspect & vk::ImageAspectFlagBits::eDepth)
                                  ? vk::Filter::eNearest
                                  : vk::Filter::eLinear;

    // 屏障1：从计算着色器写入 -> 传输源/目标布局
    vk::ImageMemoryBarrier2 pre_barrier = {
        .srcStageMask = vk::PipelineStageFlagBits2::eComputeShader,
        .srcAccessMask = vk::AccessFlagBits2::eShaderWrite,
        .dstStageMask = vk::PipelineStageFlagBits2::eTransfer,
        .dstAccessMask = vk::AccessFlagBits2::eTransferRead | vk::AccessFlagBits2::eTransferWrite,
        .oldLayout = image.backing->state.layout,
        .newLayout = vk::ImageLayout::eTransferSrcOptimal,
        .image = image.GetImage(),
        .subresourceRange = {
            .aspectMask = aspect,
            .baseMipLevel = 0,
            .levelCount = resources.levels,
            .baseArrayLayer = 0,
            .layerCount = resources.layers,
        },
    };
    cmdbuf.pipelineBarrier2(vk::DependencyInfo{
        .imageMemoryBarrierCount = 1,
        .pImageMemoryBarriers = &pre_barrier,
    });

    // 生成 mip 链：从 level 0 到 level-1
    for (u32 level = 0; level < resources.levels - 1; ++level) {
        std::array src_offsets = {
            vk::Offset3D{0, 0, 0},
            vk::Offset3D{
                static_cast<int32_t>(std::max(1u, image.info.size.width >> level)),
                static_cast<int32_t>(std::max(1u, image.info.size.height >> level)),
                1
            }
        };
        std::array dst_offsets = {
            vk::Offset3D{0, 0, 0},
            vk::Offset3D{
                static_cast<int32_t>(std::max(1u, image.info.size.width >> (level + 1))),
                static_cast<int32_t>(std::max(1u, image.info.size.height >> (level + 1))),
                1
            }
        };

        vk::ImageBlit blitRegion{
            .srcSubresource = {
                .aspectMask = aspect,
                .mipLevel = level,
                .baseArrayLayer = 0,
                .layerCount = resources.layers,
            },
            .srcOffsets = src_offsets,
            .dstSubresource = {
                .aspectMask = aspect,
                .mipLevel = level + 1,
                .baseArrayLayer = 0,
                .layerCount = resources.layers,
            },
            .dstOffsets = dst_offsets,
        };
        cmdbuf.blitImage(image.GetImage(), vk::ImageLayout::eTransferSrcOptimal,
                         image.GetImage(), vk::ImageLayout::eTransferSrcOptimal,
                         blitRegion, filter);
    }

    // 屏障2：从传输写 -> 后续着色器读（转换为只读布局）
    vk::ImageMemoryBarrier2 post_barrier = {
        .srcStageMask = vk::PipelineStageFlagBits2::eTransfer,
        .srcAccessMask = vk::AccessFlagBits2::eTransferWrite,
        .dstStageMask = vk::PipelineStageFlagBits2::eFragmentShader | vk::PipelineStageFlagBits2::eComputeShader,
        .dstAccessMask = vk::AccessFlagBits2::eShaderRead,
        .oldLayout = vk::ImageLayout::eTransferSrcOptimal,
        .newLayout = vk::ImageLayout::eShaderReadOnlyOptimal,
        .image = image.GetImage(),
        .subresourceRange = {
            .aspectMask = aspect,
            .baseMipLevel = 0,
            .levelCount = resources.levels,
            .baseArrayLayer = 0,
            .layerCount = resources.layers,
        },
    };
    cmdbuf.pipelineBarrier2(vk::DependencyInfo{
        .imageMemoryBarrierCount = 1,
        .pImageMemoryBarriers = &post_barrier,
    });

    // 更新图像缓存的布局状态
    image.backing->state.layout = vk::ImageLayout::eShaderReadOnlyOptimal;
}

} // namespace Vulkan

@wuguo13842 commented on GitHub (Feb 19, 2026): **### Insufficient capacity to solve problems，It cannot be solved** image.h ` bool generated_mip_chain = false;` ``` // SPDX-FileCopyrightText: Copyright 2024 shadPS4 Emulator Project // SPDX-License-Identifier: GPL-2.0-or-later #pragma once #include "common/recursive_lock.h" #include "common/shared_first_mutex.h" #include "video_core/buffer_cache/buffer_cache.h" #include "video_core/page_manager.h" #include "video_core/renderer_vulkan/vk_pipeline_cache.h" #include "video_core/texture_cache/texture_cache.h" namespace AmdGpu { struct Liverpool; } namespace Core { class MemoryManager; } namespace Vulkan { class Scheduler; class RenderState; class GraphicsPipeline; class Rasterizer { public: explicit Rasterizer(const Instance& instance, Scheduler& scheduler, AmdGpu::Liverpool* liverpool); ~Rasterizer(); [[nodiscard]] Scheduler& GetScheduler() noexcept { return scheduler; } [[nodiscard]] VideoCore::BufferCache& GetBufferCache() noexcept { return buffer_cache; } [[nodiscard]] VideoCore::TextureCache& GetTextureCache() noexcept { return texture_cache; } void Draw(bool is_indexed, u32 index_offset = 0); void DrawIndirect(bool is_indexed, VAddr arg_address, u32 offset, u32 stride, u32 max_count, VAddr count_address); void DispatchDirect(); void DispatchIndirect(VAddr address, u32 offset, u32 size); void ScopeMarkerBegin(const std::string_view& str, bool from_guest = false); void ScopeMarkerEnd(bool from_guest = false); void ScopedMarkerInsert(const std::string_view& str, bool from_guest = false); void ScopedMarkerInsertColor(const std::string_view& str, const u32 color, bool from_guest = false); void FillBuffer(VAddr address, u32 num_bytes, u32 value, bool is_gds); void CopyBuffer(VAddr dst, VAddr src, u32 num_bytes, bool dst_gds, bool src_gds); u32 ReadDataFromGds(u32 gsd_offset); bool InvalidateMemory(VAddr addr, u64 size); bool ReadMemory(VAddr addr, u64 size); bool IsMapped(VAddr addr, u64 size); void MapMemory(VAddr addr, u64 size); void UnmapMemory(VAddr addr, u64 size); void CpSync(); u64 Flush(); void Finish(); void OnSubmit(); PipelineCache& GetPipelineCache() { return pipeline_cache; } template <typename Func> void ForEachMappedRangeInRange(VAddr addr, u64 size, Func&& func) { const auto range = decltype(mapped_ranges)::interval_type::right_open(addr, addr + size); Common::RecursiveSharedLock lock{mapped_ranges_mutex}; for (const auto& mapped_range : (mapped_ranges & range)) { func(mapped_range); } } private: void PrepareRenderState(const GraphicsPipeline* pipeline); RenderState BeginRendering(const GraphicsPipeline* pipeline); void Resolve(); void DepthStencilCopy(bool is_depth, bool is_stencil); void EliminateFastClear(); void UpdateDynamicState(const GraphicsPipeline* pipeline, bool is_indexed) const; void UpdateViewportScissorState() const; void UpdateDepthStencilState() const; void UpdatePrimitiveState(bool is_indexed) const; void UpdateRasterizationState() const; void UpdateColorBlendingState(const GraphicsPipeline* pipeline) const; bool FilterDraw(); void BindBuffers(const Shader::Info& stage, Shader::Backend::Bindings& binding, Shader::PushData& push_data); void BindTextures(const Shader::Info& stage, Shader::Backend::Bindings& binding); bool BindResources(const Pipeline* pipeline); void ResetBindings() { for (auto& image_id : bound_images) { texture_cache.GetImage(image_id).binding = {}; } bound_images.clear(); } bool IsComputeMetaClear(const Pipeline* pipeline); bool IsComputeImageCopy(const Pipeline* pipeline); bool IsComputeImageClear(const Pipeline* pipeline); // --- 新增：为特定计算着色器生成 mip 链 --- void GenerateMipChainForWrittenImages(const Shader::Info& cs_info); void GenerateMipChainForImage(VideoCore::Image& image); private: friend class VideoCore::BufferCache; const Instance& instance; Scheduler& scheduler; VideoCore::PageManager page_manager; VideoCore::BufferCache buffer_cache; VideoCore::TextureCache texture_cache; AmdGpu::Liverpool* liverpool; Core::MemoryManager* memory; boost::icl::interval_set<VAddr> mapped_ranges; Common::SharedFirstMutex mapped_ranges_mutex; PipelineCache pipeline_cache; using RenderTargetInfo = std::pair<VideoCore::ImageId, VideoCore::TextureCache::ImageDesc>; std::array<RenderTargetInfo, AmdGpu::NUM_COLOR_BUFFERS> cb_descs; std::pair<VideoCore::ImageId, VideoCore::TextureCache::ImageDesc> db_desc; boost::container::static_vector<vk::DescriptorImageInfo, Shader::NUM_IMAGES> image_infos; boost::container::static_vector<vk::DescriptorBufferInfo, Shader::NUM_BUFFERS> buffer_infos; boost::container::static_vector<VideoCore::ImageId, Shader::NUM_IMAGES> bound_images; Pipeline::DescriptorWrites set_writes; Pipeline::BufferBarriers buffer_barriers; Shader::PushData push_data; using BufferBindingInfo = std::tuple<VideoCore::BufferId, AmdGpu::Buffer, u64>; boost::container::static_vector<BufferBindingInfo, Shader::NUM_BUFFERS> buffer_bindings; using ImageBindingInfo = std::pair<VideoCore::ImageId, VideoCore::TextureCache::ImageDesc>; boost::container::static_vector<ImageBindingInfo, Shader::NUM_IMAGES> image_bindings; bool fault_process_pending{}; bool attachment_feedback_loop{}; }; } // namespace Vulkan ``` ``` // SPDX-FileCopyrightText: Copyright 2024 shadPS4 Emulator Project // SPDX-License-Identifier: GPL-2.0-or-later #include "common/config.h" #include "common/debug.h" #include "core/memory.h" #include "shader_recompiler/runtime_info.h" #include "video_core/amdgpu/liverpool.h" #include "video_core/renderer_vulkan/liverpool_to_vk.h" #include "video_core/renderer_vulkan/vk_instance.h" #include "video_core/renderer_vulkan/vk_rasterizer.h" #include "video_core/renderer_vulkan/vk_scheduler.h" #include "video_core/renderer_vulkan/vk_shader_hle.h" #include "video_core/texture_cache/image_view.h" #include "video_core/texture_cache/texture_cache.h" #ifdef MemoryBarrier #undef MemoryBarrier #endif namespace Vulkan { static Shader::PushData MakeUserData(const AmdGpu::Regs& regs) { // TODO(roamic): Add support for multiple viewports and geometry shaders when ViewportIndex // is encountered and implemented in the recompiler. Shader::PushData push_data{}; push_data.xoffset = regs.viewport_control.xoffset_enable ? regs.viewports[0].xoffset : 0.f; push_data.xscale = regs.viewport_control.xscale_enable ? regs.viewports[0].xscale : 1.f; push_data.yoffset = regs.viewport_control.yoffset_enable ? regs.viewports[0].yoffset : 0.f; push_data.yscale = regs.viewport_control.yscale_enable ? regs.viewports[0].yscale : 1.f; return push_data; } Rasterizer::Rasterizer(const Instance& instance_, Scheduler& scheduler_, AmdGpu::Liverpool* liverpool_) : instance{instance_}, scheduler{scheduler_}, page_manager{this}, buffer_cache{instance, scheduler, liverpool_, texture_cache, page_manager}, texture_cache{instance, scheduler, liverpool_, buffer_cache, page_manager}, liverpool{liverpool_}, memory{Core::Memory::Instance()}, pipeline_cache{instance, scheduler, liverpool} { if (!Config::nullGpu()) { liverpool->BindRasterizer(this); } memory->SetRasterizer(this); } Rasterizer::~Rasterizer() = default; void Rasterizer::CpSync() { scheduler.EndRendering(); auto cmdbuf = scheduler.CommandBuffer(); const vk::MemoryBarrier ib_barrier{ .srcAccessMask = vk::AccessFlagBits::eShaderWrite, .dstAccessMask = vk::AccessFlagBits::eIndirectCommandRead, }; cmdbuf.pipelineBarrier(vk::PipelineStageFlagBits::eComputeShader, vk::PipelineStageFlagBits::eDrawIndirect, vk::DependencyFlagBits::eByRegion, ib_barrier, {}, {}); } bool Rasterizer::FilterDraw() { const auto& regs = liverpool->regs; if (regs.color_control.mode == AmdGpu::ColorControl::OperationMode::EliminateFastClear) { // Clears the render target if FCE is launched before any draws EliminateFastClear(); return false; } if (regs.color_control.mode == AmdGpu::ColorControl::OperationMode::FmaskDecompress) { // TODO: check for a valid MRT1 to promote the draw to the resolve pass. LOG_TRACE(Render_Vulkan, "FMask decompression pass skipped"); ScopedMarkerInsert("FmaskDecompress"); return false; } if (regs.color_control.mode == AmdGpu::ColorControl::OperationMode::Resolve) { LOG_TRACE(Render_Vulkan, "Resolve pass"); Resolve(); return false; } if (regs.primitive_type == AmdGpu::PrimitiveType::None) { LOG_TRACE(Render_Vulkan, "Primitive type 'None' skipped"); ScopedMarkerInsert("PrimitiveTypeNone"); return false; } const bool cb_disabled = regs.color_control.mode == AmdGpu::ColorControl::OperationMode::Disable; const auto depth_copy = regs.depth_render_override.force_z_dirty && regs.depth_render_override.force_z_valid && regs.depth_buffer.DepthValid() && regs.depth_buffer.DepthWriteValid() && regs.depth_buffer.DepthAddress() != regs.depth_buffer.DepthWriteAddress(); const auto stencil_copy = regs.depth_render_override.force_stencil_dirty && regs.depth_render_override.force_stencil_valid && regs.depth_buffer.StencilValid() && regs.depth_buffer.StencilWriteValid() && regs.depth_buffer.StencilAddress() != regs.depth_buffer.StencilWriteAddress(); if (cb_disabled && (depth_copy || stencil_copy)) { // Games may disable color buffer and enable force depth/stencil dirty and valid to // do a copy from one depth-stencil surface to another, without a pixel shader. // We need to detect this case and perform the copy, otherwise it will have no effect. LOG_TRACE(Render_Vulkan, "Performing depth-stencil override copy"); DepthStencilCopy(depth_copy, stencil_copy); return false; } return true; } void Rasterizer::PrepareRenderState(const GraphicsPipeline* pipeline) { // Prefetch render targets to handle overlaps with bound textures (e.g. mipgen) const auto& key = pipeline->GetGraphicsKey(); const auto& regs = liverpool->regs; if (regs.color_control.degamma_enable) { LOG_WARNING(Render_Vulkan, "Color buffers require gamma correction"); } const bool skip_cb_binding = regs.color_control.mode == AmdGpu::ColorControl::OperationMode::Disable; for (s32 cb = 0; cb < std::bit_width(key.mrt_mask); ++cb) { auto& [image_id, desc] = cb_descs[cb]; const auto& col_buf = regs.color_buffers[cb]; const u32 target_mask = regs.color_target_mask.GetMask(cb); if (skip_cb_binding || !col_buf || !target_mask || (key.mrt_mask & (1 << cb)) == 0) { image_id = {}; continue; } const auto& hint = liverpool->last_cb_extent[cb]; std::construct_at(&desc, col_buf, hint); image_id = bound_images.emplace_back(texture_cache.FindImage(desc)); auto& image = texture_cache.GetImage(image_id); image.binding.is_target = 1u; } if ((regs.depth_control.depth_enable && regs.depth_buffer.DepthValid()) || (regs.depth_control.stencil_enable && regs.depth_buffer.StencilValid())) { const auto htile_address = regs.depth_htile_data_base.GetAddress(); const auto& hint = liverpool->last_db_extent; auto& [image_id, desc] = db_desc; std::construct_at(&desc, regs.depth_buffer, regs.depth_view, regs.depth_control, htile_address, hint); image_id = bound_images.emplace_back(texture_cache.FindImage(desc)); auto& image = texture_cache.GetImage(image_id); image.binding.is_target = 1u; } else { db_desc.first = {}; } } static std::pair<u32, u32> GetDrawOffsets( const AmdGpu::Regs& regs, const Shader::Info& info, const std::optional<Shader::Gcn::FetchShaderData>& fetch_shader) { u32 vertex_offset = regs.index_offset; u32 instance_offset = 0; if (fetch_shader) { if (vertex_offset == 0 && fetch_shader->vertex_offset_sgpr != -1) { vertex_offset = info.user_data[fetch_shader->vertex_offset_sgpr]; } if (fetch_shader->instance_offset_sgpr != -1) { instance_offset = info.user_data[fetch_shader->instance_offset_sgpr]; } } return {vertex_offset, instance_offset}; } void Rasterizer::EliminateFastClear() { auto& col_buf = liverpool->regs.color_buffers[0]; if (!col_buf || !col_buf.info.fast_clear) { return; } VideoCore::TextureCache::ImageDesc desc(col_buf, liverpool->last_cb_extent[0]); const auto image_id = texture_cache.FindImage(desc); const auto& image_view = texture_cache.FindRenderTarget(image_id, desc); if (!texture_cache.IsMetaCleared(col_buf.CmaskAddress(), col_buf.view.slice_start)) { return; } for (u32 slice = col_buf.view.slice_start; slice <= col_buf.view.slice_max; ++slice) { texture_cache.TouchMeta(col_buf.CmaskAddress(), slice, false); } auto& image = texture_cache.GetImage(image_id); const auto clear_value = LiverpoolToVK::ColorBufferClearValue(col_buf); ScopeMarkerBegin(fmt::format("EliminateFastClear:MRT={:#x}:M={:#x}", col_buf.Address(), col_buf.CmaskAddress())); image.Clear(clear_value, desc.view_info.range); ScopeMarkerEnd(); } void Rasterizer::Draw(bool is_indexed, u32 index_offset) { RENDERER_TRACE; scheduler.PopPendingOperations(); if (!FilterDraw()) { return; } const auto& regs = liverpool->regs; const GraphicsPipeline* pipeline = pipeline_cache.GetGraphicsPipeline(); if (!pipeline) { return; } PrepareRenderState(pipeline); if (!BindResources(pipeline)) { return; } const auto state = BeginRendering(pipeline); buffer_cache.BindVertexBuffers(*pipeline); if (is_indexed) { buffer_cache.BindIndexBuffer(index_offset); } pipeline->BindResources(set_writes, buffer_barriers, push_data); UpdateDynamicState(pipeline, is_indexed); scheduler.BeginRendering(state); const auto& vs_info = pipeline->GetStage(Shader::LogicalStage::Vertex); const auto& fetch_shader = pipeline->GetFetchShader(); const auto [vertex_offset, instance_offset] = GetDrawOffsets(regs, vs_info, fetch_shader); const auto cmdbuf = scheduler.CommandBuffer(); cmdbuf.bindPipeline(vk::PipelineBindPoint::eGraphics, pipeline->Handle()); if (is_indexed) { cmdbuf.drawIndexed(regs.num_indices, regs.num_instances.NumInstances(), 0, s32(vertex_offset), instance_offset); } else { cmdbuf.draw(regs.num_indices, regs.num_instances.NumInstances(), vertex_offset, instance_offset); } ResetBindings(); } void Rasterizer::DrawIndirect(bool is_indexed, VAddr arg_address, u32 offset, u32 stride, u32 max_count, VAddr count_address) { RENDERER_TRACE; scheduler.PopPendingOperations(); if (!FilterDraw()) { return; } const GraphicsPipeline* pipeline = pipeline_cache.GetGraphicsPipeline(); if (!pipeline) { return; } PrepareRenderState(pipeline); if (!BindResources(pipeline)) { return; } const auto state = BeginRendering(pipeline); buffer_cache.BindVertexBuffers(*pipeline); if (is_indexed) { buffer_cache.BindIndexBuffer(0); } const auto& [buffer, base] = buffer_cache.ObtainBuffer(arg_address + offset, stride * max_count, false); VideoCore::Buffer* count_buffer{}; u32 count_base{}; if (count_address != 0) { std::tie(count_buffer, count_base) = buffer_cache.ObtainBuffer(count_address, 4, false); } pipeline->BindResources(set_writes, buffer_barriers, push_data); UpdateDynamicState(pipeline, is_indexed); scheduler.BeginRendering(state); // We can safely ignore both SGPR UD indices and results of fetch shader parsing, as vertex and // instance offsets will be automatically applied by Vulkan from indirect args buffer. const auto cmdbuf = scheduler.CommandBuffer(); cmdbuf.bindPipeline(vk::PipelineBindPoint::eGraphics, pipeline->Handle()); if (is_indexed) { ASSERT(sizeof(VkDrawIndexedIndirectCommand) == stride); if (count_address != 0) { cmdbuf.drawIndexedIndirectCount(buffer->Handle(), base, count_buffer->Handle(), count_base, max_count, stride); } else { cmdbuf.drawIndexedIndirect(buffer->Handle(), base, max_count, stride); } } else { ASSERT(sizeof(VkDrawIndirectCommand) == stride); if (count_address != 0) { cmdbuf.drawIndirectCount(buffer->Handle(), base, count_buffer->Handle(), count_base, max_count, stride); } else { cmdbuf.drawIndirect(buffer->Handle(), base, max_count, stride); } } ResetBindings(); } void Rasterizer::DispatchDirect() { RENDERER_TRACE; scheduler.PopPendingOperations(); const auto& cs_program = liverpool->GetCsRegs(); const ComputePipeline* pipeline = pipeline_cache.GetComputePipeline(); if (!pipeline) { return; } const auto& cs = pipeline->GetStage(Shader::LogicalStage::Compute); if (ExecuteShaderHLE(cs, liverpool->regs, cs_program, *this)) { return; } if (!BindResources(pipeline)) { return; } scheduler.EndRendering(); pipeline->BindResources(set_writes, buffer_barriers, push_data); const auto cmdbuf = scheduler.CommandBuffer(); cmdbuf.bindPipeline(vk::PipelineBindPoint::eCompute, pipeline->Handle()); cmdbuf.dispatch(cs_program.dim_x, cs_program.dim_y, cs_program.dim_z); // --- 新增：为特定计算着色器生成 mip 链（仅当不支持 LOD 写入时）--- const auto& cs_info = pipeline->GetStage(Shader::LogicalStage::Compute); if (!pipeline_cache.GetProfile().supports_image_load_store_lod && cs_info.pgm_hash == 0x8503bcb7) { GenerateMipChainForWrittenImages(cs_info); } ResetBindings(); } void Rasterizer::DispatchIndirect(VAddr address, u32 offset, u32 size) { RENDERER_TRACE; scheduler.PopPendingOperations(); const auto& cs_program = liverpool->GetCsRegs(); const ComputePipeline* pipeline = pipeline_cache.GetComputePipeline(); if (!pipeline) { return; } if (!BindResources(pipeline)) { return; } const auto [buffer, base] = buffer_cache.ObtainBuffer(address + offset, size, false); scheduler.EndRendering(); pipeline->BindResources(set_writes, buffer_barriers, push_data); const auto cmdbuf = scheduler.CommandBuffer(); cmdbuf.bindPipeline(vk::PipelineBindPoint::eCompute, pipeline->Handle()); cmdbuf.dispatchIndirect(buffer->Handle(), base); ResetBindings(); } u64 Rasterizer::Flush() { const u64 current_tick = scheduler.CurrentTick(); SubmitInfo info{}; scheduler.Flush(info); return current_tick; } void Rasterizer::Finish() { scheduler.Finish(); } void Rasterizer::OnSubmit() { if (fault_process_pending) { fault_process_pending = false; buffer_cache.ProcessFaultBuffer(); } texture_cache.ProcessDownloadImages(); texture_cache.RunGarbageCollector(); buffer_cache.RunGarbageCollector(); } bool Rasterizer::BindResources(const Pipeline* pipeline) { if (IsComputeImageCopy(pipeline) || IsComputeMetaClear(pipeline) || IsComputeImageClear(pipeline)) { return false; } set_writes.clear(); buffer_barriers.clear(); buffer_infos.clear(); image_infos.clear(); bool uses_dma = false; // Bind resource buffers and textures. Shader::Backend::Bindings binding{}; push_data = MakeUserData(liverpool->regs); for (const auto* stage : pipeline->GetStages()) { if (!stage) { continue; } stage->PushUd(binding, push_data); BindBuffers(*stage, binding, push_data); BindTextures(*stage, binding); uses_dma |= stage->uses_dma; } if (uses_dma) { // We only use fault buffer for DMA right now. Common::RecursiveSharedLock lock{mapped_ranges_mutex}; for (auto& range : mapped_ranges) { buffer_cache.SynchronizeBuffersInRange(range.lower(), range.upper() - range.lower()); } fault_process_pending = true; } return true; } bool Rasterizer::IsComputeMetaClear(const Pipeline* pipeline) { if (!pipeline->IsCompute()) { return false; } // Most of the time when a metadata is updated with a shader it gets cleared. It means // we can skip the whole dispatch and update the tracked state instead. Also, it is not // intended to be consumed and in such rare cases (e.g. HTile introspection, CRAA) we // will need its full emulation anyways. const auto& info = pipeline->GetStage(Shader::LogicalStage::Compute); // Assume if a shader reads metadata, it is a copy shader. for (const auto& desc : info.buffers) { const VAddr address = desc.GetSharp(info).base_address; if (!desc.IsSpecial() && !desc.is_written && texture_cache.IsMeta(address)) { return false; } } // Metadata surfaces are tiled and thus need address calculation to be written properly. // If a shader wants to encode HTILE, for example, from a depth image it will have to compute // proper tile address from dispatch invocation id. This address calculation contains an xor // operation so use it as a heuristic for metadata writes that are probably not clears. if (!info.has_bitwise_xor) { // Assume if a shader writes metadata without address calculation, it is a clear shader. for (const auto& desc : info.buffers) { const VAddr address = desc.GetSharp(info).base_address; if (!desc.IsSpecial() && desc.is_written && texture_cache.ClearMeta(address)) { // Assume all slices were updates LOG_TRACE(Render_Vulkan, "Metadata update skipped"); return true; } } } return false; } bool Rasterizer::IsComputeImageCopy(const Pipeline* pipeline) { if (!pipeline->IsCompute()) { return false; } // Ensure shader only has 2 bound buffers const auto& cs_pgm = liverpool->GetCsRegs(); const auto& info = pipeline->GetStage(Shader::LogicalStage::Compute); if (cs_pgm.num_thread_x.full != 64 || info.buffers.size() != 2 || !info.images.empty()) { return false; } // Those 2 buffers must both be formatted. One must be source and another destination. const auto& desc0 = info.buffers[0]; const auto& desc1 = info.buffers[1]; if (!desc0.is_formatted || !desc1.is_formatted || desc0.is_written == desc1.is_written) { return false; } // Buffers must have the same size and each thread of the dispatch must copy 1 dword of data const AmdGpu::Buffer buf0 = desc0.GetSharp(info); const AmdGpu::Buffer buf1 = desc1.GetSharp(info); if (buf0.GetSize() != buf1.GetSize() || cs_pgm.dim_x != (buf0.GetSize() / 256)) { return false; } // Find images the buffer alias const auto image0_id = texture_cache.FindImageFromRange(buf0.base_address, buf0.GetSize()); if (!image0_id) { return false; } const auto image1_id = texture_cache.FindImageFromRange(buf1.base_address, buf1.GetSize(), false); if (!image1_id) { return false; } // Image copy must be valid VideoCore::Image& image0 = texture_cache.GetImage(image0_id); VideoCore::Image& image1 = texture_cache.GetImage(image1_id); if (image0.info.guest_size != image1.info.guest_size || image0.info.pitch != image1.info.pitch || image0.info.guest_size != buf0.GetSize() || image0.info.num_bits != image1.info.num_bits) { return false; } // Perform image copy VideoCore::Image& src_image = desc0.is_written ? image1 : image0; VideoCore::Image& dst_image = desc0.is_written ? image0 : image1; if (instance.IsMaintenance8Supported() || src_image.info.props.is_depth == dst_image.info.props.is_depth) { dst_image.CopyImage(src_image); } else { const auto& copy_buffer = buffer_cache.GetUtilityBuffer(VideoCore::MemoryUsage::DeviceLocal); dst_image.CopyImageWithBuffer(src_image, copy_buffer.Handle(), 0); } dst_image.flags |= VideoCore::ImageFlagBits::GpuModified; dst_image.flags &= ~VideoCore::ImageFlagBits::Dirty; return true; } bool Rasterizer::IsComputeImageClear(const Pipeline* pipeline) { if (!pipeline->IsCompute()) { return false; } // Ensure shader only has 2 bound buffers const auto& cs_pgm = liverpool->GetCsRegs(); const auto& info = pipeline->GetStage(Shader::LogicalStage::Compute); if (cs_pgm.num_thread_x.full != 64 || info.buffers.size() != 2 || !info.images.empty()) { return false; } // From those 2 buffers, first must hold the clear vector and second the image being cleared const auto& desc0 = info.buffers[0]; const auto& desc1 = info.buffers[1]; if (desc0.is_formatted || !desc1.is_formatted || desc0.is_written || !desc1.is_written) { return false; } // First buffer must have size of vec4 and second the size of a single layer const AmdGpu::Buffer buf0 = desc0.GetSharp(info); const AmdGpu::Buffer buf1 = desc1.GetSharp(info); const u32 buf1_bpp = AmdGpu::NumBitsPerBlock(buf1.GetDataFmt()); if (buf0.GetSize() != 16 || (cs_pgm.dim_x * 128ULL * (buf1_bpp / 8)) != buf1.GetSize()) { return false; } // Find image the buffer alias const auto image1_id = texture_cache.FindImageFromRange(buf1.base_address, buf1.GetSize(), false); if (!image1_id) { return false; } // Image clear must be valid VideoCore::Image& image1 = texture_cache.GetImage(image1_id); if (image1.info.guest_size != buf1.GetSize() || image1.info.num_bits != buf1_bpp || image1.info.props.is_depth) { return false; } // Perform image clear const float* values = reinterpret_cast<float*>(buf0.base_address); const vk::ClearValue clear = { .color = {.float32 = std::array<float, 4>{values[0], values[1], values[2], values[3]}}, }; const VideoCore::SubresourceRange range = { .base = { .level = 0, .layer = 0, }, .extent = image1.info.resources, }; image1.Clear(clear, range); image1.flags |= VideoCore::ImageFlagBits::GpuModified; image1.flags &= ~VideoCore::ImageFlagBits::Dirty; return true; } void Rasterizer::BindBuffers(const Shader::Info& stage, Shader::Backend::Bindings& binding, Shader::PushData& push_data) { buffer_bindings.clear(); for (const auto& desc : stage.buffers) { const auto vsharp = desc.GetSharp(stage); if (!desc.IsSpecial() && vsharp.base_address != 0 && vsharp.GetSize() > 0) { const u64 size = memory->ClampRangeSize(vsharp.base_address, vsharp.GetSize()); const auto buffer_id = buffer_cache.FindBuffer(vsharp.base_address, size); buffer_bindings.emplace_back(buffer_id, vsharp, size); } else { buffer_bindings.emplace_back(VideoCore::BufferId{}, vsharp, 0); } } // Second pass to re-bind buffers that were updated after binding for (u32 i = 0; i < buffer_bindings.size(); i++) { const auto& [buffer_id, vsharp, size] = buffer_bindings[i]; const auto& desc = stage.buffers[i]; const bool is_storage = desc.IsStorage(vsharp); const u32 alignment = is_storage ? instance.StorageMinAlignment() : instance.UniformMinAlignment(); // Buffer is not from the cache, either a special buffer or unbound. if (!buffer_id) { if (desc.buffer_type == Shader::BufferType::GdsBuffer) { const auto* gds_buf = buffer_cache.GetGdsBuffer(); buffer_infos.emplace_back(gds_buf->Handle(), 0, gds_buf->SizeBytes()); } else if (desc.buffer_type == Shader::BufferType::Flatbuf) { auto& vk_buffer = buffer_cache.GetUtilityBuffer(VideoCore::MemoryUsage::Stream); const u32 ubo_size = stage.flattened_ud_buf.size() * sizeof(u32); const u64 offset = vk_buffer.Copy(stage.flattened_ud_buf.data(), ubo_size, alignment); buffer_infos.emplace_back(vk_buffer.Handle(), offset, ubo_size); } else if (desc.buffer_type == Shader::BufferType::BdaPagetable) { const auto* bda_buffer = buffer_cache.GetBdaPageTableBuffer(); buffer_infos.emplace_back(bda_buffer->Handle(), 0, bda_buffer->SizeBytes()); } else if (desc.buffer_type == Shader::BufferType::FaultBuffer) { const auto* fault_buffer = buffer_cache.GetFaultBuffer(); buffer_infos.emplace_back(fault_buffer->Handle(), 0, fault_buffer->SizeBytes()); } else if (desc.buffer_type == Shader::BufferType::SharedMemory) { auto& lds_buffer = buffer_cache.GetUtilityBuffer(VideoCore::MemoryUsage::Stream); const auto& cs_program = liverpool->GetCsRegs(); const auto lds_size = cs_program.SharedMemSize() * cs_program.NumWorkgroups(); const auto [data, offset] = lds_buffer.Map(lds_size, alignment); std::memset(data, 0, lds_size); buffer_infos.emplace_back(lds_buffer.Handle(), offset, lds_size); } else if (instance.IsNullDescriptorSupported()) { buffer_infos.emplace_back(VK_NULL_HANDLE, 0, VK_WHOLE_SIZE); } else { auto& null_buffer = buffer_cache.GetBuffer(VideoCore::NULL_BUFFER_ID); buffer_infos.emplace_back(null_buffer.Handle(), 0, VK_WHOLE_SIZE); } } else { const auto [vk_buffer, offset] = buffer_cache.ObtainBuffer( vsharp.base_address, size, desc.is_written, desc.is_formatted, buffer_id); const u32 offset_aligned = Common::AlignDown(offset, alignment); const u32 adjust = offset - offset_aligned; ASSERT(adjust % 4 == 0); push_data.AddOffset(binding.buffer, adjust); buffer_infos.emplace_back(vk_buffer->Handle(), offset_aligned, size + adjust); if (auto barrier = vk_buffer->GetBarrier(desc.is_written ? vk::AccessFlagBits2::eShaderWrite : vk::AccessFlagBits2::eShaderRead, vk::PipelineStageFlagBits2::eAllCommands)) { buffer_barriers.emplace_back(*barrier); } if (desc.is_written && desc.is_formatted) { texture_cache.InvalidateMemoryFromGPU(vsharp.base_address, size); } } set_writes.push_back({ .dstSet = VK_NULL_HANDLE, .dstBinding = binding.unified++, .dstArrayElement = 0, .descriptorCount = 1, .descriptorType = is_storage ? vk::DescriptorType::eStorageBuffer : vk::DescriptorType::eUniformBuffer, .pBufferInfo = &buffer_infos.back(), }); ++binding.buffer; } } void Rasterizer::BindTextures(const Shader::Info& stage, Shader::Backend::Bindings& binding) { image_bindings.clear(); for (const auto& image_desc : stage.images) { const auto tsharp = image_desc.GetSharp(stage); if (texture_cache.IsMeta(tsharp.Address())) { LOG_WARNING(Render_Vulkan, "Unexpected metadata read by a shader (texture)"); } if (tsharp.GetDataFmt() == AmdGpu::DataFormat::FormatInvalid) { image_bindings.emplace_back(std::piecewise_construct, std::tuple{}, std::tuple{}); continue; } auto& [image_id, desc] = image_bindings.emplace_back(std::piecewise_construct, std::tuple{}, std::tuple{tsharp, image_desc}); image_id = texture_cache.FindImage(desc); auto* image = &texture_cache.GetImage(image_id); if (image->depth_id) { // If this image has an associated depth image, it's a stencil attachment. // Redirect the access to the actual depth-stencil buffer. image_id = image->depth_id; image = &texture_cache.GetImage(image_id); } if (image->binding.is_bound) { // The image is already bound. In case if it is about to be used as storage we need // to force general layout on it. image->binding.force_general |= image_desc.is_written; } image->binding.is_bound = 1u; } // Second pass to re-bind images that were updated after binding for (auto& [image_id, desc] : image_bindings) { bool is_storage = desc.type == VideoCore::TextureCache::BindingType::Storage; if (!image_id) { if (instance.IsNullDescriptorSupported()) { image_infos.emplace_back(VK_NULL_HANDLE, VK_NULL_HANDLE, vk::ImageLayout::eGeneral); } else { auto& null_image_view = texture_cache.FindTexture(VideoCore::NULL_IMAGE_ID, desc); image_infos.emplace_back(VK_NULL_HANDLE, *null_image_view.image_view, vk::ImageLayout::eGeneral); } } else { if (auto& old_image = texture_cache.GetImage(image_id); old_image.binding.needs_rebind) { old_image.binding = {}; image_id = texture_cache.FindImage(desc); } bound_images.emplace_back(image_id); auto& image = texture_cache.GetImage(image_id); auto& image_view = texture_cache.FindTexture(image_id, desc); // The image is either bound as storage in a separate descriptor or bound as render // target in feedback loop. Depth images are excluded because they can't be bound as // storage and feedback loop doesn't make sense for them if ((image.binding.force_general || image.binding.is_target) && !image.info.props.is_depth) { image.Transit(instance.IsAttachmentFeedbackLoopLayoutSupported() && image.binding.is_target ? vk::ImageLayout::eAttachmentFeedbackLoopOptimalEXT : vk::ImageLayout::eGeneral, vk::AccessFlagBits2::eShaderRead | (image.info.props.is_depth ? vk::AccessFlagBits2::eDepthStencilAttachmentWrite : vk::AccessFlagBits2::eColorAttachmentWrite), {}); } else { if (is_storage) { image.Transit(vk::ImageLayout::eGeneral, vk::AccessFlagBits2::eShaderRead | vk::AccessFlagBits2::eShaderWrite, desc.view_info.range); } else { const auto new_layout = image.info.props.is_depth ? vk::ImageLayout::eDepthStencilReadOnlyOptimal : vk::ImageLayout::eShaderReadOnlyOptimal; image.Transit(new_layout, vk::AccessFlagBits2::eShaderRead, desc.view_info.range); } } image.usage.storage |= is_storage; image.usage.texture |= !is_storage; image_infos.emplace_back(VK_NULL_HANDLE, *image_view.image_view, image.backing->state.layout); } set_writes.push_back({ .dstSet = VK_NULL_HANDLE, .dstBinding = binding.unified++, .dstArrayElement = 0, .descriptorCount = 1, .descriptorType = is_storage ? vk::DescriptorType::eStorageImage : vk::DescriptorType::eSampledImage, .pImageInfo = &image_infos.back(), }); } for (const auto& sampler : stage.samplers) { auto ssharp = sampler.GetSharp(stage); if (sampler.disable_aniso) { const auto& tsharp = stage.images[sampler.associated_image].GetSharp(stage); if (tsharp.base_level == 0 && tsharp.last_level == 0) { ssharp.max_aniso.Assign(AmdGpu::AnisoRatio::One); } } const auto vk_sampler = texture_cache.GetSampler(ssharp, liverpool->regs.ta_bc_base); image_infos.emplace_back(vk_sampler, VK_NULL_HANDLE, vk::ImageLayout::eGeneral); set_writes.push_back({ .dstSet = VK_NULL_HANDLE, .dstBinding = binding.unified++, .dstArrayElement = 0, .descriptorCount = 1, .descriptorType = vk::DescriptorType::eSampler, .pImageInfo = &image_infos.back(), }); } } RenderState Rasterizer::BeginRendering(const GraphicsPipeline* pipeline) { attachment_feedback_loop = false; const auto& regs = liverpool->regs; const auto& key = pipeline->GetGraphicsKey(); RenderState state; state.width = instance.GetMaxFramebufferWidth(); state.height = instance.GetMaxFramebufferHeight(); state.num_layers = std::numeric_limits<u32>::max(); state.num_color_attachments = std::bit_width(key.mrt_mask); for (auto cb = 0u; cb < state.num_color_attachments; ++cb) { auto& [image_id, desc] = cb_descs[cb]; if (!image_id) { continue; } auto* image = &texture_cache.GetImage(image_id); if (image->binding.needs_rebind) { image_id = bound_images.emplace_back(texture_cache.FindImage(desc)); image = &texture_cache.GetImage(image_id); } texture_cache.UpdateImage(image_id); image->SetBackingSamples(key.color_samples[cb]); const auto& image_view = texture_cache.FindRenderTarget(image_id, desc); const auto slice = image_view.info.range.base.layer; const auto mip = image_view.info.range.base.level; const auto& col_buf = regs.color_buffers[cb]; const bool is_clear = texture_cache.IsMetaCleared(col_buf.CmaskAddress(), slice); texture_cache.TouchMeta(col_buf.CmaskAddress(), slice, false); if (image->binding.is_bound) { ASSERT_MSG(!image->binding.force_general, "Having image both as storage and render target is unsupported"); image->Transit(instance.IsAttachmentFeedbackLoopLayoutSupported() ? vk::ImageLayout::eAttachmentFeedbackLoopOptimalEXT : vk::ImageLayout::eGeneral, vk::AccessFlagBits2::eColorAttachmentWrite, {}); attachment_feedback_loop = true; } else { image->Transit(vk::ImageLayout::eColorAttachmentOptimal, vk::AccessFlagBits2::eColorAttachmentWrite | vk::AccessFlagBits2::eColorAttachmentRead, desc.view_info.range); } state.width = std::min<u32>(state.width, std::max(image->info.size.width >> mip, 1u)); state.height = std::min<u32>(state.height, std::max(image->info.size.height >> mip, 1u)); state.num_layers = std::min<u32>(state.num_layers, image_view.info.range.extent.layers); state.color_attachments[cb] = { .imageView = *image_view.image_view, .imageLayout = image->backing->state.layout, .loadOp = is_clear ? vk::AttachmentLoadOp::eClear : vk::AttachmentLoadOp::eLoad, .storeOp = vk::AttachmentStoreOp::eStore, .clearValue = is_clear ? LiverpoolToVK::ColorBufferClearValue(col_buf) : vk::ClearValue{}, }; image->usage.render_target = 1u; } if (auto image_id = db_desc.first; image_id) { auto& desc = db_desc.second; const auto htile_address = regs.depth_htile_data_base.GetAddress(); const auto& image_view = texture_cache.FindDepthTarget(image_id, desc); auto& image = texture_cache.GetImage(image_id); const auto slice = image_view.info.range.base.layer; const bool is_depth_clear = regs.depth_render_control.depth_clear_enable || texture_cache.IsMetaCleared(htile_address, slice); const bool is_stencil_clear = regs.depth_render_control.stencil_clear_enable; texture_cache.TouchMeta(htile_address, slice, false); ASSERT(desc.view_info.range.extent.levels == 1 && !image.binding.needs_rebind); const bool has_stencil = image.info.props.has_stencil; const auto new_layout = desc.view_info.is_storage ? has_stencil ? vk::ImageLayout::eDepthStencilAttachmentOptimal : vk::ImageLayout::eDepthAttachmentOptimal : has_stencil ? vk::ImageLayout::eDepthStencilReadOnlyOptimal : vk::ImageLayout::eDepthReadOnlyOptimal; image.Transit(new_layout, vk::AccessFlagBits2::eDepthStencilAttachmentWrite | vk::AccessFlagBits2::eDepthStencilAttachmentRead, desc.view_info.range); state.width = std::min<u32>(state.width, image.info.size.width); state.height = std::min<u32>(state.height, image.info.size.height); state.has_depth = regs.depth_buffer.DepthValid(); state.has_stencil = regs.depth_buffer.StencilValid(); state.num_layers = std::min<u32>(state.num_layers, image_view.info.range.extent.layers); if (state.has_depth) { state.depth_attachment = { .imageView = *image_view.image_view, .imageLayout = image.backing->state.layout, .loadOp = is_depth_clear ? vk::AttachmentLoadOp::eClear : vk::AttachmentLoadOp::eLoad, .storeOp = vk::AttachmentStoreOp::eStore, .clearValue = vk::ClearValue{.depthStencil = {.depth = regs.depth_clear}}, }; } if (state.has_stencil) { state.stencil_attachment = { .imageView = *image_view.image_view, .imageLayout = image.backing->state.layout, .loadOp = is_stencil_clear ? vk::AttachmentLoadOp::eClear : vk::AttachmentLoadOp::eLoad, .storeOp = vk::AttachmentStoreOp::eStore, .clearValue = vk::ClearValue{.depthStencil = {.stencil = regs.stencil_clear}}, }; } image.usage.depth_target = true; } if (state.num_layers == std::numeric_limits<u32>::max()) { state.num_layers = 1; } return state; } void Rasterizer::Resolve() { const auto& mrt0_hint = liverpool->last_cb_extent[0]; const auto& mrt1_hint = liverpool->last_cb_extent[1]; VideoCore::TextureCache::ImageDesc mrt0_desc{liverpool->regs.color_buffers[0], mrt0_hint}; VideoCore::TextureCache::ImageDesc mrt1_desc{liverpool->regs.color_buffers[1], mrt1_hint}; auto& mrt0_image = texture_cache.GetImage(texture_cache.FindImage(mrt0_desc, true)); auto& mrt1_image = texture_cache.GetImage(texture_cache.FindImage(mrt1_desc, true)); ScopeMarkerBegin(fmt::format("Resolve:MRT0={:#x}:MRT1={:#x}", liverpool->regs.color_buffers[0].Address(), liverpool->regs.color_buffers[1].Address())); mrt1_image.Resolve(mrt0_image, mrt0_desc.view_info.range, mrt1_desc.view_info.range); ScopeMarkerEnd(); } void Rasterizer::DepthStencilCopy(bool is_depth, bool is_stencil) { auto& regs = liverpool->regs; auto read_desc = VideoCore::TextureCache::ImageDesc( regs.depth_buffer, regs.depth_view, regs.depth_control, regs.depth_htile_data_base.GetAddress(), liverpool->last_db_extent, false); auto write_desc = VideoCore::TextureCache::ImageDesc( regs.depth_buffer, regs.depth_view, regs.depth_control, regs.depth_htile_data_base.GetAddress(), liverpool->last_db_extent, true); auto& read_image = texture_cache.GetImage(texture_cache.FindImage(read_desc)); auto& write_image = texture_cache.GetImage(texture_cache.FindImage(write_desc)); VideoCore::SubresourceRange sub_range; sub_range.base.layer = liverpool->regs.depth_view.slice_start; sub_range.extent.layers = liverpool->regs.depth_view.NumSlices() - sub_range.base.layer; ScopeMarkerBegin(fmt::format( "DepthStencilCopy:DR={:#x}:SR={:#x}:DW={:#x}:SW={:#x}", regs.depth_buffer.DepthAddress(), regs.depth_buffer.StencilAddress(), regs.depth_buffer.DepthWriteAddress(), regs.depth_buffer.StencilWriteAddress())); read_image.Transit(vk::ImageLayout::eTransferSrcOptimal, vk::AccessFlagBits2::eTransferRead, sub_range); write_image.Transit(vk::ImageLayout::eTransferDstOptimal, vk::AccessFlagBits2::eTransferWrite, sub_range); auto aspect_mask = vk::ImageAspectFlags(0); if (is_depth) { aspect_mask |= vk::ImageAspectFlagBits::eDepth; } if (is_stencil) { aspect_mask |= vk::ImageAspectFlagBits::eStencil; } vk::ImageCopy region = { .srcSubresource = { .aspectMask = aspect_mask, .mipLevel = 0, .baseArrayLayer = sub_range.base.layer, .layerCount = sub_range.extent.layers, }, .srcOffset = {0, 0, 0}, .dstSubresource = { .aspectMask = aspect_mask, .mipLevel = 0, .baseArrayLayer = sub_range.base.layer, .layerCount = sub_range.extent.layers, }, .dstOffset = {0, 0, 0}, .extent = {write_image.info.size.width, write_image.info.size.height, 1}, }; scheduler.CommandBuffer().copyImage(read_image.GetImage(), vk::ImageLayout::eTransferSrcOptimal, write_image.GetImage(), vk::ImageLayout::eTransferDstOptimal, region); ScopeMarkerEnd(); } void Rasterizer::FillBuffer(VAddr address, u32 num_bytes, u32 value, bool is_gds) { buffer_cache.FillBuffer(address, num_bytes, value, is_gds); } void Rasterizer::CopyBuffer(VAddr dst, VAddr src, u32 num_bytes, bool dst_gds, bool src_gds) { buffer_cache.CopyBuffer(dst, src, num_bytes, dst_gds, src_gds); } u32 Rasterizer::ReadDataFromGds(u32 gds_offset) { auto* gds_buf = buffer_cache.GetGdsBuffer(); u32 value; std::memcpy(&value, gds_buf->mapped_data.data() + gds_offset, sizeof(u32)); return value; } bool Rasterizer::InvalidateMemory(VAddr addr, u64 size) { if (!IsMapped(addr, size)) { // Not GPU mapped memory, can skip invalidation logic entirely. return false; } buffer_cache.InvalidateMemory(addr, size); texture_cache.InvalidateMemory(addr, size); return true; } bool Rasterizer::ReadMemory(VAddr addr, u64 size) { if (!IsMapped(addr, size)) { // Not GPU mapped memory, can skip invalidation logic entirely. return false; } buffer_cache.ReadMemory(addr, size); return true; } bool Rasterizer::IsMapped(VAddr addr, u64 size) { if (size == 0) { // There is no memory, so not mapped. return false; } if (static_cast<u64>(addr) > std::numeric_limits<u64>::max() - size) { // Memory range wrapped the address space, cannot be mapped. return false; } const auto range = decltype(mapped_ranges)::interval_type::right_open(addr, addr + size); Common::RecursiveSharedLock lock{mapped_ranges_mutex}; return boost::icl::contains(mapped_ranges, range); } void Rasterizer::MapMemory(VAddr addr, u64 size) { { std::scoped_lock lock{mapped_ranges_mutex}; mapped_ranges += decltype(mapped_ranges)::interval_type::right_open(addr, addr + size); } page_manager.OnGpuMap(addr, size); } void Rasterizer::UnmapMemory(VAddr addr, u64 size) { buffer_cache.InvalidateMemory(addr, size); texture_cache.UnmapMemory(addr, size); page_manager.OnGpuUnmap(addr, size); { std::scoped_lock lock{mapped_ranges_mutex}; mapped_ranges -= decltype(mapped_ranges)::interval_type::right_open(addr, addr + size); } } void Rasterizer::UpdateDynamicState(const GraphicsPipeline* pipeline, const bool is_indexed) const { UpdateViewportScissorState(); UpdateDepthStencilState(); UpdatePrimitiveState(is_indexed); UpdateRasterizationState(); UpdateColorBlendingState(pipeline); auto& dynamic_state = scheduler.GetDynamicState(); dynamic_state.Commit(instance, scheduler.CommandBuffer()); } void Rasterizer::UpdateViewportScissorState() const { const auto& regs = liverpool->regs; const auto combined_scissor_value_tl = [](s16 scr, s16 win, s16 gen, s16 win_offset) { return std::max({scr, s16(win + win_offset), s16(gen + win_offset)}); }; const auto combined_scissor_value_br = [](s16 scr, s16 win, s16 gen, s16 win_offset) { return std::min({scr, s16(win + win_offset), s16(gen + win_offset)}); }; const bool enable_offset = !regs.window_scissor.window_offset_disable; AmdGpu::Scissor scsr{}; scsr.top_left_x = combined_scissor_value_tl( regs.screen_scissor.top_left_x, s16(regs.window_scissor.top_left_x), s16(regs.generic_scissor.top_left_x), enable_offset ? regs.window_offset.window_x_offset : 0); scsr.top_left_y = combined_scissor_value_tl( regs.screen_scissor.top_left_y, s16(regs.window_scissor.top_left_y), s16(regs.generic_scissor.top_left_y), enable_offset ? regs.window_offset.window_y_offset : 0); scsr.bottom_right_x = combined_scissor_value_br( regs.screen_scissor.bottom_right_x, regs.window_scissor.bottom_right_x, regs.generic_scissor.bottom_right_x, enable_offset ? regs.window_offset.window_x_offset : 0); scsr.bottom_right_y = combined_scissor_value_br( regs.screen_scissor.bottom_right_y, regs.window_scissor.bottom_right_y, regs.generic_scissor.bottom_right_y, enable_offset ? regs.window_offset.window_y_offset : 0); boost::container::static_vector<vk::Viewport, AmdGpu::NUM_VIEWPORTS> viewports; boost::container::static_vector<vk::Rect2D, AmdGpu::NUM_VIEWPORTS> scissors; if (regs.polygon_control.enable_window_offset && (regs.window_offset.window_x_offset != 0 || regs.window_offset.window_y_offset != 0)) { LOG_ERROR(Render_Vulkan, "PA_SU_SC_MODE_CNTL.VTX_WINDOW_OFFSET_ENABLE support is not yet implemented."); } const auto& vp_ctl = regs.viewport_control; for (u32 i = 0; i < AmdGpu::NUM_VIEWPORTS; i++) { const auto& vp = regs.viewports[i]; const auto& vp_d = regs.viewport_depths[i]; if (vp.xscale == 0) { continue; } const auto zoffset = vp_ctl.zoffset_enable ? vp.zoffset : 0.f; const auto zscale = vp_ctl.zscale_enable ? vp.zscale : 1.f; vk::Viewport viewport{}; // https://gitlab.freedesktop.org/mesa/mesa/-/blob/209a0ed/src/amd/vulkan/radv_pipeline_graphics.c#L688-689 // https://gitlab.freedesktop.org/mesa/mesa/-/blob/209a0ed/src/amd/vulkan/radv_cmd_buffer.c#L3103-3109 // When the clip space is ranged [-1...1], the zoffset is centered. // By reversing the above viewport calculations, we get the following: if (regs.clipper_control.clip_space == AmdGpu::ClipSpace::MinusWToW) { viewport.minDepth = zoffset - zscale; viewport.maxDepth = zoffset + zscale; } else { viewport.minDepth = zoffset; viewport.maxDepth = zoffset + zscale; } if (!instance.IsDepthRangeUnrestrictedSupported()) { // Unrestricted depth range not supported by device. Restrict to valid range. viewport.minDepth = std::max(viewport.minDepth, 0.f); viewport.maxDepth = std::min(viewport.maxDepth, 1.f); } if (regs.IsClipDisabled()) { // In case if clipping is disabled we patch the shader to convert vertex position // from screen space coordinates to NDC by defining a render space as full hardware // window range [0..16383, 0..16383] and setting the viewport to its size. viewport.x = 0.f; viewport.y = 0.f; viewport.width = float(std::min<u32>(instance.GetMaxViewportWidth(), 16_KB)); viewport.height = float(std::min<u32>(instance.GetMaxViewportHeight(), 16_KB)); } else { const auto xoffset = vp_ctl.xoffset_enable ? vp.xoffset : 0.f; const auto xscale = vp_ctl.xscale_enable ? vp.xscale : 1.f; const auto yoffset = vp_ctl.yoffset_enable ? vp.yoffset : 0.f; const auto yscale = vp_ctl.yscale_enable ? vp.yscale : 1.f; viewport.x = xoffset - xscale; viewport.y = yoffset - yscale; viewport.width = xscale * 2.0f; viewport.height = yscale * 2.0f; } viewports.push_back(viewport); auto vp_scsr = scsr; if (regs.mode_control.vport_scissor_enable) { vp_scsr.top_left_x = std::max(vp_scsr.top_left_x, s16(regs.viewport_scissors[i].top_left_x)); vp_scsr.top_left_y = std::max(vp_scsr.top_left_y, s16(regs.viewport_scissors[i].top_left_y)); vp_scsr.bottom_right_x = std::min(AmdGpu::Scissor::Clamp(vp_scsr.bottom_right_x), regs.viewport_scissors[i].bottom_right_x); vp_scsr.bottom_right_y = std::min(AmdGpu::Scissor::Clamp(vp_scsr.bottom_right_y), regs.viewport_scissors[i].bottom_right_y); } scissors.push_back({ .offset = {vp_scsr.top_left_x, vp_scsr.top_left_y}, .extent = {vp_scsr.GetWidth(), vp_scsr.GetHeight()}, }); } if (viewports.empty()) { // Vulkan requires providing at least one viewport. constexpr vk::Viewport empty_viewport = { .x = -1.0f, .y = -1.0f, .width = 1.0f, .height = 1.0f, .minDepth = 0.0f, .maxDepth = 1.0f, }; constexpr vk::Rect2D empty_scissor = { .offset = {0, 0}, .extent = {1, 1}, }; viewports.push_back(empty_viewport); scissors.push_back(empty_scissor); } auto& dynamic_state = scheduler.GetDynamicState(); dynamic_state.SetViewports(viewports); dynamic_state.SetScissors(scissors); } void Rasterizer::UpdateDepthStencilState() const { const auto& regs = liverpool->regs; auto& dynamic_state = scheduler.GetDynamicState(); const auto depth_test_enabled = regs.depth_control.depth_enable && regs.depth_buffer.DepthValid(); dynamic_state.SetDepthTestEnabled(depth_test_enabled); if (depth_test_enabled) { dynamic_state.SetDepthWriteEnabled(regs.depth_control.depth_write_enable && !regs.depth_render_control.depth_clear_enable); dynamic_state.SetDepthCompareOp(LiverpoolToVK::CompareOp(regs.depth_control.depth_func)); } const auto depth_bounds_test_enabled = regs.depth_control.depth_bounds_enable; dynamic_state.SetDepthBoundsTestEnabled(depth_bounds_test_enabled); if (depth_bounds_test_enabled) { dynamic_state.SetDepthBounds(regs.depth_bounds_min, regs.depth_bounds_max); } const auto depth_bias_enabled = regs.polygon_control.NeedsBias(); dynamic_state.SetDepthBiasEnabled(depth_bias_enabled); if (depth_bias_enabled) { const bool front = regs.polygon_control.enable_polygon_offset_front; dynamic_state.SetDepthBias( front ? regs.poly_offset.front_offset : regs.poly_offset.back_offset, regs.poly_offset.depth_bias, (front ? regs.poly_offset.front_scale : regs.poly_offset.back_scale) / 16.f); } const auto stencil_test_enabled = regs.depth_control.stencil_enable && regs.depth_buffer.StencilValid(); dynamic_state.SetStencilTestEnabled(stencil_test_enabled); if (stencil_test_enabled) { const StencilOps front_ops{ .fail_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_fail_front), .pass_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_zpass_front), .depth_fail_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_zfail_front), .compare_op = LiverpoolToVK::CompareOp(regs.depth_control.stencil_ref_func), }; const StencilOps back_ops = regs.depth_control.backface_enable ? StencilOps{ .fail_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_fail_back), .pass_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_zpass_back), .depth_fail_op = LiverpoolToVK::StencilOp(regs.stencil_control.stencil_zfail_back), .compare_op = LiverpoolToVK::CompareOp(regs.depth_control.stencil_bf_func), } : front_ops; dynamic_state.SetStencilOps(front_ops, back_ops); const bool stencil_clear = regs.depth_render_control.stencil_clear_enable; const auto front = regs.stencil_ref_front; const auto back = regs.depth_control.backface_enable ? regs.stencil_ref_back : regs.stencil_ref_front; dynamic_state.SetStencilReferences(front.stencil_test_val, back.stencil_test_val); dynamic_state.SetStencilWriteMasks(!stencil_clear ? front.stencil_write_mask : 0U, !stencil_clear ? back.stencil_write_mask : 0U); dynamic_state.SetStencilCompareMasks(front.stencil_mask, back.stencil_mask); } } void Rasterizer::UpdatePrimitiveState(const bool is_indexed) const { const auto& regs = liverpool->regs; auto& dynamic_state = scheduler.GetDynamicState(); const auto prim_restart = (regs.enable_primitive_restart & 1) != 0; ASSERT_MSG(!is_indexed || !prim_restart || regs.primitive_restart_index == 0xFFFF || regs.primitive_restart_index == 0xFFFFFFFF, "Primitive restart index other than -1 is not supported yet"); const auto cull_mode = LiverpoolToVK::IsPrimitiveCulled(regs.primitive_type) ? LiverpoolToVK::CullMode(regs.polygon_control.CullingMode()) : vk::CullModeFlagBits::eNone; const auto front_face = LiverpoolToVK::FrontFace(regs.polygon_control.front_face); dynamic_state.SetPrimitiveRestartEnabled(prim_restart); dynamic_state.SetRasterizerDiscardEnabled(regs.clipper_control.dx_rasterization_kill); dynamic_state.SetCullMode(cull_mode); dynamic_state.SetFrontFace(front_face); } void Rasterizer::UpdateRasterizationState() const { const auto& regs = liverpool->regs; auto& dynamic_state = scheduler.GetDynamicState(); dynamic_state.SetLineWidth(regs.line_control.Width()); } void Rasterizer::UpdateColorBlendingState(const GraphicsPipeline* pipeline) const { const auto& regs = liverpool->regs; auto& dynamic_state = scheduler.GetDynamicState(); dynamic_state.SetBlendConstants(regs.blend_constants); dynamic_state.SetColorWriteMasks(pipeline->GetGraphicsKey().write_masks); dynamic_state.SetAttachmentFeedbackLoopEnabled(attachment_feedback_loop); } void Rasterizer::ScopeMarkerBegin(const std::string_view& str, bool from_guest) { if ((from_guest && !Config::getVkGuestMarkersEnabled()) || (!from_guest && !Config::getVkHostMarkersEnabled())) { return; } const auto cmdbuf = scheduler.CommandBuffer(); cmdbuf.beginDebugUtilsLabelEXT(vk::DebugUtilsLabelEXT{ .pLabelName = str.data(), }); } void Rasterizer::ScopeMarkerEnd(bool from_guest) { if ((from_guest && !Config::getVkGuestMarkersEnabled()) || (!from_guest && !Config::getVkHostMarkersEnabled())) { return; } const auto cmdbuf = scheduler.CommandBuffer(); cmdbuf.endDebugUtilsLabelEXT(); } void Rasterizer::ScopedMarkerInsert(const std::string_view& str, bool from_guest) { if ((from_guest && !Config::getVkGuestMarkersEnabled()) || (!from_guest && !Config::getVkHostMarkersEnabled())) { return; } const auto cmdbuf = scheduler.CommandBuffer(); cmdbuf.insertDebugUtilsLabelEXT(vk::DebugUtilsLabelEXT{ .pLabelName = str.data(), }); } void Rasterizer::ScopedMarkerInsertColor(const std::string_view& str, const u32 color, bool from_guest) { if ((from_guest && !Config::getVkGuestMarkersEnabled()) || (!from_guest && !Config::getVkHostMarkersEnabled())) { return; } const auto cmdbuf = scheduler.CommandBuffer(); cmdbuf.insertDebugUtilsLabelEXT(vk::DebugUtilsLabelEXT{ .pLabelName = str.data(), .color = std::array<f32, 4>( {(f32)((color >> 16) & 0xff) / 255.0f, (f32)((color >> 8) & 0xff) / 255.0f, (f32)(color & 0xff) / 255.0f, (f32)((color >> 24) & 0xff) / 255.0f})}); } // --- 新增：为特定计算着色器生成 mip 链 --- void Rasterizer::GenerateMipChainForWrittenImages(const Shader::Info& cs_info) { for (const auto& img_desc : cs_info.images) { if (!img_desc.is_written) continue; // 仅处理被写入的图像 auto tsharp = img_desc.GetSharp(cs_info); if (tsharp.GetDataFmt() == AmdGpu::DataFormat::FormatInvalid) continue; // 使用 tsharp 和 img_desc 构造 ImageDesc（与 BindTextures 中一致） VideoCore::TextureCache::ImageDesc desc(tsharp, img_desc); auto image_id = texture_cache.FindImage(desc); if (!image_id) continue; auto& image = texture_cache.GetImage(image_id); if (image.info.resources.levels <= 1) continue; // 无需 mip 链 // 如果已经生成过，跳过（避免重复生成） if (image.generated_mip_chain) continue; GenerateMipChainForImage(image); image.generated_mip_chain = true; } } void Rasterizer::GenerateMipChainForImage(VideoCore::Image& image) { auto cmdbuf = scheduler.CommandBuffer(); const auto& resources = image.info.resources; const vk::ImageAspectFlags aspect = image.info.props.is_depth ? vk::ImageAspectFlagBits::eDepth : vk::ImageAspectFlagBits::eColor; const vk::Filter filter = (aspect & vk::ImageAspectFlagBits::eDepth) ? vk::Filter::eNearest : vk::Filter::eLinear; // 屏障1：从计算着色器写入 -> 传输源/目标布局 vk::ImageMemoryBarrier2 pre_barrier = { .srcStageMask = vk::PipelineStageFlagBits2::eComputeShader, .srcAccessMask = vk::AccessFlagBits2::eShaderWrite, .dstStageMask = vk::PipelineStageFlagBits2::eTransfer, .dstAccessMask = vk::AccessFlagBits2::eTransferRead | vk::AccessFlagBits2::eTransferWrite, .oldLayout = image.backing->state.layout, .newLayout = vk::ImageLayout::eTransferSrcOptimal, .image = image.GetImage(), .subresourceRange = { .aspectMask = aspect, .baseMipLevel = 0, .levelCount = resources.levels, .baseArrayLayer = 0, .layerCount = resources.layers, }, }; cmdbuf.pipelineBarrier2(vk::DependencyInfo{ .imageMemoryBarrierCount = 1, .pImageMemoryBarriers = &pre_barrier, }); // 生成 mip 链：从 level 0 到 level-1 for (u32 level = 0; level < resources.levels - 1; ++level) { std::array src_offsets = { vk::Offset3D{0, 0, 0}, vk::Offset3D{ static_cast<int32_t>(std::max(1u, image.info.size.width >> level)), static_cast<int32_t>(std::max(1u, image.info.size.height >> level)), 1 } }; std::array dst_offsets = { vk::Offset3D{0, 0, 0}, vk::Offset3D{ static_cast<int32_t>(std::max(1u, image.info.size.width >> (level + 1))), static_cast<int32_t>(std::max(1u, image.info.size.height >> (level + 1))), 1 } }; vk::ImageBlit blitRegion{ .srcSubresource = { .aspectMask = aspect, .mipLevel = level, .baseArrayLayer = 0, .layerCount = resources.layers, }, .srcOffsets = src_offsets, .dstSubresource = { .aspectMask = aspect, .mipLevel = level + 1, .baseArrayLayer = 0, .layerCount = resources.layers, }, .dstOffsets = dst_offsets, }; cmdbuf.blitImage(image.GetImage(), vk::ImageLayout::eTransferSrcOptimal, image.GetImage(), vk::ImageLayout::eTransferSrcOptimal, blitRegion, filter); } // 屏障2：从传输写 -> 后续着色器读（转换为只读布局） vk::ImageMemoryBarrier2 post_barrier = { .srcStageMask = vk::PipelineStageFlagBits2::eTransfer, .srcAccessMask = vk::AccessFlagBits2::eTransferWrite, .dstStageMask = vk::PipelineStageFlagBits2::eFragmentShader | vk::PipelineStageFlagBits2::eComputeShader, .dstAccessMask = vk::AccessFlagBits2::eShaderRead, .oldLayout = vk::ImageLayout::eTransferSrcOptimal, .newLayout = vk::ImageLayout::eShaderReadOnlyOptimal, .image = image.GetImage(), .subresourceRange = { .aspectMask = aspect, .baseMipLevel = 0, .levelCount = resources.levels, .baseArrayLayer = 0, .layerCount = resources.layers, }, }; cmdbuf.pipelineBarrier2(vk::DependencyInfo{ .imageMemoryBarrierCount = 1, .pImageMemoryBarriers = &post_barrier, }); // 更新图像缓存的布局状态 image.backing->state.layout = vk::ImageLayout::eShaderReadOnlyOptimal; } } // namespace Vulkan ```

kerem commented

2026-02-27 21:10:31 +03:00

Author

Owner

@wuguo13842 commented on GitHub (Feb 20, 2026):

Evaluation Report: Integrating Mip Chain Generation into the Standard Pipeline

Technical Feasibility ✅
Core idea: Automatically detect all images that require mip chain generation before the scheduler submits a command buffer, and insert vkCmdBlitImage commands accordingly.

Foundation:

The texture cache already tracks image states (layout, access masks) and can be extended to record a "needs mip generation" flag.

The scheduler can be extended with a pre-submit hook, allowing the texture cache to inject extra commands just before finalizing the command buffer.

Existing barrier management functions like Image::Transit can safely perform layout transitions.

Key challenge: Ensuring the mip generation happens at the correct time (after writes, before sampling) without disrupting concurrent rendering.

Estimated Workload 📊
Module Changes Complexity
Scheduler Add pre-submit callback list; call before Flush or Submit Low
Image class Add needs_mip_gen and mip_chain_valid flags; implement GenerateMipChain() Medium
Texture Cache Set flags when images are written (e.g., Upload, CopyImage, compute shader writes) Medium-High
Synchronization Ensure correct layout transitions before and after mip generation Medium
Thread safety Protect flags with locks or atomics Low
Total: Approximately 300–500 lines of code across multiple files, but with clear logic and modular implementation.
Potential Benefits 🌟
Permanent fix: Solves the current picture-in-picture issue and prevents similar problems in future games.

Eliminates image copying: The new design shares physical resources for images at the same address, removing the need for CopyImage and thus eliminating device loss risks.

Code robustness: Integrates ad‑hoc patches into a system‑level feature, improving maintainability.

Cross‑vendor compatibility: Automatically generates mip chains for hardware that lacks LOD write support (e.g., NVIDIA) while preserving the native path for AMD.

Risks and Mitigations 🛡️
Performance overhead: Scanning all images and checking flags each frame → can be optimized with lazy marking and generating only once after the first write.

Synchronization errors: The image may be in an incompatible layout during mip generation → use Transit to enforce correct layout, adhering to Vulkan rules.

Data consistency: Multiple views sharing the same physical image may be affected → ensure all writes to any view are completed before generation (using barriers).

Device loss: Incorrect generation commands could crash the driver → add thorough validation and test incrementally.

Implementation Recommendations 🚀
Phased approach:

Phase 1: Add flags and methods to Image, and implement the pre‑submit hook in the scheduler (test without actual command insertion).

Phase 2: Set needs_mip_gen in the texture cache when an image is written (e.g., Upload, CopyImage, compute shader writes).

Phase 3: In the pre‑submit hook, call Image::GenerateMipChain() and insert the necessary barriers and blits.

Phase 4: Remove the old per‑shader patch code and fully enable the new mechanism.

Testing strategy:

Run the affected game on NVIDIA hardware to verify the picture‑in‑picture disappears and no crashes occur.

Regression test on AMD hardware to ensure performance is unaffected and functionality remains correct.

Use RenderDoc to capture frames and verify that mip chains are generated correctly.

Code reuse: The existing GenerateMipChainForImage function can be moved directly into the Image class with minor adjustments.

Conclusion ✅
Strongly recommended to implement this refactoring. Although it requires a moderate amount of work, it is the most fundamental solution to the current problem and will significantly improve the emulator’s stability and cross‑platform compatibility. Compared to continuously applying ad‑hoc patches, this is a worthwhile technical investment.

@wuguo13842 commented on GitHub (Feb 20, 2026): Evaluation Report: Integrating Mip Chain Generation into the Standard Pipeline 1. Technical Feasibility ✅ Core idea: Automatically detect all images that require mip chain generation before the scheduler submits a command buffer, and insert vkCmdBlitImage commands accordingly. Foundation: The texture cache already tracks image states (layout, access masks) and can be extended to record a "needs mip generation" flag. The scheduler can be extended with a pre-submit hook, allowing the texture cache to inject extra commands just before finalizing the command buffer. Existing barrier management functions like Image::Transit can safely perform layout transitions. Key challenge: Ensuring the mip generation happens at the correct time (after writes, before sampling) without disrupting concurrent rendering. 2. Estimated Workload 📊 Module Changes Complexity Scheduler Add pre-submit callback list; call before Flush or Submit Low Image class Add needs_mip_gen and mip_chain_valid flags; implement GenerateMipChain() Medium Texture Cache Set flags when images are written (e.g., Upload, CopyImage, compute shader writes) Medium-High Synchronization Ensure correct layout transitions before and after mip generation Medium Thread safety Protect flags with locks or atomics Low Total: Approximately 300–500 lines of code across multiple files, but with clear logic and modular implementation. 3. Potential Benefits 🌟 Permanent fix: Solves the current picture-in-picture issue and prevents similar problems in future games. Eliminates image copying: The new design shares physical resources for images at the same address, removing the need for CopyImage and thus eliminating device loss risks. Code robustness: Integrates ad‑hoc patches into a system‑level feature, improving maintainability. Cross‑vendor compatibility: Automatically generates mip chains for hardware that lacks LOD write support (e.g., NVIDIA) while preserving the native path for AMD. 4. Risks and Mitigations 🛡️ Performance overhead: Scanning all images and checking flags each frame → can be optimized with lazy marking and generating only once after the first write. Synchronization errors: The image may be in an incompatible layout during mip generation → use Transit to enforce correct layout, adhering to Vulkan rules. Data consistency: Multiple views sharing the same physical image may be affected → ensure all writes to any view are completed before generation (using barriers). Device loss: Incorrect generation commands could crash the driver → add thorough validation and test incrementally. 5. Implementation Recommendations 🚀 Phased approach: Phase 1: Add flags and methods to Image, and implement the pre‑submit hook in the scheduler (test without actual command insertion). Phase 2: Set needs_mip_gen in the texture cache when an image is written (e.g., Upload, CopyImage, compute shader writes). Phase 3: In the pre‑submit hook, call Image::GenerateMipChain() and insert the necessary barriers and blits. Phase 4: Remove the old per‑shader patch code and fully enable the new mechanism. Testing strategy: Run the affected game on NVIDIA hardware to verify the picture‑in‑picture disappears and no crashes occur. Regression test on AMD hardware to ensure performance is unaffected and functionality remains correct. Use RenderDoc to capture frames and verify that mip chains are generated correctly. Code reuse: The existing GenerateMipChainForImage function can be moved directly into the Image class with minor adjustments. 6. Conclusion ✅ Strongly recommended to implement this refactoring. Although it requires a moderate amount of work, it is the most fundamental solution to the current problem and will significantly improve the emulator’s stability and cross‑platform compatibility. Compared to continuously applying ad‑hoc patches, this is a worthwhile technical investment.

kerem commented

2026-02-27 21:10:31 +03:00

Author

Owner

@wuguo13842 commented on GitHub (Feb 20, 2026):

🔥 Fundamental Solution: Refactoring the Texture Cache from the Ground Up
After multiple attempts, we have identified the root cause: different GPUs handle image aliasing differently. AMD drivers may automatically handle certain incompatible aliasing cases, while NVIDIA strictly follows the Vulkan specification, causing our copy operations to fail. To solve this permanently, we need to redesign the texture cache architecture so that it no longer relies on copying, but instead leverages Vulkan's image view mechanism and smarter metadata management.

🎯 Core Design Principles
Avoid Copies: Whenever possible, let different image requests share the same vkImage object, providing different views via vkImageView.

On-Demand Mip Chain Generation: When hardware does not support LOD writes, automatically generate the full mip chain the first time the image is sampled.

Precise Usage Tracking: Record each image's use cases (sampling, render target, storage) to decide whether mip generation is needed.

Lazy Binding: The actual Vulkan resources can be created only at first use, avoiding premature allocation.

Metadata-Driven: Decouple image properties (format, size, mip levels, etc.) from the Vulkan resource, allowing one resource to have multiple views.

@wuguo13842 commented on GitHub (Feb 20, 2026): 🔥 Fundamental Solution: Refactoring the Texture Cache from the Ground Up After multiple attempts, we have identified the root cause: different GPUs handle image aliasing differently. AMD drivers may automatically handle certain incompatible aliasing cases, while NVIDIA strictly follows the Vulkan specification, causing our copy operations to fail. To solve this permanently, we need to redesign the texture cache architecture so that it no longer relies on copying, but instead leverages Vulkan's image view mechanism and smarter metadata management. 🎯 Core Design Principles Avoid Copies: Whenever possible, let different image requests share the same vkImage object, providing different views via vkImageView. On-Demand Mip Chain Generation: When hardware does not support LOD writes, automatically generate the full mip chain the first time the image is sampled. Precise Usage Tracking: Record each image's use cases (sampling, render target, storage) to decide whether mip generation is needed. Lazy Binding: The actual Vulkan resources can be created only at first use, avoiding premature allocation. Metadata-Driven: Decouple image properties (format, size, mip levels, etc.) from the Vulkan resource, allowing one resource to have multiple views.

kerem commented

2026-02-27 21:10:31 +03:00

Author

Owner

@LorenzoFerri commented on GitHub (Feb 25, 2026):

This is marked as completed, is it fixed? I'm experiencing the same on an RTX 4080, latest emulator build

@LorenzoFerri commented on GitHub (Feb 25, 2026): This is marked as completed, is it fixed? I'm experiencing the same on an RTX 4080, latest emulator build

kerem commented

2026-02-27 21:10:31 +03:00

Author

Owner

@StevenMiller123 commented on GitHub (Feb 25, 2026):

Will be fixed soon, but isn't fixed yet.
Pretty sure this got closed because we blocked the issue author, though we don't generally do that.

@StevenMiller123 commented on GitHub (Feb 25, 2026): Will be fixed soon, but isn't fixed yet. Pretty sure this got closed because we blocked the issue author, though we don't generally do that.

kerem commented

2026-02-27 21:10:31 +03:00

Author

Owner

@CorpseSlayer commented on GitHub (Feb 26, 2026):

Suddenly Gravity Rush 2 stop loading saves when pressing continue this is in the latest nightly release.

Error:
[Lib.AvPlayer] (BGFiberWorkerSys) avplayer_source.cpp:271 Stop: Could not stop playback: already stopped.

@CorpseSlayer commented on GitHub (Feb 26, 2026): Suddenly Gravity Rush 2 stop loading saves when pressing continue this is in the latest nightly release. <img width="1278" height="774" alt="Image" src="https://github.com/user-attachments/assets/80c9b6a2-f519-4207-9705-b51a61947400" /> <img width="1269" height="774" alt="Image" src="https://github.com/user-attachments/assets/04e1c1be-dec3-48d4-a0a1-690c75c92371" /> Error: [Lib.AvPlayer] <Warning> (BGFiberWorkerSys) avplayer_source.cpp:271 Stop: Could not stop playback: already stopped.

kerem commented

2026-02-27 21:10:31 +03:00

Author

Owner

@a857313401 commented on GitHub (Feb 27, 2026):

https://github.com/shadps4-emu/shadPS4/actions/runs/22388110449
你可以用这个分支核心，解决了1/4左上角重影的问题，但是EP3通过第一个试炼后依旧会卡死

@a857313401 commented on GitHub (Feb 27, 2026): https://github.com/shadps4-emu/shadPS4/actions/runs/22388110449 你可以用这个分支核心，解决了1/4左上角重影的问题，但是EP3通过第一个试炼后依旧会卡死

Rows
Columns

[GH-ISSUE #4049] [GAME BUG]: The Gravity Rush 2 renders a downsized copy of the scene in the top‑left quarter of the screen, creating a recursive feedback effect. #1208

Checklist (we expect you to perform these steps before opening the issue)

Describe the Bug

Reproduction Steps

Specify OS Version

CPU

GPU

Amount of RAM in GB

Amount of VRAM in GB

Log File