[GH-ISSUE #26] Keep closing my issues, simple mode generates without lyrics or tags, please help #22

Open
opened 2026-02-26 21:30:50 +03:00 by kerem · 12 comments
Owner

Originally created by @iChristGit on GitHub (Feb 5, 2026).
Original GitHub issue: https://github.com/fspecii/ace-step-ui/issues/26

Image
Running SQLite database migrations...
Migrations completed successfully!
ACE-Step UI Server running on http://localhost:3001
Environment: development
ACE-Step API: http://localhost:8001
LAN access: http://100.106.209.57:3001
LAN access: http://192.168.0.169:3001
LAN access: http://172.19.32.1:3001
Initializing local storage provider
Job job_1770329667331_2ygoa8e: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770329667331_2ygoa8e: Using ACE-Step REST API {
  prompt: 'A happy r&b pop song about living in Los Santos (G',
  duration: undefined
}
Job job_1770329667331_2ygoa8e: Submitted to API as task a1b2a08b-0b70-4f45-b915-e9f5829bf363
Job job_1770329667331_2ygoa8e: Completed via API with 1 audio files
Job job_1770329730371_agxk76v: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770329730371_agxk76v: Using ACE-Step REST API {
  prompt: 'A happy r&b pop song about living in Los Santos ',
  duration: undefined
}
Job job_1770329730371_agxk76v: Submitted to API as task a092ddd4-bd19-43ca-80fa-34125626e659
Job job_1770329730371_agxk76v: Completed via API with 1 audio files
Job job_1770329748232_ilj01em: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770329748232_ilj01em: Using ACE-Step REST API {
  prompt: 'A happy r&b pop song about living in Los Santos ',
  duration: undefined
}
Job job_1770329748232_ilj01em: Submitted to API as task 57efb9e0-c93e-426c-aad4-c6b8332545c7
Job job_1770329748232_ilj01em: Completed via API with 1 audio files
Job job_1770329816989_9jxd8l4: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770329816989_9jxd8l4: Using ACE-Step REST API { prompt: 'a happy rap song about burgers', duration: undefined }
Job job_1770329816989_9jxd8l4: Submitted to API as task 4e5641bf-79fa-46ec-a539-313f85842ed8
Job job_1770329816989_9jxd8l4: Completed via API with 1 audio files
[Format] Running: ..\ACE-Step-1.5\.venv\Scripts\python.exe C:\Users\admin\ace-step-ui\server\scripts\format_sample.py --caption rap --json --lyrics burger is lovd --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-1.7B --lm-backend pt
[Format] CWD: ..\ACE-Step-1.5
[Format] Spawn error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Python error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Process exited with code -4058
Job job_1770329956021_5jfupbe: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770329956021_5jfupbe: Using ACE-Step REST API { prompt: 'a happy song about love', duration: undefined }
Job job_1770329956021_5jfupbe: Submitted to API as task c60c9920-4999-4c62-8587-72c4d008f50d
Job job_1770329956021_5jfupbe: Completed via API with 1 audio files
[Format] Running: ..\ACE-Step-1.5\.venv\Scripts\python.exe C:\Users\admin\ace-step-ui\server\scripts\format_sample.py --caption rap --json --lyrics burger is lovd --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-1.7B --lm-backend pt
[Format] CWD: ..\ACE-Step-1.5
[Format] Spawn error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Python error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Process exited with code -4058

No matter what, when on Simple mode, the song is generated without lyrics or tags.

fresh install of both Ace-step and Ace-step-ui.

Originally created by @iChristGit on GitHub (Feb 5, 2026). Original GitHub issue: https://github.com/fspecii/ace-step-ui/issues/26 <img width="2402" height="1238" alt="Image" src="https://github.com/user-attachments/assets/83130fd8-f9ce-4106-b26b-e2a0625118d1" /> ``` Running SQLite database migrations... Migrations completed successfully! ACE-Step UI Server running on http://localhost:3001 Environment: development ACE-Step API: http://localhost:8001 LAN access: http://100.106.209.57:3001 LAN access: http://192.168.0.169:3001 LAN access: http://172.19.32.1:3001 Initializing local storage provider Job job_1770329667331_2ygoa8e: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770329667331_2ygoa8e: Using ACE-Step REST API { prompt: 'A happy r&b pop song about living in Los Santos (G', duration: undefined } Job job_1770329667331_2ygoa8e: Submitted to API as task a1b2a08b-0b70-4f45-b915-e9f5829bf363 Job job_1770329667331_2ygoa8e: Completed via API with 1 audio files Job job_1770329730371_agxk76v: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770329730371_agxk76v: Using ACE-Step REST API { prompt: 'A happy r&b pop song about living in Los Santos ', duration: undefined } Job job_1770329730371_agxk76v: Submitted to API as task a092ddd4-bd19-43ca-80fa-34125626e659 Job job_1770329730371_agxk76v: Completed via API with 1 audio files Job job_1770329748232_ilj01em: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770329748232_ilj01em: Using ACE-Step REST API { prompt: 'A happy r&b pop song about living in Los Santos ', duration: undefined } Job job_1770329748232_ilj01em: Submitted to API as task 57efb9e0-c93e-426c-aad4-c6b8332545c7 Job job_1770329748232_ilj01em: Completed via API with 1 audio files Job job_1770329816989_9jxd8l4: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770329816989_9jxd8l4: Using ACE-Step REST API { prompt: 'a happy rap song about burgers', duration: undefined } Job job_1770329816989_9jxd8l4: Submitted to API as task 4e5641bf-79fa-46ec-a539-313f85842ed8 Job job_1770329816989_9jxd8l4: Completed via API with 1 audio files [Format] Running: ..\ACE-Step-1.5\.venv\Scripts\python.exe C:\Users\admin\ace-step-ui\server\scripts\format_sample.py --caption rap --json --lyrics burger is lovd --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-1.7B --lm-backend pt [Format] CWD: ..\ACE-Step-1.5 [Format] Spawn error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Python error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Process exited with code -4058 Job job_1770329956021_5jfupbe: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770329956021_5jfupbe: Using ACE-Step REST API { prompt: 'a happy song about love', duration: undefined } Job job_1770329956021_5jfupbe: Submitted to API as task c60c9920-4999-4c62-8587-72c4d008f50d Job job_1770329956021_5jfupbe: Completed via API with 1 audio files [Format] Running: ..\ACE-Step-1.5\.venv\Scripts\python.exe C:\Users\admin\ace-step-ui\server\scripts\format_sample.py --caption rap --json --lyrics burger is lovd --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-1.7B --lm-backend pt [Format] CWD: ..\ACE-Step-1.5 [Format] Spawn error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Python error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Process exited with code -4058 ``` No matter what, when on Simple mode, the song is generated without lyrics or tags. fresh install of both Ace-step and Ace-step-ui.
Author
Owner

@iChristGit commented on GitHub (Feb 5, 2026):


> ace-step-ui-server@1.0.0 dev
> tsx watch src/index.ts

Running SQLite database migrations...
Migrations completed successfully!
ACE-Step UI Server running on http://localhost:3001
Environment: development
ACE-Step API: http://localhost:8001
LAN access: http://100.106.209.57:3001
LAN access: http://192.168.0.169:3001
LAN access: http://172.19.32.1:3001
Initializing local storage provider
Job job_1770330447716_4njy6vk: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770330447716_4njy6vk: Using ACE-Step REST API {
  prompt: 'A Happy song about living in new york',
  duration: undefined
}
Job job_1770330447716_4njy6vk: Submitted to API as task e7cc7425-f042-4af9-a33a-cb8ba6cd784f
Job job_1770330447716_4njy6vk: Completed via API with 1 audio files
Job job_1770330457573_iundqnv: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770330457573_iundqnv: Using ACE-Step REST API {
  prompt: 'A Happy song about living in new york',
  duration: undefined
}
Job job_1770330457573_iundqnv: Submitted to API as task 9b44a46b-e0d1-42b3-a0d2-8049806ce91d
Job job_1770330457573_iundqnv: Completed via API with 1 audio files
Job job_1770330476354_ir11xpc: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770330476354_ir11xpc: Using ACE-Step REST API {
  prompt: 'A Happy song about living in new york',
  duration: undefined
}
Job job_1770330476354_ir11xpc: Submitted to API as task 8d4e64a7-82bf-448f-9d1d-52cd97aabd7c
Job job_1770330476354_ir11xpc: Completed via API with 1 audio files
Job job_1770330492353_73lugs5: Queued at position 1
[ACE-Step] API available at http://localhost:8001: true
Job job_1770330492353_73lugs5: Using ACE-Step REST API {
  prompt: 'A Happy song about living in new york',
  duration: undefined
}
Job job_1770330492353_73lugs5: Submitted to API as task 63414233-e262-4937-8cfd-dd14cdc42206
Job job_1770330492353_73lugs5: Completed via API with 1 audio files
[Format] Running: ..\ACE-Step-1.5\.venv\Scripts\python.exe C:\Users\admin\ace-step-ui\server\scripts\format_sample.py --caption Electronic, Hip Hop --json --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-0.6B --lm-backend pt
[Format] CWD: ..\ACE-Step-1.5
[Format] Spawn error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Python error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Process exited with code -4058

Uninstalled 1 package in 8ms
Installed 1 package in 41ms
Skipping import of cpp extensions due to incompatible torch version 2.7.1+cu128 for torchao version 0.15.0             Please see https://github.com/pytorch/ao/issues/2919 for more info
W0206 00:25:18.778000 31080 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
INFO:     Started server process [31080]
INFO:     Waiting for application startup.
[API Server] Initializing models at startup...

============================================================
[API Server] GPU Configuration Detected:
============================================================
  GPU Memory: 23.99 GB
  Configuration Tier: tier6
  Max Duration (with LM): 480s
  Max Duration (without LM): 480s
  Max Batch Size (with LM): 4
  Max Batch Size (without LM): 8
  Default LM Init: True
  Available LM Models: ['acestep-5Hz-lm-0.6B', 'acestep-5Hz-lm-1.7B', 'acestep-5Hz-lm-4B']
============================================================

[API Server] CPU offload disabled by default (GPU >= 16GB)
[Model Download] Model acestep-v15-turbo already exists at C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-v15-turbo
[Model Download] Model vae already exists at C:\Users\admin\ACE-Step-1.5\checkpoints\vae
[API Server] Loading primary DiT model: acestep-v15-turbo
2026-02-06 00:25:22.068 | INFO     | acestep.handler:initialize_service:399 - [initialize_service] Attempting to load model with attention implementation: flash_attention_2
[API Server] Primary model loaded: acestep-v15-turbo
[API Server] GPU auto-detection: init_llm=True (VRAM: 24.0GB, tier: tier6)
[API Server] ACESTEP_INIT_LLM=auto, using GPU auto-detection result
[API Server] Loading LLM model...
[API Server] Auto-selected LM model: acestep-5Hz-lm-4B based on GPU tier
[Model Download] Model acestep-5Hz-lm-4B already exists at C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B
2026-02-06 00:25:27.341 | INFO     | acestep.llm_inference:initialize:361 - loading 5Hz LM tokenizer... it may take 80~90s
2026-02-06 00:25:47.566 | INFO     | acestep.llm_inference:initialize:365 - 5Hz LM tokenizer loaded successfully in 20.22 seconds
2026-02-06 00:25:47.566 | INFO     | acestep.llm_inference:initialize:370 - Initializing constrained decoding processor...
2026-02-06 00:25:47.567 | INFO     | acestep.llm_inference:initialize:376 - Setting constrained decoding max_duration to 480s based on GPU config (tier: tier6)
2026-02-06 00:25:49.401 | WARNING  | acestep.constrained_logits_processor:_precompute_audio_code_tokens:556 - Found 1535 audio code tokens with values outside valid range [0, 63999]
2026-02-06 00:25:53.428 | INFO     | acestep.llm_inference:initialize:384 - Constrained processor initialized in 5.86 seconds
2026-02-06 00:25:54.282 | INFO     | acestep.llm_inference:get_gpu_memory_utilization:102 - Adaptive LM memory allocation: model=C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B, target=12.0GB, ratio=0.500, total_gpu=24.0GB
2026-02-06 00:25:54.282 | INFO     | acestep.llm_inference:_initialize_5hz_lm_vllm:444 - Initializing 5Hz LM with model: C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B, enforce_eager: False, tensor_parallel_size: 1, max_model_len: 4096, gpu_memory_utilization: 0.500
[debug]dist_port: 2333
[W206 00:25:54.000000000 socket.cpp:755] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:2333 (system error: 10049 - The requested address is not valid in its context.).
`torch_dtype` is deprecated! Use `dtype` instead!
[nanovllm] KV cache allocated: 125 blocks × 256 tokens = 32000 tokens capacity, 4.39 GB (free: 8.86 GB, used: 13.77 GB, target: 12.00 GB, block: 36.00 MB)
2026-02-06 00:26:23.723 | INFO     | acestep.llm_inference:_initialize_5hz_lm_vllm:454 - 5Hz LM initialized successfully in 29.44 seconds
2026-02-06 00:26:23.724 | INFO     | acestep.llm_inference:initialize:390 - 5Hz LM status message: ✅ 5Hz LM initialized successfully
Model: C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B
Device: NVIDIA GeForce RTX 3090 Ti
GPU Memory Utilization: 0.500
Low GPU Memory Mode: False
[API Server] LLM model loaded: acestep-5Hz-lm-4B
[API Server] All models initialized successfully!
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
INFO:     127.0.0.1:53673 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:53675 - "POST /release_task HTTP/1.1" 200 OK
2026-02-06 00:27:27.760 | INFO     | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=False, use_cot_caption=False, use_cot_language=False, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True
2026-02-06 00:27:27.760 | INFO     | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=dit) (size: 1, seeds: [3658063042])
2026-02-06 00:27:27.760 | INFO     | acestep.llm_inference:generate_with_stop_condition:964 - Phase 1: Generating CoT metadata...
2026-02-06 00:27:27.781 | INFO     | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system
# Instruction
Generate audio semantic tokens based on the given conditions:

<|im_end|>
<|im_start|>user
# Caption
A Happy song about living in new york

# Lyric

<|im_end|>
<|im_start|>assistant

INFO:     127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK
Generating: 100%|███████████████████████████████████| 1/1 [00:01<00:00,  1.32s/steps, Prefill=110tok/s, Decode=40tok/s]
2026-02-06 00:27:29.103 | DEBUG    | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <think>
bpm: 58
duration: 24
keyscale: C major
timesignature: 2
<|im_end|>
2026-02-06 00:27:29.103 | INFO     | acestep.llm_inference:generate_with_stop_condition:1012 - Phase 1 completed in 1.34s. Generated metadata: ['bpm', 'duration', 'keyscale', 'timesignature']
2026-02-06 00:27:29.103 | INFO     | acestep.handler:generate_music:2782 - [generate_music] Starting generation...
2026-02-06 00:27:29.105 | INFO     | acestep.handler:generate_music:2785 - [generate_music] Preparing inputs...
2026-02-06 00:27:29.142 | INFO     | acestep.handler:_prepare_batch:1888 -
======================================================================
2026-02-06 00:27:29.144 | INFO     | acestep.handler:_prepare_batch:1889 - 🔍 [DEBUG] DiT TEXT ENCODER INPUT (Inference)
2026-02-06 00:27:29.144 | INFO     | acestep.handler:_prepare_batch:1890 - ======================================================================
2026-02-06 00:27:29.144 | INFO     | acestep.handler:_prepare_batch:1891 - text_prompt:
# Instruction
Fill the audio semantic mask based on the given conditions:

# Caption
A Happy song about living in new york

# Metas
- bpm: 58
- timesignature: 2
- keyscale: C major
- duration: 24 seconds
<|endoftext|>

2026-02-06 00:27:29.144 | INFO     | acestep.handler:_prepare_batch:1892 - ======================================================================

2026-02-06 00:27:29.158 | INFO     | acestep.handler:preprocess_batch:2099 - [preprocess_batch] Inferring prompt embeddings...
2026-02-06 00:27:29.340 | INFO     | acestep.handler:preprocess_batch:2102 - [preprocess_batch] Inferring lyric embeddings...
2026-02-06 00:27:29.341 | INFO     | acestep.handler:service_generate:2335 - [service_generate] Generating audio...
INFO:     127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK
2026-02-06 00:27:30.469 | INFO     | acestep.handler:generate_music:2893 - [generate_music] Model generation completed. Decoding latents...
2026-02-06 00:27:30.482 | DEBUG    | acestep.handler:generate_music:2897 - [generate_music] pred_latents: torch.Size([1, 600, 64]), dtype=torch.bfloat16 pred_latents.min()=tensor(-5.5938, device='cuda:0', dtype=torch.bfloat16), pred_latents.max()=tensor(3.8594, device='cuda:0', dtype=torch.bfloat16), pred_latents.mean()=tensor(-0.1455, device='cuda:0', dtype=torch.bfloat16) pred_latents.std()=tensor(1.0391, device='cuda:0', dtype=torch.bfloat16)
2026-02-06 00:27:30.484 | DEBUG    | acestep.handler:generate_music:2898 - [generate_music] time_costs: {'encoder_time_cost': 0.0553889274597168, 'diffusion_time_cost': 0.9490585327148438, 'diffusion_per_step_time_cost': 0.11863231658935547, 'total_time_cost': 1.0044474601745605, 'offload_time_cost': 0.0}
2026-02-06 00:27:30.484 | INFO     | acestep.handler:generate_music:2901 - [generate_music] Decoding latents with VAE...
2026-02-06 00:27:30.488 | DEBUG    | acestep.handler:generate_music:2919 - [generate_music] Before VAE decode: allocated=18.19GB, max=18.23GB
2026-02-06 00:27:30.489 | INFO     | acestep.handler:generate_music:2922 - [generate_music] Using tiled VAE decode to reduce VRAM usage...
Decoding audio chunks: 100%|██████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.53steps/s]
2026-02-06 00:27:30.794 | DEBUG    | acestep.handler:generate_music:2929 - [generate_music] After VAE decode: allocated=18.35GB, max=19.48GB
2026-02-06 00:27:30.805 | INFO     | acestep.handler:generate_music:2946 - [generate_music] VAE decode completed. Preparing audio tensors...
2026-02-06 00:27:30.805 | INFO     | acestep.handler:generate_music:2961 - [generate_music] Done! Generated 1 audio tensors.
2026-02-06 00:27:31.077 | DEBUG    | acestep.audio_utils:save_audio:125 - [AudioSaver] Fallback soundfile Saved audio to C:\Users\admin\ACE-Step-1.5\.cache\acestep\tmp\api_audio\97c51633-ee43-cb55-c551-7f118c9e8d03.mp3 (mp3, 48000Hz)
INFO:     127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53687 - "GET /v1/audio?path=C%3A%5CUsers%5Cadmin%5CACE-Step-1.5%5C.cache%5Cacestep%5Ctmp%5Capi_audio%5C97c51633-ee43-cb55-c551-7f118c9e8d03.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:53695 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:53697 - "POST /release_task HTTP/1.1" 200 OK
2026-02-06 00:27:37.583 | INFO     | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=False, use_cot_caption=False, use_cot_language=False, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True
2026-02-06 00:27:37.583 | INFO     | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=dit) (size: 1, seeds: [468283798])
2026-02-06 00:27:37.583 | INFO     | acestep.llm_inference:generate_with_stop_condition:964 - Phase 1: Generating CoT metadata...
2026-02-06 00:27:37.586 | INFO     | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system
# Instruction
Generate audio semantic tokens based on the given conditions:

<|im_end|>
<|im_start|>user
# Caption
A Happy song about living in new york

# Lyric

<|im_end|>
<|im_start|>assistant

Generating:   0%|                                                                             | 0/1 [00:00<?, ?steps/s]INFO:     127.0.0.1:53695 - "POST /query_result HTTP/1.1" 200 OK
Generating: 100%|██████████████████████████████████| 1/1 [00:00<00:00,  1.55steps/s, Prefill=1368tok/s, Decode=47tok/s]
2026-02-06 00:27:38.232 | DEBUG    | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <think>
bpm: 58
duration: 32
keyscale: B♭ major
timesignature: 2
<|im_end|>
2026-02-06 00:27:38.232 | INFO     | acestep.llm_inference:generate_with_stop_condition:1012 - Phase 1 completed in 0.65s. Generated metadata: ['bpm', 'duration', 'keyscale', 'timesignature']
2026-02-06 00:27:38.232 | INFO     | acestep.handler:generate_music:2782 - [generate_music] Starting generation...
2026-02-06 00:27:38.232 | INFO     | acestep.handler:generate_music:2785 - [generate_music] Preparing inputs...
2026-02-06 00:27:38.243 | INFO     | acestep.handler:_prepare_batch:1888 -
======================================================================
2026-02-06 00:27:38.243 | INFO     | acestep.handler:_prepare_batch:1889 - 🔍 [DEBUG] DiT TEXT ENCODER INPUT (Inference)
2026-02-06 00:27:38.243 | INFO     | acestep.handler:_prepare_batch:1890 - ======================================================================
2026-02-06 00:27:38.243 | INFO     | acestep.handler:_prepare_batch:1891 - text_prompt:
# Instruction
Fill the audio semantic mask based on the given conditions:

# Caption
A Happy song about living in new york

# Metas
- bpm: 58
- timesignature: 2
- keyscale: B♭ major
- duration: 32 seconds
<|endoftext|>

2026-02-06 00:27:38.243 | INFO     | acestep.handler:_prepare_batch:1892 - ======================================================================

2026-02-06 00:27:38.251 | INFO     | acestep.handler:preprocess_batch:2099 - [preprocess_batch] Inferring prompt embeddings...
2026-02-06 00:27:38.285 | INFO     | acestep.handler:preprocess_batch:2102 - [preprocess_batch] Inferring lyric embeddings...
2026-02-06 00:27:38.285 | INFO     | acestep.handler:service_generate:2335 - [service_generate] Generating audio...
2026-02-06 00:27:38.960 | INFO     | acestep.handler:generate_music:2893 - [generate_music] Model generation completed. Decoding latents...
2026-02-06 00:27:38.964 | DEBUG    | acestep.handler:generate_music:2897 - [generate_music] pred_latents: torch.Size([1, 800, 64]), dtype=torch.bfloat16 pred_latents.min()=tensor(-6.7812, device='cuda:0', dtype=torch.bfloat16), pred_latents.max()=tensor(4.2500, device='cuda:0', dtype=torch.bfloat16), pred_latents.mean()=tensor(-0.1040, device='cuda:0', dtype=torch.bfloat16) pred_latents.std()=tensor(1.0625, device='cuda:0', dtype=torch.bfloat16)
2026-02-06 00:27:38.964 | DEBUG    | acestep.handler:generate_music:2898 - [generate_music] time_costs: {'encoder_time_cost': 0.039624691009521484, 'diffusion_time_cost': 0.6022460460662842, 'diffusion_per_step_time_cost': 0.07528075575828552, 'total_time_cost': 0.6418707370758057, 'offload_time_cost': 0.0}
2026-02-06 00:27:38.964 | INFO     | acestep.handler:generate_music:2901 - [generate_music] Decoding latents with VAE...
2026-02-06 00:27:38.968 | DEBUG    | acestep.handler:generate_music:2919 - [generate_music] Before VAE decode: allocated=18.35GB, max=19.48GB
2026-02-06 00:27:38.968 | INFO     | acestep.handler:generate_music:2922 - [generate_music] Using tiled VAE decode to reduce VRAM usage...
Decoding audio chunks: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  9.24steps/s]
2026-02-06 00:27:39.340 | DEBUG    | acestep.handler:generate_music:2929 - [generate_music] After VAE decode: allocated=18.35GB, max=19.56GB
2026-02-06 00:27:39.354 | INFO     | acestep.handler:generate_music:2946 - [generate_music] VAE decode completed. Preparing audio tensors...
2026-02-06 00:27:39.354 | INFO     | acestep.handler:generate_music:2961 - [generate_music] Done! Generated 1 audio tensors.
2026-02-06 00:27:39.573 | DEBUG    | acestep.audio_utils:save_audio:125 - [AudioSaver] Fallback soundfile Saved audio to C:\Users\admin\ACE-Step-1.5\.cache\acestep\tmp\api_audio\5f8d6115-efa9-f5d5-2300-920dea7b33f0.mp3 (mp3, 48000Hz)
INFO:     127.0.0.1:53695 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53697 - "GET /v1/audio?path=C%3A%5CUsers%5Cadmin%5CACE-Step-1.5%5C.cache%5Cacestep%5Ctmp%5Capi_audio%5C5f8d6115-efa9-f5d5-2300-920dea7b33f0.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:53695 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53713 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:53715 - "POST /release_task HTTP/1.1" 200 OK
2026-02-06 00:27:56.361 | INFO     | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=True, use_cot_caption=True, use_cot_language=True, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True
2026-02-06 00:27:56.361 | INFO     | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=llm_dit) (size: 1, seeds: [332563508])
2026-02-06 00:27:56.361 | INFO     | acestep.llm_inference:generate_with_stop_condition:964 - Phase 1: Generating CoT metadata...
2026-02-06 00:27:56.362 | INFO     | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system
# Instruction
Generate audio semantic tokens based on the given conditions:

<|im_end|>
<|im_start|>user
# Caption
A Happy song about living in new york

# Lyric

<|im_end|>
<|im_start|>assistant

INFO:     127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                           | 0/1 [00:02<?, ?steps/s, Prefill=645tok/s, Decode=54tok/s]INFO:     127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                           | 0/1 [00:02<?, ?steps/s, Prefill=645tok/s, Decode=45tok/s]INFO:     127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK
Generating: 100%|███████████████████████████████████| 1/1 [00:02<00:00,  2.78s/steps, Prefill=645tok/s, Decode=40tok/s]
2026-02-06 00:27:59.142 | DEBUG    | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <think>
bpm: 58
caption: A cheerful and lighthearted ukulele-led instrumental. The ukulele plays
  a bright, catchy melody over a simple, steady chord progression. It's supported
  by a clean, straightforward drum machine beat and a subtle bass line that follows
  the root notes. The overall production is clean and simple, creating an upbeat,
  happy, and carefree mood perfect for children's content or positive background music.
duration: 35
keyscale: D major
language: unknown
timesignature: 2
<|im_end|>
2026-02-06 00:27:59.142 | INFO     | acestep.llm_inference:generate_with_stop_condition:1012 - Phase 1 completed in 2.78s. Generated metadata: ['bpm', 'caption', 'duration', 'keyscale', 'language', 'timesignature']
2026-02-06 00:27:59.142 | INFO     | acestep.llm_inference:generate_with_stop_condition:1055 - Phase 2: Generating audio codes...
2026-02-06 00:27:59.144 | INFO     | acestep.llm_inference:generate_with_stop_condition:1063 - generate_with_stop_condition: formatted_prompt_with_cot=<|im_start|>system
# Instruction
Generate audio semantic tokens based on the given conditions:

<|im_end|>
<|im_start|>user
# Caption
A Happy song about living in new york

# Lyric

<|im_end|>
<|im_start|>assistant
<think>
bpm: 58
caption: A cheerful and lighthearted ukulele-led instrumental. The ukulele plays a
  bright, catchy melody over a simple, steady chord progression. It's supported by
  a clean, straightforward drum machine beat and a subtle bass line that follows the
  root notes. The overall production is clean and simple, creating an upbeat, happy,
  and carefree mood perfect for children's content or positive background music.
duration: 35
keyscale: D major
language: unknown
timesignature: 2
</think>

<|im_end|>

Generating:   0%|                                          | 0/1 [00:01<?, ?steps/s, Prefill=3903tok/s, Decode=56tok/s]INFO:     127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:01<?, ?steps/s, Prefill=3903tok/s, Decode=47tok/s]INFO:     127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:03<?, ?steps/s, Prefill=3903tok/s, Decode=44tok/s]INFO:     127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:03<?, ?steps/s, Prefill=3903tok/s, Decode=45tok/s]INFO:     127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK
Generating: 100%|██████████████████████████████████| 1/1 [00:03<00:00,  3.54s/steps, Prefill=3903tok/s, Decode=50tok/s]
2026-02-06 00:28:02.690 | DEBUG    | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <|audio_code_43316|><|audio_code_10783|><|audio_code_38708|><|audio_code_58280|><|audio_code_45529|><|audio_code_10053|><|audio_code_18575|><|audio_code_16075|><|audio_code_58518|><|audio_code_39086|><|audio_code_37378|><|audio_code_37426|><|audio_code_61195|><|audio_code_53653|><|audio_code_39219|><|audio_code_40277|><|audio_code_29451|><|audio_code_55805|><|audio_code_12237|><|audio_code_33730|><|audio_code_28875|><|audio_code_3104|><|audio_code_38424|><|audio_code_28668|><|audio_code_2391|><|audio_code_11748|><|audio_code_4871|><|audio_code_31375|><|audio_code_28992|><|audio_code_37896|><|audio_code_35580|><|audio_code_23663|><|audio_code_58374|><|audio_code_26772|><|audio_code_51501|><|audio_code_53237|><|audio_code_10173|><|audio_code_58851|><|audio_code_55637|><|audio_code_17348|><|audio_code_8700|><|audio_code_821|><|audio_code_48799|><|audio_code_21380|><|audio_code_63945|><|audio_code_30040|><|audio_code_45921|><|audio_code_12664|><|audio_code_46040|><|audio_code_8117|><|audio_code_49311|><|audio_code_4868|><|audio_code_56124|><|audio_code_40828|><|audio_code_52173|><|audio_code_27507|><|audio_code_4759|><|audio_code_13871|><|audio_code_60934|><|audio_code_26373|><|audio_code_30021|><|audio_code_19965|><|audio_code_59673|><|audio_code_40464|><|audio_code_32520|><|audio_code_1815|><|audio_code_23419|><|audio_code_33272|><|audio_code_7992|><|audio_code_61274|><|audio_code_22675|><|audio_code_49354|><|audio_code_2303|><|audio_code_18567|><|audio_code_17302|><|audio_code_14221|><|audio_code_31717|><|audio_code_50973|><|audio_code_51181|><|audio_code_40956|><|audio_code_51133|><|audio_code_49098|><|audio_code_53606|><|audio_code_45118|><|audio_code_33503|><|audio_code_47365|><|audio_code_59256|><|audio_code_11762|><|audio_code_22791|><|audio_code_44775|><|audio_code_189|><|audio_code_24167|><|audio_code_22789|><|audio_code_34|><|audio_code_2603|><|audio_code_2326|><|audio_code_20459|><|audio_code_19907|><|audio_code_13619|><|audio_code_10555|><|audio_code_63917|><|audio_code_27588|><|audio_code_62953|><|audio_code_10613|><|audio_code_5168|><|audio_code_49480|><|audio_code_50138|><|audio_code_15846|><|audio_code_39232|><|audio_code_63783|><|audio_code_28999|><|audio_code_25528|><|audio_code_11832|><|audio_code_53383|><|audio_code_14382|><|audio_code_46084|><|audio_code_36852|><|audio_code_61188|><|audio_code_12719|><|audio_code_12196|><|audio_code_42908|><|audio_code_39188|><|audio_code_33664|><|audio_code_5620|><|audio_code_38254|><|audio_code_9407|><|audio_code_40866|><|audio_code_45456|><|audio_code_26376|><|audio_code_44520|><|audio_code_28408|><|audio_code_35819|><|audio_code_5486|><|audio_code_41623|><|audio_code_37091|><|audio_code_21275|><|audio_code_18387|><|audio_code_14325|><|audio_code_18842|><|audio_code_45759|><|audio_code_13351|><|audio_code_58950|><|audio_code_40916|><|audio_code_24042|><|audio_code_430|><|audio_code_11833|><|audio_code_5688|><|audio_code_17856|><|audio_code_52878|><|audio_code_7666|><|audio_code_18534|><|audio_code_45575|><|audio_code_38676|><|audio_code_21140|><|audio_code_21196|><|audio_code_50495|><|audio_code_4799|><|audio_code_16125|><|audio_code_43516|><|audio_code_35855|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_48839|><|im_end|>
2026-02-06 00:28:02.690 | INFO     | acestep.llm_inference:generate_with_stop_condition:1203 - Phase 2 completed in 3.55s. Generated 171 audio codes
2026-02-06 00:28:02.691 | INFO     | acestep.handler:generate_music:2782 - [generate_music] Starting generation...
2026-02-06 00:28:02.691 | INFO     | acestep.handler:generate_music:2785 - [generate_music] Preparing inputs...
2026-02-06 00:28:02.700 | INFO     | acestep.handler:_prepare_batch:1659 - [generate_music] Decoding audio codes for item 0...
2026-02-06 00:28:02.813 | INFO     | acestep.handler:_prepare_batch:1831 - [generate_music] Decoding audio codes for LM hints for item 0...
2026-02-06 00:28:02.817 | INFO     | acestep.handler:_prepare_batch:1888 -
======================================================================
2026-02-06 00:28:02.817 | INFO     | acestep.handler:_prepare_batch:1889 - 🔍 [DEBUG] DiT TEXT ENCODER INPUT (Inference)
2026-02-06 00:28:02.817 | INFO     | acestep.handler:_prepare_batch:1890 - ======================================================================
2026-02-06 00:28:02.817 | INFO     | acestep.handler:_prepare_batch:1891 - text_prompt:
# Instruction
Generate audio semantic tokens based on the given conditions:

# Caption
A cheerful and lighthearted ukulele-led instrumental. The ukulele plays a bright, catchy melody over a simple, steady chord progression. It's supported by a clean, straightforward drum machine beat and a subtle bass line that follows the root notes. The overall production is clean and simple, creating an upbeat, happy, and carefree mood perfect for children's content or positive background music.

# Metas
- bpm: 58
- timesignature: 2
- keyscale: D major
- duration: 35 seconds
<|endoftext|>

2026-02-06 00:28:02.817 | INFO     | acestep.handler:_prepare_batch:1892 - ======================================================================

2026-02-06 00:28:02.824 | INFO     | acestep.handler:preprocess_batch:2099 - [preprocess_batch] Inferring prompt embeddings...
2026-02-06 00:28:02.861 | INFO     | acestep.handler:preprocess_batch:2102 - [preprocess_batch] Inferring lyric embeddings...
2026-02-06 00:28:02.861 | INFO     | acestep.handler:service_generate:2335 - [service_generate] Generating audio...
Using precomputed LM hints
Using precomputed LM hints
2026-02-06 00:28:03.500 | INFO     | acestep.handler:generate_music:2893 - [generate_music] Model generation completed. Decoding latents...
2026-02-06 00:28:03.505 | DEBUG    | acestep.handler:generate_music:2897 - [generate_music] pred_latents: torch.Size([1, 855, 64]), dtype=torch.bfloat16 pred_latents.min()=tensor(-7.1250, device='cuda:0', dtype=torch.bfloat16), pred_latents.max()=tensor(4.8750, device='cuda:0', dtype=torch.bfloat16), pred_latents.mean()=tensor(-0.1079, device='cuda:0', dtype=torch.bfloat16) pred_latents.std()=tensor(1.1016, device='cuda:0', dtype=torch.bfloat16)
2026-02-06 00:28:03.505 | DEBUG    | acestep.handler:generate_music:2898 - [generate_music] time_costs: {'encoder_time_cost': 0.026651620864868164, 'diffusion_time_cost': 0.5836784839630127, 'diffusion_per_step_time_cost': 0.07295981049537659, 'total_time_cost': 0.6103301048278809, 'offload_time_cost': 0.0}
2026-02-06 00:28:03.505 | INFO     | acestep.handler:generate_music:2901 - [generate_music] Decoding latents with VAE...
2026-02-06 00:28:03.507 | DEBUG    | acestep.handler:generate_music:2919 - [generate_music] Before VAE decode: allocated=18.35GB, max=19.56GB
2026-02-06 00:28:03.507 | INFO     | acestep.handler:generate_music:2922 - [generate_music] Using tiled VAE decode to reduce VRAM usage...
Decoding audio chunks: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  8.41steps/s]
2026-02-06 00:28:03.902 | DEBUG    | acestep.handler:generate_music:2929 - [generate_music] After VAE decode: allocated=18.35GB, max=19.64GB
2026-02-06 00:28:03.914 | INFO     | acestep.handler:generate_music:2946 - [generate_music] VAE decode completed. Preparing audio tensors...
2026-02-06 00:28:03.914 | INFO     | acestep.handler:generate_music:2961 - [generate_music] Done! Generated 1 audio tensors.
2026-02-06 00:28:04.188 | DEBUG    | acestep.audio_utils:save_audio:125 - [AudioSaver] Fallback soundfile Saved audio to C:\Users\admin\ACE-Step-1.5\.cache\acestep\tmp\api_audio\33a9ab72-1593-e2ee-ee2c-998fe63107a4.mp3 (mp3, 48000Hz)
INFO:     127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53729 - "GET /v1/audio?path=C%3A%5CUsers%5Cadmin%5CACE-Step-1.5%5C.cache%5Cacestep%5Ctmp%5Capi_audio%5C33a9ab72-1593-e2ee-ee2c-998fe63107a4.mp3 HTTP/1.1" 200 OK
INFO:     127.0.0.1:53742 - "GET /health HTTP/1.1" 200 OK
INFO:     127.0.0.1:53744 - "POST /release_task HTTP/1.1" 200 OK
2026-02-06 00:28:12.360 | INFO     | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=True, use_cot_caption=True, use_cot_language=True, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True
2026-02-06 00:28:12.360 | INFO     | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=llm_dit) (size: 1, seeds: [3883759793])
2026-02-06 00:28:12.360 | INFO     | acestep.llm_inference:generate_with_stop_condition:964 - Phase 1: Generating CoT metadata...
2026-02-06 00:28:12.362 | INFO     | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system
# Instruction
Generate audio semantic tokens based on the given conditions:

<|im_end|>
<|im_start|>user
# Caption
A Happy song about living in new york

# Lyric

<|im_end|>
<|im_start|>assistant

INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:01<?, ?steps/s, Prefill=1024tok/s, Decode=53tok/s]INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:02<?, ?steps/s, Prefill=1024tok/s, Decode=44tok/s]INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating: 100%|██████████████████████████████████| 1/1 [00:02<00:00,  2.04s/steps, Prefill=1024tok/s, Decode=48tok/s]
2026-02-06 00:28:14.408 | DEBUG    | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <think>
bpm: 53
caption: A brief, upbeat jazz trio performance featuring a bright acoustic piano
  playing a lively, swinging melodic phrase. It's supported by a walking upright bass
  line and the subtle, rhythmic shuffle of brushed drums, creating a classic, sophisticated
  lounge jazz feel.
duration: 76
keyscale: G major
language: unknown
timesignature: 2
<|im_end|>
2026-02-06 00:28:14.408 | INFO     | acestep.llm_inference:generate_with_stop_condition:1012 - Phase 1 completed in 2.05s. Generated metadata: ['bpm', 'caption', 'duration', 'keyscale', 'language', 'timesignature']
2026-02-06 00:28:14.409 | INFO     | acestep.llm_inference:generate_with_stop_condition:1055 - Phase 2: Generating audio codes...
2026-02-06 00:28:14.410 | INFO     | acestep.llm_inference:generate_with_stop_condition:1063 - generate_with_stop_condition: formatted_prompt_with_cot=<|im_start|>system
# Instruction
Generate audio semantic tokens based on the given conditions:

<|im_end|>
<|im_start|>user
# Caption
A Happy song about living in new york

# Lyric

<|im_end|>
<|im_start|>assistant
<think>
bpm: 53
caption: A brief, upbeat jazz trio performance featuring a bright acoustic piano playing
  a lively, swinging melodic phrase. It's supported by a walking upright bass line
  and the subtle, rhythmic shuffle of brushed drums, creating a classic, sophisticated
  lounge jazz feel.
duration: 76
keyscale: G major
language: unknown
timesignature: 2
</think>

<|im_end|>

Generating:   0%|                                          | 0/1 [00:01<?, ?steps/s, Prefill=4618tok/s, Decode=56tok/s]INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:01<?, ?steps/s, Prefill=4618tok/s, Decode=48tok/s]INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:03<?, ?steps/s, Prefill=4618tok/s, Decode=53tok/s]INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:03<?, ?steps/s, Prefill=4618tok/s, Decode=55tok/s]INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:05<?, ?steps/s, Prefill=4618tok/s, Decode=51tok/s]INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating:   0%|                                          | 0/1 [00:06<?, ?steps/s, Prefill=4618tok/s, Decode=51tok/s]INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
Generating: 100%|██████████████████████████████████| 1/1 [00:07<00:00,  7.32s/steps, Prefill=4618tok/s, Decode=50tok/s]
2026-02-06 00:28:21.735 | DEBUG    | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <|audio_code_43316|><|audio_code_12614|><|audio_code_16007|><|audio_code_57876|><|audio_code_744|><|audio_code_28664|><|audio_code_38975|><|audio_code_19458|><|audio_code_6679|><|audio_code_11980|><|audio_code_37098|><|audio_code_1655|><|audio_code_38250|><|audio_code_41018|><|audio_code_11154|><|audio_code_23904|><|audio_code_33333|><|audio_code_1233|><|audio_code_6649|><|audio_code_63340|><|audio_code_26243|><|audio_code_26426|><|audio_code_12779|><|audio_code_7701|><|audio_code_40704|><|audio_code_40746|><|audio_code_50047|><|audio_code_16487|><|audio_code_6153|><|audio_code_6906|><|audio_code_7427|><|audio_code_7482|><|audio_code_38395|><|audio_code_49949|><|audio_code_12718|><|audio_code_53522|><|audio_code_25028|><|audio_code_35584|><|audio_code_34323|><|audio_code_38779|><|audio_code_4160|><|audio_code_2821|><|audio_code_55358|><|audio_code_31251|><|audio_code_10101|><|audio_code_25532|><|audio_code_21696|><|audio_code_54064|><|audio_code_31762|><|audio_code_14900|><|audio_code_47124|><|audio_code_15419|><|audio_code_33722|><|audio_code_14292|><|audio_code_25536|><|audio_code_14788|><|audio_code_55859|><|audio_code_1167|><|audio_code_53292|><|audio_code_15110|><|audio_code_10439|><|audio_code_54087|><|audio_code_54127|><|audio_code_17884|><|audio_code_7335|><|audio_code_36883|><|audio_code_4797|><|audio_code_19047|><|audio_code_9597|><|audio_code_31864|><|audio_code_18990|><|audio_code_46287|><|audio_code_2823|><|audio_code_5359|><|audio_code_333|><|audio_code_38796|><|audio_code_53637|><|audio_code_13250|><|audio_code_53725|><|audio_code_2758|><|audio_code_17480|><|audio_code_48374|><|audio_code_40618|><|audio_code_48256|><|audio_code_35520|><|audio_code_20160|><|audio_code_17664|><|audio_code_19047|><|audio_code_20613|><|audio_code_27359|><|audio_code_53233|><|audio_code_51156|><|audio_code_13351|><|audio_code_20158|><|audio_code_63539|><|audio_code_31250|><|audio_code_4723|><|audio_code_58132|><|audio_code_11140|><|audio_code_18581|><|audio_code_8736|><|audio_code_32612|><|audio_code_23183|><|audio_code_6656|><|audio_code_20165|><|audio_code_11282|><|audio_code_18984|><|audio_code_19176|><|audio_code_6304|><|audio_code_52072|><|audio_code_3102|><|audio_code_17215|><|audio_code_13079|><|audio_code_12264|><|audio_code_2846|><|audio_code_32319|><|audio_code_59085|><|audio_code_47115|><|audio_code_12859|><|audio_code_60444|><|audio_code_16002|><|audio_code_28132|><|audio_code_53502|><|audio_code_7168|><|audio_code_61204|><|audio_code_38975|><|audio_code_58368|><|audio_code_2710|><|audio_code_16896|><|audio_code_19456|><|audio_code_6400|><|audio_code_4497|><|audio_code_39481|><|audio_code_45639|><|audio_code_23829|><|audio_code_50013|><|audio_code_28682|><|audio_code_51952|><|audio_code_52766|><|audio_code_62240|><|audio_code_15370|><|audio_code_25011|><|audio_code_33144|><|audio_code_44562|><|audio_code_40785|><|audio_code_63516|><|audio_code_41538|><|audio_code_2293|><|audio_code_17472|><|audio_code_22535|><|audio_code_18470|><|audio_code_58237|><|audio_code_19952|><|audio_code_16432|><|audio_code_26671|><|audio_code_45575|><|audio_code_10981|><|audio_code_17342|><|audio_code_61827|><|audio_code_7000|><|audio_code_27712|><|audio_code_7427|><|audio_code_54838|><|audio_code_19456|><|audio_code_22741|><|audio_code_59590|><|audio_code_51744|><|audio_code_244|><|audio_code_19976|><|audio_code_20096|><|audio_code_4473|><|audio_code_21022|><|audio_code_37222|><|audio_code_3112|><|audio_code_23963|><|audio_code_36772|><|audio_code_15418|><|audio_code_30416|><|audio_code_2944|><|audio_code_63238|><|audio_code_12802|><|audio_code_3195|><|audio_code_12251|><|audio_code_53558|><|audio_code_2063|><|audio_code_47123|><|audio_code_4798|><|audio_code_19055|><|audio_code_18117|><|audio_code_47160|><|audio_code_16896|><|audio_code_2756|><|audio_code_15826|><|audio_code_24554|><|audio_code_11226|><|audio_code_46968|><|audio_code_51887|><|audio_code_55484|><|audio_code_18463|><|audio_code_42300|><|audio_code_26608|><|audio_code_30770|><|audio_code_63531|><|audio_code_62555|><|audio_code_39498|><|audio_code_25922|><|audio_code_550|><|audio_code_40164|><|audio_code_63996|><|audio_code_28690|><|audio_code_19704|><|audio_code_5115|><|audio_code_40773|><|audio_code_27719|><|audio_code_58368|><|audio_code_6748|><|audio_code_49346|><|audio_code_19990|><|audio_code_1165|><|audio_code_23095|><|audio_code_17430|><|audio_code_25463|><|audio_code_22574|><|audio_code_50751|><|audio_code_52894|><|audio_code_25088|><|audio_code_5561|><|audio_code_60452|><|audio_code_15428|><|audio_code_16576|><|audio_code_13224|><|audio_code_15418|><|audio_code_60451|><|audio_code_28746|><|audio_code_63667|><|audio_code_18891|><|audio_code_1419|><|audio_code_50162|><|audio_code_7653|><|audio_code_59923|><|audio_code_27799|><|audio_code_44580|><|audio_code_2287|><|audio_code_14583|><|audio_code_37311|><|audio_code_32695|><|audio_code_55789|><|audio_code_57885|><|audio_code_13314|><|audio_code_16021|><|audio_code_32319|><|audio_code_11644|><|audio_code_42089|><|audio_code_45119|><|audio_code_51383|><|audio_code_26871|><|audio_code_14647|><|audio_code_27638|><|audio_code_22461|><|audio_code_27573|><|audio_code_11236|><|audio_code_2957|><|audio_code_5837|><|audio_code_21624|><|audio_code_56062|><|audio_code_14929|><|audio_code_20032|><|audio_code_7424|><|audio_code_6336|><|audio_code_6456|><|audio_code_45119|><|audio_code_25463|><|audio_code_14039|><|audio_code_1976|><|audio_code_5406|><|audio_code_29759|><|audio_code_59086|><|audio_code_47123|><|audio_code_12859|><|audio_code_60508|><|audio_code_15427|><|audio_code_23019|><|audio_code_63269|><|audio_code_3104|><|audio_code_63773|><|audio_code_38975|><|audio_code_18944|><|audio_code_2702|><|audio_code_3616|><|audio_code_3616|><|audio_code_52384|><|audio_code_62240|><|audio_code_36328|><|audio_code_13871|><|audio_code_7503|><|audio_code_12695|><|audio_code_63992|><|audio_code_26791|><|audio_code_58375|><|audio_code_63196|><|audio_code_15362|><|audio_code_63020|><|audio_code_31755|><|audio_code_2220|><|audio_code_61325|><|audio_code_2813|><|audio_code_19456|><|audio_code_5214|><|audio_code_4286|><|audio_code_21560|><|audio_code_18645|><|audio_code_21544|><|audio_code_19048|><|audio_code_24360|><|audio_code_63548|><|audio_code_55998|><|audio_code_18637|><|audio_code_6297|><|audio_code_58382|><|audio_code_25006|><|audio_code_60935|><|audio_code_614|><|audio_code_45374|><|audio_code_62356|><|audio_code_14547|><|audio_code_43066|><|audio_code_19975|><|audio_code_23268|><|audio_code_18709|><|audio_code_7232|><|audio_code_42047|><|audio_code_32318|><|audio_code_61461|><|audio_code_1255|><|audio_code_20412|><|audio_code_23531|><|audio_code_63035|><|audio_code_18453|><|audio_code_18566|><|audio_code_50470|><|audio_code_3104|><|audio_code_543|><|audio_code_15143|><|audio_code_61446|><|audio_code_14575|><|audio_code_45829|><|audio_code_8540|><|audio_code_15378|><|audio_code_36980|><|audio_code_10782|><|audio_code_23757|><|audio_code_24403|><|audio_code_32314|><|audio_code_61063|><|audio_code_23590|><|audio_code_17790|><|audio_code_35855|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_62151|><|im_end|>
2026-02-06 00:28:21.735 | INFO     | acestep.llm_inference:generate_with_stop_condition:1203 - Phase 2 completed in 7.33s. Generated 374 audio codes
2026-02-06 00:28:21.735 | INFO     | acestep.handler:generate_music:2782 - [generate_music] Starting generation...
2026-02-06 00:28:21.735 | INFO     | acestep.handler:generate_music:2785 - [generate_music] Preparing inputs...
2026-02-06 00:28:21.752 | INFO     | acestep.handler:_prepare_batch:1659 - [generate_music] Decoding audio codes for item 0...
2026-02-06 00:28:21.779 | INFO     | acestep.handler:_prepare_batch:1831 - [generate_music] Decoding audio codes for LM hints for item 0...
2026-02-06 00:28:21.783 | INFO     | acestep.handler:_prepare_batch:1888 -
======================================================================
2026-02-06 00:28:21.783 | INFO     | acestep.handler:_prepare_batch:1889 - 🔍 [DEBUG] DiT TEXT ENCODER INPUT (Inference)
2026-02-06 00:28:21.783 | INFO     | acestep.handler:_prepare_batch:1890 - ======================================================================
2026-02-06 00:28:21.784 | INFO     | acestep.handler:_prepare_batch:1891 - text_prompt:
# Instruction
Generate audio semantic tokens based on the given conditions:

# Caption
A brief, upbeat jazz trio performance featuring a bright acoustic piano playing a lively, swinging melodic phrase. It's supported by a walking upright bass line and the subtle, rhythmic shuffle of brushed drums, creating a classic, sophisticated lounge jazz feel.

# Metas
- bpm: 53
- timesignature: 2
- keyscale: G major
- duration: 76 seconds
<|endoftext|>

2026-02-06 00:28:21.784 | INFO     | acestep.handler:_prepare_batch:1892 - ======================================================================

2026-02-06 00:28:21.798 | INFO     | acestep.handler:preprocess_batch:2099 - [preprocess_batch] Inferring prompt embeddings...
2026-02-06 00:28:21.832 | INFO     | acestep.handler:preprocess_batch:2102 - [preprocess_batch] Inferring lyric embeddings...
2026-02-06 00:28:21.832 | INFO     | acestep.handler:service_generate:2335 - [service_generate] Generating audio...
Using precomputed LM hints
Using precomputed LM hints
INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
2026-02-06 00:28:22.920 | INFO     | acestep.handler:generate_music:2893 - [generate_music] Model generation completed. Decoding latents...
2026-02-06 00:28:22.928 | DEBUG    | acestep.handler:generate_music:2897 - [generate_music] pred_latents: torch.Size([1, 1870, 64]), dtype=torch.bfloat16 pred_latents.min()=tensor(-6.8438, device='cuda:0', dtype=torch.bfloat16), pred_latents.max()=tensor(4.7500, device='cuda:0', dtype=torch.bfloat16), pred_latents.mean()=tensor(-0.0330, device='cuda:0', dtype=torch.bfloat16) pred_latents.std()=tensor(1.0938, device='cuda:0', dtype=torch.bfloat16)
2026-02-06 00:28:22.929 | DEBUG    | acestep.handler:generate_music:2898 - [generate_music] time_costs: {'encoder_time_cost': 0.024573087692260742, 'diffusion_time_cost': 1.036590814590454, 'diffusion_per_step_time_cost': 0.12957385182380676, 'total_time_cost': 1.0611639022827148, 'offload_time_cost': 0.0}
2026-02-06 00:28:22.929 | INFO     | acestep.handler:generate_music:2901 - [generate_music] Decoding latents with VAE...
2026-02-06 00:28:22.931 | DEBUG    | acestep.handler:generate_music:2919 - [generate_music] Before VAE decode: allocated=18.35GB, max=19.64GB
2026-02-06 00:28:22.931 | INFO     | acestep.handler:generate_music:2922 - [generate_music] Using tiled VAE decode to reduce VRAM usage...
Decoding audio chunks: 100%|██████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  7.49steps/s]
2026-02-06 00:28:23.612 | DEBUG    | acestep.handler:generate_music:2929 - [generate_music] After VAE decode: allocated=18.35GB, max=19.64GB
2026-02-06 00:28:23.626 | INFO     | acestep.handler:generate_music:2946 - [generate_music] VAE decode completed. Preparing audio tensors...
2026-02-06 00:28:23.626 | INFO     | acestep.handler:generate_music:2961 - [generate_music] Done! Generated 1 audio tensors.
2026-02-06 00:28:24.237 | DEBUG    | acestep.audio_utils:save_audio:125 - [AudioSaver] Fallback soundfile Saved audio to C:\Users\admin\ACE-Step-1.5\.cache\acestep\tmp\api_audio\021365d2-b6a9-ae59-abfe-37968224c7be.mp3 (mp3, 48000Hz)
INFO:     127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53758 - "GET /v1/audio?path=C%3A%5CUsers%5Cadmin%5CACE-Step-1.5%5C.cache%5Cacestep%5Ctmp%5Capi_audio%5C021365d2-b6a9-ae59-abfe-37968224c7be.mp3 HTTP/1.1" 200 O


<!-- gh-comment-id:3856692539 --> @iChristGit commented on GitHub (Feb 5, 2026): ``` > ace-step-ui-server@1.0.0 dev > tsx watch src/index.ts Running SQLite database migrations... Migrations completed successfully! ACE-Step UI Server running on http://localhost:3001 Environment: development ACE-Step API: http://localhost:8001 LAN access: http://100.106.209.57:3001 LAN access: http://192.168.0.169:3001 LAN access: http://172.19.32.1:3001 Initializing local storage provider Job job_1770330447716_4njy6vk: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770330447716_4njy6vk: Using ACE-Step REST API { prompt: 'A Happy song about living in new york', duration: undefined } Job job_1770330447716_4njy6vk: Submitted to API as task e7cc7425-f042-4af9-a33a-cb8ba6cd784f Job job_1770330447716_4njy6vk: Completed via API with 1 audio files Job job_1770330457573_iundqnv: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770330457573_iundqnv: Using ACE-Step REST API { prompt: 'A Happy song about living in new york', duration: undefined } Job job_1770330457573_iundqnv: Submitted to API as task 9b44a46b-e0d1-42b3-a0d2-8049806ce91d Job job_1770330457573_iundqnv: Completed via API with 1 audio files Job job_1770330476354_ir11xpc: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770330476354_ir11xpc: Using ACE-Step REST API { prompt: 'A Happy song about living in new york', duration: undefined } Job job_1770330476354_ir11xpc: Submitted to API as task 8d4e64a7-82bf-448f-9d1d-52cd97aabd7c Job job_1770330476354_ir11xpc: Completed via API with 1 audio files Job job_1770330492353_73lugs5: Queued at position 1 [ACE-Step] API available at http://localhost:8001: true Job job_1770330492353_73lugs5: Using ACE-Step REST API { prompt: 'A Happy song about living in new york', duration: undefined } Job job_1770330492353_73lugs5: Submitted to API as task 63414233-e262-4937-8cfd-dd14cdc42206 Job job_1770330492353_73lugs5: Completed via API with 1 audio files [Format] Running: ..\ACE-Step-1.5\.venv\Scripts\python.exe C:\Users\admin\ace-step-ui\server\scripts\format_sample.py --caption Electronic, Hip Hop --json --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-0.6B --lm-backend pt [Format] CWD: ..\ACE-Step-1.5 [Format] Spawn error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Python error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Process exited with code -4058 Uninstalled 1 package in 8ms Installed 1 package in 41ms Skipping import of cpp extensions due to incompatible torch version 2.7.1+cu128 for torchao version 0.15.0 Please see https://github.com/pytorch/ao/issues/2919 for more info W0206 00:25:18.778000 31080 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs. INFO: Started server process [31080] INFO: Waiting for application startup. [API Server] Initializing models at startup... ============================================================ [API Server] GPU Configuration Detected: ============================================================ GPU Memory: 23.99 GB Configuration Tier: tier6 Max Duration (with LM): 480s Max Duration (without LM): 480s Max Batch Size (with LM): 4 Max Batch Size (without LM): 8 Default LM Init: True Available LM Models: ['acestep-5Hz-lm-0.6B', 'acestep-5Hz-lm-1.7B', 'acestep-5Hz-lm-4B'] ============================================================ [API Server] CPU offload disabled by default (GPU >= 16GB) [Model Download] Model acestep-v15-turbo already exists at C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-v15-turbo [Model Download] Model vae already exists at C:\Users\admin\ACE-Step-1.5\checkpoints\vae [API Server] Loading primary DiT model: acestep-v15-turbo 2026-02-06 00:25:22.068 | INFO | acestep.handler:initialize_service:399 - [initialize_service] Attempting to load model with attention implementation: flash_attention_2 [API Server] Primary model loaded: acestep-v15-turbo [API Server] GPU auto-detection: init_llm=True (VRAM: 24.0GB, tier: tier6) [API Server] ACESTEP_INIT_LLM=auto, using GPU auto-detection result [API Server] Loading LLM model... [API Server] Auto-selected LM model: acestep-5Hz-lm-4B based on GPU tier [Model Download] Model acestep-5Hz-lm-4B already exists at C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B 2026-02-06 00:25:27.341 | INFO | acestep.llm_inference:initialize:361 - loading 5Hz LM tokenizer... it may take 80~90s 2026-02-06 00:25:47.566 | INFO | acestep.llm_inference:initialize:365 - 5Hz LM tokenizer loaded successfully in 20.22 seconds 2026-02-06 00:25:47.566 | INFO | acestep.llm_inference:initialize:370 - Initializing constrained decoding processor... 2026-02-06 00:25:47.567 | INFO | acestep.llm_inference:initialize:376 - Setting constrained decoding max_duration to 480s based on GPU config (tier: tier6) 2026-02-06 00:25:49.401 | WARNING | acestep.constrained_logits_processor:_precompute_audio_code_tokens:556 - Found 1535 audio code tokens with values outside valid range [0, 63999] 2026-02-06 00:25:53.428 | INFO | acestep.llm_inference:initialize:384 - Constrained processor initialized in 5.86 seconds 2026-02-06 00:25:54.282 | INFO | acestep.llm_inference:get_gpu_memory_utilization:102 - Adaptive LM memory allocation: model=C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B, target=12.0GB, ratio=0.500, total_gpu=24.0GB 2026-02-06 00:25:54.282 | INFO | acestep.llm_inference:_initialize_5hz_lm_vllm:444 - Initializing 5Hz LM with model: C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B, enforce_eager: False, tensor_parallel_size: 1, max_model_len: 4096, gpu_memory_utilization: 0.500 [debug]dist_port: 2333 [W206 00:25:54.000000000 socket.cpp:755] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:2333 (system error: 10049 - The requested address is not valid in its context.). `torch_dtype` is deprecated! Use `dtype` instead! [nanovllm] KV cache allocated: 125 blocks × 256 tokens = 32000 tokens capacity, 4.39 GB (free: 8.86 GB, used: 13.77 GB, target: 12.00 GB, block: 36.00 MB) 2026-02-06 00:26:23.723 | INFO | acestep.llm_inference:_initialize_5hz_lm_vllm:454 - 5Hz LM initialized successfully in 29.44 seconds 2026-02-06 00:26:23.724 | INFO | acestep.llm_inference:initialize:390 - 5Hz LM status message: ✅ 5Hz LM initialized successfully Model: C:\Users\admin\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B Device: NVIDIA GeForce RTX 3090 Ti GPU Memory Utilization: 0.500 Low GPU Memory Mode: False [API Server] LLM model loaded: acestep-5Hz-lm-4B [API Server] All models initialized successfully! INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit) INFO: 127.0.0.1:53673 - "GET /health HTTP/1.1" 200 OK INFO: 127.0.0.1:53675 - "POST /release_task HTTP/1.1" 200 OK 2026-02-06 00:27:27.760 | INFO | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=False, use_cot_caption=False, use_cot_language=False, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True 2026-02-06 00:27:27.760 | INFO | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=dit) (size: 1, seeds: [3658063042]) 2026-02-06 00:27:27.760 | INFO | acestep.llm_inference:generate_with_stop_condition:964 - Phase 1: Generating CoT metadata... 2026-02-06 00:27:27.781 | INFO | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system # Instruction Generate audio semantic tokens based on the given conditions: <|im_end|> <|im_start|>user # Caption A Happy song about living in new york # Lyric <|im_end|> <|im_start|>assistant INFO: 127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK Generating: 100%|███████████████████████████████████| 1/1 [00:01<00:00, 1.32s/steps, Prefill=110tok/s, Decode=40tok/s] 2026-02-06 00:27:29.103 | DEBUG | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <think> bpm: 58 duration: 24 keyscale: C major timesignature: 2 <|im_end|> 2026-02-06 00:27:29.103 | INFO | acestep.llm_inference:generate_with_stop_condition:1012 - Phase 1 completed in 1.34s. Generated metadata: ['bpm', 'duration', 'keyscale', 'timesignature'] 2026-02-06 00:27:29.103 | INFO | acestep.handler:generate_music:2782 - [generate_music] Starting generation... 2026-02-06 00:27:29.105 | INFO | acestep.handler:generate_music:2785 - [generate_music] Preparing inputs... 2026-02-06 00:27:29.142 | INFO | acestep.handler:_prepare_batch:1888 - ====================================================================== 2026-02-06 00:27:29.144 | INFO | acestep.handler:_prepare_batch:1889 - 🔍 [DEBUG] DiT TEXT ENCODER INPUT (Inference) 2026-02-06 00:27:29.144 | INFO | acestep.handler:_prepare_batch:1890 - ====================================================================== 2026-02-06 00:27:29.144 | INFO | acestep.handler:_prepare_batch:1891 - text_prompt: # Instruction Fill the audio semantic mask based on the given conditions: # Caption A Happy song about living in new york # Metas - bpm: 58 - timesignature: 2 - keyscale: C major - duration: 24 seconds <|endoftext|> 2026-02-06 00:27:29.144 | INFO | acestep.handler:_prepare_batch:1892 - ====================================================================== 2026-02-06 00:27:29.158 | INFO | acestep.handler:preprocess_batch:2099 - [preprocess_batch] Inferring prompt embeddings... 2026-02-06 00:27:29.340 | INFO | acestep.handler:preprocess_batch:2102 - [preprocess_batch] Inferring lyric embeddings... 2026-02-06 00:27:29.341 | INFO | acestep.handler:service_generate:2335 - [service_generate] Generating audio... INFO: 127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK 2026-02-06 00:27:30.469 | INFO | acestep.handler:generate_music:2893 - [generate_music] Model generation completed. Decoding latents... 2026-02-06 00:27:30.482 | DEBUG | acestep.handler:generate_music:2897 - [generate_music] pred_latents: torch.Size([1, 600, 64]), dtype=torch.bfloat16 pred_latents.min()=tensor(-5.5938, device='cuda:0', dtype=torch.bfloat16), pred_latents.max()=tensor(3.8594, device='cuda:0', dtype=torch.bfloat16), pred_latents.mean()=tensor(-0.1455, device='cuda:0', dtype=torch.bfloat16) pred_latents.std()=tensor(1.0391, device='cuda:0', dtype=torch.bfloat16) 2026-02-06 00:27:30.484 | DEBUG | acestep.handler:generate_music:2898 - [generate_music] time_costs: {'encoder_time_cost': 0.0553889274597168, 'diffusion_time_cost': 0.9490585327148438, 'diffusion_per_step_time_cost': 0.11863231658935547, 'total_time_cost': 1.0044474601745605, 'offload_time_cost': 0.0} 2026-02-06 00:27:30.484 | INFO | acestep.handler:generate_music:2901 - [generate_music] Decoding latents with VAE... 2026-02-06 00:27:30.488 | DEBUG | acestep.handler:generate_music:2919 - [generate_music] Before VAE decode: allocated=18.19GB, max=18.23GB 2026-02-06 00:27:30.489 | INFO | acestep.handler:generate_music:2922 - [generate_music] Using tiled VAE decode to reduce VRAM usage... Decoding audio chunks: 100%|██████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 8.53steps/s] 2026-02-06 00:27:30.794 | DEBUG | acestep.handler:generate_music:2929 - [generate_music] After VAE decode: allocated=18.35GB, max=19.48GB 2026-02-06 00:27:30.805 | INFO | acestep.handler:generate_music:2946 - [generate_music] VAE decode completed. Preparing audio tensors... 2026-02-06 00:27:30.805 | INFO | acestep.handler:generate_music:2961 - [generate_music] Done! Generated 1 audio tensors. 2026-02-06 00:27:31.077 | DEBUG | acestep.audio_utils:save_audio:125 - [AudioSaver] Fallback soundfile Saved audio to C:\Users\admin\ACE-Step-1.5\.cache\acestep\tmp\api_audio\97c51633-ee43-cb55-c551-7f118c9e8d03.mp3 (mp3, 48000Hz) INFO: 127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53673 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53687 - "GET /v1/audio?path=C%3A%5CUsers%5Cadmin%5CACE-Step-1.5%5C.cache%5Cacestep%5Ctmp%5Capi_audio%5C97c51633-ee43-cb55-c551-7f118c9e8d03.mp3 HTTP/1.1" 200 OK INFO: 127.0.0.1:53695 - "GET /health HTTP/1.1" 200 OK INFO: 127.0.0.1:53697 - "POST /release_task HTTP/1.1" 200 OK 2026-02-06 00:27:37.583 | INFO | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=False, use_cot_caption=False, use_cot_language=False, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True 2026-02-06 00:27:37.583 | INFO | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=dit) (size: 1, seeds: [468283798]) 2026-02-06 00:27:37.583 | INFO | acestep.llm_inference:generate_with_stop_condition:964 - Phase 1: Generating CoT metadata... 2026-02-06 00:27:37.586 | INFO | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system # Instruction Generate audio semantic tokens based on the given conditions: <|im_end|> <|im_start|>user # Caption A Happy song about living in new york # Lyric <|im_end|> <|im_start|>assistant Generating: 0%| | 0/1 [00:00<?, ?steps/s]INFO: 127.0.0.1:53695 - "POST /query_result HTTP/1.1" 200 OK Generating: 100%|██████████████████████████████████| 1/1 [00:00<00:00, 1.55steps/s, Prefill=1368tok/s, Decode=47tok/s] 2026-02-06 00:27:38.232 | DEBUG | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <think> bpm: 58 duration: 32 keyscale: B♭ major timesignature: 2 <|im_end|> 2026-02-06 00:27:38.232 | INFO | acestep.llm_inference:generate_with_stop_condition:1012 - Phase 1 completed in 0.65s. Generated metadata: ['bpm', 'duration', 'keyscale', 'timesignature'] 2026-02-06 00:27:38.232 | INFO | acestep.handler:generate_music:2782 - [generate_music] Starting generation... 2026-02-06 00:27:38.232 | INFO | acestep.handler:generate_music:2785 - [generate_music] Preparing inputs... 2026-02-06 00:27:38.243 | INFO | acestep.handler:_prepare_batch:1888 - ====================================================================== 2026-02-06 00:27:38.243 | INFO | acestep.handler:_prepare_batch:1889 - 🔍 [DEBUG] DiT TEXT ENCODER INPUT (Inference) 2026-02-06 00:27:38.243 | INFO | acestep.handler:_prepare_batch:1890 - ====================================================================== 2026-02-06 00:27:38.243 | INFO | acestep.handler:_prepare_batch:1891 - text_prompt: # Instruction Fill the audio semantic mask based on the given conditions: # Caption A Happy song about living in new york # Metas - bpm: 58 - timesignature: 2 - keyscale: B♭ major - duration: 32 seconds <|endoftext|> 2026-02-06 00:27:38.243 | INFO | acestep.handler:_prepare_batch:1892 - ====================================================================== 2026-02-06 00:27:38.251 | INFO | acestep.handler:preprocess_batch:2099 - [preprocess_batch] Inferring prompt embeddings... 2026-02-06 00:27:38.285 | INFO | acestep.handler:preprocess_batch:2102 - [preprocess_batch] Inferring lyric embeddings... 2026-02-06 00:27:38.285 | INFO | acestep.handler:service_generate:2335 - [service_generate] Generating audio... 2026-02-06 00:27:38.960 | INFO | acestep.handler:generate_music:2893 - [generate_music] Model generation completed. Decoding latents... 2026-02-06 00:27:38.964 | DEBUG | acestep.handler:generate_music:2897 - [generate_music] pred_latents: torch.Size([1, 800, 64]), dtype=torch.bfloat16 pred_latents.min()=tensor(-6.7812, device='cuda:0', dtype=torch.bfloat16), pred_latents.max()=tensor(4.2500, device='cuda:0', dtype=torch.bfloat16), pred_latents.mean()=tensor(-0.1040, device='cuda:0', dtype=torch.bfloat16) pred_latents.std()=tensor(1.0625, device='cuda:0', dtype=torch.bfloat16) 2026-02-06 00:27:38.964 | DEBUG | acestep.handler:generate_music:2898 - [generate_music] time_costs: {'encoder_time_cost': 0.039624691009521484, 'diffusion_time_cost': 0.6022460460662842, 'diffusion_per_step_time_cost': 0.07528075575828552, 'total_time_cost': 0.6418707370758057, 'offload_time_cost': 0.0} 2026-02-06 00:27:38.964 | INFO | acestep.handler:generate_music:2901 - [generate_music] Decoding latents with VAE... 2026-02-06 00:27:38.968 | DEBUG | acestep.handler:generate_music:2919 - [generate_music] Before VAE decode: allocated=18.35GB, max=19.48GB 2026-02-06 00:27:38.968 | INFO | acestep.handler:generate_music:2922 - [generate_music] Using tiled VAE decode to reduce VRAM usage... Decoding audio chunks: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 9.24steps/s] 2026-02-06 00:27:39.340 | DEBUG | acestep.handler:generate_music:2929 - [generate_music] After VAE decode: allocated=18.35GB, max=19.56GB 2026-02-06 00:27:39.354 | INFO | acestep.handler:generate_music:2946 - [generate_music] VAE decode completed. Preparing audio tensors... 2026-02-06 00:27:39.354 | INFO | acestep.handler:generate_music:2961 - [generate_music] Done! Generated 1 audio tensors. 2026-02-06 00:27:39.573 | DEBUG | acestep.audio_utils:save_audio:125 - [AudioSaver] Fallback soundfile Saved audio to C:\Users\admin\ACE-Step-1.5\.cache\acestep\tmp\api_audio\5f8d6115-efa9-f5d5-2300-920dea7b33f0.mp3 (mp3, 48000Hz) INFO: 127.0.0.1:53695 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53697 - "GET /v1/audio?path=C%3A%5CUsers%5Cadmin%5CACE-Step-1.5%5C.cache%5Cacestep%5Ctmp%5Capi_audio%5C5f8d6115-efa9-f5d5-2300-920dea7b33f0.mp3 HTTP/1.1" 200 OK INFO: 127.0.0.1:53695 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53713 - "GET /health HTTP/1.1" 200 OK INFO: 127.0.0.1:53715 - "POST /release_task HTTP/1.1" 200 OK 2026-02-06 00:27:56.361 | INFO | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=True, use_cot_caption=True, use_cot_language=True, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True 2026-02-06 00:27:56.361 | INFO | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=llm_dit) (size: 1, seeds: [332563508]) 2026-02-06 00:27:56.361 | INFO | acestep.llm_inference:generate_with_stop_condition:964 - Phase 1: Generating CoT metadata... 2026-02-06 00:27:56.362 | INFO | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system # Instruction Generate audio semantic tokens based on the given conditions: <|im_end|> <|im_start|>user # Caption A Happy song about living in new york # Lyric <|im_end|> <|im_start|>assistant INFO: 127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:02<?, ?steps/s, Prefill=645tok/s, Decode=54tok/s]INFO: 127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:02<?, ?steps/s, Prefill=645tok/s, Decode=45tok/s]INFO: 127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK Generating: 100%|███████████████████████████████████| 1/1 [00:02<00:00, 2.78s/steps, Prefill=645tok/s, Decode=40tok/s] 2026-02-06 00:27:59.142 | DEBUG | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <think> bpm: 58 caption: A cheerful and lighthearted ukulele-led instrumental. The ukulele plays a bright, catchy melody over a simple, steady chord progression. It's supported by a clean, straightforward drum machine beat and a subtle bass line that follows the root notes. The overall production is clean and simple, creating an upbeat, happy, and carefree mood perfect for children's content or positive background music. duration: 35 keyscale: D major language: unknown timesignature: 2 <|im_end|> 2026-02-06 00:27:59.142 | INFO | acestep.llm_inference:generate_with_stop_condition:1012 - Phase 1 completed in 2.78s. Generated metadata: ['bpm', 'caption', 'duration', 'keyscale', 'language', 'timesignature'] 2026-02-06 00:27:59.142 | INFO | acestep.llm_inference:generate_with_stop_condition:1055 - Phase 2: Generating audio codes... 2026-02-06 00:27:59.144 | INFO | acestep.llm_inference:generate_with_stop_condition:1063 - generate_with_stop_condition: formatted_prompt_with_cot=<|im_start|>system # Instruction Generate audio semantic tokens based on the given conditions: <|im_end|> <|im_start|>user # Caption A Happy song about living in new york # Lyric <|im_end|> <|im_start|>assistant <think> bpm: 58 caption: A cheerful and lighthearted ukulele-led instrumental. The ukulele plays a bright, catchy melody over a simple, steady chord progression. It's supported by a clean, straightforward drum machine beat and a subtle bass line that follows the root notes. The overall production is clean and simple, creating an upbeat, happy, and carefree mood perfect for children's content or positive background music. duration: 35 keyscale: D major language: unknown timesignature: 2 </think> <|im_end|> Generating: 0%| | 0/1 [00:01<?, ?steps/s, Prefill=3903tok/s, Decode=56tok/s]INFO: 127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:01<?, ?steps/s, Prefill=3903tok/s, Decode=47tok/s]INFO: 127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:03<?, ?steps/s, Prefill=3903tok/s, Decode=44tok/s]INFO: 127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:03<?, ?steps/s, Prefill=3903tok/s, Decode=45tok/s]INFO: 127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK Generating: 100%|██████████████████████████████████| 1/1 [00:03<00:00, 3.54s/steps, Prefill=3903tok/s, Decode=50tok/s] 2026-02-06 00:28:02.690 | DEBUG | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <|audio_code_43316|><|audio_code_10783|><|audio_code_38708|><|audio_code_58280|><|audio_code_45529|><|audio_code_10053|><|audio_code_18575|><|audio_code_16075|><|audio_code_58518|><|audio_code_39086|><|audio_code_37378|><|audio_code_37426|><|audio_code_61195|><|audio_code_53653|><|audio_code_39219|><|audio_code_40277|><|audio_code_29451|><|audio_code_55805|><|audio_code_12237|><|audio_code_33730|><|audio_code_28875|><|audio_code_3104|><|audio_code_38424|><|audio_code_28668|><|audio_code_2391|><|audio_code_11748|><|audio_code_4871|><|audio_code_31375|><|audio_code_28992|><|audio_code_37896|><|audio_code_35580|><|audio_code_23663|><|audio_code_58374|><|audio_code_26772|><|audio_code_51501|><|audio_code_53237|><|audio_code_10173|><|audio_code_58851|><|audio_code_55637|><|audio_code_17348|><|audio_code_8700|><|audio_code_821|><|audio_code_48799|><|audio_code_21380|><|audio_code_63945|><|audio_code_30040|><|audio_code_45921|><|audio_code_12664|><|audio_code_46040|><|audio_code_8117|><|audio_code_49311|><|audio_code_4868|><|audio_code_56124|><|audio_code_40828|><|audio_code_52173|><|audio_code_27507|><|audio_code_4759|><|audio_code_13871|><|audio_code_60934|><|audio_code_26373|><|audio_code_30021|><|audio_code_19965|><|audio_code_59673|><|audio_code_40464|><|audio_code_32520|><|audio_code_1815|><|audio_code_23419|><|audio_code_33272|><|audio_code_7992|><|audio_code_61274|><|audio_code_22675|><|audio_code_49354|><|audio_code_2303|><|audio_code_18567|><|audio_code_17302|><|audio_code_14221|><|audio_code_31717|><|audio_code_50973|><|audio_code_51181|><|audio_code_40956|><|audio_code_51133|><|audio_code_49098|><|audio_code_53606|><|audio_code_45118|><|audio_code_33503|><|audio_code_47365|><|audio_code_59256|><|audio_code_11762|><|audio_code_22791|><|audio_code_44775|><|audio_code_189|><|audio_code_24167|><|audio_code_22789|><|audio_code_34|><|audio_code_2603|><|audio_code_2326|><|audio_code_20459|><|audio_code_19907|><|audio_code_13619|><|audio_code_10555|><|audio_code_63917|><|audio_code_27588|><|audio_code_62953|><|audio_code_10613|><|audio_code_5168|><|audio_code_49480|><|audio_code_50138|><|audio_code_15846|><|audio_code_39232|><|audio_code_63783|><|audio_code_28999|><|audio_code_25528|><|audio_code_11832|><|audio_code_53383|><|audio_code_14382|><|audio_code_46084|><|audio_code_36852|><|audio_code_61188|><|audio_code_12719|><|audio_code_12196|><|audio_code_42908|><|audio_code_39188|><|audio_code_33664|><|audio_code_5620|><|audio_code_38254|><|audio_code_9407|><|audio_code_40866|><|audio_code_45456|><|audio_code_26376|><|audio_code_44520|><|audio_code_28408|><|audio_code_35819|><|audio_code_5486|><|audio_code_41623|><|audio_code_37091|><|audio_code_21275|><|audio_code_18387|><|audio_code_14325|><|audio_code_18842|><|audio_code_45759|><|audio_code_13351|><|audio_code_58950|><|audio_code_40916|><|audio_code_24042|><|audio_code_430|><|audio_code_11833|><|audio_code_5688|><|audio_code_17856|><|audio_code_52878|><|audio_code_7666|><|audio_code_18534|><|audio_code_45575|><|audio_code_38676|><|audio_code_21140|><|audio_code_21196|><|audio_code_50495|><|audio_code_4799|><|audio_code_16125|><|audio_code_43516|><|audio_code_35855|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_48839|><|im_end|> 2026-02-06 00:28:02.690 | INFO | acestep.llm_inference:generate_with_stop_condition:1203 - Phase 2 completed in 3.55s. Generated 171 audio codes 2026-02-06 00:28:02.691 | INFO | acestep.handler:generate_music:2782 - [generate_music] Starting generation... 2026-02-06 00:28:02.691 | INFO | acestep.handler:generate_music:2785 - [generate_music] Preparing inputs... 2026-02-06 00:28:02.700 | INFO | acestep.handler:_prepare_batch:1659 - [generate_music] Decoding audio codes for item 0... 2026-02-06 00:28:02.813 | INFO | acestep.handler:_prepare_batch:1831 - [generate_music] Decoding audio codes for LM hints for item 0... 2026-02-06 00:28:02.817 | INFO | acestep.handler:_prepare_batch:1888 - ====================================================================== 2026-02-06 00:28:02.817 | INFO | acestep.handler:_prepare_batch:1889 - 🔍 [DEBUG] DiT TEXT ENCODER INPUT (Inference) 2026-02-06 00:28:02.817 | INFO | acestep.handler:_prepare_batch:1890 - ====================================================================== 2026-02-06 00:28:02.817 | INFO | acestep.handler:_prepare_batch:1891 - text_prompt: # Instruction Generate audio semantic tokens based on the given conditions: # Caption A cheerful and lighthearted ukulele-led instrumental. The ukulele plays a bright, catchy melody over a simple, steady chord progression. It's supported by a clean, straightforward drum machine beat and a subtle bass line that follows the root notes. The overall production is clean and simple, creating an upbeat, happy, and carefree mood perfect for children's content or positive background music. # Metas - bpm: 58 - timesignature: 2 - keyscale: D major - duration: 35 seconds <|endoftext|> 2026-02-06 00:28:02.817 | INFO | acestep.handler:_prepare_batch:1892 - ====================================================================== 2026-02-06 00:28:02.824 | INFO | acestep.handler:preprocess_batch:2099 - [preprocess_batch] Inferring prompt embeddings... 2026-02-06 00:28:02.861 | INFO | acestep.handler:preprocess_batch:2102 - [preprocess_batch] Inferring lyric embeddings... 2026-02-06 00:28:02.861 | INFO | acestep.handler:service_generate:2335 - [service_generate] Generating audio... Using precomputed LM hints Using precomputed LM hints 2026-02-06 00:28:03.500 | INFO | acestep.handler:generate_music:2893 - [generate_music] Model generation completed. Decoding latents... 2026-02-06 00:28:03.505 | DEBUG | acestep.handler:generate_music:2897 - [generate_music] pred_latents: torch.Size([1, 855, 64]), dtype=torch.bfloat16 pred_latents.min()=tensor(-7.1250, device='cuda:0', dtype=torch.bfloat16), pred_latents.max()=tensor(4.8750, device='cuda:0', dtype=torch.bfloat16), pred_latents.mean()=tensor(-0.1079, device='cuda:0', dtype=torch.bfloat16) pred_latents.std()=tensor(1.1016, device='cuda:0', dtype=torch.bfloat16) 2026-02-06 00:28:03.505 | DEBUG | acestep.handler:generate_music:2898 - [generate_music] time_costs: {'encoder_time_cost': 0.026651620864868164, 'diffusion_time_cost': 0.5836784839630127, 'diffusion_per_step_time_cost': 0.07295981049537659, 'total_time_cost': 0.6103301048278809, 'offload_time_cost': 0.0} 2026-02-06 00:28:03.505 | INFO | acestep.handler:generate_music:2901 - [generate_music] Decoding latents with VAE... 2026-02-06 00:28:03.507 | DEBUG | acestep.handler:generate_music:2919 - [generate_music] Before VAE decode: allocated=18.35GB, max=19.56GB 2026-02-06 00:28:03.507 | INFO | acestep.handler:generate_music:2922 - [generate_music] Using tiled VAE decode to reduce VRAM usage... Decoding audio chunks: 100%|██████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 8.41steps/s] 2026-02-06 00:28:03.902 | DEBUG | acestep.handler:generate_music:2929 - [generate_music] After VAE decode: allocated=18.35GB, max=19.64GB 2026-02-06 00:28:03.914 | INFO | acestep.handler:generate_music:2946 - [generate_music] VAE decode completed. Preparing audio tensors... 2026-02-06 00:28:03.914 | INFO | acestep.handler:generate_music:2961 - [generate_music] Done! Generated 1 audio tensors. 2026-02-06 00:28:04.188 | DEBUG | acestep.audio_utils:save_audio:125 - [AudioSaver] Fallback soundfile Saved audio to C:\Users\admin\ACE-Step-1.5\.cache\acestep\tmp\api_audio\33a9ab72-1593-e2ee-ee2c-998fe63107a4.mp3 (mp3, 48000Hz) INFO: 127.0.0.1:53713 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53729 - "GET /v1/audio?path=C%3A%5CUsers%5Cadmin%5CACE-Step-1.5%5C.cache%5Cacestep%5Ctmp%5Capi_audio%5C33a9ab72-1593-e2ee-ee2c-998fe63107a4.mp3 HTTP/1.1" 200 OK INFO: 127.0.0.1:53742 - "GET /health HTTP/1.1" 200 OK INFO: 127.0.0.1:53744 - "POST /release_task HTTP/1.1" 200 OK 2026-02-06 00:28:12.360 | INFO | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=True, use_cot_caption=True, use_cot_language=True, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True 2026-02-06 00:28:12.360 | INFO | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=llm_dit) (size: 1, seeds: [3883759793]) 2026-02-06 00:28:12.360 | INFO | acestep.llm_inference:generate_with_stop_condition:964 - Phase 1: Generating CoT metadata... 2026-02-06 00:28:12.362 | INFO | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system # Instruction Generate audio semantic tokens based on the given conditions: <|im_end|> <|im_start|>user # Caption A Happy song about living in new york # Lyric <|im_end|> <|im_start|>assistant INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:01<?, ?steps/s, Prefill=1024tok/s, Decode=53tok/s]INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:02<?, ?steps/s, Prefill=1024tok/s, Decode=44tok/s]INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 100%|██████████████████████████████████| 1/1 [00:02<00:00, 2.04s/steps, Prefill=1024tok/s, Decode=48tok/s] 2026-02-06 00:28:14.408 | DEBUG | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <think> bpm: 53 caption: A brief, upbeat jazz trio performance featuring a bright acoustic piano playing a lively, swinging melodic phrase. It's supported by a walking upright bass line and the subtle, rhythmic shuffle of brushed drums, creating a classic, sophisticated lounge jazz feel. duration: 76 keyscale: G major language: unknown timesignature: 2 <|im_end|> 2026-02-06 00:28:14.408 | INFO | acestep.llm_inference:generate_with_stop_condition:1012 - Phase 1 completed in 2.05s. Generated metadata: ['bpm', 'caption', 'duration', 'keyscale', 'language', 'timesignature'] 2026-02-06 00:28:14.409 | INFO | acestep.llm_inference:generate_with_stop_condition:1055 - Phase 2: Generating audio codes... 2026-02-06 00:28:14.410 | INFO | acestep.llm_inference:generate_with_stop_condition:1063 - generate_with_stop_condition: formatted_prompt_with_cot=<|im_start|>system # Instruction Generate audio semantic tokens based on the given conditions: <|im_end|> <|im_start|>user # Caption A Happy song about living in new york # Lyric <|im_end|> <|im_start|>assistant <think> bpm: 53 caption: A brief, upbeat jazz trio performance featuring a bright acoustic piano playing a lively, swinging melodic phrase. It's supported by a walking upright bass line and the subtle, rhythmic shuffle of brushed drums, creating a classic, sophisticated lounge jazz feel. duration: 76 keyscale: G major language: unknown timesignature: 2 </think> <|im_end|> Generating: 0%| | 0/1 [00:01<?, ?steps/s, Prefill=4618tok/s, Decode=56tok/s]INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:01<?, ?steps/s, Prefill=4618tok/s, Decode=48tok/s]INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:03<?, ?steps/s, Prefill=4618tok/s, Decode=53tok/s]INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:03<?, ?steps/s, Prefill=4618tok/s, Decode=55tok/s]INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:05<?, ?steps/s, Prefill=4618tok/s, Decode=51tok/s]INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 0%| | 0/1 [00:06<?, ?steps/s, Prefill=4618tok/s, Decode=51tok/s]INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK Generating: 100%|██████████████████████████████████| 1/1 [00:07<00:00, 7.32s/steps, Prefill=4618tok/s, Decode=50tok/s] 2026-02-06 00:28:21.735 | DEBUG | acestep.llm_inference:parse_lm_output:2257 - Debug output text: <|audio_code_43316|><|audio_code_12614|><|audio_code_16007|><|audio_code_57876|><|audio_code_744|><|audio_code_28664|><|audio_code_38975|><|audio_code_19458|><|audio_code_6679|><|audio_code_11980|><|audio_code_37098|><|audio_code_1655|><|audio_code_38250|><|audio_code_41018|><|audio_code_11154|><|audio_code_23904|><|audio_code_33333|><|audio_code_1233|><|audio_code_6649|><|audio_code_63340|><|audio_code_26243|><|audio_code_26426|><|audio_code_12779|><|audio_code_7701|><|audio_code_40704|><|audio_code_40746|><|audio_code_50047|><|audio_code_16487|><|audio_code_6153|><|audio_code_6906|><|audio_code_7427|><|audio_code_7482|><|audio_code_38395|><|audio_code_49949|><|audio_code_12718|><|audio_code_53522|><|audio_code_25028|><|audio_code_35584|><|audio_code_34323|><|audio_code_38779|><|audio_code_4160|><|audio_code_2821|><|audio_code_55358|><|audio_code_31251|><|audio_code_10101|><|audio_code_25532|><|audio_code_21696|><|audio_code_54064|><|audio_code_31762|><|audio_code_14900|><|audio_code_47124|><|audio_code_15419|><|audio_code_33722|><|audio_code_14292|><|audio_code_25536|><|audio_code_14788|><|audio_code_55859|><|audio_code_1167|><|audio_code_53292|><|audio_code_15110|><|audio_code_10439|><|audio_code_54087|><|audio_code_54127|><|audio_code_17884|><|audio_code_7335|><|audio_code_36883|><|audio_code_4797|><|audio_code_19047|><|audio_code_9597|><|audio_code_31864|><|audio_code_18990|><|audio_code_46287|><|audio_code_2823|><|audio_code_5359|><|audio_code_333|><|audio_code_38796|><|audio_code_53637|><|audio_code_13250|><|audio_code_53725|><|audio_code_2758|><|audio_code_17480|><|audio_code_48374|><|audio_code_40618|><|audio_code_48256|><|audio_code_35520|><|audio_code_20160|><|audio_code_17664|><|audio_code_19047|><|audio_code_20613|><|audio_code_27359|><|audio_code_53233|><|audio_code_51156|><|audio_code_13351|><|audio_code_20158|><|audio_code_63539|><|audio_code_31250|><|audio_code_4723|><|audio_code_58132|><|audio_code_11140|><|audio_code_18581|><|audio_code_8736|><|audio_code_32612|><|audio_code_23183|><|audio_code_6656|><|audio_code_20165|><|audio_code_11282|><|audio_code_18984|><|audio_code_19176|><|audio_code_6304|><|audio_code_52072|><|audio_code_3102|><|audio_code_17215|><|audio_code_13079|><|audio_code_12264|><|audio_code_2846|><|audio_code_32319|><|audio_code_59085|><|audio_code_47115|><|audio_code_12859|><|audio_code_60444|><|audio_code_16002|><|audio_code_28132|><|audio_code_53502|><|audio_code_7168|><|audio_code_61204|><|audio_code_38975|><|audio_code_58368|><|audio_code_2710|><|audio_code_16896|><|audio_code_19456|><|audio_code_6400|><|audio_code_4497|><|audio_code_39481|><|audio_code_45639|><|audio_code_23829|><|audio_code_50013|><|audio_code_28682|><|audio_code_51952|><|audio_code_52766|><|audio_code_62240|><|audio_code_15370|><|audio_code_25011|><|audio_code_33144|><|audio_code_44562|><|audio_code_40785|><|audio_code_63516|><|audio_code_41538|><|audio_code_2293|><|audio_code_17472|><|audio_code_22535|><|audio_code_18470|><|audio_code_58237|><|audio_code_19952|><|audio_code_16432|><|audio_code_26671|><|audio_code_45575|><|audio_code_10981|><|audio_code_17342|><|audio_code_61827|><|audio_code_7000|><|audio_code_27712|><|audio_code_7427|><|audio_code_54838|><|audio_code_19456|><|audio_code_22741|><|audio_code_59590|><|audio_code_51744|><|audio_code_244|><|audio_code_19976|><|audio_code_20096|><|audio_code_4473|><|audio_code_21022|><|audio_code_37222|><|audio_code_3112|><|audio_code_23963|><|audio_code_36772|><|audio_code_15418|><|audio_code_30416|><|audio_code_2944|><|audio_code_63238|><|audio_code_12802|><|audio_code_3195|><|audio_code_12251|><|audio_code_53558|><|audio_code_2063|><|audio_code_47123|><|audio_code_4798|><|audio_code_19055|><|audio_code_18117|><|audio_code_47160|><|audio_code_16896|><|audio_code_2756|><|audio_code_15826|><|audio_code_24554|><|audio_code_11226|><|audio_code_46968|><|audio_code_51887|><|audio_code_55484|><|audio_code_18463|><|audio_code_42300|><|audio_code_26608|><|audio_code_30770|><|audio_code_63531|><|audio_code_62555|><|audio_code_39498|><|audio_code_25922|><|audio_code_550|><|audio_code_40164|><|audio_code_63996|><|audio_code_28690|><|audio_code_19704|><|audio_code_5115|><|audio_code_40773|><|audio_code_27719|><|audio_code_58368|><|audio_code_6748|><|audio_code_49346|><|audio_code_19990|><|audio_code_1165|><|audio_code_23095|><|audio_code_17430|><|audio_code_25463|><|audio_code_22574|><|audio_code_50751|><|audio_code_52894|><|audio_code_25088|><|audio_code_5561|><|audio_code_60452|><|audio_code_15428|><|audio_code_16576|><|audio_code_13224|><|audio_code_15418|><|audio_code_60451|><|audio_code_28746|><|audio_code_63667|><|audio_code_18891|><|audio_code_1419|><|audio_code_50162|><|audio_code_7653|><|audio_code_59923|><|audio_code_27799|><|audio_code_44580|><|audio_code_2287|><|audio_code_14583|><|audio_code_37311|><|audio_code_32695|><|audio_code_55789|><|audio_code_57885|><|audio_code_13314|><|audio_code_16021|><|audio_code_32319|><|audio_code_11644|><|audio_code_42089|><|audio_code_45119|><|audio_code_51383|><|audio_code_26871|><|audio_code_14647|><|audio_code_27638|><|audio_code_22461|><|audio_code_27573|><|audio_code_11236|><|audio_code_2957|><|audio_code_5837|><|audio_code_21624|><|audio_code_56062|><|audio_code_14929|><|audio_code_20032|><|audio_code_7424|><|audio_code_6336|><|audio_code_6456|><|audio_code_45119|><|audio_code_25463|><|audio_code_14039|><|audio_code_1976|><|audio_code_5406|><|audio_code_29759|><|audio_code_59086|><|audio_code_47123|><|audio_code_12859|><|audio_code_60508|><|audio_code_15427|><|audio_code_23019|><|audio_code_63269|><|audio_code_3104|><|audio_code_63773|><|audio_code_38975|><|audio_code_18944|><|audio_code_2702|><|audio_code_3616|><|audio_code_3616|><|audio_code_52384|><|audio_code_62240|><|audio_code_36328|><|audio_code_13871|><|audio_code_7503|><|audio_code_12695|><|audio_code_63992|><|audio_code_26791|><|audio_code_58375|><|audio_code_63196|><|audio_code_15362|><|audio_code_63020|><|audio_code_31755|><|audio_code_2220|><|audio_code_61325|><|audio_code_2813|><|audio_code_19456|><|audio_code_5214|><|audio_code_4286|><|audio_code_21560|><|audio_code_18645|><|audio_code_21544|><|audio_code_19048|><|audio_code_24360|><|audio_code_63548|><|audio_code_55998|><|audio_code_18637|><|audio_code_6297|><|audio_code_58382|><|audio_code_25006|><|audio_code_60935|><|audio_code_614|><|audio_code_45374|><|audio_code_62356|><|audio_code_14547|><|audio_code_43066|><|audio_code_19975|><|audio_code_23268|><|audio_code_18709|><|audio_code_7232|><|audio_code_42047|><|audio_code_32318|><|audio_code_61461|><|audio_code_1255|><|audio_code_20412|><|audio_code_23531|><|audio_code_63035|><|audio_code_18453|><|audio_code_18566|><|audio_code_50470|><|audio_code_3104|><|audio_code_543|><|audio_code_15143|><|audio_code_61446|><|audio_code_14575|><|audio_code_45829|><|audio_code_8540|><|audio_code_15378|><|audio_code_36980|><|audio_code_10782|><|audio_code_23757|><|audio_code_24403|><|audio_code_32314|><|audio_code_61063|><|audio_code_23590|><|audio_code_17790|><|audio_code_35855|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_35847|><|audio_code_62151|><|im_end|> 2026-02-06 00:28:21.735 | INFO | acestep.llm_inference:generate_with_stop_condition:1203 - Phase 2 completed in 7.33s. Generated 374 audio codes 2026-02-06 00:28:21.735 | INFO | acestep.handler:generate_music:2782 - [generate_music] Starting generation... 2026-02-06 00:28:21.735 | INFO | acestep.handler:generate_music:2785 - [generate_music] Preparing inputs... 2026-02-06 00:28:21.752 | INFO | acestep.handler:_prepare_batch:1659 - [generate_music] Decoding audio codes for item 0... 2026-02-06 00:28:21.779 | INFO | acestep.handler:_prepare_batch:1831 - [generate_music] Decoding audio codes for LM hints for item 0... 2026-02-06 00:28:21.783 | INFO | acestep.handler:_prepare_batch:1888 - ====================================================================== 2026-02-06 00:28:21.783 | INFO | acestep.handler:_prepare_batch:1889 - 🔍 [DEBUG] DiT TEXT ENCODER INPUT (Inference) 2026-02-06 00:28:21.783 | INFO | acestep.handler:_prepare_batch:1890 - ====================================================================== 2026-02-06 00:28:21.784 | INFO | acestep.handler:_prepare_batch:1891 - text_prompt: # Instruction Generate audio semantic tokens based on the given conditions: # Caption A brief, upbeat jazz trio performance featuring a bright acoustic piano playing a lively, swinging melodic phrase. It's supported by a walking upright bass line and the subtle, rhythmic shuffle of brushed drums, creating a classic, sophisticated lounge jazz feel. # Metas - bpm: 53 - timesignature: 2 - keyscale: G major - duration: 76 seconds <|endoftext|> 2026-02-06 00:28:21.784 | INFO | acestep.handler:_prepare_batch:1892 - ====================================================================== 2026-02-06 00:28:21.798 | INFO | acestep.handler:preprocess_batch:2099 - [preprocess_batch] Inferring prompt embeddings... 2026-02-06 00:28:21.832 | INFO | acestep.handler:preprocess_batch:2102 - [preprocess_batch] Inferring lyric embeddings... 2026-02-06 00:28:21.832 | INFO | acestep.handler:service_generate:2335 - [service_generate] Generating audio... Using precomputed LM hints Using precomputed LM hints INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK 2026-02-06 00:28:22.920 | INFO | acestep.handler:generate_music:2893 - [generate_music] Model generation completed. Decoding latents... 2026-02-06 00:28:22.928 | DEBUG | acestep.handler:generate_music:2897 - [generate_music] pred_latents: torch.Size([1, 1870, 64]), dtype=torch.bfloat16 pred_latents.min()=tensor(-6.8438, device='cuda:0', dtype=torch.bfloat16), pred_latents.max()=tensor(4.7500, device='cuda:0', dtype=torch.bfloat16), pred_latents.mean()=tensor(-0.0330, device='cuda:0', dtype=torch.bfloat16) pred_latents.std()=tensor(1.0938, device='cuda:0', dtype=torch.bfloat16) 2026-02-06 00:28:22.929 | DEBUG | acestep.handler:generate_music:2898 - [generate_music] time_costs: {'encoder_time_cost': 0.024573087692260742, 'diffusion_time_cost': 1.036590814590454, 'diffusion_per_step_time_cost': 0.12957385182380676, 'total_time_cost': 1.0611639022827148, 'offload_time_cost': 0.0} 2026-02-06 00:28:22.929 | INFO | acestep.handler:generate_music:2901 - [generate_music] Decoding latents with VAE... 2026-02-06 00:28:22.931 | DEBUG | acestep.handler:generate_music:2919 - [generate_music] Before VAE decode: allocated=18.35GB, max=19.64GB 2026-02-06 00:28:22.931 | INFO | acestep.handler:generate_music:2922 - [generate_music] Using tiled VAE decode to reduce VRAM usage... Decoding audio chunks: 100%|██████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 7.49steps/s] 2026-02-06 00:28:23.612 | DEBUG | acestep.handler:generate_music:2929 - [generate_music] After VAE decode: allocated=18.35GB, max=19.64GB 2026-02-06 00:28:23.626 | INFO | acestep.handler:generate_music:2946 - [generate_music] VAE decode completed. Preparing audio tensors... 2026-02-06 00:28:23.626 | INFO | acestep.handler:generate_music:2961 - [generate_music] Done! Generated 1 audio tensors. 2026-02-06 00:28:24.237 | DEBUG | acestep.audio_utils:save_audio:125 - [AudioSaver] Fallback soundfile Saved audio to C:\Users\admin\ACE-Step-1.5\.cache\acestep\tmp\api_audio\021365d2-b6a9-ae59-abfe-37968224c7be.mp3 (mp3, 48000Hz) INFO: 127.0.0.1:53742 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53758 - "GET /v1/audio?path=C%3A%5CUsers%5Cadmin%5CACE-Step-1.5%5C.cache%5Cacestep%5Ctmp%5Capi_audio%5C021365d2-b6a9-ae59-abfe-37968224c7be.mp3 HTTP/1.1" 200 O ```
Author
Owner

@iChristGit commented on GitHub (Feb 5, 2026):

Looking at your logs, I can see the issue clearly. The API is receiving empty lyrics and tags every time:

Caption A Happy song about living in new york

Lyric

The # Lyric section is completely empty, and there are no tags being passed either.
This is happening in your UI server code - it's not sending the lyrics or tags to the ACE-Step API. Here's what you need to check:

  1. Check Your UI Form Submission
    Look at your frontend code where users input:

Song prompt
Lyrics (optional field)
Tags/genres (optional field)

Make sure these values are being captured and included in the API request.
2. Check Your Server's API Request Builder
In your ace-step-ui-server, find where it constructs the request to ACE-Step API. It should look something like:
typescript{
prompt: 'A Happy song about living in new york',
duration: undefined,
lyrics: userInputLyrics, // ← This is missing!
tags: userInputTags // ← This is missing!
}
3. Expected Format
The ACE-Step API expects:

prompt: The song description
lyrics: Optional lyrics text
tags: Optional array of genre/style tags like ["pop", "upbeat", "electronic"]

Quick Fix
Find your job submission code (likely in src/index.ts or a routes file) and ensure it's passing through all user inputs:
typescriptconst requestBody = {
prompt: userPrompt,
lyrics: userLyrics || undefined,
tags: userTags || undefined,
duration: userDuration || undefined
};
Would you like me to help you locate the specific file that needs modification? I'd need to see your UI server's source code structure.

<!-- gh-comment-id:3856704325 --> @iChristGit commented on GitHub (Feb 5, 2026): Looking at your logs, I can see the issue clearly. The API is receiving empty lyrics and tags every time: # Caption A Happy song about living in new york # Lyric The # Lyric section is completely empty, and there are no tags being passed either. This is happening in your UI server code - it's not sending the lyrics or tags to the ACE-Step API. Here's what you need to check: 1. Check Your UI Form Submission Look at your frontend code where users input: Song prompt Lyrics (optional field) Tags/genres (optional field) Make sure these values are being captured and included in the API request. 2. Check Your Server's API Request Builder In your ace-step-ui-server, find where it constructs the request to ACE-Step API. It should look something like: typescript{ prompt: 'A Happy song about living in new york', duration: undefined, lyrics: userInputLyrics, // ← This is missing! tags: userInputTags // ← This is missing! } 3. Expected Format The ACE-Step API expects: prompt: The song description lyrics: Optional lyrics text tags: Optional array of genre/style tags like ["pop", "upbeat", "electronic"] Quick Fix Find your job submission code (likely in src/index.ts or a routes file) and ensure it's passing through all user inputs: typescriptconst requestBody = { prompt: userPrompt, lyrics: userLyrics || undefined, tags: userTags || undefined, duration: userDuration || undefined }; Would you like me to help you locate the specific file that needs modification? I'd need to see your UI server's source code structure.
Author
Owner

@iChristGit commented on GitHub (Feb 5, 2026):

please do not close this issue again, its the 3rd same issue and you just close it..

<!-- gh-comment-id:3856708753 --> @iChristGit commented on GitHub (Feb 5, 2026): please do not close this issue again, its the 3rd same issue and you just close it..
Author
Owner

@fspecii commented on GitHub (Feb 5, 2026):

Hi @iChristGit,

Thanks for the detailed logs - they help a lot!

Format/Enhancement ENOENT

The error you're seeing:

[Format] Spawn error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT

This means the server can't find the Python executable at ..\ACE-Step-1.5\.venv\Scripts\python.exe. This happens when running manually (outside Pinokio) because the server doesn't know where your Python/venv is located.

Fix: Set environment variables before starting the server so it knows where ACE-Step and Python are:

PowerShell:

$env:ACESTEP_PATH = "C:\Users\admin\ACE-Step-1.5"
$env:PYTHON_PATH = "C:\Users\admin\ACE-Step-1.5\.venv\Scripts\python.exe"
npm run dev

CMD:

set ACESTEP_PATH=C:\Users\admin\ACE-Step-1.5
set PYTHON_PATH=C:\Users\admin\ACE-Step-1.5\.venv\Scripts\python.exe
npm run dev

Adjust the paths to match your actual ACE-Step installation. If you're using conda instead of a venv, point PYTHON_PATH to your conda Python, e.g.:

$env:PYTHON_PATH = "C:\Users\admin\miniconda3\envs\acestep\python.exe"

Or if you installed Python globally and ACE-Step packages globally:

$env:PYTHON_PATH = "python"

Simple Mode Without Lyrics

In Simple mode, you provide only a prompt/caption. The ACE-Step model then generates the music based on that description. If you want lyrics in the output, you need to:

  1. Use the Enhancement button (the sparkle icon next to the caption/lyrics fields) — this uses the LLM to auto-generate lyrics, BPM, key, etc. from your prompt. But this requires the Python path fix above to work.
  2. Switch to Custom mode and manually enter lyrics in the Lyrics field.

Once you fix the PYTHON_PATH issue, the Enhancement feature should generate lyrics automatically from your prompt.

The easiest way to run this is through Pinokio which handles all the Python paths, venv setup, and environment configuration automatically. No manual env vars needed.

Let us know if this helps!

<!-- gh-comment-id:3856791320 --> @fspecii commented on GitHub (Feb 5, 2026): Hi @iChristGit, Thanks for the detailed logs - they help a lot! ## Format/Enhancement ENOENT The error you're seeing: ``` [Format] Spawn error: spawn ..\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT ``` This means the server can't find the Python executable at `..\ACE-Step-1.5\.venv\Scripts\python.exe`. This happens when running manually (outside Pinokio) because the server doesn't know where your Python/venv is located. **Fix:** Set environment variables before starting the server so it knows where ACE-Step and Python are: **PowerShell:** ```powershell $env:ACESTEP_PATH = "C:\Users\admin\ACE-Step-1.5" $env:PYTHON_PATH = "C:\Users\admin\ACE-Step-1.5\.venv\Scripts\python.exe" npm run dev ``` **CMD:** ```cmd set ACESTEP_PATH=C:\Users\admin\ACE-Step-1.5 set PYTHON_PATH=C:\Users\admin\ACE-Step-1.5\.venv\Scripts\python.exe npm run dev ``` Adjust the paths to match your actual ACE-Step installation. If you're using conda instead of a venv, point `PYTHON_PATH` to your conda Python, e.g.: ```powershell $env:PYTHON_PATH = "C:\Users\admin\miniconda3\envs\acestep\python.exe" ``` Or if you installed Python globally and ACE-Step packages globally: ```powershell $env:PYTHON_PATH = "python" ``` ## Simple Mode Without Lyrics In Simple mode, you provide only a prompt/caption. The ACE-Step model then generates the music based on that description. If you want lyrics in the output, you need to: 1. **Use the Enhancement button** (the ✨ sparkle icon next to the caption/lyrics fields) — this uses the LLM to auto-generate lyrics, BPM, key, etc. from your prompt. But this requires the Python path fix above to work. 2. **Switch to Custom mode** and manually enter lyrics in the Lyrics field. Once you fix the `PYTHON_PATH` issue, the Enhancement feature should generate lyrics automatically from your prompt. ## Recommended: Use Pinokio The easiest way to run this is through [Pinokio](https://pinokio.computer) which handles all the Python paths, venv setup, and environment configuration automatically. No manual env vars needed. Let us know if this helps!
Author
Owner

@iChristGit commented on GitHub (Feb 5, 2026):

Il just try the pinokio version, but Simple mode should just generate both the tags and lyrics for the song (aka "rap Song about love" should contain tags related to rap and lyrics related to love)

<!-- gh-comment-id:3856855520 --> @iChristGit commented on GitHub (Feb 5, 2026): Il just try the pinokio version, but Simple mode should just generate both the tags and lyrics for the song (aka "rap Song about love" should contain tags related to rap and lyrics related to love)
Author
Owner

@iChristGit commented on GitHub (Feb 5, 2026):

Same exact issue in Pinokio.. just cant get it output a single word no matter what
The song tag is just "Simple"

Microsoft Windows [Version 10.0.26200.7705]
(c) Microsoft Corporation. All rights reserved.

C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5>conda_hook & conda deactivate & conda deactivate & conda deactivate & conda activate base & C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\env\Scripts\activate C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\env && acestep-api --port 42003

Skipping import of cpp extensions due to incompatible torch version 2.7.1+cu128 for torchao version 0.15.0             Please see https://github.com/pytorch/ao/issues/2919 for more info
W0206 01:00:20.423000 8516 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
INFO:     Started server process [8516]
INFO:     Waiting for application startup.
[API Server] Initializing models at startup...

============================================================
[API Server] GPU Configuration Detected:
============================================================
  GPU Memory: 23.99 GB
  Configuration Tier: tier6
  Max Duration (with LM): 480s
  Max Duration (without LM): 480s
  Max Batch Size (with LM): 4
  Max Batch Size (without LM): 8
  Default LM Init: True
  Available LM Models: ['acestep-5Hz-lm-0.6B', 'acestep-5Hz-lm-1.7B', 'acestep-5Hz-lm-4B']
============================================================

[API Server] CPU offload disabled by default (GPU >= 16GB)
[Model Download] Model acestep-v15-turbo not found, checking network...
[Model Download] Auto-detected: HuggingFace Hub
[Model Download] Using HuggingFace Hub...
[Model Download] Downloading unified repo ACE-Step/Ace-Step1.5 to C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints...
config.json: 1.36kB [00:00, ?B/s]                  | 0/28 [00:00<?, ?steps/s]
added_tokens.json: 100%|████████████████████████████| 707/707 [00:00<?, ?B/s]
chat_template.jinja: 4.12kB [00:00, ?B/s]          | 0.00/707 [00:00<?, ?B/s]
chat_template.jinja: 0.00B [00:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`   
special_tokens_map.json: 100%|██████████████████████| 613/613 [00:00<?, ?B/s]
merges.txt: 1.67MB [00:00, 73.8MB/s]
.gitattributes: 1.74kB [00:00, ?B/s]               | 0.00/613 [00:00<?, ?B/s] 
tokenizer_config.json: 5.40kB [00:00, 5.40MB/s]28 [00:00<00:09,  2.83steps/s]
README.md: 5.50kB [00:00, 5.50MB/s], ?B/s]
vocab.json: 2.78MB [00:00, 79.2MB/s]
chat_template.jinja: 4.17kB [00:00, ?B/s]
added_tokens.json: 2.22MB [00:00, 193MB/s]
config.json: 1.39kB [00:00, 84.8kB/s]/s]
tokenizer.json:   0%|          Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`/s]
merges.txt: 1.67MB [00:00, 34.8MB/s]
                                            Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`   
vocab.json: 2.78MB [00:00, 40.0MB/s]
config.json: 1.97kB [00:00, 1.97MB/s]
special_tokens_map.json: 1.82MB [00:00, 8.22MB/s].19G [00:00<00:19, 59.4MB/s] 
                                                                             Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`/s]
tokenizer.json: 100%|███████████████████| 11.4M/11.4M [00:00<00:00, 22.8MB/s] 
configuration_acestep_v15.py: 13.1kB [00:00, 13.1MB/s]
modeling_acestep_v15_turbo.py: 96.0kB [00:00, 366kB/s]
config.json: 1.97kB [00:00, ?B/s].0kB [00:00, 366kB/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`s]
config.json: 100%|███████████████████████████| 425/425 [00:00<00:00, 283kB/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`   
silence_latent.pt: 100%|████████████████| 3.84M/3.84M [00:00<00:00, 16.6MB/s]
tokenizer_config.json: 100%|████████████| 14.1M/14.1M [00:01<00:00, 12.9MB/s] 
tokenizer.json: 100%|███████████████████| 24.3M/24.3M [00:02<00:00, 9.93MB/s]
diffusion_pytorch_model.safetensors: 100%|█| 337M/337M [00:10<00:00, 31.8MB/s
model.safetensors: 100%|████████████████| 1.19G/1.19G [00:37<00:00, 32.0MB/s]
model.safetensors: 100%|████████████████| 3.71G/3.71G [01:25<00:00, 43.4MB/s]
model.safetensors: 100%|████████████████| 4.79G/4.79G [01:35<00:00, 50.1MB/s]
Fetching 28 files: 100%|██████████████████| 28/28 [01:36<00:00,  3.45s/steps]
[Model Download] Model vae already exists at C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints\vae██| 3.71G/3.71G [01:25<00:00, 54.2MB/s]
[API Server] Loading primary DiT model: acestep-v15-turbo:36<01:37, 37.0MB/s]
2026-02-06 01:02:06.492 | INFO     | acestep.handler:initialize_service:399 - [initialize_service] Attempting to load model with attention implementation: flash_attention_2
[API Server] Primary model loaded: acestep-v15-turbo
[API Server] GPU auto-detection: init_llm=True (VRAM: 24.0GB, tier: tier6)
[API Server] ACESTEP_INIT_LLM=auto, using GPU auto-detection result
[API Server] Loading LLM model...
[API Server] Auto-selected LM model: acestep-5Hz-lm-4B based on GPU tier      
[Model Download] Model acestep-5Hz-lm-4B not found, checking network...       
[Model Download] Auto-detected: HuggingFace Hub
[Model Download] Using HuggingFace Hub...
[Model Download] Downloading acestep-5Hz-lm-4B from ACE-Step/acestep-5Hz-lm-4B to C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B...
Fetching 13 files:   0%|                           | 0/13 [00:00<?, ?steps/s]Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`    
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`   
merges.txt: 1.67MB [00:00, 86.6MB/s]
.gitattributes: 1.63kB [00:00, ?B/s]
config.json: 1.56kB [00:00, ?B/s]s]
README.md: 5.50kB [00:00, 1.10MB/s]        | 1/13 [00:00<00:02,  4.41steps/s] 
chat_template.jinja: 4.17kB [00:00, ?B/s]
added_tokens.json: 2.22MB [00:00, 211MB/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`   
model.safetensors.index.json: 32.8kB [00:00, 32.7MB/s]
special_tokens_map.json: 1.82MB [00:00, 404MB/s]
model.safetensors.index.json: 0.00B [00:00, ?B/s]                            Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`    
vocab.json: 2.78MB [00:00, 40.5MB/s]
tokenizer_config.json: 100%|████████████| 14.1M/14.1M [00:00<00:00, 20.3MB/s]
tokenizer.json: 100%|███████████████████| 24.3M/24.3M [00:01<00:00, 20.4MB/s]
model-00002-of-00002.safetensors: 100%|█| 3.38G/3.38G [01:33<00:00, 36.0MB/s]
model-00001-of-00002.safetensors: 100%|█| 5.00G/5.00G [01:40<00:00, 49.9MB/s]
Fetching 13 files: 100%|██████████████████| 13/13 [01:40<00:00,  7.74s/steps]
2026-02-06 01:03:50.991 | INFO     | acestep.llm_inference:initialize:361 - loading 5Hz LM tokenizer... it may take 80~90s
2026-02-06 01:04:01.703 | INFO     | acestep.llm_inference:initialize:365 - 5Hz LM tokenizer loaded successfully in 10.71 seconds
2026-02-06 01:04:01.703 | INFO     | acestep.llm_inference:initialize:370 - Initializing constrained decoding processor...
2026-02-06 01:04:01.704 | INFO     | acestep.llm_inference:initialize:376 - Setting constrained decoding max_duration to 480s based on GPU config (tier: tier6)
2026-02-06 01:04:02.401 | WARNING  | acestep.constrained_logits_processor:_precompute_audio_code_tokens:556 - Found 1535 audio code tokens with values outside valid range [0, 63999]
2026-02-06 01:04:04.488 | INFO     | acestep.llm_inference:initialize:384 - Constrained processor initialized in 2.78 seconds
2026-02-06 01:04:05.532 | INFO     | acestep.llm_inference:get_gpu_memory_utilization:102 - Adaptive LM memory allocation: model=C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B, target=12.0GB, ratio=0.500, total_gpu=24.0GB
2026-02-06 01:04:05.532 | INFO     | acestep.llm_inference:_initialize_5hz_lm_vllm:444 - Initializing 5Hz LM with model: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B, enforce_eager: False, tensor_parallel_size: 1, max_model_len: 4096, gpu_memory_utilization: 0.500       
Microsoft Windows [Version 10.0.26200.7705]
(c) Microsoft Corporation. All rights reserved.

C:\pinokio\api\ace-step-ui.pinokio.git\app>conda_hook & conda deactivate & conda deactivate & conda deactivate & conda activate base && npm run dev -- --host 127.0.0.1 --port 42004 --strictPort

> ace-step-ui@1.0.0 dev
> vite --host 127.0.0.1 --port 42004 --strictPort




Microsoft Windows [Version 10.0.26200.7705]
(c) Microsoft Corporation. All rights reserved.

C:\pinokio\api\ace-step-ui.pinokio.git\app\server>conda_hook & conda deactivate & conda deactivate & conda deactivate & conda activate base && npm run dev

> ace-step-ui-server@1.0.0 dev
> tsx watch src/index.ts

Running SQLite database migrations...
Migrations completed successfully!


██ Detached from Shell c97de72d-f515-4df5-8583-38b5854f7411

ACE-Step UI Server running on http://localhost:3001
Environment: development
ACE-Step API: http://127.0.0.1:42003
LAN access: http://100.106.209.57:3001
LAN access: http://192.168.0.169:3001
LAN access: http://172.19.32.1:3001


===================================================
# input.event
[
  "ACE-Step UI Server running"
]
===================================================

Initializing local storage provider
Job job_1770332720424_0lxh2tg: Queued at position 1
INFO:     127.0.0.1:53167 - "GET /health HTTP/1.1" 200 OK
[ACE-Step] API available at http://127.0.0.1:42003: true
INFO:     127.0.0.1:53168 - "POST /release_task HTTP/1.1" 200 OK
2026-02-06 01:05:20.456 | INFO     | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=False, use_cot_caption=False, use_cot_language=False, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True
2026-02-06 01:05:20.456 | INFO     | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=dit) (size: 1, seeds: [1356128464])
2026-02-06 01:05:20.456 | INFO     | acestep.llm_inference:generate_with_stop_Job job_1770332720424_0lxh2tg: Using ACE-Step REST API { prompt: 'a happy pop song about love', duration: undefined }
Job job_1770332720424_0lxh2tg: Submitted to API as task d32e6d73-1a1e-436d-8b63-ce0788a5f8a0

2026-02-06 01:05:20.471 | INFO     | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system
# Instruction
Generate audio semantic tokens based on the given conditions:

<|im_end|>
<|im_start|>user
# Caption
a happy pop song about love

# Lyric

<|im_end|>
<|im_start|>assistant

Generating:   0%|                                   | 0/1 [00:00<?, ?steps/s]INFO:     127.0.0.1:53167 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53167 - "POST /query_result HTTP/1.1" 200 OK
INFO:     127.0.0.1:53167 - "POST /query_result HTTP/1.1" 200 OK
Generating: 100%|█| 1/1 [00:03<00:00,  3.13s/steps, Prefill=29tok/s, Decode=4
Microsoft Windows [Version 10.0.26200.7705]
(c) Microsoft Corporation. All rights reserved.

C:\pinokio\api\ace-step-ui.pinokio.git\app\server>conda_hook & conda deactivate & conda deactivate & conda deactivate & conda activate base && npm run dev

> ace-step-ui-server@1.0.0 dev
> tsx watch src/index.ts

Running SQLite database migrations...
Migrations completed successfully!
ACE-Step UI Server running on http://localhost:3001
Environment: development
ACE-Step API: http://127.0.0.1:42003
LAN access: http://100.106.209.57:3001
LAN access: http://192.168.0.169:3001
LAN access: http://172.19.32.1:3001
Initializing local storage provider
Job job_1770332720424_0lxh2tg: Queued at position 1
[ACE-Step] API available at http://127.0.0.1:42003: true
Job job_1770332720424_0lxh2tg: Using ACE-Step REST API { prompt: 'a happy pop song about love', duration: undefined }
Job job_1770332720424_0lxh2tg: Submitted to API as task d32e6d73-1a1e-436d-8b63-ce0788a5f8a0
Job job_1770332720424_0lxh2tg: Completed via API with 1 audio files
[Format] Running: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe C:\pinokio\api\ace-step-ui.pinokio.git\app\server\scripts\format_sample.py --caption fa --json --lyrics love, is --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-0.6B --lm-backend pt
[Format] CWD: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5
[Format] Spawn error: spawn C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Python error: spawn C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Process exited with code -4058
[Format] Running: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe C:\pinokio\api\ace-step-ui.pinokio.git\app\server\scripts\format_sample.py --caption fa --json --lyrics love, is --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-0.6B --lm-backend pt
[Format] CWD: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5
[Format] Spawn error: spawn C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Python error: spawn C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT
[Format] Process exited with code -4058




<img width="2554" height="1409" alt="Image" src="https://github.com/user-attachments/assets/1b2bfa61-3fde-4a7f-ad1b-02a4ce6bcd7c" />













<!-- gh-comment-id:3856905545 --> @iChristGit commented on GitHub (Feb 5, 2026): Same exact issue in Pinokio.. just cant get it output a single word no matter what The song tag is just "Simple" ``` Microsoft Windows [Version 10.0.26200.7705] (c) Microsoft Corporation. All rights reserved. C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5>conda_hook & conda deactivate & conda deactivate & conda deactivate & conda activate base & C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\env\Scripts\activate C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\env && acestep-api --port 42003 Skipping import of cpp extensions due to incompatible torch version 2.7.1+cu128 for torchao version 0.15.0 Please see https://github.com/pytorch/ao/issues/2919 for more info W0206 01:00:20.423000 8516 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs. INFO: Started server process [8516] INFO: Waiting for application startup. [API Server] Initializing models at startup... ============================================================ [API Server] GPU Configuration Detected: ============================================================ GPU Memory: 23.99 GB Configuration Tier: tier6 Max Duration (with LM): 480s Max Duration (without LM): 480s Max Batch Size (with LM): 4 Max Batch Size (without LM): 8 Default LM Init: True Available LM Models: ['acestep-5Hz-lm-0.6B', 'acestep-5Hz-lm-1.7B', 'acestep-5Hz-lm-4B'] ============================================================ [API Server] CPU offload disabled by default (GPU >= 16GB) [Model Download] Model acestep-v15-turbo not found, checking network... [Model Download] Auto-detected: HuggingFace Hub [Model Download] Using HuggingFace Hub... [Model Download] Downloading unified repo ACE-Step/Ace-Step1.5 to C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints... config.json: 1.36kB [00:00, ?B/s] | 0/28 [00:00<?, ?steps/s] added_tokens.json: 100%|████████████████████████████| 707/707 [00:00<?, ?B/s] chat_template.jinja: 4.12kB [00:00, ?B/s] | 0.00/707 [00:00<?, ?B/s] chat_template.jinja: 0.00B [00:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` special_tokens_map.json: 100%|██████████████████████| 613/613 [00:00<?, ?B/s] merges.txt: 1.67MB [00:00, 73.8MB/s] .gitattributes: 1.74kB [00:00, ?B/s] | 0.00/613 [00:00<?, ?B/s] tokenizer_config.json: 5.40kB [00:00, 5.40MB/s]28 [00:00<00:09, 2.83steps/s] README.md: 5.50kB [00:00, 5.50MB/s], ?B/s] vocab.json: 2.78MB [00:00, 79.2MB/s] chat_template.jinja: 4.17kB [00:00, ?B/s] added_tokens.json: 2.22MB [00:00, 193MB/s] config.json: 1.39kB [00:00, 84.8kB/s]/s] tokenizer.json: 0%| Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`/s] merges.txt: 1.67MB [00:00, 34.8MB/s] Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`/s] Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` vocab.json: 2.78MB [00:00, 40.0MB/s] config.json: 1.97kB [00:00, 1.97MB/s] special_tokens_map.json: 1.82MB [00:00, 8.22MB/s].19G [00:00<00:19, 59.4MB/s] Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`/s] tokenizer.json: 100%|███████████████████| 11.4M/11.4M [00:00<00:00, 22.8MB/s] configuration_acestep_v15.py: 13.1kB [00:00, 13.1MB/s] modeling_acestep_v15_turbo.py: 96.0kB [00:00, 366kB/s] config.json: 1.97kB [00:00, ?B/s].0kB [00:00, 366kB/s] Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`s] config.json: 100%|███████████████████████████| 425/425 [00:00<00:00, 283kB/s] Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` silence_latent.pt: 100%|████████████████| 3.84M/3.84M [00:00<00:00, 16.6MB/s] tokenizer_config.json: 100%|████████████| 14.1M/14.1M [00:01<00:00, 12.9MB/s] tokenizer.json: 100%|███████████████████| 24.3M/24.3M [00:02<00:00, 9.93MB/s] diffusion_pytorch_model.safetensors: 100%|█| 337M/337M [00:10<00:00, 31.8MB/s model.safetensors: 100%|████████████████| 1.19G/1.19G [00:37<00:00, 32.0MB/s] model.safetensors: 100%|████████████████| 3.71G/3.71G [01:25<00:00, 43.4MB/s] model.safetensors: 100%|████████████████| 4.79G/4.79G [01:35<00:00, 50.1MB/s] Fetching 28 files: 100%|██████████████████| 28/28 [01:36<00:00, 3.45s/steps] [Model Download] Model vae already exists at C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints\vae██| 3.71G/3.71G [01:25<00:00, 54.2MB/s] [API Server] Loading primary DiT model: acestep-v15-turbo:36<01:37, 37.0MB/s] 2026-02-06 01:02:06.492 | INFO | acestep.handler:initialize_service:399 - [initialize_service] Attempting to load model with attention implementation: flash_attention_2 [API Server] Primary model loaded: acestep-v15-turbo [API Server] GPU auto-detection: init_llm=True (VRAM: 24.0GB, tier: tier6) [API Server] ACESTEP_INIT_LLM=auto, using GPU auto-detection result [API Server] Loading LLM model... [API Server] Auto-selected LM model: acestep-5Hz-lm-4B based on GPU tier [Model Download] Model acestep-5Hz-lm-4B not found, checking network... [Model Download] Auto-detected: HuggingFace Hub [Model Download] Using HuggingFace Hub... [Model Download] Downloading acestep-5Hz-lm-4B from ACE-Step/acestep-5Hz-lm-4B to C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B... Fetching 13 files: 0%| | 0/13 [00:00<?, ?steps/s]Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` merges.txt: 1.67MB [00:00, 86.6MB/s] .gitattributes: 1.63kB [00:00, ?B/s] config.json: 1.56kB [00:00, ?B/s]s] README.md: 5.50kB [00:00, 1.10MB/s] | 1/13 [00:00<00:02, 4.41steps/s] chat_template.jinja: 4.17kB [00:00, ?B/s] added_tokens.json: 2.22MB [00:00, 211MB/s] Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` model.safetensors.index.json: 32.8kB [00:00, 32.7MB/s] special_tokens_map.json: 1.82MB [00:00, 404MB/s] model.safetensors.index.json: 0.00B [00:00, ?B/s] Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` vocab.json: 2.78MB [00:00, 40.5MB/s] tokenizer_config.json: 100%|████████████| 14.1M/14.1M [00:00<00:00, 20.3MB/s] tokenizer.json: 100%|███████████████████| 24.3M/24.3M [00:01<00:00, 20.4MB/s] model-00002-of-00002.safetensors: 100%|█| 3.38G/3.38G [01:33<00:00, 36.0MB/s] model-00001-of-00002.safetensors: 100%|█| 5.00G/5.00G [01:40<00:00, 49.9MB/s] Fetching 13 files: 100%|██████████████████| 13/13 [01:40<00:00, 7.74s/steps] 2026-02-06 01:03:50.991 | INFO | acestep.llm_inference:initialize:361 - loading 5Hz LM tokenizer... it may take 80~90s 2026-02-06 01:04:01.703 | INFO | acestep.llm_inference:initialize:365 - 5Hz LM tokenizer loaded successfully in 10.71 seconds 2026-02-06 01:04:01.703 | INFO | acestep.llm_inference:initialize:370 - Initializing constrained decoding processor... 2026-02-06 01:04:01.704 | INFO | acestep.llm_inference:initialize:376 - Setting constrained decoding max_duration to 480s based on GPU config (tier: tier6) 2026-02-06 01:04:02.401 | WARNING | acestep.constrained_logits_processor:_precompute_audio_code_tokens:556 - Found 1535 audio code tokens with values outside valid range [0, 63999] 2026-02-06 01:04:04.488 | INFO | acestep.llm_inference:initialize:384 - Constrained processor initialized in 2.78 seconds 2026-02-06 01:04:05.532 | INFO | acestep.llm_inference:get_gpu_memory_utilization:102 - Adaptive LM memory allocation: model=C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B, target=12.0GB, ratio=0.500, total_gpu=24.0GB 2026-02-06 01:04:05.532 | INFO | acestep.llm_inference:_initialize_5hz_lm_vllm:444 - Initializing 5Hz LM with model: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\checkpoints\acestep-5Hz-lm-4B, enforce_eager: False, tensor_parallel_size: 1, max_model_len: 4096, gpu_memory_utilization: 0.500 Microsoft Windows [Version 10.0.26200.7705] (c) Microsoft Corporation. All rights reserved. C:\pinokio\api\ace-step-ui.pinokio.git\app>conda_hook & conda deactivate & conda deactivate & conda deactivate & conda activate base && npm run dev -- --host 127.0.0.1 --port 42004 --strictPort > ace-step-ui@1.0.0 dev > vite --host 127.0.0.1 --port 42004 --strictPort Microsoft Windows [Version 10.0.26200.7705] (c) Microsoft Corporation. All rights reserved. C:\pinokio\api\ace-step-ui.pinokio.git\app\server>conda_hook & conda deactivate & conda deactivate & conda deactivate & conda activate base && npm run dev > ace-step-ui-server@1.0.0 dev > tsx watch src/index.ts Running SQLite database migrations... Migrations completed successfully! ██ Detached from Shell c97de72d-f515-4df5-8583-38b5854f7411 ACE-Step UI Server running on http://localhost:3001 Environment: development ACE-Step API: http://127.0.0.1:42003 LAN access: http://100.106.209.57:3001 LAN access: http://192.168.0.169:3001 LAN access: http://172.19.32.1:3001 =================================================== # input.event [ "ACE-Step UI Server running" ] =================================================== Initializing local storage provider Job job_1770332720424_0lxh2tg: Queued at position 1 INFO: 127.0.0.1:53167 - "GET /health HTTP/1.1" 200 OK [ACE-Step] API available at http://127.0.0.1:42003: true INFO: 127.0.0.1:53168 - "POST /release_task HTTP/1.1" 200 OK 2026-02-06 01:05:20.456 | INFO | acestep.inference:generate_music:387 - [generate_music] LLM usage decision: thinking=False, use_cot_caption=False, use_cot_language=False, use_cot_metas=True, need_lm_for_cot=True, llm_initialized=True, use_lm=True 2026-02-06 01:05:20.456 | INFO | acestep.inference:generate_music:445 - LM chunk 1/1 (infer_type=dit) (size: 1, seeds: [1356128464]) 2026-02-06 01:05:20.456 | INFO | acestep.llm_inference:generate_with_stop_Job job_1770332720424_0lxh2tg: Using ACE-Step REST API { prompt: 'a happy pop song about love', duration: undefined } Job job_1770332720424_0lxh2tg: Submitted to API as task d32e6d73-1a1e-436d-8b63-ce0788a5f8a0 2026-02-06 01:05:20.471 | INFO | acestep.llm_inference:generate_with_stop_condition:970 - generate_with_stop_condition: formatted_prompt=<|im_start|>system # Instruction Generate audio semantic tokens based on the given conditions: <|im_end|> <|im_start|>user # Caption a happy pop song about love # Lyric <|im_end|> <|im_start|>assistant Generating: 0%| | 0/1 [00:00<?, ?steps/s]INFO: 127.0.0.1:53167 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53167 - "POST /query_result HTTP/1.1" 200 OK INFO: 127.0.0.1:53167 - "POST /query_result HTTP/1.1" 200 OK Generating: 100%|█| 1/1 [00:03<00:00, 3.13s/steps, Prefill=29tok/s, Decode=4 Microsoft Windows [Version 10.0.26200.7705] (c) Microsoft Corporation. All rights reserved. C:\pinokio\api\ace-step-ui.pinokio.git\app\server>conda_hook & conda deactivate & conda deactivate & conda deactivate & conda activate base && npm run dev > ace-step-ui-server@1.0.0 dev > tsx watch src/index.ts Running SQLite database migrations... Migrations completed successfully! ACE-Step UI Server running on http://localhost:3001 Environment: development ACE-Step API: http://127.0.0.1:42003 LAN access: http://100.106.209.57:3001 LAN access: http://192.168.0.169:3001 LAN access: http://172.19.32.1:3001 Initializing local storage provider Job job_1770332720424_0lxh2tg: Queued at position 1 [ACE-Step] API available at http://127.0.0.1:42003: true Job job_1770332720424_0lxh2tg: Using ACE-Step REST API { prompt: 'a happy pop song about love', duration: undefined } Job job_1770332720424_0lxh2tg: Submitted to API as task d32e6d73-1a1e-436d-8b63-ce0788a5f8a0 Job job_1770332720424_0lxh2tg: Completed via API with 1 audio files [Format] Running: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe C:\pinokio\api\ace-step-ui.pinokio.git\app\server\scripts\format_sample.py --caption fa --json --lyrics love, is --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-0.6B --lm-backend pt [Format] CWD: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5 [Format] Spawn error: spawn C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Python error: spawn C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Process exited with code -4058 [Format] Running: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe C:\pinokio\api\ace-step-ui.pinokio.git\app\server\scripts\format_sample.py --caption fa --json --lyrics love, is --temperature 0.8 --top-p 0.92 --lm-model acestep-5Hz-lm-0.6B --lm-backend pt [Format] CWD: C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5 [Format] Spawn error: spawn C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Python error: spawn C:\pinokio\api\ace-step-ui.pinokio.git\app\ACE-Step-1.5\.venv\Scripts\python.exe ENOENT [Format] Process exited with code -4058 <img width="2554" height="1409" alt="Image" src="https://github.com/user-attachments/assets/1b2bfa61-3fde-4a7f-ad1b-02a4ce6bcd7c" /> ```
Author
Owner

@iChristGit commented on GitHub (Feb 5, 2026):

Its all automated install by Pinokio, not a python mistake on my part... regardless of that the demo of ace-step does work with lyrics and all, so I dont understand why you think its unable to find python.. it can in the actual demo

<!-- gh-comment-id:3856908776 --> @iChristGit commented on GitHub (Feb 5, 2026): Its all automated install by Pinokio, not a python mistake on my part... regardless of that the demo of ace-step does work with lyrics and all, so I dont understand why you think its unable to find python.. it can in the actual demo
Author
Owner

@iChristGit commented on GitHub (Feb 5, 2026):

You need to work on the code itself man, dig deeper with the LLM

<!-- gh-comment-id:3856910436 --> @iChristGit commented on GitHub (Feb 5, 2026): You need to work on the code itself man, dig deeper with the LLM
Author
Owner

@snfblw commented on GitHub (Feb 6, 2026):

No lyrics for me either, you aren't alone.

<!-- gh-comment-id:3857481256 --> @snfblw commented on GitHub (Feb 6, 2026): No lyrics for me either, you aren't alone.
Author
Owner

@jetzwow commented on GitHub (Feb 6, 2026):

i got it working by going into start-all.bat
Find

set ACESTEP_PATH=..\ACE-Step-1.5

Change it to the absolute path

set ACESTEP_PATH=T:\ACE-Step-1.5

now it uses the embeded python from ACE-Step-1.5 and works

Edit: Works for formatting only, not for generating lyrics.

<!-- gh-comment-id:3857663238 --> @jetzwow commented on GitHub (Feb 6, 2026): i got it working by going into start-all.bat Find set ACESTEP_PATH=..\ACE-Step-1.5 Change it to the absolute path set ACESTEP_PATH=T:\ACE-Step-1.5 now it uses the embeded python from ACE-Step-1.5 and works Edit: Works for formatting only, not for generating lyrics.
Author
Owner

@snfblw commented on GitHub (Feb 6, 2026):

No luck for me, I tried it with my own path obviously. Still the same issue.

<!-- gh-comment-id:3857720444 --> @snfblw commented on GitHub (Feb 6, 2026): No luck for me, I tried it with my own path obviously. Still the same issue.
Author
Owner

@jetzwow commented on GitHub (Feb 6, 2026):

Yea, got ahead of myself.
I read the API documentation.
Generating lyrics uses something called sample_mode

Image

Don't think its been implemented.

<!-- gh-comment-id:3857821342 --> @jetzwow commented on GitHub (Feb 6, 2026): Yea, got ahead of myself. I read the API documentation. Generating lyrics uses something called `sample_mode` <img width="786" height="188" alt="Image" src="https://github.com/user-attachments/assets/381fdd63-3baf-460b-809d-abc3ceb86310" /> Don't think its been implemented.
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ace-step-ui#22
No description provided.