mirror of
https://github.com/librespot-org/librespot.git
synced 2026-04-27 08:15:50 +03:00
[GH-ISSUE #343] Significanly higher CPU usage with JACK audio backend vs. ALSA backend on ARM #226
Labels
No labels
A-Alsa
SpotifyAPI
Tokio 1.0
audio
bug
can't reproduce
compilation
dependencies
duplicate
enhancement
good first issue
help wanted
high priority
imported
imported
invalid
new api
pull-request
question
reverse engineering
wiki
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/librespot#226
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @markubiak on GitHub (Jul 1, 2019).
Original GitHub issue: https://github.com/librespot-org/librespot/issues/343
I'm trying to track down a bug causing high CPU usage when using the JACK backend. When running librespot with identical options and only changing the backend from ALSA to JACK, the process goes from ~20% CPU usage to ~50% CPU while playing music on both my Pi 2 and Pi 3 running Raspbian. This increase is seen on the exact same binary (just changing the backend with the command line). For a non-Pi reference, my SoCFPGA development board with a dual-core Cortex A9 (Cyclone 5) runs librespot at under 10% CPU usage with the JACK backend. A9s are faster, but not that much. As JACK runs with all samples as single-precision floats by default, I have the theory that the difference is due to an improper usage of the Cortex A53's floating point hardware, as the main jackd process and a custom IIR filterbank engine for REW are both running with very low CPU usage despite performing a not insignificant number of FLOPS (20 biquads per channel + mixing) resulting in under 20% CPU usage combined according to
top. It doesn't make any sense that the JACK backend of librespot would use significantly more than that.I have mostly been toying with compiler flags. I make sure to compile the JACK code with:
-mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bitsfor both C and C++. I am much less familiar with Rust, but I'm very confident I'm correctly passing floating point optimizations for the armv7-unknown-linux-gnueabihf target:
RUSTFLAGS="-C target-cpu=cortex-a53 -C target-feature=+v8,+vfp4,+neon,-d16"(the -d16 is to resolve an issue with LLVM register allocation with NEON enabled, this reassures me that NEON support is actually in the binaries).
I'm very curious to hear if anyone has been troubleshooting or has resolved a similar issue. If you want to test, feel free to clone my raspotify fork and build my docker image. On my machine, it's just a
docker build .and then adocker run -it <image>. Just mark the build.sh as executable, run it, and get a coffee. The binary is built against the version of jack2 in the Raspbian repos so just make sure you have jackd2 installed if you're also testing on a Pi.Sincerely appreciate any help!
EDIT: Also wanted to add that I've tried a native build (on the Pi 3 itself) and saw no changes
@willstott101 commented on GitHub (Jul 12, 2019):
Hm, librespot's code for Jack is much more complicated than ALSA thanks to it's callback mechanism, and Librespot's architecture.
Briefly glancing at the code it looks like the Jack back-end communicates with two 32bit float ports. It would be interesting to find out if it's possible to use a single interlaced stereo 16bit int port through to the Jack server - the format of decoded spotify audio. I'm not familiar enough with Jack to know how possible that is.
The alsa back-end is able to specify it's data format quite specifically and needs not do any conversion. I would be surprised if that wasn't a significant portion of the CPU difference you're seeing.
@roderickvd commented on GitHub (Mar 12, 2021):
Please give my work at #660 a go. In the current
devbranch there are two conversions going on, assuming you're using thelewtondecoder. Firstlewtonconverts fromf32toi16, thenjackaudioback fromi16tof32again. My branch keeps everything inf32without the back-and-forth conversions.@roderickvd commented on GitHub (Jun 14, 2021):
Closing after no replies.