[GH-ISSUE #343] Significanly higher CPU usage with JACK audio backend vs. ALSA backend on ARM #226

Closed
opened 2026-02-27 19:29:31 +03:00 by kerem · 3 comments
Owner

Originally created by @markubiak on GitHub (Jul 1, 2019).
Original GitHub issue: https://github.com/librespot-org/librespot/issues/343

I'm trying to track down a bug causing high CPU usage when using the JACK backend. When running librespot with identical options and only changing the backend from ALSA to JACK, the process goes from ~20% CPU usage to ~50% CPU while playing music on both my Pi 2 and Pi 3 running Raspbian. This increase is seen on the exact same binary (just changing the backend with the command line). For a non-Pi reference, my SoCFPGA development board with a dual-core Cortex A9 (Cyclone 5) runs librespot at under 10% CPU usage with the JACK backend. A9s are faster, but not that much. As JACK runs with all samples as single-precision floats by default, I have the theory that the difference is due to an improper usage of the Cortex A53's floating point hardware, as the main jackd process and a custom IIR filterbank engine for REW are both running with very low CPU usage despite performing a not insignificant number of FLOPS (20 biquads per channel + mixing) resulting in under 20% CPU usage combined according to top. It doesn't make any sense that the JACK backend of librespot would use significantly more than that.

I have mostly been toying with compiler flags. I make sure to compile the JACK code with:
-mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits
for both C and C++. I am much less familiar with Rust, but I'm very confident I'm correctly passing floating point optimizations for the armv7-unknown-linux-gnueabihf target:
RUSTFLAGS="-C target-cpu=cortex-a53 -C target-feature=+v8,+vfp4,+neon,-d16"
(the -d16 is to resolve an issue with LLVM register allocation with NEON enabled, this reassures me that NEON support is actually in the binaries).

I'm very curious to hear if anyone has been troubleshooting or has resolved a similar issue. If you want to test, feel free to clone my raspotify fork and build my docker image. On my machine, it's just a docker build . and then a docker run -it <image>. Just mark the build.sh as executable, run it, and get a coffee. The binary is built against the version of jack2 in the Raspbian repos so just make sure you have jackd2 installed if you're also testing on a Pi.

Sincerely appreciate any help!

EDIT: Also wanted to add that I've tried a native build (on the Pi 3 itself) and saw no changes

Originally created by @markubiak on GitHub (Jul 1, 2019). Original GitHub issue: https://github.com/librespot-org/librespot/issues/343 I'm trying to track down a bug causing high CPU usage when using the JACK backend. When running librespot with identical options and only changing the backend from ALSA to JACK, the process goes from ~20% CPU usage to ~50% CPU while playing music on both my Pi 2 and Pi 3 running Raspbian. This increase is seen on the exact same binary (just changing the backend with the command line). For a non-Pi reference, my SoCFPGA development board with a dual-core Cortex A9 (Cyclone 5) runs librespot at under 10% CPU usage with the JACK backend. A9s are faster, but not that much. As JACK runs with all samples as single-precision floats by default, I have the theory that the difference is due to an improper usage of the Cortex A53's floating point hardware, as the main jackd process and a custom IIR filterbank engine for REW are both running with very low CPU usage despite performing a not insignificant number of FLOPS (20 biquads per channel + mixing) resulting in under 20% CPU usage combined according to `top`. It doesn't make any sense that the JACK backend of librespot would use significantly more than that. I have mostly been toying with compiler flags. I make sure to compile the JACK code with: `-mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits` for both C and C++. I am much less familiar with Rust, but I'm very confident I'm correctly passing floating point optimizations for the armv7-unknown-linux-gnueabihf target: `RUSTFLAGS="-C target-cpu=cortex-a53 -C target-feature=+v8,+vfp4,+neon,-d16"` (the -d16 is to resolve an issue with LLVM register allocation with NEON enabled, this reassures me that NEON support is actually in the binaries). I'm very curious to hear if anyone has been troubleshooting or has resolved a similar issue. If you want to test, feel free to [clone my raspotify fork](https://github.com/markubiak/raspotify) and build my docker image. On my machine, it's just a `docker build .` and then a `docker run -it <image>`. Just mark the build.sh as executable, run it, and get a coffee. The binary is built against the version of jack2 in the Raspbian repos so just make sure you have jackd2 installed if you're also testing on a Pi. Sincerely appreciate any help! EDIT: Also wanted to add that I've tried a native build (on the Pi 3 itself) and saw no changes
kerem 2026-02-27 19:29:31 +03:00
  • closed this issue
  • added the
    audio
    label
Author
Owner

@willstott101 commented on GitHub (Jul 12, 2019):

Hm, librespot's code for Jack is much more complicated than ALSA thanks to it's callback mechanism, and Librespot's architecture.

Briefly glancing at the code it looks like the Jack back-end communicates with two 32bit float ports. It would be interesting to find out if it's possible to use a single interlaced stereo 16bit int port through to the Jack server - the format of decoded spotify audio. I'm not familiar enough with Jack to know how possible that is.

The alsa back-end is able to specify it's data format quite specifically and needs not do any conversion. I would be surprised if that wasn't a significant portion of the CPU difference you're seeing.

<!-- gh-comment-id:510982390 --> @willstott101 commented on GitHub (Jul 12, 2019): Hm, librespot's code for Jack is much more complicated than ALSA thanks to it's callback mechanism, and Librespot's architecture. Briefly glancing at the code it looks like the Jack back-end communicates with two 32bit float ports. It would be interesting to find out if it's possible to use a single interlaced stereo 16bit int port through to the Jack server - the format of decoded spotify audio. I'm not familiar enough with Jack to know how possible that is. The alsa back-end is able to specify it's data format quite specifically and needs not do any conversion. I would be surprised if that wasn't a significant portion of the CPU difference you're seeing.
Author
Owner

@roderickvd commented on GitHub (Mar 12, 2021):

Please give my work at #660 a go. In the current dev branch there are two conversions going on, assuming you're using the lewton decoder. First lewton converts from f32 to i16, then jackaudio back from i16 to f32 again. My branch keeps everything in f32 without the back-and-forth conversions.

<!-- gh-comment-id:797806816 --> @roderickvd commented on GitHub (Mar 12, 2021): Please give my work at #660 a go. In the current `dev` branch there are two conversions going on, assuming you're using the `lewton` decoder. First `lewton` converts from `f32` to `i16`, then `jackaudio` back from `i16` to `f32` again. My branch keeps everything in `f32` without the back-and-forth conversions.
Author
Owner

@roderickvd commented on GitHub (Jun 14, 2021):

Closing after no replies.

<!-- gh-comment-id:860573996 --> @roderickvd commented on GitHub (Jun 14, 2021): Closing after no replies.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/librespot#226
No description provided.