[PR #459] [CLOSED] Refactor SpotifyId #937

Closed
opened 2026-02-27 20:00:31 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/librespot-org/librespot/pull/459
Author: @alcore
Created: 4/10/2020
Status: Closed

Base: devHead: refactor-id


📝 Commits (1)

📊 Changes

1 file changed (+362 additions, -58 deletions)

View changed files

📝 core/src/spotify_id.rs (+362 -58)

📄 Description

This PR includes a refactoring of SpotifyId.

In my particular use case I am dealing with a relatively large amount of track IDs (GBs) as playback history - those need to be base62 encoded onto the wire in batches of up to 100 and the current encoding algo was at my QPS prohibitively expensive as it performs 128-bit division + modulo, which compiles down to an unoptimized runtime call.

It now instead uses 64-bit arithmetic (which could relatively trivially be optimized for 32-bit arches behind a build condition) with an algo in use in the cryptocurrency ecosystem for their base58 encoding, further optimized for our case.

On rustc 1.44.0-nightly (b543afca9 2020-04-05), Windows 10, i7 4770k @ 4.4GHz (windows/x86-64) the result is as follows:

benchmark          old ns/op     new ns/op     speedup
to_base62()           3343.2         164.6        ~20x

I then went ahead to pick some further low hanging fruit in the other methods, without resorting to LUTs nor SIMD intrinsics. I.e. the change to the base62 encoding algo is the only non-trivial piece of code contained.

benchmark          old ns/op     new ns/op     speedup
from_base62()          362.7          31.2        ~11x
to_base16()            183.4          75.2         ~2x
from_base16()          144.3          27.8         ~5x
from_uri()             759.9          38.3        ~20x
FileId::to_base16()   1905.4          72.0        ~26x

from_uri() had an overhead of ~400ns/op on top of its from_base62() call, which is now reduced to 7ns, with some trivial inline parsing based on our knowledge of the inputs. FileId gain was primarily due to it needlessly allocating 20 Strings - one for each byte.

I included simple tests and rudimentary docs, which felt less welcome than the code - given there are none of either in the entire codebase ;-)


Rationale

I realize librespot itself in a player scenario barely uses those codepaths and that the PR introduces a bit of additional complexity. I would still like to be able to use this as a library and reuse the types in my project, given they do what I want -- just not efficiently enough.

Additional

  • This conflicts with the change proposed in https://github.com/librespot-org/librespot/pull/440 as it includes a to_uri() method. However, it moves part of the logic to the SpotifyAudioType enum (and is 100ns faster, albeit less idiomatic) where I feel it should belong;
  • If breaking API changes are perfectly fine pre-1.0, I'd like to - in a subsequent PR - as suggested in https://github.com/librespot-org/librespot/pull/381#issuecomment-539900059, add a SpotifyObject struct to encapsulate the ID and object type instead.
    This would involve:
    • swapping the SpotifyAudioType for an SpotifyObjectType enum and filling it with other known types (including non-playable ones), keeping it unitary (with a placeholder Unknown tag for unrecognized types);
    • moving to/from_uri() from SpotifyId to SpotifyObject. For consistency I'd love to rename ID to SpotifyObjectId in that case as well;
    • adding a SpotifyContext to encapsulate a SpotifyObjectType in a given context (including the URI);
    • From and Into implementations could be given and be clear. In this PR I refrained as it is currently not clear whether a SpotifyId should convert to/from an URI or the base62 encoding. That is, if bumping MSRV is a go, as those need to be TryInto/TryFrom;

A 'go' for this would be welcome as I'm eager to get coding.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/librespot-org/librespot/pull/459 **Author:** [@alcore](https://github.com/alcore) **Created:** 4/10/2020 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `refactor-id` --- ### 📝 Commits (1) - [`8d46305`](https://github.com/librespot-org/librespot/commit/8d4630529be744fa4588f2cdf42ef0590571cfe9) Refactor SpotifyId ### 📊 Changes **1 file changed** (+362 additions, -58 deletions) <details> <summary>View changed files</summary> 📝 `core/src/spotify_id.rs` (+362 -58) </details> ### 📄 Description This PR includes a refactoring of `SpotifyId`. In my particular use case I am dealing with a relatively large amount of track IDs (GBs) as playback history - those need to be base62 encoded onto the wire in batches of up to 100 and the current encoding algo was at my QPS prohibitively expensive as it performs 128-bit division + modulo, which compiles down to an unoptimized runtime call. It now instead uses 64-bit arithmetic (which could relatively trivially be optimized for 32-bit arches behind a build condition) with an algo in use in the cryptocurrency ecosystem for their base58 encoding, further optimized for our case. On `rustc 1.44.0-nightly (b543afca9 2020-04-05), Windows 10, i7 4770k @ 4.4GHz (windows/x86-64)` the result is as follows: ``` benchmark old ns/op new ns/op speedup to_base62() 3343.2 164.6 ~20x ``` I then went ahead to pick some further low hanging fruit in the other methods, without resorting to LUTs nor SIMD intrinsics. I.e. the change to the base62 encoding algo is the only non-trivial piece of code contained. ``` benchmark old ns/op new ns/op speedup from_base62() 362.7 31.2 ~11x to_base16() 183.4 75.2 ~2x from_base16() 144.3 27.8 ~5x from_uri() 759.9 38.3 ~20x FileId::to_base16() 1905.4 72.0 ~26x ``` `from_uri()` had an overhead of ~400ns/op on top of its `from_base62()` call, which is now reduced to 7ns, with some trivial inline parsing based on our knowledge of the inputs. FileId gain was primarily due to it needlessly allocating 20 Strings - one for each byte. I included simple tests and rudimentary docs, which felt less welcome than the code - given there are none of either in the entire codebase ;-) ---- ### Rationale I realize librespot itself in a player scenario barely uses those codepaths and that the PR introduces a bit of additional complexity. I would still like to be able to use this as a library and reuse the types in my project, given they do what I want -- just not efficiently enough. ### Additional - This conflicts with the change proposed in https://github.com/librespot-org/librespot/pull/440 as it includes a `to_uri()` method. However, it moves part of the logic to the `SpotifyAudioType` enum (and is 100ns faster, albeit less idiomatic) where I feel it should belong; - If breaking API changes are perfectly fine pre-1.0, I'd like to - in a subsequent PR - as suggested in https://github.com/librespot-org/librespot/pull/381#issuecomment-539900059, add a `SpotifyObject` struct to encapsulate the ID and object type instead. This would involve: * swapping the `SpotifyAudioType` for an `SpotifyObjectType` enum and filling it with other known types (including non-playable ones), keeping it unitary (with a placeholder `Unknown` tag for unrecognized types); * moving to/from_uri() from `SpotifyId` to `SpotifyObject`. For consistency I'd love to rename ID to `SpotifyObjectId` in that case as well; * adding a `SpotifyContext` to encapsulate a `SpotifyObjectType` in a given context (including the URI); * From and Into implementations could be given and be clear. In this PR I refrained as it is currently not clear whether a `SpotifyId` should convert to/from an URI or the base62 encoding. That is, if bumping MSRV is a go, as those need to be TryInto/TryFrom; A 'go' for this would be welcome as I'm eager to get coding. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-02-27 20:00:31 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/librespot#937
No description provided.