[GH-ISSUE #112] Cleaning up the blocking module #34

Closed
opened 2026-02-27 20:22:43 +03:00 by kerem · 7 comments
Owner

Originally created by @marioortizmanero on GitHub (Aug 15, 2020).
Original GitHub issue: https://github.com/ramsayleung/rspotify/issues/112

Originally assigned to: @marioortizmanero on GitHub.

This is a continuation of #102, which discussed how to properly export the blocking module without copy-pasting the async functions and removing async and await. The current solution makes it painful to maintain rspotify, and compilation times when using the blocking client are huge because you're basically compiling rspotify twice, which kinda defeats the point of #108.

I first thought that using an attribute macro could be a good idea because all that needs to be modified from the async implementations is removing the async and await keywords. See this comment for a demonstration. But this still doesn't fix the "compiling the codebase twice" issue, because the code would still be repeated, now after the macro is ran instead of manually. The macro in question would make compilation times even longer, and could be messy to implement. The only possible way to only "compile the codebase once" would be by having the blocking feature only export the blocking interface when used, but I'm not sure we can assume that the users won't need both the async and blocking interfaces for a program.

The second possible solution is calling the async implementations in the blocking functions using block_on. This might sound less efficient at first, but here's a comparison I made to prove my point:

use std::time::Instant;

async fn original() -> Result<String, reqwest::Error> {
    reqwest::get("https://www.rust-lang.org")
        .await?
        .text()
        .await
}

fn with_copypaste() -> Result<String, reqwest::Error> {
    reqwest::blocking::get("https://www.rust-lang.org")?
        .text()
}

fn with_block_on() -> Result<String, reqwest::Error> {
    let mut rt = tokio::runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap();

    rt.block_on(async move {
        original().await
    })
}

fn main() {
    let now = Instant::now();
    print!("benchmarking with copypaste ...");
    for _ in 1..100 {
        with_copypaste().unwrap();
    }
    println!("done in {}ms", now.elapsed().as_millis());

    let now = Instant::now();
    print!("benchmarking with block_on ...");
    for _ in 1..100 {
        with_block_on().unwrap();
    }
    println!("done in {}ms", now.elapsed().as_millis());
}
❯ cargo run --release
    Finished release [optimized] target(s) in 0.04s
     Running `/home/mario/.cache/cargo/release/macros`
benchmarking with copypaste ...done in 31988ms
benchmarking with block_on ...done in 30250ms

What? block_on was just as fast as the copypaste? Turns out that this is more or less what reqwest does with its own blocking module!. So both the copypaste and the runtime solutions are basically doing the same. This way we don't even need the blocking feature in reqwest, and rspotify will only be "compiled once".

This solution could use a smaller macro that automatically generates the boilerplate block_on functions to avoid some repetition.

The best part is that the example isn't even properly optimized. It generates a runtime every time the function is called. If it were global, it would only have to be initialized once. Here's a more complex and improvable (since it doesn't seem to be actually faster for some reason) version with lazy_static:

use std::time::Instant;
use std::sync::{Arc, Mutex};

use lazy_static::lazy_static;
use tokio::runtime;

async fn original() -> Result<String, reqwest::Error> {
    reqwest::get("https://www.rust-lang.org")
        .await?
        .text()
        .await
}

fn with_copypaste() -> Result<String, reqwest::Error> {
    reqwest::blocking::get("https://www.rust-lang.org")?
        .text()
}

lazy_static! {
    // Mutex to have mutable access and Arc so that it's thread-safe.
    static ref RT: Arc<Mutex<runtime::Runtime>> = Arc::new(Mutex::new(runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap()));
}

fn with_block_on() -> Result<String, reqwest::Error> {
    RT.lock().unwrap().block_on(async move {
        original().await
    })
}

fn main() {
    let now = Instant::now();
    print!("benchmarking with copypaste ...");
    for _ in 1..100 {
        with_copypaste().unwrap();
    }
    println!("done in {}ms", now.elapsed().as_millis());

    let now = Instant::now();
    print!("benchmarking with block_on ...");
    for _ in 1..100 {
        with_block_on().unwrap();
    }
    println!("done in {}ms", now.elapsed().as_millis());
}
❯ cargo run --release
   Compiling macros v0.1.0 (/tmp/macros)
    Finished release [optimized] target(s) in 2.42s
     Running `/home/mario/.cache/cargo/release/macros`
benchmarking with copypaste ...done in 30172ms
benchmarking with block_on ...done in 30606ms
Originally created by @marioortizmanero on GitHub (Aug 15, 2020). Original GitHub issue: https://github.com/ramsayleung/rspotify/issues/112 Originally assigned to: @marioortizmanero on GitHub. This is a continuation of #102, which discussed how to properly export the `blocking` module without copy-pasting the async functions and removing `async` and `await`. The current solution makes it painful to maintain rspotify, and compilation times when using the blocking client are huge because you're basically compiling rspotify twice, which kinda defeats the point of #108. I first thought that using an attribute macro could be a good idea because all that needs to be modified from the async implementations is removing the `async` and `await` keywords. See [this comment](https://github.com/ramsayleung/rspotify/pull/102#issuecomment-660632560) for a demonstration. But this still doesn't fix the "compiling the codebase twice" issue, because the code would still be repeated, now after the macro is ran instead of manually. The macro in question would make compilation times even longer, and could be messy to implement. The only possible way to only "compile the codebase once" would be by having the `blocking` feature *only* export the blocking interface when used, but I'm not sure we can assume that the users won't need both the async and blocking interfaces for a program. The second possible solution is calling the async implementations in the blocking functions using `block_on`. This might sound less efficient at first, but here's a comparison I made to prove my point: ```rust use std::time::Instant; async fn original() -> Result<String, reqwest::Error> { reqwest::get("https://www.rust-lang.org") .await? .text() .await } fn with_copypaste() -> Result<String, reqwest::Error> { reqwest::blocking::get("https://www.rust-lang.org")? .text() } fn with_block_on() -> Result<String, reqwest::Error> { let mut rt = tokio::runtime::Builder::new() .basic_scheduler() .enable_all() .build() .unwrap(); rt.block_on(async move { original().await }) } fn main() { let now = Instant::now(); print!("benchmarking with copypaste ..."); for _ in 1..100 { with_copypaste().unwrap(); } println!("done in {}ms", now.elapsed().as_millis()); let now = Instant::now(); print!("benchmarking with block_on ..."); for _ in 1..100 { with_block_on().unwrap(); } println!("done in {}ms", now.elapsed().as_millis()); } ``` ``` ❯ cargo run --release Finished release [optimized] target(s) in 0.04s Running `/home/mario/.cache/cargo/release/macros` benchmarking with copypaste ...done in 31988ms benchmarking with block_on ...done in 30250ms ``` What? `block_on` was just as fast as the copypaste? [Turns out that this is more or less what `reqwest` does with its own `blocking` module!](https://github.com/seanmonstar/reqwest/blob/master/src/blocking/client.rs#L723). So both the copypaste and the runtime solutions are basically doing the same. This way we don't even need the `blocking` feature in `reqwest`, and rspotify will only be "compiled once". This solution could use a smaller macro that automatically generates the boilerplate `block_on` functions to avoid some repetition. The best part is that the example isn't even properly optimized. It generates a runtime every time the function is called. If it were global, it would only have to be initialized once. Here's a more complex and improvable (since it doesn't seem to be actually faster for some reason) version with `lazy_static`: ```rust use std::time::Instant; use std::sync::{Arc, Mutex}; use lazy_static::lazy_static; use tokio::runtime; async fn original() -> Result<String, reqwest::Error> { reqwest::get("https://www.rust-lang.org") .await? .text() .await } fn with_copypaste() -> Result<String, reqwest::Error> { reqwest::blocking::get("https://www.rust-lang.org")? .text() } lazy_static! { // Mutex to have mutable access and Arc so that it's thread-safe. static ref RT: Arc<Mutex<runtime::Runtime>> = Arc::new(Mutex::new(runtime::Builder::new() .basic_scheduler() .enable_all() .build() .unwrap())); } fn with_block_on() -> Result<String, reqwest::Error> { RT.lock().unwrap().block_on(async move { original().await }) } fn main() { let now = Instant::now(); print!("benchmarking with copypaste ..."); for _ in 1..100 { with_copypaste().unwrap(); } println!("done in {}ms", now.elapsed().as_millis()); let now = Instant::now(); print!("benchmarking with block_on ..."); for _ in 1..100 { with_block_on().unwrap(); } println!("done in {}ms", now.elapsed().as_millis()); } ``` ``` ❯ cargo run --release Compiling macros v0.1.0 (/tmp/macros) Finished release [optimized] target(s) in 2.42s Running `/home/mario/.cache/cargo/release/macros` benchmarking with copypaste ...done in 30172ms benchmarking with block_on ...done in 30606ms ```
kerem 2026-02-27 20:22:43 +03:00
Author
Owner

@ramsayleung commented on GitHub (Aug 16, 2020):

Totally agree with your point, the copy-paste blocking module makes rspotify painful to maintain, you have to maintain two versions of code if you are adding a new endpoint.

This solution could use a smaller macro that automatically generates the boilerplate block_on functions to avoid some repetition.

This solution is good, we don't have generate code by hand, can't help to see it.

And there is one more thing which we should keep eyes on, will state of blocking call become more complicated to maintain?

After discussion in this PR https://github.com/ramsayleung/rspotify/pull/102, I come out with an intuition almost as same as this:

fn with_block_on() -> Result<String, reqwest::Error> {
    let mut rt = tokio::runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap();

    rt.block_on(async move {
        original().await
    })
}

And my problem is that is there more state-machine and error we have to take care of compared with copy-paste code, because we plan to introduce tokei runtime to handle the execution of calling reqwest function.

This way we don't even need the blocking feature in reqwest

Yeah, but I think it's some kind of thing like:

Andy gives, Bill takes away.

reqwest gives, tokio takes away :)

<!-- gh-comment-id:674464715 --> @ramsayleung commented on GitHub (Aug 16, 2020): Totally agree with your point, the copy-paste blocking module makes `rspotify` painful to maintain, you have to maintain two versions of code if you are adding a new endpoint. > This solution could use a smaller macro that automatically generates the boilerplate block_on functions to avoid some repetition. This solution is good, we don't have generate code by hand, can't help to see it. And there is one more thing which we should keep eyes on, will state of blocking call become more complicated to maintain? After discussion in this PR https://github.com/ramsayleung/rspotify/pull/102, I come out with an intuition almost as same as this: ```rust fn with_block_on() -> Result<String, reqwest::Error> { let mut rt = tokio::runtime::Builder::new() .basic_scheduler() .enable_all() .build() .unwrap(); rt.block_on(async move { original().await }) } ``` And my problem is that is there more state-machine and error we have to take care of compared with `copy-paste` code, because we plan to introduce `tokei runtime` to handle the execution of calling `reqwest` function. > This way we don't even need the blocking feature in reqwest Yeah, but I think it's some kind of thing like: > Andy gives, Bill takes away. `reqwest` gives, `tokio` takes away :)
Author
Owner

@marioortizmanero commented on GitHub (Aug 16, 2020):

And my problem is that is there more state-machine and error we have to take care of compared with copy-paste code, because we plan to introduce tokei runtime to handle the execution of calling reqwest function.

I don't understand what you mean with that. The way I see it is that performance/complexity-wise it's basically the same as the copy paste version. reqwest also introduces a blocking tokio runtime under the hood, with an even more complicated method to run the blocking code than just using block_on. We have less code to take care of than the copy-paste version because we only have to call runtime.block_on(async move { ... }): the implementation only needs to be written once.

I'll start working on this once #95, #110 and #113 are finished, but I still need to figure out how to only create a single global runtime and how to make it more performant.

<!-- gh-comment-id:674520637 --> @marioortizmanero commented on GitHub (Aug 16, 2020): > And my problem is that is there more state-machine and error we have to take care of compared with copy-paste code, because we plan to introduce tokei runtime to handle the execution of calling reqwest function. I don't understand what you mean with that. The way I see it is that performance/complexity-wise it's basically the same as the copy paste version. [`reqwest` also introduces a blocking tokio runtime under the hood](https://github.com/seanmonstar/reqwest/blob/master/src/blocking/client.rs#L723), with an even more complicated method to run the blocking code than just using `block_on`. We have less code to take care of than the copy-paste version because we only have to call `runtime.block_on(async move { ... })`: the implementation only needs to be written once. I'll start working on this once #95, #110 and #113 are finished, but I still need to figure out how to only create a single global runtime and how to make it more performant.
Author
Owner

@ramsayleung commented on GitHub (Aug 16, 2020):

Reqwest also introduces a blocking tokio runtime under the hood, with an even more complicated method to run the blocking code than just using block_on

Yeap, I get your point

but I still need to figure out how to only create a single global runtime and how to make it more performant.

How about the Singleton design pattern, would it be helpful?

<!-- gh-comment-id:674521229 --> @ramsayleung commented on GitHub (Aug 16, 2020): > Reqwest also introduces a blocking tokio runtime under the hood, with an even more complicated method to run the blocking code than just using block_on Yeap, I get your point > but I still need to figure out how to only create a single global runtime and how to make it more performant. How about the Singleton design pattern, would it be helpful?
Author
Owner

@marioortizmanero commented on GitHub (Aug 16, 2020):

How about the Singleton design pattern, would it be helpful?

I'm not a big fan of singletons but I guess it's the only way to do it in this case. Isn't what I suggested with lazy_static a singleton anyway? What I don't understand is why the singleton performs worse than using multiple runtimes, perhaps the Arc<Mutex<T>> overhead is greater than instantiating a new runtime? :/

<!-- gh-comment-id:674521818 --> @marioortizmanero commented on GitHub (Aug 16, 2020): > How about the Singleton design pattern, would it be helpful? I'm not a big fan of singletons but I guess it's the only way to do it in this case. Isn't what I suggested with `lazy_static` a singleton anyway? What I don't understand is why the singleton performs worse than using multiple runtimes, perhaps the `Arc<Mutex<T>>` overhead is greater than instantiating a new runtime? :/
Author
Owner

@Darksonn commented on GitHub (Aug 29, 2020):

I recommend using a shared runtime, but with a small change compared to what you have now:

  1. Use a threaded runtime with one thread.
  2. Do not put it in an Arc<Mutex<...>>
  3. Use Handle::block_on to block on many things concurrently.

Alternatively you could spawn the operation on the shared runtime and use Handle::block_on on the JoinHandle returned by tokio::spawn.

I'm not sure which of the two are better.

As slight variation on the above, you can use a basic scheduler, but spawn a thread that calls block on, waiting on e.g. a channel with jobs. This is what reqwest's blocking module does afaik.

<!-- gh-comment-id:683249563 --> @Darksonn commented on GitHub (Aug 29, 2020): I recommend using a shared runtime, but with a small change compared to what you have now: 1. Use a threaded runtime with one thread. 2. Do not put it in an `Arc<Mutex<...>>` 3. Use `Handle::block_on` to block on many things concurrently. Alternatively you could spawn the operation on the shared runtime and use `Handle::block_on` on the `JoinHandle` returned by `tokio::spawn`. I'm not sure which of the two are better. As slight variation on the above, you can use a basic scheduler, but spawn a thread that calls block on, waiting on e.g. a channel with jobs. This is what reqwest's blocking module does afaik.
Author
Owner

@marioortizmanero commented on GitHub (Aug 29, 2020):

Thanks for the recommendations, @Darksonn!

I'm extending the initial benchmark with all the possibilities. The benchmark obviously isn't reliable enough for the choice, but I think it gives an idea of how the performance is affected:

main.rs
use std::time::Instant;
use std::sync::{Arc, Mutex};
use std::thread;

use lazy_static::lazy_static;
use tokio::runtime;

async fn original() -> Result<String, reqwest::Error> {
    reqwest::get("https://www.rust-lang.org")
        .await?
        .text()
        .await
}

fn with_copypaste() -> Result<String, reqwest::Error> {
    reqwest::blocking::get("https://www.rust-lang.org")?
        .text()
}

lazy_static! {
    // Mutex to have mutable access and Arc so that it's thread-safe.
    static ref RT_THREADING: Arc<Mutex<runtime::Runtime>> = Arc::new(Mutex::new(runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap()));

    // Without mutexes, it should be used with `handle`.
    static ref RT: runtime::Runtime = runtime::Runtime::new().unwrap();
}

fn with_block_on_local_runtime() -> Result<String, reqwest::Error> {
    let mut rt = tokio::runtime::Builder::new()
        .basic_scheduler()
        .enable_all()
        .build()
        .unwrap();

    rt.block_on(async move {
        original().await
    })
}

fn with_block_on_threaded_runtime() -> Result<String, reqwest::Error> {
    RT_THREADING.lock().unwrap().block_on(async move {
        original().await
    })
}

fn with_block_on_handle() -> Result<String, reqwest::Error> {
    RT.handle().block_on(async move {
        original().await
    })
}

fn with_block_on_spawn_handle() -> Result<String, reqwest::Error> {
    RT.handle().block_on(async {
        RT.spawn(async move {
            original().await
        }).await.unwrap()
    })
}

fn main() {
    macro_rules! benchmark {
        ($name:ident) => {
            let mut total = 0;
            let now = Instant::now();
            print!("benchmarking {} single-threaded ...", stringify!($name));

            for _ in 1..50 {
                $name().unwrap();
            }

            total += now.elapsed().as_millis();
            println!(" done in {}ms", now.elapsed().as_millis());

            let now = Instant::now();
            print!("benchmarking {} multi-threaded ...", stringify!($name));

            // multi threaded
            let mut handles = Vec::new();
            for _ in 1..50 {
                handles.push(thread::spawn(|| {
                    $name().unwrap();
                }))
            }
            for handle in handles {
                handle.join().unwrap();
            }

            total += now.elapsed().as_millis();
            println!(" done in {}ms", now.elapsed().as_millis());

            println!(">> total time taken: {}ms", total);
        }
    }

    benchmark!(with_copypaste);
    benchmark!(with_block_on_local_runtime);
    benchmark!(with_block_on_threaded_runtime);
    benchmark!(with_block_on_handle);
    benchmark!(with_block_on_spawn_handle);
}
Cargo.toml
[package]
name = "benchmarks"
version = "0.1.0"
authors = ["Mario Ortiz Manero <marioortizmanero@gmail.com>"]
edition = "2018"

[dependencies]
reqwest = { version = "0.10.8", features = ["blocking"] }
tokio = { version = "0.2.22", features = ["full"] }
lazy_static = "1.4.0"

Results on an Intel Pentium G4560 (4) @ 3.500GHz (which, to be fair, is a terrible CPU, specially when multithreading). It'd be a good idea to run the benchmark on more systems:

benchmarking with_copypaste single-threaded ... done in 17664ms
benchmarking with_copypaste multi-threaded ... done in 6231ms
>> total time taken: 23895ms
benchmarking with_block_on_local_runtime single-threaded ... done in 17636ms
benchmarking with_block_on_local_runtime multi-threaded ... done in 10506ms
>> total time taken: 28142ms
benchmarking with_block_on_threaded_runtime single-threaded ... done in 21804ms
benchmarking with_block_on_threaded_runtime multi-threaded ... done in 17705ms
>> total time taken: 39509ms
benchmarking with_block_on_handle single-threaded ... done in 16935ms
benchmarking with_block_on_handle multi-threaded ... done in 6104ms
>> total time taken: 23039ms
benchmarking with_block_on_spawn_handle single-threaded ... done in 16600ms
benchmarking with_block_on_spawn_handle multi-threaded ... done in 5886ms
>> total time taken: 22486ms

The reqwest solution you suggested is basically the same as with_copypaste, since it's what it uses under the hood.

<!-- gh-comment-id:683266508 --> @marioortizmanero commented on GitHub (Aug 29, 2020): Thanks for the recommendations, @Darksonn! I'm extending the initial benchmark with all the possibilities. The benchmark obviously isn't reliable enough for the choice, but I think it gives an idea of how the performance is affected: <details> <summary>main.rs</summary> ```rust use std::time::Instant; use std::sync::{Arc, Mutex}; use std::thread; use lazy_static::lazy_static; use tokio::runtime; async fn original() -> Result<String, reqwest::Error> { reqwest::get("https://www.rust-lang.org") .await? .text() .await } fn with_copypaste() -> Result<String, reqwest::Error> { reqwest::blocking::get("https://www.rust-lang.org")? .text() } lazy_static! { // Mutex to have mutable access and Arc so that it's thread-safe. static ref RT_THREADING: Arc<Mutex<runtime::Runtime>> = Arc::new(Mutex::new(runtime::Builder::new() .basic_scheduler() .enable_all() .build() .unwrap())); // Without mutexes, it should be used with `handle`. static ref RT: runtime::Runtime = runtime::Runtime::new().unwrap(); } fn with_block_on_local_runtime() -> Result<String, reqwest::Error> { let mut rt = tokio::runtime::Builder::new() .basic_scheduler() .enable_all() .build() .unwrap(); rt.block_on(async move { original().await }) } fn with_block_on_threaded_runtime() -> Result<String, reqwest::Error> { RT_THREADING.lock().unwrap().block_on(async move { original().await }) } fn with_block_on_handle() -> Result<String, reqwest::Error> { RT.handle().block_on(async move { original().await }) } fn with_block_on_spawn_handle() -> Result<String, reqwest::Error> { RT.handle().block_on(async { RT.spawn(async move { original().await }).await.unwrap() }) } fn main() { macro_rules! benchmark { ($name:ident) => { let mut total = 0; let now = Instant::now(); print!("benchmarking {} single-threaded ...", stringify!($name)); for _ in 1..50 { $name().unwrap(); } total += now.elapsed().as_millis(); println!(" done in {}ms", now.elapsed().as_millis()); let now = Instant::now(); print!("benchmarking {} multi-threaded ...", stringify!($name)); // multi threaded let mut handles = Vec::new(); for _ in 1..50 { handles.push(thread::spawn(|| { $name().unwrap(); })) } for handle in handles { handle.join().unwrap(); } total += now.elapsed().as_millis(); println!(" done in {}ms", now.elapsed().as_millis()); println!(">> total time taken: {}ms", total); } } benchmark!(with_copypaste); benchmark!(with_block_on_local_runtime); benchmark!(with_block_on_threaded_runtime); benchmark!(with_block_on_handle); benchmark!(with_block_on_spawn_handle); } ``` </details> <details> <summary>Cargo.toml</summary> ```toml [package] name = "benchmarks" version = "0.1.0" authors = ["Mario Ortiz Manero <marioortizmanero@gmail.com>"] edition = "2018" [dependencies] reqwest = { version = "0.10.8", features = ["blocking"] } tokio = { version = "0.2.22", features = ["full"] } lazy_static = "1.4.0" ``` </details> Results on an Intel Pentium G4560 (4) @ 3.500GHz (which, to be fair, is a terrible CPU, specially when multithreading). It'd be a good idea to run the benchmark on more systems: ``` benchmarking with_copypaste single-threaded ... done in 17664ms benchmarking with_copypaste multi-threaded ... done in 6231ms >> total time taken: 23895ms benchmarking with_block_on_local_runtime single-threaded ... done in 17636ms benchmarking with_block_on_local_runtime multi-threaded ... done in 10506ms >> total time taken: 28142ms benchmarking with_block_on_threaded_runtime single-threaded ... done in 21804ms benchmarking with_block_on_threaded_runtime multi-threaded ... done in 17705ms >> total time taken: 39509ms benchmarking with_block_on_handle single-threaded ... done in 16935ms benchmarking with_block_on_handle multi-threaded ... done in 6104ms >> total time taken: 23039ms benchmarking with_block_on_spawn_handle single-threaded ... done in 16600ms benchmarking with_block_on_spawn_handle multi-threaded ... done in 5886ms >> total time taken: 22486ms ``` The `reqwest` solution you suggested is basically the same as `with_copypaste`, since it's what it uses under the hood.
Author
Owner

@ramsayleung commented on GitHub (Aug 29, 2020):

It'd be a good idea to run the benchmark on more systems.

Perhaps we could do that with the Github Action with runs-on label?

<!-- gh-comment-id:683269900 --> @ramsayleung commented on GitHub (Aug 29, 2020): > It'd be a good idea to run the benchmark on more systems. Perhaps we could do that with the `Github Action` with `runs-on` label?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/rspotify#34
No description provided.