[GH-ISSUE #152] human-readable random url #92

Open
opened 2026-02-27 10:15:41 +03:00 by kerem · 6 comments
Owner

Originally created by @mokurin000 on GitHub (Apr 5, 2025).
Original GitHub issue: https://github.com/matze/wastebin/issues/152

Example program:

use cgisf_lib::SentenceConfigBuilder;

fn main() {
    let sentence = cgisf_lib::gen_sentence(
        SentenceConfigBuilder::random()
            .plural(false)
            .adjectives(1)
            .adverbs(1)
            .structure(cgisf_lib::Structure::AdjectivesNounVerbAdverbs)
            .build(),
    );
    let sentence = sentence
        .replace("The ", "")
        .trim_end_matches(".")
        .replace(" ", "-");
    println!("{sentence}");
}

Would output things like:

rust-kiss-alarms-stealthily

Compared to traditional random hash strings (e.g., "GSlZNwBUGKi"), it achieves semantic structural composition with these advantages:

  • Memorability - Uses natural language elements (adjective+noun+verb+adverb) that align with human memory patterns.
  • Readability - Word combinations form pseudo-sentence structures (e.g., "rust-kiss-alarms-stealthily" could be interpreted as "rusty kisses stealthily alarm").
Originally created by @mokurin000 on GitHub (Apr 5, 2025). Original GitHub issue: https://github.com/matze/wastebin/issues/152 Example program: ```rust use cgisf_lib::SentenceConfigBuilder; fn main() { let sentence = cgisf_lib::gen_sentence( SentenceConfigBuilder::random() .plural(false) .adjectives(1) .adverbs(1) .structure(cgisf_lib::Structure::AdjectivesNounVerbAdverbs) .build(), ); let sentence = sentence .replace("The ", "") .trim_end_matches(".") .replace(" ", "-"); println!("{sentence}"); } ``` Would output things like: ```text rust-kiss-alarms-stealthily ``` Compared to traditional random hash strings (e.g., "GSlZNwBUGKi"), it achieves semantic structural composition with these advantages: - Memorability - Uses natural language elements (adjective+noun+verb+adverb) that align with human memory patterns. - Readability - Word combinations form pseudo-sentence structures (e.g., "rust-kiss-alarms-stealthily" could be interpreted as "rusty kisses stealthily alarm").
Author
Owner

@matze commented on GitHub (Apr 5, 2025):

In principle this is a good idea but your proposal would mean generating new identifiers incompatible with the existing ones. Unless there is some bijective function that allows mapping from and to existing 32/64 bit identifiers I don't know of. Storing additional string identifiers is not a viable alternative for me, I'd like to keep the database schema simple and lean.

<!-- gh-comment-id:2780874315 --> @matze commented on GitHub (Apr 5, 2025): In principle this is a good idea but your proposal would mean generating new identifiers incompatible with the existing ones. Unless there is some bijective function that allows mapping from and to existing 32/64 bit identifiers I don't know of. Storing additional string identifiers is not a viable alternative for me, I'd like to keep the database schema simple and lean.
Author
Owner

@cgzones commented on GitHub (Apr 5, 2025):

Not quite related but I though of adding alias identifiers with am unambiguous character set, e.g. the one from https://stackoverflow.com/a/58098360. This would avoid confusions of similar looking characters like I/1/l or 0/O.

The length would increase from currently 11 to roundup(ln(2^64) / ln(number-of-character := 23)) = 15.
To avoid clashes with current IDs they could be queried via /simple/{ID}.

<!-- gh-comment-id:2780963886 --> @cgzones commented on GitHub (Apr 5, 2025): Not quite related but I though of adding alias identifiers with am unambiguous character set, e.g. the one from https://stackoverflow.com/a/58098360. This would avoid confusions of similar looking characters like `I`/`1`/`l` or `0`/`O`. The length would increase from currently 11 to `roundup(ln(2^64) / ln(number-of-character := 23)) = 15`. To avoid clashes with current IDs they could be queried via `/simple/{ID}`.
Author
Owner

@mokurin000 commented on GitHub (Apr 5, 2025):

In principle this is a good idea but your proposal would mean generating new identifiers incompatible with the existing ones. Unless there is some bijective function that allows mapping from and to existing 32/64 bit identifiers I don't know of. Storing additional string identifiers is not a viable alternative for me, I'd like to keep the database schema simple and lean.

AFAIK the current id (the number)~url_path mapping approach is just some mask to get each 6 bits (or 2/4 bits), id's are generated from random i64 numbers. 1

I would suggest perform ahash on such short strings (with hardware-acceleration this would be faster than rustc-hash), and get a u64 by RandomState::hash_one

The only thing I am not sure, do we really need bidirectional mapping between the url path and the id number? For example, if a user access https://somedomain.tld/long-readable-string-url, we calculate the corresponding ID to query related data from database

By the way, as the current url parts could only be 6 chars or 11 chars, ensuring human-readable ids longer than 11 bytes could prevent possible collisions. Anyway due to the sentence contains 4 words, and it's mostly impossible to have four 2-alpha words, the length check is not even required

<!-- gh-comment-id:2781015868 --> @mokurin000 commented on GitHub (Apr 5, 2025): > In principle this is a good idea but your proposal would mean generating new identifiers incompatible with the existing ones. Unless there is some bijective function that allows mapping from and to existing 32/64 bit identifiers I don't know of. Storing additional string identifiers is not a viable alternative for me, I'd like to keep the database schema simple and lean. AFAIK the current id (the number)~url_path mapping approach is just some mask to get each 6 bits (or 2/4 bits), id's are generated from random i64 numbers. [^0] I would suggest perform [ahash](https://docs.rs/ahash/latest/ahash/) on such short strings (with hardware-acceleration this would be faster than rustc-hash), and get a u64 by [RandomState::hash_one](https://docs.rs/ahash/latest/ahash/random_state/struct.RandomState.html#method.hash_one) The only thing I am not sure, do we really need bidirectional mapping between the url path and the id number? For example, if a user access `https://somedomain.tld/long-readable-string-url`, we calculate the corresponding ID to query related data from database By the way, as the current url parts could only be 6 chars or 11 chars, ensuring human-readable ids longer than 11 bytes could prevent possible collisions. Anyway due to the sentence contains 4 words, and it's mostly impossible to have four 2-alpha words, the length check is not even required [^0]: https://github.com/matze/wastebin/blob/3c7c84911d942d7966794d8b996ad8c74f8b3835/crates/wastebin_core/src/id.rs#L34
Author
Owner

@mokurin000 commented on GitHub (Apr 5, 2025):

Storing additional string identifiers is not a viable alternative for me

You need not to store additional identifiers if we just hash them to u64's.

We could allow users to specify a optional boolean human_readable e.g., to have human-readable url part, but we could still store them as i64.

<!-- gh-comment-id:2781017978 --> @mokurin000 commented on GitHub (Apr 5, 2025): > Storing additional string identifiers is not a viable alternative for me You need not to store additional identifiers if we just hash them to `u64`'s. We could allow users to specify a optional boolean `human_readable` e.g., to have human-readable url part, but we could still store them as i64.
Author
Owner

@matze commented on GitHub (Apr 5, 2025):

The only thing I am not sure, do we really need bidirectional mapping between the url path and the id number? For example, if a user access https://somedomain.tld/long-readable-string-url, we calculate the corresponding ID to query related data from database

Okay, got your point. There are two issues I still see left:

  1. To not break backwards compatibility we still need to support old identifiers, i.e. need two code paths in the same route. Not cool.
  2. I'd hate to change the default, i.e. after pasting a new item be confronted with a different looking URL than previously which for some people is even preferable over long "readable" ones. So, it'd would have to be an opt-in change with yet-another configuration variable. Not cool either.

I'm on the fence to be honest.

<!-- gh-comment-id:2781044412 --> @matze commented on GitHub (Apr 5, 2025): > The only thing I am not sure, do we really need bidirectional mapping between the url path and the id number? For example, if a user access https://somedomain.tld/long-readable-string-url, we calculate the corresponding ID to query related data from database Okay, got your point. There are two issues I still see left: 1. To not break backwards compatibility we still need to support old identifiers, i.e. need two code paths in the same route. Not cool. 2. I'd hate to change the default, i.e. after pasting a new item be confronted with a different looking URL than previously which for some people is even preferable over long "readable" ones. So, it'd would have to be an opt-in change with yet-another configuration variable. Not cool either. I'm on the fence to be honest.
Author
Owner

@mokurin000 commented on GitHub (Apr 5, 2025):

The only thing I am not sure, do we really need bidirectional mapping between the url path and the id number? For example, if a user access https://somedomain.tld/long-readable-string-url, we calculate the corresponding ID to query related data from database

Okay, got your point. There are two issues I still see left:

1. To not break backwards compatibility we still need to support old identifiers, i.e. need two code paths in the same route. Not cool.

2. I'd hate to change the default, i.e. after pasting a new item be confronted with a different looking URL than previously which for some people is even preferable over long "readable" ones. So, it'd would have to be an opt-in change with yet-another configuration variable. Not cool either.

I'm on the fence to be honest.

Okay. I see your concern, so I will leave it in my fork for now. working on the implementation

<!-- gh-comment-id:2781050352 --> @mokurin000 commented on GitHub (Apr 5, 2025): > > The only thing I am not sure, do we really need bidirectional mapping between the url path and the id number? For example, if a user access https://somedomain.tld/long-readable-string-url, we calculate the corresponding ID to query related data from database > > Okay, got your point. There are two issues I still see left: > > 1. To not break backwards compatibility we still need to support old identifiers, i.e. need two code paths in the same route. Not cool. > > 2. I'd hate to change the default, i.e. after pasting a new item be confronted with a different looking URL than previously which for some people is even preferable over long "readable" ones. So, it'd would have to be an opt-in change with yet-another configuration variable. Not cool either. > > > I'm on the fence to be honest. Okay. I see your concern, so I will leave it in my fork for now. working on the implementation
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/wastebin-matze#92
No description provided.