mirror of
https://github.com/matze/wastebin.git
synced 2026-04-25 00:25:59 +03:00
[GH-ISSUE #152] human-readable random url #92
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/wastebin-matze#92
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @mokurin000 on GitHub (Apr 5, 2025).
Original GitHub issue: https://github.com/matze/wastebin/issues/152
Example program:
Would output things like:
Compared to traditional random hash strings (e.g., "GSlZNwBUGKi"), it achieves semantic structural composition with these advantages:
@matze commented on GitHub (Apr 5, 2025):
In principle this is a good idea but your proposal would mean generating new identifiers incompatible with the existing ones. Unless there is some bijective function that allows mapping from and to existing 32/64 bit identifiers I don't know of. Storing additional string identifiers is not a viable alternative for me, I'd like to keep the database schema simple and lean.
@cgzones commented on GitHub (Apr 5, 2025):
Not quite related but I though of adding alias identifiers with am unambiguous character set, e.g. the one from https://stackoverflow.com/a/58098360. This would avoid confusions of similar looking characters like
I/1/lor0/O.The length would increase from currently 11 to
roundup(ln(2^64) / ln(number-of-character := 23)) = 15.To avoid clashes with current IDs they could be queried via
/simple/{ID}.@mokurin000 commented on GitHub (Apr 5, 2025):
AFAIK the current id (the number)~url_path mapping approach is just some mask to get each 6 bits (or 2/4 bits), id's are generated from random i64 numbers. 1
I would suggest perform ahash on such short strings (with hardware-acceleration this would be faster than rustc-hash), and get a u64 by RandomState::hash_one
The only thing I am not sure, do we really need bidirectional mapping between the url path and the id number? For example, if a user access
https://somedomain.tld/long-readable-string-url, we calculate the corresponding ID to query related data from databaseBy the way, as the current url parts could only be 6 chars or 11 chars, ensuring human-readable ids longer than 11 bytes could prevent possible collisions. Anyway due to the sentence contains 4 words, and it's mostly impossible to have four 2-alpha words, the length check is not even required
github.com/matze/wastebin@3c7c84911d/crates/wastebin_core/src/id.rs (L34)↩︎@mokurin000 commented on GitHub (Apr 5, 2025):
You need not to store additional identifiers if we just hash them to
u64's.We could allow users to specify a optional boolean
human_readablee.g., to have human-readable url part, but we could still store them as i64.@matze commented on GitHub (Apr 5, 2025):
Okay, got your point. There are two issues I still see left:
I'm on the fence to be honest.
@mokurin000 commented on GitHub (Apr 5, 2025):
Okay. I see your concern, so I will leave it in my fork for now. working on the implementation