[GH-ISSUE #592] Proposal: collect dataset #313

Closed
opened 2026-02-25 20:33:18 +03:00 by kerem · 1 comment
Owner

Originally created by @elexunix on GitHub (Dec 2, 2023).
Original GitHub issue: https://github.com/asciinema/asciinema/issues/592

Hello guys!

Maybe it would be of interest or fun to collect a moderately large asciinema recordings dataset, from many different users -- for that, just asciinema rec your terminal, if you are not doing something too personal there, and then share the recording. Perhaps, we can collect them together to a dataset of casts, and then, since the asciinema recording structure is luckily simple, train an LLM on that corpus, and have fun watching "realistic" (in the view of that NN) casts of doing something in a terminal

What do you think about collectively collecting such a dataset? I have a 4090, can train the LLM on it

:)

Originally created by @elexunix on GitHub (Dec 2, 2023). Original GitHub issue: https://github.com/asciinema/asciinema/issues/592 Hello guys! Maybe it would be of interest or fun to collect a moderately large asciinema recordings dataset, from many different users -- for that, just `asciinema rec` your terminal, if you are not doing something too personal there, and then share the recording. Perhaps, we can collect them together to a dataset of casts, and then, since the asciinema recording structure is luckily simple, train an LLM on that corpus, and have fun watching "realistic" (in the view of that NN) casts of doing something in a terminal What do you think about collectively collecting such a dataset? I have a 4090, can train the LLM on it :)
kerem closed this issue 2026-02-25 20:33:18 +03:00
Author
Owner

@ku1ik commented on GitHub (Jan 19, 2024):

How would you train the model? You're thinking of some RNN like GRU, or rather a transformer model? What about the timing information - would this be part of the model as well, or you were thinking of training on the raw output only?

<!-- gh-comment-id:1899977982 --> @ku1ik commented on GitHub (Jan 19, 2024): How would you train the model? You're thinking of some RNN like GRU, or rather a transformer model? What about the timing information - would this be part of the model as well, or you were thinking of training on the raw output only?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/asciinema#313
No description provided.