[GH-ISSUE #444] Isue with archive for Github web page #276

Closed
opened 2026-02-25 23:33:50 +03:00 by kerem · 3 comments
Owner

Originally created by @juev on GitHub (Jun 28, 2022).
Original GitHub issue: https://github.com/go-shiori/shiori/issues/444

Hello,

When I saves Github page to shiori I got unreadable result.

I saved html page for original and archive. Whats wrong?

archive.zip

As I can see, we have duplicate meta character, wrong meta name="optimizely-datafile", many js files was merged to one file.

Why you changed canonical url?

<link rel="canonical" href="https-github.com-milgradesec-ddns" data-pjax-transient=""/>

As I can see this pages have many difference on body. Whats wrong?

Originally created by @juev on GitHub (Jun 28, 2022). Original GitHub issue: https://github.com/go-shiori/shiori/issues/444 Hello, When I saves Github page to shiori I got unreadable result. I saved html page for original and archive. Whats wrong? [archive.zip](https://github.com/go-shiori/shiori/files/8998260/archive.zip) As I can see, we have duplicate `meta character`, wrong `meta name="optimizely-datafile"`, many js files was merged to one file. Why you changed canonical url? ```css <link rel="canonical" href="https-github.com-milgradesec-ddns" data-pjax-transient=""/> ``` As I can see this pages have many difference on body. Whats wrong?
Author
Owner

@stale[bot] commented on GitHub (Jul 28, 2022):

This issue has been automatically marked as stale because it has not had any activity for quite some time.
It will be closed if no further activity occurs.
Thank you for your contributions.

<!-- gh-comment-id:1197912036 --> @stale[bot] commented on GitHub (Jul 28, 2022): This issue has been automatically marked as stale because it has not had any activity for quite some time. It will be closed if no further activity occurs. Thank you for your contributions.
Author
Owner

@fmartingr commented on GitHub (Aug 5, 2022):

Hey @juev, we currently use warc to archive websites, but there's plans to move to obelisk (#353) which would hopefully fix some archiving issues, I can't say for sure why it's doing what you mention right now (until I hop on that task) but since we're archiving sites some procesing needs to be done and maybe some data treatment breaks some headers.

This is a low priority at the moment until we finish up with #353. And if we don't end up migrating to that we could take a look at that in WARC.

<!-- gh-comment-id:1206327493 --> @fmartingr commented on GitHub (Aug 5, 2022): Hey @juev, we currently use [warc](https://github.com/go-shiori/warc) to archive websites, but there's plans to move to [obelisk](https://github.com/go-shiori/obelisk) (#353) which would hopefully fix some archiving issues, I can't say for sure why it's doing what you mention right now (until I hop on that task) but since we're archiving sites some procesing needs to be done and maybe some data treatment breaks some headers. This is a low priority at the moment until we finish up with #353. And if we don't end up migrating to that we could take a look at that in WARC.
Author
Owner

@stale[bot] commented on GitHub (Sep 4, 2022):

This issue has been automatically marked as stale because it has not had any activity for quite some time.
It will be closed if no further activity occurs.
Thank you for your contributions.

<!-- gh-comment-id:1236340729 --> @stale[bot] commented on GitHub (Sep 4, 2022): This issue has been automatically marked as stale because it has not had any activity for quite some time. It will be closed if no further activity occurs. Thank you for your contributions.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shiori#276
No description provided.