[GH-ISSUE #11] Archive Method: Add support for .webarchive output #3029

Closed
opened 2026-03-14 20:40:14 +03:00 by kerem · 5 comments
Owner

Originally created by @rcarmo on GitHub (May 7, 2017).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/11

Just to let you know that I published a sample on how to create Safari .webarchive files: https://github.com/rcarmo/python-webarchive - might be useful, since it also provides a sample Python 3 asyncio crawler :)

Originally created by @rcarmo on GitHub (May 7, 2017). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/11 Just to let you know that I published a sample on how to create Safari `.webarchive` files: https://github.com/rcarmo/python-webarchive - might be useful, since it also provides a sample Python 3 `asyncio` crawler :)
kerem closed this issue 2026-03-14 20:40:19 +03:00
Author
Owner

@pirate commented on GitHub (May 7, 2017):

👍 thanks! I'm also checking out https://github.com/chfoo/wpull for warc support.

<!-- gh-comment-id:299724464 --> @pirate commented on GitHub (May 7, 2017): :+1: thanks! I'm also checking out https://github.com/chfoo/wpull for warc support.
Author
Owner

@rcarmo commented on GitHub (May 7, 2017):

Nice - will see if I can do something with it!

<!-- gh-comment-id:299724592 --> @rcarmo commented on GitHub (May 7, 2017): Nice - will see if I can do something with it!
Author
Owner

@pirate commented on GitHub (Jul 6, 2017):

Apparently wget supports saving to warc files!

‘--warc-file=file’
Use file as the destination WARC file.

https://www.gnu.org/software/wget/manual/wget.html

<!-- gh-comment-id:313531676 --> @pirate commented on GitHub (Jul 6, 2017): Apparently wget supports saving to warc files! ```bash ‘--warc-file=file’ Use file as the destination WARC file. ``` https://www.gnu.org/software/wget/manual/wget.html
Author
Owner

@rcarmo commented on GitHub (Jul 7, 2017):

Yep. Different format, though.

On 6 Jul 2017, at 23:02, Nick Sweeting notifications@github.com wrote:

Apparently wget supports saving to warc files!

‘--warc-file=file’
Use file as the destination WARC file.
https://www.gnu.org/software/wget/manual/wget.html


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

<!-- gh-comment-id:313691737 --> @rcarmo commented on GitHub (Jul 7, 2017): Yep. Different format, though. > On 6 Jul 2017, at 23:02, Nick Sweeting <notifications@github.com> wrote: > > Apparently wget supports saving to warc files! > > ‘--warc-file=file’ > Use file as the destination WARC file. > https://www.gnu.org/software/wget/manual/wget.html > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub, or mute the thread. >
Author
Owner

@pirate commented on GitHub (Sep 13, 2018):

I've decided to close this for now, since WARC is more officially supported and .webarchive is safari-only, I don't think it's worth investing time into this before WARC generation and replay is finished.

<!-- gh-comment-id:421173472 --> @pirate commented on GitHub (Sep 13, 2018): I've decided to close this for now, since WARC is more officially supported and .webarchive is safari-only, I don't think it's worth investing time into this before WARC generation and replay is finished.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3029
No description provided.