[GH-ISSUE #152] Question about docker install #1614

Closed
opened 2026-03-01 17:52:12 +03:00 by kerem · 4 comments
Owner

Originally created by @flip111 on GitHub (Feb 22, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/152

Hi i tried the normal docker install https://github.com/pirate/ArchiveBox/wiki/Docker#docker (without compose).

I get the following output which looks good:

» sudo docker run -v archivebox-data:/data archivebox /bin/archive 'https://example.com/'
[*] [2019-02-22 12:53:56] Downloading https://example.com/ > /data/sources/example.com-1550840036.txt
[*] [2019-02-22 12:53:57] Parsing new links from output/sources/example.com-1550840036.txt...
    > Adding 1 new links to index from /data/sources/example.com-1550840036.txt (parsed as Plain Text format)
[*] [2019-02-22 12:53:57] Updating main index files...
    > /data/index.json
    > /data/index.html
[▶] [2019-02-22 12:53:57] Updating content for 1 pages in archive...
[+] [2019-02-22 12:53:57] "http://www.iana.org/domains/example"
    http://www.iana.org/domains/example
    > /data/archive/1550840037 (new)
      > favicon
      > title
      > wget
      > pdf
      > screenshot
      > dom
      > archive_org
      > git
      > media
      √ index.json
      √ index.html
[√] [2019-02-22 12:54:17] Update of 1 pages complete (20.14 sec)
    - 1 entries skipped
    - 8 entries updated
    - 0 errors
    To view your archive, open: /data/index.html
[*] [2019-02-22 12:54:17] Updating main index files...
    > /data/index.json
    > /data/index.html

Where can i see the results now and the files? Maybe some information could be added about this in the docker section of the wiki ?

Originally created by @flip111 on GitHub (Feb 22, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/152 Hi i tried the normal docker install https://github.com/pirate/ArchiveBox/wiki/Docker#docker (without compose). I get the following output which looks good: ``` » sudo docker run -v archivebox-data:/data archivebox /bin/archive 'https://example.com/' [*] [2019-02-22 12:53:56] Downloading https://example.com/ > /data/sources/example.com-1550840036.txt [*] [2019-02-22 12:53:57] Parsing new links from output/sources/example.com-1550840036.txt... > Adding 1 new links to index from /data/sources/example.com-1550840036.txt (parsed as Plain Text format) [*] [2019-02-22 12:53:57] Updating main index files... > /data/index.json > /data/index.html [▶] [2019-02-22 12:53:57] Updating content for 1 pages in archive... [+] [2019-02-22 12:53:57] "http://www.iana.org/domains/example" http://www.iana.org/domains/example > /data/archive/1550840037 (new) > favicon > title > wget > pdf > screenshot > dom > archive_org > git > media √ index.json √ index.html [√] [2019-02-22 12:54:17] Update of 1 pages complete (20.14 sec) - 1 entries skipped - 8 entries updated - 0 errors To view your archive, open: /data/index.html [*] [2019-02-22 12:54:17] Updating main index files... > /data/index.json > /data/index.html ``` Where can i see the results now and the files? Maybe some information could be added about this in the docker section of the wiki ?
kerem closed this issue 2026-03-01 17:52:12 +03:00
Author
Owner

@flip111 commented on GitHub (Feb 22, 2019):

I did a

sudo mount --bind /var/lib/docker/volumes/archivebox-data/_data /mnt/

and then checked the mnt directory. I saw an index.html file there which looked like:

screenshot archived sites

The saved link is like https://www.iana.org/domains/reserved
but i expected http://example.com/

I'm not sure if this is the right way to do it .. maybe i shouldn't use mount.

<!-- gh-comment-id:466405211 --> @flip111 commented on GitHub (Feb 22, 2019): I did a ``` sudo mount --bind /var/lib/docker/volumes/archivebox-data/_data /mnt/ ``` and then checked the mnt directory. I saw an index.html file there which looked like: ![screenshot archived sites](https://user-images.githubusercontent.com/2244480/53246912-e02e1480-36a9-11e9-9de9-74f688990b7c.png) The saved link is like https://www.iana.org/domains/reserved but i expected http://example.com/ I'm not sure if this is the right way to do it .. maybe i shouldn't use mount.
Author
Owner

@pirate commented on GitHub (Feb 27, 2019):

Hey this is correct, you did the right thing.

The reason it added http://iana.org/domains/reserved instead of http://example.com is because you passed the URL as an argument to /bin/archivebox instead of via stdin.

You can see on the https://github.com/pirate/ArchiveBox/wiki/Usage page, passing URLs via stdin archives each URL, but passing as the first argument treats it as a feed to scrape URLs from, so it's usually used for RSS feeds or netscape-format bookmark lists. If you want to archive http://example.com, just pass it in via stdin instead of as an argument.

It says this in the Docker docs, but maybe I could make it clearer.

screen shot 2019-02-26 at 8 00 24 pm

Comment back if you're still having trouble and I'll help you out.

<!-- gh-comment-id:467678395 --> @pirate commented on GitHub (Feb 27, 2019): Hey this is correct, you did the right thing. The reason it added `http://iana.org/domains/reserved` instead of `http://example.com` is because you passed the URL as an argument to `/bin/archivebox` instead of via stdin. You can see on the https://github.com/pirate/ArchiveBox/wiki/Usage page, passing URLs via stdin archives each URL, but passing as the first argument treats it as a feed to scrape URLs from, so it's usually used for RSS feeds or netscape-format bookmark lists. If you want to archive http://example.com, just pass it in via stdin instead of as an argument. It says this in the Docker docs, but maybe I could make it clearer. <img width="932" alt="screen shot 2019-02-26 at 8 00 24 pm" src="https://user-images.githubusercontent.com/511499/53457689-4ba51880-3a01-11e9-80ef-bd0eb8b833f7.png"> Comment back if you're still having trouble and I'll help you out.
Author
Owner

@flip111 commented on GitHub (Feb 27, 2019):

The usage page explains it well ... i overlooked this though. The stdin approach is not too intuitive without the description on the usage page, but ok.

I saw you added info on accessing the data. Thanks for that.

I will close this now because my initial issue has been resolved.

<!-- gh-comment-id:467695267 --> @flip111 commented on GitHub (Feb 27, 2019): The usage page explains it well ... i overlooked this though. The stdin approach is not too intuitive without the description on the usage page, but ok. I saw you added info on accessing the data. Thanks for that. I will close this now because my initial issue has been resolved.
Author
Owner

@pirate commented on GitHub (Feb 27, 2019):

I just edited both the Docker and Usage wiki pages to make it clearer, thanks!

<!-- gh-comment-id:467706060 --> @pirate commented on GitHub (Feb 27, 2019): I just edited both the Docker and Usage wiki pages to make it clearer, thanks!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1614
No description provided.