[GH-ISSUE #26] Dockerfile enhancement #1528

Closed
opened 2026-03-01 17:51:26 +03:00 by kerem · 7 comments
Owner

Originally created by @hannah98 on GitHub (Jun 27, 2017).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/26

I recently submitted a pull request that paved the way for a Dockerfile for this application. I am opening this enhancement to engage in discussion with you on how to proceed.

I created a repository for the Dockerfile. It is already built on Docker hub and works - however it is under my name. I'm not sure if you want to support automated builds going forward or not.

One other discussion point is that some platforms have trouble running google-chrome in a Docker container. You can read some lengthy discussion here. Running this container (or any container that utilizes Google Chrome) on an Ubuntu host will generate the "Failed to move to new namespace" error.

One solution I was playing with was to use PhantomJS instead of Google Chrome. I already have a Docker container built for it, but your application currently doesn't support PDF and PNG snapshots with PhantomJS. Another bonus to using PhantomJS is that the image is 236MB as opposed to 845MB with Google Chrome.

I am going to be playing with my fork of your application to allow the user to choose (via Env variable) to use either PhantomJS or Google Chrome - if I figure something out I will test and submit a pull request.

This should be enough material for us to chat about - let me know if you have any questions.

Originally created by @hannah98 on GitHub (Jun 27, 2017). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/26 I recently submitted a [pull request](https://github.com/pirate/bookmark-archiver/pull/25) that paved the way for a Dockerfile for this application. I am opening this enhancement to engage in discussion with you on how to proceed. I created a [repository](https://github.com/hannah98/bookmark-archiver-docker) for the Dockerfile. It is already built on Docker hub and works - however it is under my name. I'm not sure if you want to support automated builds going forward or not. One other discussion point is that some platforms have trouble running google-chrome in a Docker container. You can read some lengthy discussion [here](https://github.com/jessfraz/dockerfiles/issues/65). Running this container (or any container that utilizes Google Chrome) on an Ubuntu host will generate the "Failed to move to new namespace" error. One solution I was playing with was to use [PhantomJS](http://phantomjs.org/) instead of Google Chrome. I already have a [Docker container](https://github.com/hannah98/bookmark-archiver-docker/tree/phantomjs) built for it, but your application currently doesn't support PDF and PNG snapshots with PhantomJS. Another bonus to using PhantomJS is that the image is 236MB as opposed to 845MB with Google Chrome. I am going to be playing with my fork of your application to allow the user to choose (via Env variable) to use either PhantomJS or Google Chrome - if I figure something out I will test and submit a pull request. This should be enough material for us to chat about - let me know if you have any questions.
kerem closed this issue 2026-03-01 17:51:26 +03:00
Author
Owner

@pirate commented on GitHub (Jun 27, 2017):

Firefox already has headless mode on Linux, it may be worth exploring adding that as an alternative to Chromium. I'll do some more research and report back.

It's reasonable that Chrome has troubles inside of Docker, Chrome uses chroots and kernel namespacing for it's own internal sandbox and I'm not sure how well that can be nested inside of Docker's chroots & namespaces.
I definitely want to use a widely used (i.e. Webkit-based) headless browser for the screenshots, since other rendering engines don't provide consistent enough renders of modern webpages.

Personally I don't use docker for tools like this, because of the huge overhead of re-downloading shared tools that I already have running on my machine (e.g. postgres, nginx, redis, chrome), but I can understand that's not the case for everyone.

<!-- gh-comment-id:311468971 --> @pirate commented on GitHub (Jun 27, 2017): Firefox already has [headless mode on Linux](http://scraping.pro/use-headless-firefox-scraping-linux/), it may be worth exploring adding that as an alternative to Chromium. I'll do some more research and report back. It's reasonable that Chrome has troubles inside of Docker, Chrome uses chroots and kernel namespacing for it's own internal sandbox and I'm not sure how well that can be nested inside of Docker's chroots & namespaces. I definitely want to use a widely used (i.e. Webkit-based) headless browser for the screenshots, since other rendering engines don't provide consistent enough renders of modern webpages. Personally I don't use docker for tools like this, because of the huge overhead of re-downloading shared tools that I already have running on my machine (e.g. postgres, nginx, redis, chrome), but I can understand that's not the case for everyone.
Author
Owner

@hannah98 commented on GitHub (Jun 28, 2017):

Oh sure - I didn't do much research on the PhatomJS rendering so that makes sense to use a Webkit-based browser for rendering. I hadn't even thought about Firefox. I will work on a container that can support headless Firefox.

<!-- gh-comment-id:311639443 --> @hannah98 commented on GitHub (Jun 28, 2017): Oh sure - I didn't do much research on the PhatomJS rendering so that makes sense to use a Webkit-based browser for rendering. I hadn't even thought about Firefox. I will work on a container that can support headless Firefox.
Author
Owner

@hannah98 commented on GitHub (Jun 28, 2017):

Looking some more, I don't see an obvious way to take a PDF capture of a page using Firefox - that would put Chrome at an advantage to FF for your application.

<!-- gh-comment-id:311688329 --> @hannah98 commented on GitHub (Jun 28, 2017): Looking some more, I don't see an obvious way to take a PDF capture of a page using Firefox - that would put Chrome at an advantage to FF for your application.
Author
Owner

@pirate commented on GitHub (Jul 6, 2017):

I haven't heard of anyone else running this in a Docker container. I think it's reasonable to expect most people will install it directly for now, since the few dependencies it does have would be better off shared with the system and not locked away in a container.

If you do manage to get Chrome working in Docker eventually, comment back and I'd be happy to re-open this.

<!-- gh-comment-id:313531285 --> @pirate commented on GitHub (Jul 6, 2017): I haven't heard of anyone else running this in a Docker container. I think it's reasonable to expect most people will install it directly for now, since the few dependencies it does have would be better off shared with the system and not locked away in a container. If you do manage to get Chrome working in Docker eventually, comment back and I'd be happy to re-open this.
Author
Owner

@Strubbl commented on GitHub (Sep 24, 2018):

I just created another Dockerfile by myself, because i did not see any in this repo. Unfortunately i ran into this issue here.

My docker file can be found here: https://gitlab.com/Strubbl/docker-bookmark-archiver/blob/master/Dockerfile

<!-- gh-comment-id:424100211 --> @Strubbl commented on GitHub (Sep 24, 2018): I just created another Dockerfile by myself, because i did not see any in this repo. Unfortunately i ran into this issue here. My docker file can be found here: https://gitlab.com/Strubbl/docker-bookmark-archiver/blob/master/Dockerfile
Author
Owner

@pirate commented on GitHub (Sep 24, 2018):

Try basing your image on https://github.com/justinribeiro/dockerfiles/tree/master/chrome-headless @Strubbl

<!-- gh-comment-id:424113744 --> @pirate commented on GitHub (Sep 24, 2018): Try basing your image on https://github.com/justinribeiro/dockerfiles/tree/master/chrome-headless @Strubbl
Author
Owner

@pirate commented on GitHub (Sep 24, 2018):

See https://github.com/pirate/bookmark-archiver/issues/62 for the active ticket, specifically this comment: https://github.com/pirate/bookmark-archiver/issues/62#issuecomment-417864095

<!-- gh-comment-id:424113955 --> @pirate commented on GitHub (Sep 24, 2018): See https://github.com/pirate/bookmark-archiver/issues/62 for the active ticket, specifically this comment: https://github.com/pirate/bookmark-archiver/issues/62#issuecomment-417864095
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1528
No description provided.