[GH-ISSUE #177] Switch all dependencies to pure python and release ArchiveBox pip package #1632

Closed
opened 2026-03-01 17:52:20 +03:00 by kerem · 5 comments
Owner

Originally created by @pirate on GitHub (Mar 14, 2019).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/177

I originally thought moving to Python-only dependencies would be intractable, but after some more research I now realize this is quite straightforward.

  • apt install curl -> pip install requests archivenow (requests docs, archivenow docs)
  • apt install wget -> pip install wpull pywb (wpull docs, pywb docs)
  • apt install git -> pip install GitPython (docs)
  • apt install youtube-dl -> pip install youtube-dl (docs)
  • apt install chromium-browser -> pip install pyppeteer (docs)

Then we wont need users to install any system dependencies anymore, and we can move to using only requirements.txt and setup.py to install ArchiveBox via pip.

Originally created by @pirate on GitHub (Mar 14, 2019). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/177 I originally thought moving to Python-only dependencies would be intractable, but after some more research I now realize this is quite straightforward. - [ ] `apt install curl` -> `pip install requests archivenow` ([requests docs](http://docs.python-requests.org/en/master/), [archivenow docs](https://github.com/oduwsdl/archivenow)) - [ ] `apt install wget` -> `pip install wpull pywb` ([wpull docs](https://github.com/ArchiveTeam/wpull), [pywb docs](https://github.com/webrecorder/pywb)) - [ ] `apt install git` -> `pip install GitPython` ([docs](http://gitpython.readthedocs.io)) - [ ] `apt install youtube-dl` -> `pip install youtube-dl` ([docs](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#embedding-youtube-dl)) - [ ] `apt install chromium-browser` -> `pip install pyppeteer` ([docs](https://pypi.org/project/pyppeteer/)) Then we wont need users to install any system dependencies anymore, and we can move to using only `requirements.txt` and `setup.py` to install ArchiveBox via `pip`.
Author
Owner

@anarcat commented on GitHub (Mar 15, 2019):

awesome, can't wait to see that one fly! :) let me know if you need help testing the stuff or get stuck.

<!-- gh-comment-id:473110560 --> @anarcat commented on GitHub (Mar 15, 2019): awesome, can't wait to see that one fly! :) let me know if you need help testing the stuff or get stuck.
Author
Owner

@007 commented on GitHub (Mar 15, 2019):

Anything you're fetching with curl should be replaced with wget or vice versa, and that'll cut down on some dependencies in the pip translation.

<!-- gh-comment-id:473112948 --> @007 commented on GitHub (Mar 15, 2019): Anything you're fetching with `curl` should be replaced with `wget` or vice versa, and that'll cut down on some dependencies in the `pip` translation.
Author
Owner

@makew0rld commented on GitHub (Aug 10, 2020):

wpull only officially supports Python 3.4 and 3.5, even now it seems. The most recent commit was in Oct. 2019, and the version on PyPI is still outdated. It's a cool tool, but I would not recommend using it, and it doesn't seem to be well maintained.

If you still want to use it anyway, you can install it from Git, and then use a Python dependency manager to only use Python 3.5 for it, but I would not recommend that.

Git install:

pip3 install git+https://github.com/ArchiveTeam/wpull.git@v2.0.3#egg=wpull
<!-- gh-comment-id:671533693 --> @makew0rld commented on GitHub (Aug 10, 2020): wpull only [officially supports](https://github.com/ArchiveTeam/wpull/issues/404#issuecomment-441120468) Python 3.4 and 3.5, even now it seems. The most recent commit was in Oct. 2019, and the version on PyPI is [still outdated](https://github.com/ArchiveTeam/wpull/issues/410). It's a cool tool, but I would not recommend using it, and it doesn't seem to be well maintained. If you still want to use it anyway, you can install it from Git, and then use a Python dependency manager to only use Python 3.5 for it, but I would not recommend that. Git install: ``` pip3 install git+https://github.com/ArchiveTeam/wpull.git@v2.0.3#egg=wpull ```
Author
Owner

@pirate commented on GitHub (Aug 10, 2020):

Yeah I looked at wpull recently and came to the same conclusion. Wget2 looks more promising than wpull.

I think I'm going to close this issue for now, as we start to expand the suite of archiving methods it's looking more and more like many of them will be node-based. Considering we already support pip install archivebox now to get the bulk of archivebox's functionality, and we offer all the methods out-of-the-box via docker, making everything python-only is no longer a priority.

<!-- gh-comment-id:671534843 --> @pirate commented on GitHub (Aug 10, 2020): Yeah I looked at wpull recently and came to the same conclusion. Wget2 looks more promising than wpull. I think I'm going to close this issue for now, as we start to expand the suite of archiving methods it's looking more and more like many of them will be node-based. Considering we already support `pip install archivebox` now to get the bulk of archivebox's functionality, and we offer all the methods out-of-the-box via docker, making *everything* python-only is no longer a priority.
Author
Owner

@makew0rld commented on GitHub (Aug 10, 2020):

The other issue I see with this is managing conflicting versions of Python dependencies for these tools. I would personally recommend Poetry for that, as it's popular and I've had great experiences with it, but whatever you choose, I still think is an important step. Apologies if you were already going to do this.

I also don't see the value in replacing git with a Python version.

<!-- gh-comment-id:671534845 --> @makew0rld commented on GitHub (Aug 10, 2020): The other issue I see with this is managing conflicting versions of Python dependencies for these tools. I would personally recommend [Poetry](https://python-poetry.org/) for that, as it's popular and I've had great experiences with it, but whatever you choose, I still think is an important step. Apologies if you were already going to do this. I also don't see the value in replacing `git` with a Python version.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1632
No description provided.