[GH-ISSUE #473] Bugfix: can't use Python API #3330

Closed
opened 2026-03-14 22:10:23 +03:00 by kerem · 7 comments
Owner

Originally created by @digi-ark on GitHub (Sep 12, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/473

Describe the bug

Importing the python API fails: ImportError: cannot import name 'update' from partially initialized module 'archivebox.main' (most likely due to a circular import)

Steps to reproduce

  1. pip install archivebox
  2. python -c "from archivebox.main import add"

Screenshots or log output

Python 3.8.5 (default, Aug 12 2020, 00:00:00) 
[GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from archivebox.main import add, remove, info, config
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/projects/news/ArchiveBox/archivebox/main.py", line 10, in <module>
    from .cli import (
  File "/home/user/projects/news/ArchiveBox/archivebox/cli/__init__.py", line 65, in <module>
    SUBCOMMANDS = list_subcommands()
  File "/home/user/projects/news/ArchiveBox/archivebox/cli/__init__.py", line 41, in list_subcommands
    module = import_module('.archivebox_{}'.format(subcommand), __package__)
  File "/home/user/.virtualenvs/archivebox/lib64/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/user/projects/news/ArchiveBox/archivebox/cli/archivebox_update.py", line 11, in <module>
    from ..main import update
ImportError: cannot import name 'update' from partially initialized module 'archivebox.main' (most likely due to a circular import) (/home/user/projects/news/ArchiveBox/archivebox/main.py)

Software versions

  • OS: fedora-32
  • ArchiveBox version: v0.4.21 (from pip: c1f2188 I suppose)
  • Python version: 3.8.5
  • Chrome version: (not relevant)

Additional Notes

Seems to be similar to https://github.com/pirate/ArchiveBox/issues/372

Originally created by @digi-ark on GitHub (Sep 12, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/473 <!-- Please fill out the following information, feel free to delete sections if they're not applicable or if long issue templates annoy you :) --> #### Describe the bug <!-- A description of what the bug is, what you expected to happen, and any relevant context about issue. --> Importing the python API fails: `ImportError: cannot import name 'update' from partially initialized module 'archivebox.main' (most likely due to a circular import)` #### Steps to reproduce <!-- For example: 1. Ran ArchiveBox with the following config '...' 2. Saw this output during archiving '....' 3. UI didn't show the thing I was expecting '....' --> 1. `pip install archivebox` 2. `python -c "from archivebox.main import add"` #### Screenshots or log output <!-- If applicable, post any relevant screenshots or copy/pasted terminal output from ArchiveBox. If you're reporting a parsing / importing error, **you must paste a copy of your redacted import file here**. --> ```python_shell Python 3.8.5 (default, Aug 12 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from archivebox.main import add, remove, info, config Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/user/projects/news/ArchiveBox/archivebox/main.py", line 10, in <module> from .cli import ( File "/home/user/projects/news/ArchiveBox/archivebox/cli/__init__.py", line 65, in <module> SUBCOMMANDS = list_subcommands() File "/home/user/projects/news/ArchiveBox/archivebox/cli/__init__.py", line 41, in list_subcommands module = import_module('.archivebox_{}'.format(subcommand), __package__) File "/home/user/.virtualenvs/archivebox/lib64/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/user/projects/news/ArchiveBox/archivebox/cli/archivebox_update.py", line 11, in <module> from ..main import update ImportError: cannot import name 'update' from partially initialized module 'archivebox.main' (most likely due to a circular import) (/home/user/projects/news/ArchiveBox/archivebox/main.py) ``` #### Software versions - OS: fedora-32 - ArchiveBox version: v0.4.21 (from pip: c1f2188 I suppose) - Python version: 3.8.5 - Chrome version: (not relevant) #### Additional Notes Seems to be similar to https://github.com/pirate/ArchiveBox/issues/372
kerem 2026-03-14 22:10:23 +03:00
Author
Owner

@cdvv7788 commented on GitHub (Sep 12, 2020):

@digi-ark what do you intend to do? using directly the add function?

<!-- gh-comment-id:691528128 --> @cdvv7788 commented on GitHub (Sep 12, 2020): @digi-ark what do you intend to do? using directly the `add` function?
Author
Owner

@pirate commented on GitHub (Sep 13, 2020):

@cdvv7788 ArchiveBox does have a Python API https://docs.archivebox.io/en/latest/archivebox.html that advertises add as being usable via Python, but it looks broken via this use case.

@digi-ark have you tried calling archivebox.config.setup_django() before that import? It should ensure that all the needed modules are on sys.PATH for python to import.

In a future version, setup_django() will become a no-op, but it's a temporary solution right now in order to allow some commands to run before django is initialized (e.g. archivebox version, archivebox help, archivebox init).

<!-- gh-comment-id:691598824 --> @pirate commented on GitHub (Sep 13, 2020): @cdvv7788 ArchiveBox does have a Python API https://docs.archivebox.io/en/latest/archivebox.html that advertises `add` as being usable via Python, but it looks broken via this use case. @digi-ark have you tried calling `archivebox.config.setup_django()` before that import? It should ensure that all the needed modules are on `sys.PATH` for python to import. In a future version, `setup_django()` will become a no-op, but it's a temporary solution right now in order to allow some commands to run before django is initialized (e.g. `archivebox version`, `archivebox help`, `archivebox init`).
Author
Owner

@digi-ark commented on GitHub (Sep 17, 2020):

Sorry for the delay @pirate and @cdvv7788. Thanks for the responses. (and congrats on the epic project 🙂 )

I was trying to use the python API example in the wiki and got stuck there.

@cdvv7788 ArchiveBox does have a Python API https://docs.archivebox.io/en/latest/archivebox.html that advertises add as being usable via Python, but it looks broken via this use case.

Thanks for the pointer to the Python API. I was looking for that link but couldn't find any pointers. (I've updated the wiki to point to that)

@digi-ark have you tried calling archivebox.config.setup_django() before that import? It should ensure that all the needed modules are on sys.PATH for python to import.

I haven't. Thank you. But I'm now having issues since my code is not on the archivebox's path ([X] No archivebox index found in the current directory.). It looks like I'll have to do some extra documentation reading.

Edit: that worked as long as I call it from the data directory

May I suggest making the python API example on the wiki self-contained? Perhaps adding a variable with the archive's path or describing the assumption that the code is being called from the archive's root directory.

<!-- gh-comment-id:694307837 --> @digi-ark commented on GitHub (Sep 17, 2020): Sorry for the delay @pirate and @cdvv7788. Thanks for the responses. (and congrats on the epic project :slightly_smiling_face: ) I was trying to use the python API example in [the wiki](https://github.com/pirate/ArchiveBox/wiki/Usage#python-api-usage) and got stuck there. > @cdvv7788 ArchiveBox does have a Python API https://docs.archivebox.io/en/latest/archivebox.html that advertises `add` as being usable via Python, but it looks broken via this use case. Thanks for the pointer to the Python API. I was looking for that link but couldn't find any pointers. (I've updated [the wiki to point to that](https://github.com/pirate/ArchiveBox/wiki/Usage#python-api-usage)) > @digi-ark have you tried calling `archivebox.config.setup_django()` before that import? It should ensure that all the needed modules are on `sys.PATH` for python to import. I haven't. Thank you. But I'm now having issues since my code is not on the archivebox's path (`[X] No archivebox index found in the current directory.`). It looks like I'll have to do some extra documentation reading. **Edit:** that worked as long as I call it from the data directory May I suggest making the [python API example on the wiki](https://github.com/pirate/ArchiveBox/wiki/Usage#python-api-usage) self-contained? Perhaps adding a variable with the archive's path or describing the assumption that the code is being called from the archive's root directory.
Author
Owner

@digi-ark commented on GitHub (Sep 27, 2020):

Not sure if this is a related bug. I'm almost able to use the API with a slightly modified version of the API basic usage example (under "Usage" on the Wiki). But now it fails to detect migrations.

I dug a bit into the source, made some debug prints and found that the 22 migrations it says are not applied are all the migrations possible. And I made sure I had run archivebox init to apply any migrations.

Proof of Concept

Running python within Archivebox

Here I'm running python directly from the root directory of the archivebox:

(archive) user@computer:archivebox$ python
Python 3.8.5 (default, Aug 12 2020, 00:00:00) 
[GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import archivebox.config
>>> DATA_DIR = os.path.abspath('./')
>>> archivebox.config.setup_django(out_dir=DATA_DIR)
>>> 
>>> from archivebox.main import check_data_folder, setup_django, add, remove, server
>>> 
>>> check_data_folder(out_dir=DATA_DIR)
>>>

Running from another Directory

(archive) user@computer:viewer$ python
Python 3.8.5 (default, Aug 12 2020, 00:00:00) 
[GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import archivebox.config
>>> DATA_DIR = os.path.abspath('../../data/archivebox/')
>>> archivebox.config.setup_django(out_dir=DATA_DIR)
>>> 
>>> from archivebox.main import check_data_folder, setup_django, add, remove, server
>>> 
>>> check_data_folder(out_dir=DATA_DIR)
[X] This collection was created with an older version of ArchiveBox and must be upgraded first.
    /home/user/projects/personal-archive/data/archivebox

    To upgrade it to the latest version and apply the 22 pending migrations run:
        archivebox init
<!-- gh-comment-id:699662879 --> @digi-ark commented on GitHub (Sep 27, 2020): Not sure if this is a related bug. I'm almost able to use the API with a slightly modified version of the API basic usage example (under "Usage" on the Wiki). But now it fails to detect migrations. I dug a bit into the source, made some debug prints and found that the 22 migrations it says are not applied are all the migrations possible. And I made sure I had run `archivebox init` to apply any migrations. ## Proof of Concept ### Running python within Archivebox Here I'm running python directly from the root directory of the archivebox: ```shell_session (archive) user@computer:archivebox$ python Python 3.8.5 (default, Aug 12 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> import archivebox.config >>> DATA_DIR = os.path.abspath('./') >>> archivebox.config.setup_django(out_dir=DATA_DIR) >>> >>> from archivebox.main import check_data_folder, setup_django, add, remove, server >>> >>> check_data_folder(out_dir=DATA_DIR) >>> ``` ### Running from another Directory ```python_session (archive) user@computer:viewer$ python Python 3.8.5 (default, Aug 12 2020, 00:00:00) [GCC 10.2.1 20200723 (Red Hat 10.2.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> import archivebox.config >>> DATA_DIR = os.path.abspath('../../data/archivebox/') >>> archivebox.config.setup_django(out_dir=DATA_DIR) >>> >>> from archivebox.main import check_data_folder, setup_django, add, remove, server >>> >>> check_data_folder(out_dir=DATA_DIR) [X] This collection was created with an older version of ArchiveBox and must be upgraded first. /home/user/projects/personal-archive/data/archivebox To upgrade it to the latest version and apply the 22 pending migrations run: archivebox init ```
Author
Owner

@digi-ark commented on GitHub (Sep 27, 2020):

A possible part of the above bug might be the fact that in the following line list_migrations() does not received out_dir like list_migrations(out_dir=out_dir):

github.com/pirate/ArchiveBox@0158efb1d0/archivebox/config/init.py#L912

But this does not fully solve the problem

Edit: After looking a bit more, it looks like django is getting confused when it's ran from somewhere else. The following is after me commenting out the part for checking for migrations:

>>> add("https://example.com", out_dir=DATA_DIR)
Traceback (most recent call last):
  File "/home/user/.virtualenvs/news/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/user/.virtualenvs/news/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: no such table: core_snapshot

<!-- gh-comment-id:699666259 --> @digi-ark commented on GitHub (Sep 27, 2020): A possible part of the above bug might be the fact that in the following line `list_migrations()` does not received `out_dir` like `list_migrations(out_dir=out_dir)`: https://github.com/pirate/ArchiveBox/blob/0158efb1d0cf142da14808cd39cd508e74fe7e23/archivebox/config/__init__.py#L912 But this does not fully solve the problem **Edit:** After looking a bit more, it looks like django is getting confused when it's ran from somewhere else. The following is after me commenting out the part for checking for migrations: ```shell_session >>> add("https://example.com", out_dir=DATA_DIR) Traceback (most recent call last): File "/home/user/.virtualenvs/news/lib/python3.8/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/home/user/.virtualenvs/news/lib/python3.8/site-packages/django/db/backends/sqlite3/base.py", line 396, in execute return Database.Cursor.execute(self, query, params) sqlite3.OperationalError: no such table: core_snapshot ```
Author
Owner

@pirate commented on GitHub (Sep 28, 2020):

We don't currently support running it from another directory other than the data dir, the os.chdir(DATA_DIR) call (before setup_django()) is mandatory for now. This restriction will likely be lifted in a future version once archivebox oneshot is released.

<!-- gh-comment-id:699724732 --> @pirate commented on GitHub (Sep 28, 2020): We don't currently support running it from another directory other than the data dir, the `os.chdir(DATA_DIR)` call (before `setup_django()`) is mandatory for now. This restriction will likely be lifted in a future version once `archivebox oneshot` is released.
Author
Owner

@pirate commented on GitHub (Apr 6, 2021):

I believe all the issues reported here have been fixed at this point e92db03, but feel free to comment back here if you're still having problems and I'll open the ticket.

>>> from archivebox.config import setup_django
>>> setup_django()
...
>>> from main import init
>>> init()
...
>>> from core.models import Snapshot
>>> Snapshot.objects.all()
<QuerySet []>
<!-- gh-comment-id:813861322 --> @pirate commented on GitHub (Apr 6, 2021): I believe all the issues reported here have been fixed at this point e92db03, but feel free to comment back here if you're still having problems and I'll open the ticket. ```python3 >>> from archivebox.config import setup_django >>> setup_django() ... >>> from main import init >>> init() ... >>> from core.models import Snapshot >>> Snapshot.objects.all() <QuerySet []> ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3330
No description provided.