[GH-ISSUE #1237] Support: Docker v0.7.1 unable to use /tmp directory for crontab mkfstemp #3782

Open
opened 2026-03-15 00:26:02 +03:00 by kerem · 7 comments
Owner

Originally created by @cutterkom on GitHub (Oct 6, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1237

I want to schedule archiving a list of URLs:

docker-compose run archivebox schedule --every=day urls.txt

urls.txt consists of a list of URLs:

https://example.com
https://example1.com
https://example2.com

This works, no parsing error as described here: https://github.com/ArchiveBox/ArchiveBox/issues/968

grafik

I restart the scheduler and run it:

docker-compose restart archivebox_scheduler
docker-compose run archivebox schedule --run-all

But it's not working, in the logs there is a "Failed to parse":
grafik

What am I missing?

Also, using cron style does not work, day, month... is working find within schedule:

docker-compose run archivebox schedule --every="5 * * * *" https://example.com
grafik

In general, what is your workflow advice to archive a list of URLs regularly?

Originally created by @cutterkom on GitHub (Oct 6, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1237 I want to schedule archiving a list of URLs: ``` docker-compose run archivebox schedule --every=day urls.txt ``` `urls.txt` consists of a list of URLs: ``` https://example.com https://example1.com https://example2.com ``` This works, no parsing error as described here: https://github.com/ArchiveBox/ArchiveBox/issues/968 <img width="956" alt="grafik" src="https://github.com/ArchiveBox/ArchiveBox/assets/4574033/671d39d8-6bab-421c-b97a-fe90250e41fe"> I restart the scheduler and run it: ``` docker-compose restart archivebox_scheduler docker-compose run archivebox schedule --run-all ``` But it's not working, in the logs there is a "Failed to parse": <img width="666" alt="grafik" src="https://github.com/ArchiveBox/ArchiveBox/assets/4574033/1fb0ccf5-98f9-4667-85ef-d1c88cf6c2bf"> What am I missing? Also, using cron style does not work, day, month... is working find within `schedule`: ``` docker-compose run archivebox schedule --every="5 * * * *" https://example.com ``` <img width="1728" alt="grafik" src="https://github.com/ArchiveBox/ArchiveBox/assets/4574033/91c04b7c-6a56-43dd-8167-e8ca6707bcb5"> --- In general, what is your workflow advice to archive a list of URLs regularly?
Author
Owner

@pirate commented on GitHub (Oct 9, 2023):

Have you mounted urls.txt in a docker volume or is it in the root of the data folder? Please post your docker-compose.yml volume config to show where urls.txt is located.

Also when using * characters in CLI args you have to use single quotes, not double quotes as otherwise it'll expand to the list of all files in the current directory as you see in your last screenshot.

<!-- gh-comment-id:1754007845 --> @pirate commented on GitHub (Oct 9, 2023): Have you mounted urls.txt in a docker volume or is it in the root of the data folder? Please post your docker-compose.yml volume config to show where urls.txt is located. Also when using `*` characters in CLI args you have to use single quotes, not double quotes as otherwise it'll expand to the list of all files in the current directory as you see in your last screenshot.
Author
Owner

@cutterkom commented on GitHub (Oct 15, 2023):

Oh, right. I didn't mount it!

The docker-compose.yml is here, I mount the urls list now into the scheduler section: github.com/forummuenchen/forum-archivebox@43bd662c13/docker-compose.yml (L119)

Before that, I download the list into the archivebox directory with:

curl -O 'https://raw.githubusercontent.com/forummuenchen/forum-archivebox/main/data/urls_test.txt'

When I want to add the urls list:

docker-compose run archivebox -T schedule --every=hour < urls_test.txt

I get:

the input device is not a TTY
<!-- gh-comment-id:1763390969 --> @cutterkom commented on GitHub (Oct 15, 2023): Oh, right. I didn't mount it! The `docker-compose.yml` is [here](https://github.com/forummuenchen/forum-archivebox/blob/main/docker-compose.yml#L119), I mount the urls list now into the scheduler section: https://github.com/forummuenchen/forum-archivebox/blob/43bd662c13b961a8d71d14cd4b1de0e26897ca9f/docker-compose.yml#L119 Before that, I download the list into the `archivebox` directory with: ```shell curl -O 'https://raw.githubusercontent.com/forummuenchen/forum-archivebox/main/data/urls_test.txt' ``` When I want to add the urls list: ```shell docker-compose run archivebox -T schedule --every=hour < urls_test.txt ``` I get: ``` the input device is not a TTY ```
Author
Owner

@pirate commented on GitHub (Oct 16, 2023):

So < is actually piping outside of docker (because < is greedily parsed by the first shell that sees it), not inside docker. If you want to load the file inside docker, do:

docker-compose run archivebox -T schedule --every=hour /data/urls_test.txt
<!-- gh-comment-id:1765229951 --> @pirate commented on GitHub (Oct 16, 2023): So `<` is actually piping outside of docker (because < is greedily parsed by the first shell that sees it), not inside docker. If you want to load the file inside docker, do: ```yml docker-compose run archivebox -T schedule --every=hour /data/urls_test.txt ```
Author
Owner

@cutterkom commented on GitHub (Nov 8, 2023):

Sorry for the long delay, I was on vacation....

I tried your propesed statement (without the -T), but it's not doing the trick.

docker-compose run archivebox schedule --every=hour /data/urls_test.txt
docker-compose restart archivebox_scheduler
docker-compose run archivebox schedule --run-all

data/logs/schedule.log says, it failed to parse:

find: ‘/home/archivebox/.config/chromium/Crash Reports/pending/’: No such file or directory
[i] [2023-11-08 07:02:57] ArchiveBox v0.6.3: archivebox add --depth=0 /data/urls_test.txt
    > /data

find: ‘/home/archivebox/.config/chromium/Crash Reports/pending/’: No such file or directory
[+] [2023-11-08 07:02:57] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/1699426977-import.txt
    > Parsed 0 URLs from input (Failed to parse)
    > Found 0 new URLs not already in index

[*] [2023-11-08 07:02:57] Writing 0 links to main index...
    √ ./index.sqlite3

Do you have any other recommondation or can you point me to projects the use the scheduler in production?

<!-- gh-comment-id:1801214546 --> @cutterkom commented on GitHub (Nov 8, 2023): Sorry for the long delay, I was on vacation.... I tried your propesed statement (without the `-T`), but it's not doing the trick. ```shell docker-compose run archivebox schedule --every=hour /data/urls_test.txt docker-compose restart archivebox_scheduler docker-compose run archivebox schedule --run-all ``` `data/logs/schedule.log` says, it failed to parse: ``` find: ‘/home/archivebox/.config/chromium/Crash Reports/pending/’: No such file or directory [i] [2023-11-08 07:02:57] ArchiveBox v0.6.3: archivebox add --depth=0 /data/urls_test.txt > /data find: ‘/home/archivebox/.config/chromium/Crash Reports/pending/’: No such file or directory [+] [2023-11-08 07:02:57] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1699426977-import.txt > Parsed 0 URLs from input (Failed to parse) > Found 0 new URLs not already in index [*] [2023-11-08 07:02:57] Writing 0 links to main index... √ ./index.sqlite3 ``` Do you have any other recommondation or can you point me to projects the use the scheduler in production?
Author
Owner

@pirate commented on GitHub (Nov 8, 2023):

Can you try the latest 0.7 image: archivebox/archivebox:latest and set --depth=1

<!-- gh-comment-id:1801279936 --> @pirate commented on GitHub (Nov 8, 2023): Can you try the latest 0.7 image: `archivebox/archivebox:latest` and set `--depth=1`
Author
Owner

@cutterkom commented on GitHub (Nov 9, 2023):

Okay, I updated to the latest image and ran:

docker-compose run archivebox schedule --every=hour --depth=1 /data/urls_test.txt

Unfortunately, it does not do the trick (yet):

chown: cannot access '/browsers/*': No such file or directory
[i] [2023-11-09 18:38:19] ArchiveBox v0.7.1+editable: archivebox schedule --every=hour --depth=1 /data/urls_test.txt
    > /data

Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/app/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/cli/archivebox_schedule.py", line 92, in main
    schedule(
  File "/app/archivebox/util.py", line 116, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/main.py", line 1215, in schedule
    cron.write()
  File "/usr/local/lib/python3.11/site-packages/crontab.py", line 384, in write
    filed, path = tempfile.mkstemp()
                  ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/tempfile.py", line 334, in mkstemp
    prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/tempfile.py", line 126, in _sanitize_params
    dir = gettempdir()
          ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/tempfile.py", line 299, in gettempdir
    return _os.fsdecode(_gettempdir())
                        ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/tempfile.py", line 292, in _gettempdir
    tempdir = _get_default_tempdir()
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/tempfile.py", line 223, in _get_default_tempdir
    raise FileNotFoundError(_errno.ENOENT,
FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/data']
<!-- gh-comment-id:1804376660 --> @cutterkom commented on GitHub (Nov 9, 2023): Okay, I updated to [the latest image](https://github.com/forummuenchen/forum-archivebox/blob/main/docker-compose.yml#L95) and ran: ```shell docker-compose run archivebox schedule --every=hour --depth=1 /data/urls_test.txt ``` Unfortunately, it does not do the trick (yet): ```shell chown: cannot access '/browsers/*': No such file or directory [i] [2023-11-09 18:38:19] ArchiveBox v0.7.1+editable: archivebox schedule --every=hour --depth=1 /data/urls_test.txt > /data Traceback (most recent call last): File "/usr/local/bin/archivebox", line 8, in <module> sys.exit(main()) ^^^^^^ File "/app/archivebox/cli/__init__.py", line 140, in main run_subcommand( File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/cli/archivebox_schedule.py", line 92, in main schedule( File "/app/archivebox/util.py", line 116, in typechecked_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/app/archivebox/main.py", line 1215, in schedule cron.write() File "/usr/local/lib/python3.11/site-packages/crontab.py", line 384, in write filed, path = tempfile.mkstemp() ^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/tempfile.py", line 334, in mkstemp prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/tempfile.py", line 126, in _sanitize_params dir = gettempdir() ^^^^^^^^^^^^ File "/usr/local/lib/python3.11/tempfile.py", line 299, in gettempdir return _os.fsdecode(_gettempdir()) ^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/tempfile.py", line 292, in _gettempdir tempdir = _get_default_tempdir() ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/tempfile.py", line 223, in _get_default_tempdir raise FileNotFoundError(_errno.ENOENT, FileNotFoundError: [Errno 2] No usable temporary directory found in ['/tmp', '/var/tmp', '/usr/tmp', '/data'] ```
Author
Owner

@pirate commented on GitHub (Jan 19, 2024):

The latest version has a bunch of fixes and improvements that might help with this. Can you try on 0.7.2 or 0.7.3?

Other things to check: make sure your Docker VM has enough disk space available, make sure /data is readable and writable.

<!-- gh-comment-id:1899628771 --> @pirate commented on GitHub (Jan 19, 2024): The latest version has a bunch of fixes and improvements that might help with this. Can you try on 0.7.2 or 0.7.3? Other things to check: make sure your Docker VM has enough disk space available, make sure /data is readable and writable.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3782
No description provided.