[GH-ISSUE #1191] Links with an ampersand (&) can not be added (via docker-compose) #739

Closed
opened 2026-03-01 14:45:59 +03:00 by kerem · 2 comments
Owner

Originally created by @jkirk on GitHub (Jul 24, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1191

I noticed that links with an ampersand in it can not be added to archivebox:

❯ docker-compose run archivebox add --extract singlefile "https://grml.org/&"

Escaping & makes it work:

❯ docker-compose run archivebox add --extract singlefile "https://grml.org/\&"             
[i] [2023-07-24 10:36:01] ArchiveBox v0.6.2: archivebox add --extract singlefile https://grml.org/&grml
    > /data

[+] [2023-07-24 10:36:02] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/1690194962-import.txt
    > Parsed 1 URLs from input (Generic TXT)                                                                                                                                                                                                  
    > Found 1 new URLs not already in index

[*] [2023-07-24 10:36:02] Writing 1 links to main index...
    √ ./index.sqlite3                                                                                                                                                                                                                         

[] [2023-07-24 10:36:02] Starting archiving of 1 snapshots in index...

[+] [2023-07-24 10:36:02] "grml.org/&"
    https://grml.org/&
    > ./archive/1690194962.156108
      > singlefile
        3 files (240.7 KB) in 0:00:05s                                                                                                                                                                                                        

[] [2023-07-24 10:36:08] Update of 1 pages complete (5.99 sec)
    - 0 links skipped
    - 1 links updated
    - 0 links had errors

    Hint: To manage your archive in a Web UI, run:
        archivebox server 0.0.0.0:8000

~/software/docker/archivebox took 10s 
❯

If've made some tests:

❯ cat test.sh 
#!/bin/bash
bash -c "echo $*"

❯ ./test.sh "http://grml.org/?&" "https://grml.org?&grml"     
http://grml.org/?
bash: line 1: https://grml.org?: No such file or directory
bash: line 1: grml: command not found
❯ cat test.sh 
#!/bin/bash
bash -c "echo '$*'"

❯ ./test.sh "http://grml.org/?&" "https://grml.org?&grml"          
http://grml.org/?& https://grml.org?&grml

Maybe the following line in docker_entrypoint.sh should be changed:
github.com/ArchiveBox/ArchiveBox@40ddd33602/bin/docker_entrypoint.sh (L44)

to:

exec gosu "$ARCHIVEBOX_USER" bash -c "archivebox '$*'"

Thanks for this awesome software.

Originally created by @jkirk on GitHub (Jul 24, 2023). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1191 I noticed that links with an ampersand in it can not be added to archivebox: ```sh ❯ docker-compose run archivebox add --extract singlefile "https://grml.org/&" ❯ ``` Escaping `&` makes it work: ```sh ❯ docker-compose run archivebox add --extract singlefile "https://grml.org/\&" [i] [2023-07-24 10:36:01] ArchiveBox v0.6.2: archivebox add --extract singlefile https://grml.org/&grml > /data [+] [2023-07-24 10:36:02] Adding 1 links to index (crawl depth=0)... > Saved verbatim input to sources/1690194962-import.txt > Parsed 1 URLs from input (Generic TXT) > Found 1 new URLs not already in index [*] [2023-07-24 10:36:02] Writing 1 links to main index... √ ./index.sqlite3 [▶] [2023-07-24 10:36:02] Starting archiving of 1 snapshots in index... [+] [2023-07-24 10:36:02] "grml.org/&" https://grml.org/& > ./archive/1690194962.156108 > singlefile 3 files (240.7 KB) in 0:00:05s [√] [2023-07-24 10:36:08] Update of 1 pages complete (5.99 sec) - 0 links skipped - 1 links updated - 0 links had errors Hint: To manage your archive in a Web UI, run: archivebox server 0.0.0.0:8000 ~/software/docker/archivebox took 10s ❯ ``` If've made some tests: ```sh ❯ cat test.sh #!/bin/bash bash -c "echo $*" ❯ ./test.sh "http://grml.org/?&" "https://grml.org?&grml" http://grml.org/? bash: line 1: https://grml.org?: No such file or directory bash: line 1: grml: command not found ``` ```sh ❯ cat test.sh #!/bin/bash bash -c "echo '$*'" ❯ ./test.sh "http://grml.org/?&" "https://grml.org?&grml" http://grml.org/?& https://grml.org?&grml ``` Maybe the following line in `docker_entrypoint.sh` should be changed: https://github.com/ArchiveBox/ArchiveBox/blob/40ddd3360207aefd0e2c168833c72a6868c9c80b/bin/docker_entrypoint.sh#L44 to: ```sh exec gosu "$ARCHIVEBOX_USER" bash -c "archivebox '$*'" ``` Thanks for this awesome software.
Author
Owner

@jkirk commented on GitHub (Jul 24, 2023):

Oh, I just noticed that the given URL is listed as 'Pending':

screenshot_20230724T125649

Selecting the item and clicking "⬇ Title" updates the title. And yes, https://grml.org/&grml does not exist and returns a "404 not found". Its just an example.

<!-- gh-comment-id:1647692229 --> @jkirk commented on GitHub (Jul 24, 2023): Oh, I just noticed that the given URL is listed as 'Pending': ![screenshot_20230724T125649](https://github.com/ArchiveBox/ArchiveBox/assets/288637/5d7174fb-bbb1-49da-a204-801d7db39671) Selecting the item and clicking "⬇ Title" updates the title. And yes, https://grml.org/&grml does not exist and returns a "404 not found". Its just an example.
Author
Owner

@pirate commented on GitHub (Jan 19, 2024):

This should be working as of v0.7.2 (I changed the docker_entrypoint.sh to requote shell arguments properly).

<!-- gh-comment-id:1899753545 --> @pirate commented on GitHub (Jan 19, 2024): This should be working as of v0.7.2 (I changed the `docker_entrypoint.sh` to requote shell arguments properly).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#739
No description provided.