[GH-ISSUE #782] Bug: tags parsed as individual characters from Pinboard export #2006

Closed
opened 2026-03-01 17:55:46 +03:00 by kerem · 2 comments
Owner

Originally created by @tmladek on GitHub (Jul 6, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/782

Describe the bug

Today, I made a JSON export of my Pinboard account; it looks just abount fine at a glance, but when I ran archivebox add < pinboard_export.json, the links got tagged with the individual characters of the tags - i.e. not music, but m, u, s, i, c...

Steps to reproduce

  1. Set up ArchiveBox via docker-compose
    1a. Download the latest archivebox docker-compose.yml
    1b. docker-compose run archivebox --init
  2. Download Pinboard JSON export from https://pinboard.in/settings/backup
  3. docker-compose run archivebox add < ../pinboard_export.2021.07.06_08.55.json
  4. Tags are characters (in the Web UI).

Screenshots or log output

image

ArchiveBox version

ArchiveBox v0.6.2
Cpython Linux Linux-5.12.13-arch1-2-x86_64-with-glibc2.28 x86_64
IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/local/bin/archivebox                                                   
 √  PYTHON_BINARY         v3.9.5          valid     /usr/local/bin/python3.9                                                    
 √  DJANGO_BINARY         v3.1.10         valid     /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py           
 √  CURL_BINARY           v7.64.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.1         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v15.14.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     /node/node_modules/single-file/cli/single-file                              
 √  READABILITY_BINARY    v0.0.2          valid     /node/node_modules/readability-extractor/readability-extractor              
 √  MERCURY_BINARY        v1.0.0          valid     /node/node_modules/@postlight/mercury-parser/cli.js                         
 √  GIT_BINARY            v2.20.1         valid     /usr/bin/git                                                                
 √  YOUTUBEDL_BINARY      v2021.04.26     valid     /usr/local/bin/youtube-dl                                                   
 √  CHROME_BINARY         v90.0.4430.93   valid     /usr/bin/chromium                                                           
 √  RIPGREP_BINARY        v0.10.0         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           22 files        valid     /app/archivebox                                                             
 √  TEMPLATES_DIR         3 files         valid     /app/archivebox/templates                                                   
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            5 files         valid     /data                                                                       
 √  SOURCES_DIR           2 files         valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           5 files         valid     ./archive                                                                   
 √  CONFIG_FILE           81.0 Bytes      valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             1.9 MB          valid     ./index.sqlite3 

Speculation

(In my experience with Python and JSON, I kinda think that the tags property used to be an array but now is just a string separated by spaces?)

Originally created by @tmladek on GitHub (Jul 6, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/782 #### Describe the bug Today, I made a JSON export of my Pinboard account; it looks just abount fine at a glance, but when I ran `archivebox add < pinboard_export.json`, the links got tagged with the *individual characters* of the tags - i.e. not `music`, but `m`, `u`, `s`, `i`, `c`... #### Steps to reproduce 1. Set up ArchiveBox via docker-compose 1a. Download the latest archivebox `docker-compose.yml` 1b. `docker-compose run archivebox --init` 2. Download Pinboard JSON export from https://pinboard.in/settings/backup 3. `docker-compose run archivebox add < ../pinboard_export.2021.07.06_08.55.json` 4. Tags are characters (in the Web UI). #### Screenshots or log output ![image](https://user-images.githubusercontent.com/5217341/124574048-60540f80-de4a-11eb-8e83-7b6bc5a442a5.png) #### ArchiveBox version ```logs ArchiveBox v0.6.2 Cpython Linux Linux-5.12.13-arch1-2-x86_64-with-glibc2.28 x86_64 IN_DOCKER=True DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep [i] Dependency versions: √ ARCHIVEBOX_BINARY v0.6.2 valid /usr/local/bin/archivebox √ PYTHON_BINARY v3.9.5 valid /usr/local/bin/python3.9 √ DJANGO_BINARY v3.1.10 valid /usr/local/lib/python3.9/site-packages/django/bin/django-admin.py √ CURL_BINARY v7.64.0 valid /usr/bin/curl √ WGET_BINARY v1.20.1 valid /usr/bin/wget √ NODE_BINARY v15.14.0 valid /usr/bin/node √ SINGLEFILE_BINARY v0.3.16 valid /node/node_modules/single-file/cli/single-file √ READABILITY_BINARY v0.0.2 valid /node/node_modules/readability-extractor/readability-extractor √ MERCURY_BINARY v1.0.0 valid /node/node_modules/@postlight/mercury-parser/cli.js √ GIT_BINARY v2.20.1 valid /usr/bin/git √ YOUTUBEDL_BINARY v2021.04.26 valid /usr/local/bin/youtube-dl √ CHROME_BINARY v90.0.4430.93 valid /usr/bin/chromium √ RIPGREP_BINARY v0.10.0 valid /usr/bin/rg [i] Source-code locations: √ PACKAGE_DIR 22 files valid /app/archivebox √ TEMPLATES_DIR 3 files valid /app/archivebox/templates - CUSTOM_TEMPLATES_DIR - disabled [i] Secrets locations: - CHROME_USER_DATA_DIR - disabled - COOKIES_FILE - disabled [i] Data locations: √ OUTPUT_DIR 5 files valid /data √ SOURCES_DIR 2 files valid ./sources √ LOGS_DIR 1 files valid ./logs √ ARCHIVE_DIR 5 files valid ./archive √ CONFIG_FILE 81.0 Bytes valid ./ArchiveBox.conf √ SQL_INDEX 1.9 MB valid ./index.sqlite3 ``` #### Speculation (In my experience with Python and JSON, I kinda think that the `tags` property used to be an array but now is just a string separated by spaces?)
kerem closed this issue 2026-03-01 17:55:46 +03:00
Author
Owner

@tmladek commented on GitHub (Jul 6, 2021):

Oh dammit, I just saw #725. Oh well 🙃

<!-- gh-comment-id:874600458 --> @tmladek commented on GitHub (Jul 6, 2021): Oh dammit, I just saw #725. Oh well :upside_down_face:
Author
Owner

@pirate commented on GitHub (Sep 16, 2021):

Closing as duplicate of #725.

<!-- gh-comment-id:920540964 --> @pirate commented on GitHub (Sep 16, 2021): Closing as duplicate of #725.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#2006
No description provided.