[GH-ISSUE #1361] htmltotext archive results are not recorded #833

Closed
opened 2026-03-01 14:46:38 +03:00 by kerem · 1 comment
Owner

Originally created by @jimwins on GitHub (Feb 25, 2024).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1361

Because cmd is only set within the except path, an exception (cannot access local variable 'cmd' where it is not associated with a value) is thrown instead of the ArchiveResult actually being saved. This results in the htmltotext.txt existing in the archive data but not in the archive results table.

Setting cmd should be moved before the try.

github.com/ArchiveBox/ArchiveBox@f02b27920c/archivebox/extractors/htmltotext.py (L119)

Originally created by @jimwins on GitHub (Feb 25, 2024). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1361 Because `cmd` is only set within the except path, an exception (`cannot access local variable 'cmd' where it is not associated with a value`) is thrown instead of the ArchiveResult actually being saved. This results in the htmltotext.txt existing in the archive data but not in the archive results table. Setting `cmd` should be moved before the `try`. https://github.com/ArchiveBox/ArchiveBox/blob/f02b27920c41a9a1182da4d1871f7ba693c20c3a/archivebox/extractors/htmltotext.py#L119
kerem closed this issue 2026-03-01 14:46:38 +03:00
Author
Owner

@jimwins commented on GitHub (Feb 25, 2024):

Already fixed in the dev branch

<!-- gh-comment-id:1962777783 --> @jimwins commented on GitHub (Feb 25, 2024): Already fixed in the dev branch
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#833
No description provided.