mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 09:06:02 +03:00
[GH-ISSUE #1203] Feature Request: a web clipper #3763
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#3763
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @berezovskyi on GitHub (Aug 8, 2023).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/1203
Type
What is the problem that your feature request solves
Being able to clip webpage contents that are hard to fetch using ArchiveBox (captchas, datacenter IP blocks, authentication).
Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes
Evernote Web Clipper did it perfectly.
What hacks or alternative solutions have you tried to solve the problem?
Using Evernote Web Clipper for pages that ArchiveBox cannot archive. Tried Joplin and it seems to do the job too.
How badly do you want this new feature?
@berezovskyi commented on GitHub (Aug 8, 2023):
I started to look around on how this could be done technically and the first idea I have is to take some OSS clipper extension and fork it to suit AB needs. Eg https://github.com/go-shiori/shiori-web-ext
Regarding the upload, I think the best way would be to allow AB to import WARCs (also see https://github.com/ArchiveBox/ArchiveBox/issues/160). Then, perhaps, an extension like https://github.com/machawk1/warcreate could be used without any changes or with a minimal one (to automatically upload the WARC).
@gerroon commented on GitHub (Dec 15, 2023):
This would be so awesome! Joplin has a good web clipper. Trilium's web cliper is ok.
@pirate commented on GitHub (Dec 15, 2023):
In the meantime as a workaround if you urgently need this, any files placed into the snapshot folder (
./archive/<timestamp>/) will be respected by archivebox. So if you have any external WARC, PNG, PDF, etc files you can drag them into the snapshot folder manually or create a small script to place them in there.If you overwrite the existing files or use the default names archivebox uses it will even display them properly in the UI as part of the snapshot.
I try to respect the UNIX "everything is a file" mentality, and may even move towards supporting more pure filesystem-based manipulation of the archives in future releases.