[GH-ISSUE #577] Feature Request: Browser extension to submit either all history or certain URLs to a given ArchiveBox instance #3382

Closed
opened 2026-03-14 22:31:30 +03:00 by kerem · 38 comments
Owner

Originally created by @adamwolf on GitHub (Dec 9, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/577

Hi folks!

After adding the little bookmarklet, I'd like to add another extension. Once the API is closer, would you rather see an Android/iOS "share to" app extension, or a Chrome extension to quickly submit an URL to your ArchiveBox?

(Of course, if these are both things you don't like, just let me know! :)

Originally created by @adamwolf on GitHub (Dec 9, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/577 Hi folks! After adding the little bookmarklet, I'd like to add another extension. Once the API is closer, would you rather see an Android/iOS "share to" app extension, or a Chrome extension to quickly submit an URL to your ArchiveBox? (Of course, if these are both things you don't like, just let me know! :)
Author
Owner

@pirate commented on GitHub (Dec 9, 2020):

Yeah for sure, that would be great! We can easily expose an /add endpoint for those. I don't have any Android/iOS app dev experience, so that's definitely something we could use help with.

<!-- gh-comment-id:742074271 --> @pirate commented on GitHub (Dec 9, 2020): Yeah for sure, that would be great! We can easily expose an `/add` endpoint for those. I don't have any Android/iOS app dev experience, so that's definitely something we could use help with.
Author
Owner

@pirate commented on GitHub (Jan 23, 2021):

Copying @CodingSpiderFox's message from duplicate ticket here:

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

I don't want to manually type the URLs in my shell or run the export script regularly because I tend to for get it and I also want to save my pages right away. Also, I want Archivebox running on my NAS and not on my local computer.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

I want to have a plugin for at least Firefox and Chrome where I can

  • configure the URL of my archivebox on my local network and my credentials for my archivebox
  • have two modes:
  • a) it logs every URL I visited automatically to my archivebox and archivebox saves it right away
  • b) a button in the addons toolbar that I can click which submits the current open URL in the current tab (only the current tab) to my archivebox and archivebox saves it right away

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
<!-- gh-comment-id:766187824 --> @pirate commented on GitHub (Jan 23, 2021): Copying @CodingSpiderFox's message from duplicate ticket here: > ## Type > > - [ ] General question or discussion > - [X] Propose a brand new feature > - [ ] Request modification of existing behavior or design > > ## What is the problem that your feature request solves > I don't want to manually type the URLs in my shell or run the export script regularly because I tend to for get it and I also want to save my pages right away. Also, I want Archivebox running on my NAS and not on my local computer. > > ## Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes > I want to have a plugin for at least Firefox and Chrome where I can > * configure the URL of my archivebox on my local network and my credentials for my archivebox > * have two modes: > * a) it logs **every** URL I visited automatically to my archivebox and archivebox saves it right away > * b) a button in the addons toolbar that I can click which submits the current open URL in the current tab (only the current tab) to my archivebox and archivebox saves it right away > > ## How badly do you want this new feature? > > - [ ] It's an urgent deal-breaker, I can't live without it > - [X] It's important to add it in the near-mid term future > - [ ] It would be nice to have eventually > > --- > > - [X] I'm willing to contribute dev time / money to fix this issue > - [X] I like ArchiveBox so far / would recommend it to a friend > - [ ] I've had a lot of difficulty getting ArchiveBox set up
Author
Owner

@adamwolf commented on GitHub (Jan 23, 2021):

Hi! I haven't followed this project as closely as I have in the past, but I keep seeing it in headlines... good work!

Is there an /add or equivalent API endpoint? No worries if not... I'm a little overbooked with billable work at the moment but if there isn't one yet, is there a particular ticket that tracks that? I could subscribe to that so I know when to get started on this.

<!-- gh-comment-id:766188362 --> @adamwolf commented on GitHub (Jan 23, 2021): Hi! I haven't followed this project as closely as I have in the past, but I keep seeing it in headlines... good work! Is there an /add or equivalent API endpoint? No worries if not... I'm a little overbooked with billable work at the moment but if there isn't one yet, is there a particular ticket that tracks that? I could subscribe to that so I know when to get started on this.
Author
Owner

@pirate commented on GitHub (Jan 23, 2021):

There is an /add endpoint now, but it's the one used by the UI so it requires a CSRF token which is a pain for API-style usage. No ticket for fixing that yet, but I'll be sure to post back here once I stabilize that endpoint more.

I'm also a bit swamped with my day job right now, but I haven't forgotten about this.

<!-- gh-comment-id:766189023 --> @pirate commented on GitHub (Jan 23, 2021): There is an `/add` endpoint now, but it's the one used by the UI so it requires a CSRF token which is a pain for API-style usage. No ticket for fixing that yet, but I'll be sure to post back here once I stabilize that endpoint more. I'm also a bit swamped with my day job right now, but I haven't forgotten about this.
Author
Owner

@adamwolf commented on GitHub (Jan 23, 2021):

No problem! Do not rush to implement this for my sake! :) Thanks for all your work.

<!-- gh-comment-id:766189235 --> @adamwolf commented on GitHub (Jan 23, 2021): No problem! Do not rush to implement this for my sake! :) Thanks for all your work.
Author
Owner

@pirate commented on GitHub (Mar 10, 2021):

Ideally a browser extension for ArchiveBox should be releasable cross-platform with minimal effort on the packaging side (ideally like something equivalent to FPM in the Debian packaging world).

Some of my research so far:

So far this seems like the best place to get started: https://www.emailthis.me/open-source/extension-boilerplate
Their sample extension is quite close to what the ArchiveBox extension UI would need.

If anyone wants to take a crack at this, PRs are welcome! In theory an extension that submits a POST to http://<user configurable archivebox host>/add? could be accomplished in <200 LOC.

<!-- gh-comment-id:795756958 --> @pirate commented on GitHub (Mar 10, 2021): Ideally a browser extension for ArchiveBox should be releasable cross-platform with minimal effort on the packaging side (ideally like something equivalent to FPM in the Debian packaging world). Some of my research so far: - https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Build_a_cross_browser_extension - https://medium.com/swlh/bootstrapping-complex-chrome-firefox-edge-extensions-with-create-react-app-667be8df35d7 - https://www.smashingmagazine.com/2017/04/browser-extension-edge-chrome-firefox-opera-brave-vivaldi/ - :star: https://github.com/EmailThis/extension-boilerplate https://www.emailthis.me/open-source/extension-boilerplate - https://extensionizr.com/ - https://project-awesome.org/bfred-it/Awesome-WebExtensions - :star: https://github.com/dkthehuman/extension-starter-kit So far this seems like the best place to get started: https://www.emailthis.me/open-source/extension-boilerplate Their sample extension is quite close to what the ArchiveBox extension UI would need. If anyone wants to take a crack at this, PRs are welcome! In theory an extension that submits a POST to `http://<user configurable archivebox host>/add?` could be accomplished in <200 LOC.
Author
Owner

@voarsh2 commented on GitHub (Mar 15, 2021):

This extension would be great.
Also, as well as submitting urls with a click, it might make it easy to have an automatic submission (if that's an option and turned on), to submit browser history.

<!-- gh-comment-id:799622464 --> @voarsh2 commented on GitHub (Mar 15, 2021): This extension would be great. Also, as well as submitting urls with a click, it might make it easy to have an automatic submission (if that's an option and turned on), to submit browser history.
Author
Owner

@pirate commented on GitHub (Apr 1, 2021):

@layderv has written a sample extension for Firefox: https://github.com/layderv/archivefox

(x-posting this here)

<!-- gh-comment-id:812026350 --> @pirate commented on GitHub (Apr 1, 2021): @layderv has written a sample extension for Firefox: https://github.com/layderv/archivefox (x-posting this here)
Author
Owner

@layderv commented on GitHub (Apr 1, 2021):

Would it be useful to add it to the repo's readme? Is there any useful, missing feature?

<!-- gh-comment-id:812059845 --> @layderv commented on GitHub (Apr 1, 2021): Would it be useful to add it to the repo's readme? Is there any useful, missing feature?
Author
Owner

@LennyPenny commented on GitHub (Apr 1, 2021):

I think it would be cool to have an optional mode in this extension that will just queue every page you visit to be archived

edit: oh nvm https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-799622464 already mentions that

<!-- gh-comment-id:812113211 --> @LennyPenny commented on GitHub (Apr 1, 2021): I think it would be cool to have an optional mode in this extension that will just queue every page you visit to be archived edit: oh nvm https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-799622464 already mentions that
Author
Owner

@voarsh2 commented on GitHub (Apr 2, 2021):

@layderv has written a sample extension for Firefox: https://github.com/layderv/archivefox

(x-posting this here)

Cool, except I'm on Chrome.

<!-- gh-comment-id:812458588 --> @voarsh2 commented on GitHub (Apr 2, 2021): > @layderv has written a sample extension for Firefox: https://github.com/layderv/archivefox > > (x-posting this here) Cool, except I'm on Chrome.
Author
Owner

@rastacalavera commented on GitHub (May 13, 2021):

So i installed the addon but my instance is on a raspberry pi not my host computer. It looks like the addon and the instance need to be on the same machine? Is the correct? Or, can I put in the url with port number and /add at the end?
image

<!-- gh-comment-id:840733698 --> @rastacalavera commented on GitHub (May 13, 2021): So i installed the addon but my instance is on a raspberry pi not my host computer. It looks like the addon and the instance need to be on the same machine? Is the correct? Or, can I put in the url with port number and `/add` at the end? ![image](https://user-images.githubusercontent.com/6867792/118167074-e2482f00-b3eb-11eb-8010-b373a34a839a.png)
Author
Owner

@layderv commented on GitHub (May 19, 2021):

@rastacalavera the addon's repository is probably best to ask this. I didn't add that feature, but if you show me how you use it manually, I can see how to add it

<!-- gh-comment-id:844549789 --> @layderv commented on GitHub (May 19, 2021): @rastacalavera the addon's repository is probably best to ask this. I didn't add that feature, but if you show me how you use it manually, I can see how to add it
Author
Owner

@tjhorner commented on GitHub (Jun 29, 2021):

Hey @pirate, I can work on this if you'd like. I'm not well-versed in Python/Django, so I'd appreciate if you could add the API endpoint for adding URLs to archive. (Else, I can totally try it myself, doesn't seem too difficult!) How would authentication work? I think for now a simple shared secret that's defined in the config would be fine.

I'll work on the browser extension for now. Since archiving all your history would probably take up way too much space and not be very useful (for e.g. Gmail, Google Photos, other auth'd services), I think the best way to determine which sites to archive would be:

  • Don't archive any sites by default
  • Users can manually archive the current page (or links) from the context menu
  • Users can add domains/regexes to auto-archive from settings
  • If the extension notices a user browsing a certain domain often, it will ask them if they'd like to archive it or not. If they choose yes, then it'll retroactively archive the history (going back some amount of days; not forever) and any future visit to that domain

So as to not accidentally DoS your ArchiveBox instance, matched URLs would be buffered and submitted in batches, every 10 minutes or so. But if the user closes their browser while there are buffered URLs, it would submit them immediately before closing.

What do you think?

<!-- gh-comment-id:870974915 --> @tjhorner commented on GitHub (Jun 29, 2021): Hey @pirate, I can work on this if you'd like. I'm not well-versed in Python/Django, so I'd appreciate if you could add the API endpoint for adding URLs to archive. (Else, I can totally try it myself, doesn't seem too difficult!) How would authentication work? I think for now a simple shared secret that's defined in the config would be fine. I'll work on the browser extension for now. Since archiving _all_ your history would probably take up way too much space and not be very useful (for e.g. Gmail, Google Photos, other auth'd services), I think the best way to determine which sites to archive would be: - Don't archive any sites by default - Users can manually archive the current page (or links) from the context menu - Users can add domains/regexes to auto-archive from settings - If the extension notices a user browsing a certain domain often, it will ask them if they'd like to archive it or not. If they choose yes, then it'll retroactively archive the history (going back some amount of days; not forever) and any future visit to that domain So as to not accidentally DoS your ArchiveBox instance, matched URLs would be buffered and submitted in batches, every 10 minutes or so. But if the user closes their browser while there are buffered URLs, it would submit them immediately before closing. What do you think?
Author
Owner

@tjhorner commented on GitHub (Jun 30, 2021):

I've got something working pretty well! Here are some screenshots:

image

image

image

And here is the repo: https://github.com/tjhorner/archivebox-exporter

All that's left is to implement the actual API call to ArchiveBox (and some config fields for pointing to the right domain). Let me know if you want to take care of implementing that server-side or if you're fine with me handling it.

<!-- gh-comment-id:871090471 --> @tjhorner commented on GitHub (Jun 30, 2021): I've got something working pretty well! Here are some screenshots: ![image](https://user-images.githubusercontent.com/2646487/123902886-a33e5080-d93b-11eb-9b58-014d57d5e62a.png) ![image](https://user-images.githubusercontent.com/2646487/123902910-af2a1280-d93b-11eb-899c-3463e8e8d177.png) ![image](https://user-images.githubusercontent.com/2646487/123902921-b3eec680-d93b-11eb-9887-389319581a9f.png) And here is the repo: https://github.com/tjhorner/archivebox-exporter All that's left is to implement the actual API call to ArchiveBox (and some config fields for pointing to the right domain). Let me know if you want to take care of implementing that server-side or if you're fine with me handling it.
Author
Owner

@tjhorner commented on GitHub (Jul 1, 2021):

Just an update: I forked ArchiveBox and added a temporary API endpoint just for the extension. You can see that branch here: https://github.com/tjhorner/archivebox/tree/temporary-add-api (More info on how to set it up here: https://github.com/tjhorner/archivebox-exporter/wiki/Setup)

I submitted the extension to both the Chrome and Firefox web stores, and I'll post another comment here when they both pass review. Once ArchiveBox gets a more official API, I'll be glad to update the extension to support that instead of this weird hacky solution I've come up with, hah. But for now I think this is a decent solution.

<!-- gh-comment-id:872610392 --> @tjhorner commented on GitHub (Jul 1, 2021): Just an update: I forked ArchiveBox and added a temporary API endpoint just for the extension. You can see that branch here: https://github.com/tjhorner/archivebox/tree/temporary-add-api (More info on how to set it up here: https://github.com/tjhorner/archivebox-exporter/wiki/Setup) I submitted the extension to both the Chrome and Firefox web stores, and I'll post another comment here when they both pass review. Once ArchiveBox gets a more official API, I'll be glad to update the extension to support that instead of this weird hacky solution I've come up with, hah. But for now I think this is a decent solution.
Author
Owner

@voarsh2 commented on GitHub (Jul 2, 2021):

Just an update: I forked ArchiveBox and added a temporary API endpoint just for the extension. You can see that branch here: https://github.com/tjhorner/archivebox/tree/temporary-add-api (More info on how to set it up here: https://github.com/tjhorner/archivebox-exporter/wiki/Setup)

I submitted the extension to both the Chrome and Firefox web stores, and I'll post another comment here when they both pass review. Once ArchiveBox gets a more official API, I'll be glad to update the extension to support that instead of this weird hacky solution I've come up with, hah. But for now I think this is a decent solution.

Awesome, really pumped to try this!
Hopefully I'll have some time in the next few days.

<!-- gh-comment-id:872620531 --> @voarsh2 commented on GitHub (Jul 2, 2021): > Just an update: I forked ArchiveBox and added a temporary API endpoint just for the extension. You can see that branch here: https://github.com/tjhorner/archivebox/tree/temporary-add-api (More info on how to set it up here: https://github.com/tjhorner/archivebox-exporter/wiki/Setup) > > I submitted the extension to both the Chrome and Firefox web stores, and I'll post another comment here when they both pass review. Once ArchiveBox gets a more official API, I'll be glad to update the extension to support that instead of this weird hacky solution I've come up with, hah. But for now I think this is a decent solution. Awesome, really pumped to try this! Hopefully I'll have some time in the next few days.
Author
Owner

@pirate commented on GitHub (Jul 2, 2021):

@tjhorner have you tried using the existing POST /core/snapshot/add/ (archivebox/core/admin.py:382) endpoint to add new URLs? I believe the only potential blocker is the CSRF token requirement, which we can probably remove with a @csrf_exempt decorator on that view handler function.

Either way, I should have time to take a closer look in the upcoming weeks and help put whatever you need into ArchiveBox master to get this working.

As a side note, I pass on a subset of the donations that archivebox gets to dependencies we use and other crucial projects in the ecosystem. If one or more user-contributed extensions get reliable and feature-complete enough that we can make direct people to them in the README, I'd be happy to pass on some of our $ support to those projects! It's small amounts right now (<$100/mo) but hopefully as the project grows it will become more significant.

<!-- gh-comment-id:872623561 --> @pirate commented on GitHub (Jul 2, 2021): @tjhorner have you tried using the existing `POST /core/snapshot/add/` (`archivebox/core/admin.py:382`) endpoint to add new URLs? I believe the only potential blocker is the CSRF token requirement, which we can probably remove with a `@csrf_exempt` decorator on that view handler function. Either way, I should have time to take a closer look in the upcoming weeks and help put whatever you need into ArchiveBox `master` to get this working. As a side note, I pass on a subset of the donations that archivebox gets to dependencies we use and other crucial projects in the ecosystem. If one or more user-contributed extensions get reliable and feature-complete enough that we can make direct people to them in the README, I'd be happy to pass on some of our $ support to those projects! It's small amounts right now (<$100/mo) but hopefully as the project grows it will become more significant.
Author
Owner

@tjhorner commented on GitHub (Jul 2, 2021):

I believe the only potential blocker is the CSRF token requirement

Yep, I ran into that when trying to use that endpoint in my testing. I was thinking of how the extension would authenticate with ArchiveBox, and I decided on an API key would be the best solution. But I just did another test and it turns out since the extension has permission to access user data on their ArchiveBox instance, it will send the sessionid cookie along with the request, so as long the user is signed in and the session remains active (and since SESSION_SAVE_EVERY_REQUEST is set, it should automatically renew), then the extension should be authorized.

So, TL;DR: yep, it seems all that's needed is to exempt that view from CSRF, since authentication is shared with the browser session.

I decorated the API view in my branch with @method_decorator(csrf_exempt, name='dispatch') and it worked just fine. I'll decorate the existing /add path with that and see if the extension can successfully make requests to that.

<!-- gh-comment-id:872638045 --> @tjhorner commented on GitHub (Jul 2, 2021): > I believe the only potential blocker is the CSRF token requirement Yep, I ran into that when trying to use that endpoint in my testing. I was thinking of how the extension would authenticate with ArchiveBox, and I decided on an API key would be the best solution. But I just did another test and it turns out since the extension has permission to access user data on their ArchiveBox instance, it will send the `sessionid` cookie along with the request, so as long the user is signed in and the session remains active (and since `SESSION_SAVE_EVERY_REQUEST` is set, it should automatically renew), then the extension should be authorized. So, TL;DR: yep, it seems all that's needed is to exempt that view from CSRF, since authentication is shared with the browser session. I decorated the API view in my branch with `@method_decorator(csrf_exempt, name='dispatch')` and it worked just fine. I'll decorate the existing `/add` path with that and see if the extension can successfully make requests to that.
Author
Owner

@pirate commented on GitHub (Jul 2, 2021):

Ok, in the future we will likely have to build some infrastructure to authenticate the extension with ArchiveBox and issue it a dedicated bearer token key with CSRF-free endpoints (likely with a broader push towards building a real REST API). For now that should be ok though.

If you want to PR that decorator change you made against dev I can review and merge it into the 0.6.3 release candidate, though I cant promise that release will go out in the next couple weeks (I have a lot of travel and non-tech projects coming up). If it takes me any longer than 2 weeks then I can probably roll a micro-release with only your change and some other small bugfixes and save the other things on the 0.6.3 TODO list for later, as having this extension would be a huge usability win for many ArchiveBox users.

For anyone who wants to use this early, see instructions here on how run the ArchiveBox pre-release dev version on your machine:
https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch

<!-- gh-comment-id:872640967 --> @pirate commented on GitHub (Jul 2, 2021): Ok, in the future we will likely have to build some infrastructure to authenticate the extension with ArchiveBox and issue it a dedicated bearer token key with CSRF-free endpoints (likely with a broader push towards building a real REST API). For now that should be ok though. If you want to PR that decorator change you made against `dev` I can review and merge it into the [0.6.3 release candidate](https://github.com/ArchiveBox/ArchiveBox/pull/721), though I cant promise that release will go out in the next couple weeks (I have a lot of travel and non-tech projects coming up). If it takes me any longer than 2 weeks then I can probably roll a micro-release with only your change and some other small bugfixes and save the other things on the [0.6.3 TODO list](https://github.com/ArchiveBox/ArchiveBox/pull/721) for later, as having this extension would be a huge usability win for many ArchiveBox users. For anyone who wants to use this early, see instructions here on how run the ArchiveBox pre-release `dev` version on your machine: https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch
Author
Owner

@tjhorner commented on GitHub (Jul 2, 2021):

Just added the CSRF exempt decorator to AddView in this branch. I modified the extension to use that route and it works like a charm! I'll submit a PR with that change against dev. In the meantime I'll update the extension setup instructions and push an update to the Chrome/FF stores with this change.

<!-- gh-comment-id:872642358 --> @tjhorner commented on GitHub (Jul 2, 2021): Just added the CSRF exempt decorator to `AddView` in [this branch](https://github.com/tjhorner/ArchiveBox/tree/exempt-add-from-csrf). I modified the extension to use that route and it works like a charm! I'll submit a PR with that change against `dev`. In the meantime I'll update the extension setup instructions and push an update to the Chrome/FF stores with this change.
Author
Owner

@tjhorner commented on GitHub (Jul 2, 2021):

The extension's now published on the Chrome and FF webstores! Give it a try and let me know what you think. Make sure you're running the dev branch of ArchiveBox (instructions here).

Bug reports and feature requests welcome, just make a new issue on the repo: https://github.com/tjhorner/archivebox-exporter/issues

<!-- gh-comment-id:872915877 --> @tjhorner commented on GitHub (Jul 2, 2021): The extension's now published on the Chrome and FF webstores! Give it a try and let me know what you think. Make sure you're running the `dev` branch of ArchiveBox ([instructions here](https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch)). - [Chrome/Edge/Chromium-based](https://chrome.google.com/webstore/detail/archivebox-exporter/habonpimjphpdnmcfkaockjnffodikoj) - [Firefox](https://addons.mozilla.org/en-US/firefox/addon/archivebox-exporter/) Bug reports and feature requests welcome, just make a new issue on the repo: https://github.com/tjhorner/archivebox-exporter/issues
Author
Owner

@voarsh2 commented on GitHub (Jul 2, 2021):

Quick question, when I use docker to build from "dev" branch, am I actually building from this branch: https://github.com/tjhorner/archivebox/tree/temporary-add-api?
E.G: docker build -t archivebox:dev https://github.com/tjhorner/ArchiveBox.git#temporary-add-api

<!-- gh-comment-id:872951886 --> @voarsh2 commented on GitHub (Jul 2, 2021): Quick question, when I use docker to build from "dev" branch, am I actually building from this branch: https://github.com/tjhorner/archivebox/tree/temporary-add-api? E.G: docker build -t archivebox:dev https://github.com/tjhorner/ArchiveBox.git#temporary-add-api
Author
Owner

@tjhorner commented on GitHub (Jul 2, 2021):

@voarsh2 No, you should be building from ArchiveBox/ArchiveBox#dev. I updated the instructions on the wiki to reflect that. If there are other places it needs to be updated let me know :)

Edit: also make sure you have the latest version of the extension. It should be 1.2.0

<!-- gh-comment-id:872958272 --> @tjhorner commented on GitHub (Jul 2, 2021): @voarsh2 No, you should be building from `ArchiveBox/ArchiveBox#dev`. I updated the instructions on the wiki to reflect that. If there are other places it needs to be updated let me know :) Edit: also make sure you have the latest version of the extension. It should be 1.2.0
Author
Owner

@voarsh2 commented on GitHub (Jul 2, 2021):

No, you should be building from ArchiveBox/ArchiveBox#dev. I updated the instructions on the wiki to reflect that. If there are other places it needs to be updated let me know :)

Ah okay, I also thought my way above made sense since it's not in the ArchiveBox project yet....

so: docker build -t archivebox:dev https://github.com/ArchiveBox/ArchiveBox.git#dev ?
If I am pulling from the official repo, how are your changes from your repo applied exactly? I assume I'm missing something....

<!-- gh-comment-id:872959775 --> @voarsh2 commented on GitHub (Jul 2, 2021): > No, you should be building from `ArchiveBox/ArchiveBox#dev`. I updated the instructions on the wiki to reflect that. If there are other places it needs to be updated let me know :) Ah okay, I also thought my way above made sense since it's not in the ArchiveBox project yet.... so: docker build -t archivebox:dev https://github.com/ArchiveBox/ArchiveBox.git#dev ? If I am pulling from the official repo, how are your changes from your repo applied exactly? I assume I'm missing something....
Author
Owner

@tjhorner commented on GitHub (Jul 2, 2021):

I ended up going a different route by utilizing the existing /add/ endpoint, just disabling CSRF checks there. I submitted a PR earlier (#777) and it's now in dev here. It's a short term solution but it works for now. Once there's a fully fleshed out REST API with proper authorization and stuff, the extension will move to that.

In the very earliest version of the extension you would have needed to build from my fork, yes, but no longer.

edit: if you have any further questions please ask them in the discussions section of the repo; I don't want to clutter this issue too much 😅

<!-- gh-comment-id:872961973 --> @tjhorner commented on GitHub (Jul 2, 2021): I ended up going a different route by utilizing the existing `/add/` endpoint, just disabling CSRF checks there. I submitted a PR earlier (#777) and it's now in `dev` here. It's a short term solution but it works for now. Once there's a fully fleshed out REST API with proper authorization and stuff, the extension will move to that. In the very earliest version of the extension you would have needed to build from my fork, yes, but no longer. edit: if you have any further questions please ask them in the [discussions section](https://github.com/tjhorner/archivebox-exporter/discussions) of the repo; I don't want to clutter this issue too much 😅
Author
Owner

@pirate commented on GitHub (Jul 17, 2021):

One thing I'd like to do is push extension users away from "archive every page I visit" by default. Archives rapidly lose value that way, and people will end up just disabling the tool or deleting large swaths of their archive if thats the default for long periods of time. One-click archiving using a button in the navbar is always better than saving all browser history by default, curation is really important and the archives will hold both more value on a decades and centuries timescale if they are limited to pages deemed worthy of saving.

I'm not proposing removing the "all history" feature, just not making it the default, because despite what people think initially, it's really not a great idea long-term to save everything you visit.

https://youtu.be/7eoz_EU6-wQ?t=1387

<!-- gh-comment-id:881973687 --> @pirate commented on GitHub (Jul 17, 2021): One thing I'd like to do is push extension users away from "archive every page I visit" by default. Archives rapidly lose value that way, and people will end up just disabling the tool or deleting large swaths of their archive if thats the default for long periods of time. One-click archiving using a button in the navbar is always better than saving all browser history by default, curation is really important and the archives will hold both more value on a decades and centuries timescale if they are limited to pages deemed worthy of saving. I'm not proposing removing the "all history" feature, just not making it the default, because despite what people think initially, it's really not a great idea long-term to save *everything* you visit. https://youtu.be/7eoz_EU6-wQ?t=1387
Author
Owner

@voarsh2 commented on GitHub (Jul 17, 2021):

I'm not proposing removing the "all history" feature, just not making it the default, because despite what people think initially, it's really not a great idea long-term to save everything you visit.

I think, it's clear archive all is not on unless you make it so......

I will tag browsing history as an inbox to sort later....

<!-- gh-comment-id:881973836 --> @voarsh2 commented on GitHub (Jul 17, 2021): > I'm not proposing removing the "all history" feature, just not making it the default, because despite what people think initially, it's really not a great idea long-term to save _everything_ you visit. I think, it's clear archive all is not on unless you make it so...... I will tag browsing history as an inbox to sort later....
Author
Owner

@pirate commented on GitHub (Jul 17, 2021):

Yes, that is the case for @tjhorner's extension right now, but there are comments on reddit asking to make it the default, so I'm linking those people here for an explanation. I also want to stress it here for the other people developing extensions, there are 3 in the works right now last I counted.

<!-- gh-comment-id:881973948 --> @pirate commented on GitHub (Jul 17, 2021): Yes, that is the case for @tjhorner's extension right now, but there are comments on reddit asking to make it the default, so I'm linking those people here for an explanation. I also want to stress it here for the other people developing extensions, there are 3 in the works right now last I counted.
Author
Owner

@mAAdhaTTah commented on GitHub (Jul 18, 2021):

@pirate If there are extensions in the works, would it be worth picking on the REST API? Is that ready to start or
should we wait until the worker rearch w/ Huey is done?

<!-- gh-comment-id:881979727 --> @mAAdhaTTah commented on GitHub (Jul 18, 2021): @pirate If there are extensions in the works, would it be worth picking on the REST API? Is that ready to start or should we wait until the worker rearch w/ Huey is done?
Author
Owner

@pirate commented on GitHub (Jul 19, 2021):

I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. Maybe just these things to start:

  • /api/core/snapshot/ GET, POST, PUT
  • /api/core/snapshot/<id> GET, PATCH, DELETE
  • /api/core/archiveresult/ GET, POST
  • /api/core/archiveresult/<id> GET, PATCH, DELETE
  • /api/core/tag/ GET, POST, PUT
  • /api/core/tag/<id> GET, PATCH, DELETE

and this bonus escape hatch endpoint to do everything else not possible with the above ^:

  • /api/cli/<command> POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args)
    e.g. /api/cli/add POST {urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}
    or /api/cli/schedule POST {urls: 'https://example.com', depth: 1, every: 'day', ...}

I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way.

<!-- gh-comment-id:882185088 --> @pirate commented on GitHub (Jul 19, 2021): I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. Maybe just these things to start: - `/api/core/snapshot/` GET, POST, PUT - `/api/core/snapshot/<id>` GET, PATCH, DELETE - `/api/core/archiveresult/` GET, POST - `/api/core/archiveresult/<id>` GET, PATCH, DELETE - `/api/core/tag/` GET, POST, PUT - `/api/core/tag/<id>` GET, PATCH, DELETE and this bonus escape hatch endpoint to do everything else not possible with the above ^: - `/api/cli/<command>` POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args) e.g. `/api/cli/add` POST `{urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}` or `/api/cli/schedule` POST `{urls: 'https://example.com', depth: 1, every: 'day', ...}` I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way. - https://fastapi.tiangolo.com/features/ - https://www.stavros.io/posts/fastapi-with-django/ - https://fastapi.tiangolo.com/advanced/wsgi/
Author
Owner

@adamwolf commented on GitHub (Jul 19, 2021):

I haven't been in the Archivebox codebase for a while, but Django Ninja
does a pretty good job of doing type hint driven APIs in Django!

On Sun, Jul 18, 2021, 9:28 PM Nick Sweeting @.***>
wrote:

I think a minimal API can be worked on before the Huey refactor, as the
user-facing API is going to be relatively stable even with the change to
the internals. Maybe just these things to start:

  • /api/core/snapshot/ GET, POST
  • /api/core/snapshot/ GET, PATCH, DELETE
  • /api/core/archiveresult/ GET, POST
  • /api/core/archiveresult/ GET, PATCH, DELETE
  • /api/core/tag/ GET, POST
  • /api/core/tag/ GET, PATCH, DELETE

and this bonus escape hatch endpoint to do everything else not possible
with the above ^:

  • /api/cli/ POST (simulate running any archivebox CLI command
    with a given dict of args and kwargs to populate the CLI flags and args)

I'm leaning towards using FastAPI for the API instead of DRF. I like the
patterns better but I could be convinced either way.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-882185088,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAIWYPBRK3C7UYPNTWLDPLTYOETPANCNFSM4UT4KYLQ
.

<!-- gh-comment-id:882221583 --> @adamwolf commented on GitHub (Jul 19, 2021): I haven't been in the Archivebox codebase for a while, but Django Ninja does a pretty good job of doing type hint driven APIs in Django! On Sun, Jul 18, 2021, 9:28 PM Nick Sweeting ***@***.***> wrote: > I think a minimal API can be worked on before the Huey refactor, as the > user-facing API is going to be relatively stable even with the change to > the internals. Maybe just these things to start: > > - /api/core/snapshot/ GET, POST > - /api/core/snapshot/<id> GET, PATCH, DELETE > - /api/core/archiveresult/ GET, POST > - /api/core/archiveresult/<id> GET, PATCH, DELETE > - /api/core/tag/ GET, POST > - /api/core/tag/<id> GET, PATCH, DELETE > > and this bonus escape hatch endpoint to do everything else not possible > with the above ^: > > - /api/cli/<command> POST (simulate running any archivebox CLI command > with a given dict of args and kwargs to populate the CLI flags and args) > > I'm leaning towards using FastAPI for the API instead of DRF. I like the > patterns better but I could be convinced either way. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-882185088>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAAIWYPBRK3C7UYPNTWLDPLTYOETPANCNFSM4UT4KYLQ> > . >
Author
Owner

@mAAdhaTTah commented on GitHub (Jul 19, 2021):

I am using FastAPI on a side project and like it a lot but I think the integration with the way Archivebox loads Django will be complicated. Django Ninja appears to have a lot of the same trappings as FastAPI, so I'd be inclined to go with that rather than try to shoehorn FastAPI into the current Django integration.

I would be willing to work on this too–I'm trying to consume ArchiveBox for displaying my reading on my site and pulling it from the SQLite file directly is turning out to be a bit annoying.

<!-- gh-comment-id:882528889 --> @mAAdhaTTah commented on GitHub (Jul 19, 2021): I am using FastAPI on a side project and like it a lot but I think the integration with the way Archivebox loads Django will be complicated. Django Ninja appears to have a lot of the same trappings as FastAPI, so I'd be inclined to go with that rather than try to shoehorn FastAPI into the current Django integration. I would be willing to work on this too–I'm trying to consume ArchiveBox for displaying my reading on my site and pulling it from the SQLite file directly is turning out to be a bit annoying.
Author
Owner

@brunocek commented on GitHub (Jan 5, 2024):

I face this challenge (ios and firefox user), and as of now, I am actively working on a solution, please contribute:

( https://codeberg.org/brunoschroeder/archivebox-proxy )

I have a different architectural approach that has simplification as a pro. The solution is an archivebox proxy, to be deployed on the same server as the archivebox instance. For now, the proxy will call the CLI.

The proxy will be configured with a regex list of what to archive (or configured to archive all except what's on a regex list).

The proxy will provide a url to be used as prefix to meet requirement b (submits the current open URL in the current tab (only the current tab) to my archivebox and archivebox saves it right away) - I don't care about buttons and I am mainly focused on ios (ios does not allow firefox extensions).

The config list will carry for each regex:

  • tags to be applied
  • how often that link should be archived

I invite all interested to help me on codeberg opening issues for opinion contribution. I will be documenting the architectural decisions there.

current workflow:
Currently on ios, for each tab:

  1. I hit share, and share it to iMarkdown or Obsidian
  2. Obsidian asks me which file to append to - I have one file per tag/subject
  3. ios appends the url there (but sometimes it appends the page title and I must re-do)
  4. I must close the tab

When I have the proxy, I can forget about this pain.

On the desktop I am using BrowseLatter plugin (https://addons.mozilla.org/en-US/firefox/addon/browselater/), which has the convenience of closing the tab for me and a button for copy all. From there I paste on vi.

<!-- gh-comment-id:1879061266 --> @brunocek commented on GitHub (Jan 5, 2024): I face this challenge (ios and firefox user), and as of now, I am actively working on a solution, please contribute: ( https://codeberg.org/brunoschroeder/archivebox-proxy ) I have a different architectural approach that has simplification as a pro. The solution is an archivebox proxy, to be deployed on the same server as the archivebox instance. For now, the proxy will call the CLI. The proxy will be configured with a regex list of what to archive (or configured to archive all except what's on a regex list). The proxy will provide a url to be used as prefix to meet requirement b (submits the current open URL in the current tab (only the current tab) to my archivebox and archivebox saves it right away) - I don't care about buttons and I am mainly focused on ios (ios does not allow firefox extensions). The config list will carry for each regex: - tags to be applied - how often that link should be archived I invite all interested to help me on codeberg opening issues for opinion contribution. I will be documenting the architectural decisions there. **current workflow:** Currently on ios, for each tab: 1. I hit share, and share it to iMarkdown or Obsidian 1. Obsidian asks me which file to append to - I have one file per tag/subject 1. ios appends the url there (but sometimes it appends the page title and I must re-do) 1. I must close the tab When I have the proxy, I can forget about this pain. On the desktop I am using BrowseLatter plugin (https://addons.mozilla.org/en-US/firefox/addon/browselater/), which has the convenience of closing the tab for me and a button for copy all. From there I paste on vi.
Author
Owner

@brunocek commented on GitHub (Jan 20, 2024):

Folks, this is done now. The repository has a working proxy for ArchiveBox.

May we please mention it on the documentation? How should we proceed?

<!-- gh-comment-id:1902243406 --> @brunocek commented on GitHub (Jan 20, 2024): Folks, this is done now. The repository has a working proxy for ArchiveBox. May we please mention it on the documentation? How should we proceed?
Author
Owner

@pirate commented on GitHub (Jan 23, 2024):

Great work @brunocek, thanks for building this! I added it to our README here: https://github.com/ArchiveBox/ArchiveBox/blob/main/README.md#input-formats (github.com/ArchiveBox/ArchiveBox@5bdcbaeebd)

If you're interested, I'd also be willing to move this repository under the official ArchiveBox github org github.com/ArchiveBox. You'd have admin control over it still and be able to make any changes you want, but I can also help respond to support requests and integrate it more as an official ArchiveBox solution when proxy archiving is needed.

If not, no worries, happy to keep it separate and just link to it from our docs/README/tickets/etc.

<!-- gh-comment-id:1906696679 --> @pirate commented on GitHub (Jan 23, 2024): Great work @brunocek, thanks for building this! I added it to our README here: https://github.com/ArchiveBox/ArchiveBox/blob/main/README.md#input-formats (https://github.com/ArchiveBox/ArchiveBox/commit/5bdcbaeebdfeef1c293c8aba5895388bcb3e9cd1) If you're interested, I'd also be willing to move this repository under the official ArchiveBox github org `github.com/ArchiveBox`. You'd have admin control over it still and be able to make any changes you want, but I can also help respond to support requests and integrate it more as an official ArchiveBox solution when proxy archiving is needed. If not, no worries, happy to keep it separate and just link to it from our docs/README/tickets/etc.
Author
Owner

@brunocek commented on GitHub (Jan 23, 2024):

Hello.
Thank you. Yes you may move the code to your repo. I will help with anything there as well as in ArchiveBox. Honoured to be a maintainer of good Python free software.

Kind regards,

Bruno

<!-- gh-comment-id:1906763723 --> @brunocek commented on GitHub (Jan 23, 2024): Hello. Thank you. Yes you may move the code to your repo. I will help with anything there as well as in ArchiveBox. Honoured to be a maintainer of good Python free software. Kind regards, Bruno
Author
Owner

@pirate commented on GitHub (Jan 23, 2024):

@brunocek I've imported it to https://github.com/ArchiveBox/archivebox-proxy and added you as a maintainer/owner of that repo on Github. Thanks again!

<!-- gh-comment-id:1906943341 --> @pirate commented on GitHub (Jan 23, 2024): @brunocek I've imported it to https://github.com/ArchiveBox/archivebox-proxy and added you as a maintainer/owner of that repo on Github. Thanks again!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3382
No description provided.