[GH-ISSUE #496] Question: what's the current status of the REST API? #1833

Open
opened 2026-03-01 17:54:04 +03:00 by kerem · 37 comments
Owner

Originally created by @zblesk on GitHub (Oct 1, 2020).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/496

I'd like to add new pages by sending an HTTP request to an endpoint. I saw it mentioned in issues such as #339 and items linked in that thread.

There seemed to be commit names that mentioned adding a REST API, but I haven't been able to find whether those are already implemented and released.

Are they? If so, how do I call them?

I've tried just capturing a request to the "Add" method when I click it in the browser, but it looks like there is some csrf protection, so I can't just copy-paste some bearer token and re-issue requests. I'm asking here before I spend time reverse-engineering something just because I missed an already existing API. :)

Originally created by @zblesk on GitHub (Oct 1, 2020). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/496 I'd like to add new pages by sending an HTTP request to an endpoint. I saw it mentioned in issues such as #339 and items linked in that thread. There seemed to be commit names that mentioned adding a REST API, but I haven't been able to find whether those are already implemented and released. Are they? If so, how do I call them? I've tried just capturing a request to the "Add" method when I click it in the browser, but it looks like there is some csrf protection, so I can't just copy-paste some bearer token and re-issue requests. I'm asking here before I spend time reverse-engineering something just because I missed an already existing API. :)
Author
Owner

@pirate commented on GitHub (Oct 2, 2020):

The current status of the API is "unstable" I'd say. Reverse engineering the UI is the way to go for now, but we have plans to stabilize it more in future versions and split out a proper API with django-rest-framework or something so that external tools don't have to shoehorn their needs into requests used by the UI.


Edit as of v0.8.0 (2024-05): The new REST API is now available! ⬇️

<!-- gh-comment-id:702916535 --> @pirate commented on GitHub (Oct 2, 2020): ~~The current status of the API is "unstable" I'd say. Reverse engineering the UI is the way to go for now, but we have plans to stabilize it more in future versions and split out a proper API with django-rest-framework or something so that external tools don't have to shoehorn their needs into requests used by the UI.~~ --- ## ✨ Edit as of [v0.8.0](https://github.com/ArchiveBox/ArchiveBox/releases/tag/v0.8.0-rc) (2024-05): The [new REST API](https://github.com/ArchiveBox/ArchiveBox/issues/496#issuecomment-2080174235) is now available! ⬇️
Author
Owner

@mAAdhaTTah commented on GitHub (Oct 19, 2020):

@pirate I would be interested in working on this. I shot you an email a week or so ago cuz I think the underlying data model needs to be solidified and would love to help move this along. Let me know how I can help.

<!-- gh-comment-id:712381014 --> @mAAdhaTTah commented on GitHub (Oct 19, 2020): @pirate I would be interested in working on this. I shot you an email a week or so ago cuz I think the underlying data model needs to be solidified and would love to help move this along. Let me know how I can help.
Author
Owner

@cdvv7788 commented on GitHub (Oct 19, 2020):

@mAAdhaTTah that is great! Currently, on master, we have the sqlite database working. We can now start working with django-rest-framework to enable a proper API (Like @pirate mentioned).
What are the issues that you are finding with the data model? Something that needs to be improved? We can start the discussion here, so we can all have the proper context, and find a way to get started soon.

<!-- gh-comment-id:712383483 --> @cdvv7788 commented on GitHub (Oct 19, 2020): @mAAdhaTTah that is great! Currently, on master, we have the sqlite database working. We can now start working with `django-rest-framework` to enable a proper API (Like @pirate mentioned). What are the issues that you are finding with the data model? Something that needs to be improved? We can start the discussion here, so we can all have the proper context, and find a way to get started soon.
Author
Owner

@mAAdhaTTah commented on GitHub (Oct 19, 2020):

@cdvv7788 Generally, I think the split/transformation between Link <-> Snapshot is a bit weird. Snapshot seems to be db-only (it's transformed into Link's as it's fetched out of the db for most of the operations I was looking at). I also think the double duty of timestamp being "the time it was bookmarked" as well as "the path in the archive" is a bit of an issue. From my email:

I believe you're currently looking to move from timestamp -> sha for the Snapshots and their relationship to the on-disk archive. If we want to eventually allow multiple snapshots per link (to avoid the hash hack), reifiying the Link model into the database and making the Snapshot a single download of a Link seems like a good way to do it. Part of the benefit for me for moving away from timestamps is I want to track when an article was read so I can group them by read day, and manipulating the timestamp for this seems a bit fragile if it can break the relationship to the archive. Having added, updated, etc. properties for that purpose seems a lot clearer.

(For context, I'd like to use ArchiveBox as a reading list, which I would then pull into my website, hence needing a REST API to pull that from. That's the reference to the "benefit for me" line.)

<!-- gh-comment-id:712394614 --> @mAAdhaTTah commented on GitHub (Oct 19, 2020): @cdvv7788 Generally, I think the split/transformation between Link <-> Snapshot is a bit weird. Snapshot seems to be db-only (it's transformed into Link's as it's fetched out of the db for most of the operations I was looking at). I also think the double duty of `timestamp` being "the time it was bookmarked" as well as "the path in the archive" is a bit of an issue. From my email: > I believe you're currently looking to move from timestamp -> sha for the Snapshots and their relationship to the on-disk archive. If we want to eventually allow multiple snapshots per link (to avoid the hash hack), reifiying the Link model into the database and making the Snapshot a single download of a Link seems like a good way to do it. Part of the benefit for me for moving away from timestamps is I want to track when an article was read so I can group them by read day, and manipulating the timestamp for this seems a bit fragile if it can break the relationship to the archive. Having `added`, `updated`, etc. properties for that purpose seems a lot clearer. (For context, I'd like to use ArchiveBox as a reading list, which I would then pull into my website, hence needing a REST API to pull that from. That's the reference to the "benefit for me" line.)
Author
Owner

@cdvv7788 commented on GitHub (Oct 19, 2020):

@mAAdhaTTah We have discussed those topics before. I think that @pirate has some progress on the timestamp issue, and it will be changed once we come up with a good solution.
The Link <-> Snapshot stuff is a leftover of the recent migration. In the latest release (v4.x), Link was generated from the index.json, and Snapshot was updated on a best effort basis. After the refactor, this has changed, and we definitely want to get rid of this relationship, leaving everything directly in Snapshot if possible. Supporting multiple snapshots for the same url is not supported at this moment, but after we remove the dependency on the Link schema, it should not be hard to add if we decide to go that way.
The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own. We need to find a way to circumvent that (@pirate do you know if this is possible?) or we need to get more creative initializing django. Some research on this specific topic would be of great help (this is something in our short term objectives).

<!-- gh-comment-id:712402943 --> @cdvv7788 commented on GitHub (Oct 19, 2020): @mAAdhaTTah We have discussed those topics before. I think that @pirate has some progress on the timestamp issue, and it will be changed once we come up with a good solution. The Link <-> Snapshot stuff is a leftover of the recent migration. In the latest release (v4.x), Link was generated from the `index.json`, and Snapshot was updated on a best effort basis. After the refactor, this has changed, and we definitely want to get rid of this relationship, leaving everything directly in Snapshot if possible. Supporting multiple snapshots for the same url is not supported at this moment, but after we remove the dependency on the Link schema, it should not be hard to add if we decide to go that way. The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own. We need to find a way to circumvent that (@pirate do you know if this is possible?) or we need to get more creative initializing django. Some research on this specific topic would be of great help (this is something in our short term objectives).
Author
Owner

@mAAdhaTTah commented on GitHub (Oct 19, 2020):

Supporting multiple snapshots for the same url is not supported at this moment, but after we remove the dependency on the Link schema, it should not be hard to add if we decide to go that way.

So my thinking/proposal is to actually remove the Link schema, migrate what is currently considered a Snapshot to be a Link instead (mostly as a naming convention change), then add Snapshot that represents a single download of a website. Based on your explanation, I think we'd need to include a migration in v0.5 that migrates the index.json into the db, then once we're solely dependent on the db, performing the above migrations, splitting the existing Snapshot into 2 models: Snapshot & Link, with a one-to-many relationship (plus whatever UI updates are needed to account for this).

Does that make sense? Happy to elaborate and/or provide some code to explain.

The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own.

Not sure I understand this. Could you provide some background here?

<!-- gh-comment-id:712415602 --> @mAAdhaTTah commented on GitHub (Oct 19, 2020): > Supporting multiple snapshots for the same url is not supported at this moment, but after we remove the dependency on the Link schema, it should not be hard to add if we decide to go that way. So my thinking/proposal is to actually remove the `Link` schema, migrate what is currently considered a `Snapshot` to be a `Link` instead (mostly as a naming convention change), then _add_ ` Snapshot` that represents a single download of a website. Based on your explanation, I think we'd need to include a migration in v0.5 that migrates the index.json into the db, then once we're solely dependent on the db, performing the above migrations, splitting the existing `Snapshot` into 2 models: `Snapshot` & `Link`, with a one-to-many relationship (plus whatever UI updates are needed to account for this). Does that make sense? Happy to elaborate and/or provide some code to explain. > The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own. Not sure I understand this. Could you provide some background here?
Author
Owner

@cdvv7788 commented on GitHub (Oct 19, 2020):

So my thinking/proposal is to actually remove the Link schema, migrate what is currently considered a Snapshot to be a Link instead (mostly as a naming convention change), then add Snapshot that represents a single download of a website. Based on your explanation, I think we'd need to include a migration in v0.5 that migrates the index.json into the db, then once we're solely dependent on the db, performing the above migrations, splitting the existing Snapshot into 2 models: Snapshot & Link, with a one-to-many relationship (plus whatever UI updates are needed to account for this).

At this moment we only have the means to represent a single download per website. I understand what you propose, and that does make sense. At this point we already migrated the index.json into the sqlite database. In fact, if you check https://github.com/pirate/ArchiveBox/pull/502, we are already removing the automatic generation of those indexes completely. This, however, cannot be done without first solving the other issue, which takes me to:

The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own.

Snapshot is a django model. We cannot use that model in a place where django has not been initialized yet. If you try to do that, it will complain because the module will try to use some django internal stuff. This is the only reason we have not gotten rid of Link as we know it. I am going to spend some time figuring alternatives to make Snapshot usable in the whole application. You are welcome to help us pursue this. As I mentioned earlier, this is a blocker, and the other stuff cannot be worked until it is not resolved (The REST API could actually be implemented, but once we fix this, we would need to refactor it in a big way...I think it is better to solve this layer first).

<!-- gh-comment-id:712421751 --> @cdvv7788 commented on GitHub (Oct 19, 2020): > So my thinking/proposal is to actually remove the Link schema, migrate what is currently considered a Snapshot to be a Link instead (mostly as a naming convention change), then add Snapshot that represents a single download of a website. Based on your explanation, I think we'd need to include a migration in v0.5 that migrates the index.json into the db, then once we're solely dependent on the db, performing the above migrations, splitting the existing Snapshot into 2 models: Snapshot & Link, with a one-to-many relationship (plus whatever UI updates are needed to account for this). At this moment we only have the means to represent a single download per website. I understand what you propose, and that does make sense. At this point we already migrated the `index.json` into the sqlite database. In fact, if you check https://github.com/pirate/ArchiveBox/pull/502, we are already removing the automatic generation of those indexes completely. This, however, cannot be done without first solving the other issue, which takes me to: > The main blocker at this moment is that Snapshot requires django, so it cannot be used on it's own. Snapshot is a django model. We cannot use that model in a place where django has not been initialized yet. If you try to do that, it will complain because the module will try to use some django internal stuff. This is the only reason we have not gotten rid of Link as we know it. I am going to spend some time figuring alternatives to make Snapshot usable in the whole application. You are welcome to help us pursue this. As I mentioned earlier, this is a blocker, and the other stuff cannot be worked until it is not resolved (The REST API could actually be implemented, but once we fix this, we would need to refactor it in a big way...I think it is better to solve this layer first).
Author
Owner

@mAAdhaTTah commented on GitHub (Oct 19, 2020):

We cannot use that model in a place where django has not been initialized yet.

All of this makes sense so far. I can do some investigating and see what I can come up with. Just to clarify, when you say "use that model", is that "interacting with it" or is importing it enough to make it fail?

<!-- gh-comment-id:712425314 --> @mAAdhaTTah commented on GitHub (Oct 19, 2020): > We cannot use that model in a place where django has not been initialized yet. All of this makes sense so far. I can do some investigating and see what I can come up with. Just to clarify, when you say "use that model", is that "interacting with it" or is importing it enough to make it fail?
Author
Owner

@cdvv7788 commented on GitHub (Oct 19, 2020):

Importing it is enough to make it fail. There is a method that you will find around named django_setup which initializes what is required.

<!-- gh-comment-id:712431217 --> @cdvv7788 commented on GitHub (Oct 19, 2020): Importing it is enough to make it fail. There is a method that you will find around named `django_setup` which initializes what is required.
Author
Owner

@pirate commented on GitHub (Oct 21, 2020):

I don't believe we need Link or Snapshot anywhere that Django is not initialized, so that is a non-issue. If you're worried about oneshot I have an idea to fix that (we can discuss more in Zulip).

<!-- gh-comment-id:713885304 --> @pirate commented on GitHub (Oct 21, 2020): I don't believe we need Link or Snapshot anywhere that Django is not initialized, so that is a non-issue. If you're worried about `oneshot` I have an idea to fix that (we can discuss more in Zulip).
Author
Owner

@mAAdhaTTah commented on GitHub (Oct 23, 2020):

@pirate Does that change if the idea is to turn Link & Snapshot into db models?

<!-- gh-comment-id:715339816 --> @mAAdhaTTah commented on GitHub (Oct 23, 2020): @pirate Does that change if the idea is to turn Link & Snapshot into db models?
Author
Owner

@zblesk commented on GitHub (Sep 14, 2021):

Hello!
I see there's been some progress here.
What's the current status? Is the api available yet?

One of the linked tasks seems to mention it's available in 'dev' - is that an available docker tag?
Is it safe to use?
To be more specific: I understand the API is still in alpha, and I can accept that. However, I don't understand what else can be unstable in dev - I don't want to risk my instance and my data.

Thank you!

<!-- gh-comment-id:919475324 --> @zblesk commented on GitHub (Sep 14, 2021): Hello! I see there's been some progress here. What's the current status? Is the api available yet? One of the linked tasks seems to mention it's available in 'dev' - is that an available docker tag? Is it safe to use? To be more specific: I understand the API is still in alpha, and I can accept that. However, I don't understand what else can be unstable in dev - I don't want to risk my instance and my data. Thank you!
Author
Owner

@mAAdhaTTah commented on GitHub (Sep 14, 2021):

I have not made any additional progress since opening my PR here: https://github.com/ArchiveBox/ArchiveBox/pull/529 I don't think we will be continuing down that path, as we were considering using Django Ninja instead of DRF as well. Eventually, I'd like to pick this back up again but haven't had the time.

<!-- gh-comment-id:919520509 --> @mAAdhaTTah commented on GitHub (Sep 14, 2021): I have not made any additional progress since opening my PR here: https://github.com/ArchiveBox/ArchiveBox/pull/529 I don't think we will be continuing down that path, as we were considering using Django Ninja instead of DRF as well. Eventually, I'd like to pick this back up again but haven't had the time.
Author
Owner

@pirate commented on GitHub (Apr 12, 2022):

Copying over my earlier message here from the API discussion related to the ArchiveBox browser extension https://github.com/ArchiveBox/ArchiveBox/issues/577:

I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. These endpoints are already partially available through the Django Admin:

  • /add GET,POST (CSRF excempt, usable as an API from external origins and is used by the browser extension)
  • /api/core/snapshot/ GET, POST, PUT
  • /api/core/snapshot/<id> GET, PATCH, DELETE
  • /api/core/archiveresult/ GET, POST
  • /api/core/archiveresult/<id> GET, PATCH, DELETE
  • /api/core/tag/ GET, POST, PUT
  • /api/core/tag/<id> GET, PATCH, DELETE

and this bonus escape hatch endpoint is planned to be added to do everything else not possible with the above ^:

  • /api/cli/<command> POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args)
    e.g. /api/cli/add POST {urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}
    or /api/cli/schedule POST {urls: 'https://example.com', depth: 1, every: 'day', ...}

I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way.

<!-- gh-comment-id:1097310582 --> @pirate commented on GitHub (Apr 12, 2022): Copying over my earlier message here from the API discussion related to the ArchiveBox browser extension https://github.com/ArchiveBox/ArchiveBox/issues/577: I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. These endpoints are already partially available through the Django Admin: - `/add` GET,POST ([CSRF excempt](https://github.com/ArchiveBox/ArchiveBox/pull/777), usable as an API from external origins and is used by the [browser extension](https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-872961973)) - `/api/core/snapshot/` GET, POST, PUT - `/api/core/snapshot/<id>` GET, PATCH, DELETE - `/api/core/archiveresult/` GET, POST - `/api/core/archiveresult/<id>` GET, PATCH, DELETE - `/api/core/tag/` GET, POST, PUT - `/api/core/tag/<id>` GET, PATCH, DELETE and this bonus escape hatch endpoint is planned to be added to do everything else not possible with the above ^: - `/api/cli/<command>` POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args) e.g. `/api/cli/add` POST `{urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}` or `/api/cli/schedule` POST `{urls: 'https://example.com', depth: 1, every: 'day', ...}` I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way. - https://fastapi.tiangolo.com/features/ - https://www.stavros.io/posts/fastapi-with-django/ - https://fastapi.tiangolo.com/advanced/wsgi/
Author
Owner

@zblesk commented on GitHub (Apr 23, 2022):

Thanks for the update. Looking forward to this.

Though I'm not sure I read those correctly. For instance, what is the difference between a GET and a POST to /add?
Will it support adding many links at once, as well?

And which endpoint should be used for 'return the archive URL for this input URL, if it exists'?

<!-- gh-comment-id:1107459568 --> @zblesk commented on GitHub (Apr 23, 2022): Thanks for the update. Looking forward to this. Though I'm not sure I read those correctly. For instance, what is the difference between a GET and a POST to `/add`? Will it support adding many links at once, as well? And which endpoint should be used for 'return the archive URL for this input URL, if it exists'?
Author
Owner

@djkemmet commented on GitHub (Sep 12, 2022):

@pirate hey there are you still working on this / need help? I'm thinking this is possibly something I could put together with FastAPI and the CLI hopefully next weekend. let me know! cheers

Copying over my earlier message here from the API discussion related to the ArchiveBox browser extension #577:

I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. These endpoints are already partially available through the Django Admin:

  • /add GET,POST (CSRF excempt, usable as an API from external origins and is used by the browser extension)
  • /api/core/snapshot/ GET, POST, PUT
  • /api/core/snapshot/<id> GET, PATCH, DELETE
  • /api/core/archiveresult/ GET, POST
  • /api/core/archiveresult/<id> GET, PATCH, DELETE
  • /api/core/tag/ GET, POST, PUT
  • /api/core/tag/<id> GET, PATCH, DELETE

and this bonus escape hatch endpoint is planned to be added to do everything else not possible with the above ^:

  • /api/cli/<command> POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args)
    e.g. /api/cli/add POST {urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}
    or /api/cli/schedule POST {urls: 'https://example.com', depth: 1, every: 'day', ...}

I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way.

<!-- gh-comment-id:1243080972 --> @djkemmet commented on GitHub (Sep 12, 2022): @pirate hey there are you still working on this / need help? I'm thinking this is possibly something I could put together with FastAPI and the CLI hopefully next weekend. let me know! cheers > Copying over my earlier message here from the API discussion related to the ArchiveBox browser extension #577: > > I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. These endpoints are already partially available through the Django Admin: > > * `/add` GET,POST ([CSRF excempt](https://github.com/ArchiveBox/ArchiveBox/pull/777), usable as an API from external origins and is used by the [browser extension](https://github.com/ArchiveBox/ArchiveBox/issues/577#issuecomment-872961973)) > * `/api/core/snapshot/` GET, POST, PUT > * `/api/core/snapshot/<id>` GET, PATCH, DELETE > * `/api/core/archiveresult/` GET, POST > * `/api/core/archiveresult/<id>` GET, PATCH, DELETE > * `/api/core/tag/` GET, POST, PUT > * `/api/core/tag/<id>` GET, PATCH, DELETE > > and this bonus escape hatch endpoint is planned to be added to do everything else not possible with the above ^: > > * `/api/cli/<command>` POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args) > e.g. `/api/cli/add` POST `{urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}` > or `/api/cli/schedule` POST `{urls: 'https://example.com', depth: 1, every: 'day', ...}` > > I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way. > > * https://fastapi.tiangolo.com/features/ > * https://www.stavros.io/posts/fastapi-with-django/ > * https://fastapi.tiangolo.com/advanced/wsgi/
Author
Owner

@pirate commented on GitHub (Sep 15, 2022):

Definitely open to contribution on the API front! I'm more focused on internals refactoring at the moment but as mentioned in that quoted comment I believe my changes can be kept insulated from anything external facing.

If you want to share gists or a fork with your work I can leave progress on your mock-up as you go to save time on PR review later.

<!-- gh-comment-id:1247479879 --> @pirate commented on GitHub (Sep 15, 2022): Definitely open to contribution on the API front! I'm more focused on internals refactoring at the moment but as mentioned in that quoted comment I believe my changes can be kept insulated from anything external facing. If you want to share gists or a fork with your work I can leave progress on your mock-up as you go to save time on PR review later.
Author
Owner

@joedavison commented on GitHub (Sep 17, 2022):

I would use an API like this.

<!-- gh-comment-id:1249966845 --> @joedavison commented on GitHub (Sep 17, 2022): I would use an API like this.
Author
Owner

@djkemmet commented on GitHub (Sep 20, 2022):

hi, if anyone is following this issue and could give me some guidance please see this issue: https://github.com/ArchiveBox/ArchiveBox/issues/1030

<!-- gh-comment-id:1252991357 --> @djkemmet commented on GitHub (Sep 20, 2022): hi, if anyone is following this issue and could give me some guidance please see this issue: https://github.com/ArchiveBox/ArchiveBox/issues/1030
Author
Owner

@FunctionDJ commented on GitHub (Oct 19, 2022):

i think @zblesk brought up and important point. a route like /add/ feels like violating REST principles by implying an action. ideally, if the API should be REST, it the routes should be resources and the action is determined by the HTTP method (GET, PUT etc).
so i feel like it would make more sense to make a GET to /archive to get archived items and to make a POST to /archive to store a new link etc.

<!-- gh-comment-id:1283848524 --> @FunctionDJ commented on GitHub (Oct 19, 2022): i think @zblesk brought up and important point. a route like `/add/` feels like violating REST principles by implying an action. ideally, if the API should be REST, it the routes should be resources and the action is determined by the HTTP method (GET, PUT etc). so i feel like it would make more sense to make a GET to `/archive` to get archived items and to make a POST to `/archive` to store a new link etc.
Author
Owner

@joedavison commented on GitHub (Nov 13, 2022):

Sure let's start with POST to /archive in addition to the current command line input method.

<!-- gh-comment-id:1312848903 --> @joedavison commented on GitHub (Nov 13, 2022): Sure let's start with POST to /archive in addition to the current command line input method.
Author
Owner

@pirate commented on GitHub (Nov 18, 2022):

Lets keep the REST API URLs in line with the model names and use /api/snapshot GET/POST and /api/archiveresult GET/POST.

<!-- gh-comment-id:1320582830 --> @pirate commented on GitHub (Nov 18, 2022): Lets keep the REST API URLs in line with the model names and use `/api/snapshot` GET/POST and `/api/archiveresult` GET/POST.
Author
Owner

@FunctionDJ commented on GitHub (Nov 21, 2022):

@pirate good point. my comment was less about the specific endpoint names and more about the REST conformity of using proper HTTP methods and resource endpoints. depending on the application design it might not make sense to map the models to endpoints 1-to-1 because some data is simply always a composition of different data models. i'm not familiar with the archivebox software project so i can't tell.

<!-- gh-comment-id:1322226267 --> @FunctionDJ commented on GitHub (Nov 21, 2022): @pirate good point. my comment was less about the specific endpoint names and more about the REST conformity of using proper HTTP methods and resource endpoints. depending on the application design it might not make sense to map the models to endpoints 1-to-1 because some data is simply always a composition of different data models. i'm not familiar with the archivebox software project so i can't tell.
Author
Owner

@pirate commented on GitHub (Nov 23, 2022):

I think keeping endpoints the same as model names is better than the alternative because more layers of indirection/leaky abstraction make it harder to grep through the source code and understand.

<!-- gh-comment-id:1324389463 --> @pirate commented on GitHub (Nov 23, 2022): I think keeping endpoints the same as model names is better than the alternative because more layers of indirection/leaky abstraction make it harder to grep through the source code and understand.
Author
Owner

@PeterPilley commented on GitHub (Feb 6, 2023):

Hi everyone, can I ask what the status is of the rest API, definitely +1 for fastapi instead of DRF.

Is this something you need help or is there a list of active tasks for the current implementation?

<!-- gh-comment-id:1418424035 --> @PeterPilley commented on GitHub (Feb 6, 2023): Hi everyone, can I ask what the status is of the rest API, definitely +1 for fastapi instead of DRF. Is this something you need help or is there a list of active tasks for the current implementation?
Author
Owner

@pirate commented on GitHub (Feb 19, 2023):

It's still on the list but slow going, I haven't had a lot of big blocks of coding time to work on ArchiveBox over the last year, so I've mostly been devoting my time to support and docs.

On the plus side I have interest from a big multinational org to use ArchiveBox, and maybe able to turn that into a consulting contract to fund some work towards the API. They are a slow-moving org so it may take 6~12 months, but it's exciting news nonetheless.

<!-- gh-comment-id:1436113075 --> @pirate commented on GitHub (Feb 19, 2023): It's still on the list but slow going, I haven't had a lot of big blocks of coding time to work on ArchiveBox over the last year, so I've mostly been devoting my time to support and docs. On the plus side I have interest from a big multinational org to use ArchiveBox, and maybe able to turn that into a consulting contract to fund some work towards the API. They are a slow-moving org so it may take 6~12 months, but it's exciting news nonetheless.
Author
Owner

@cogscides commented on GitHub (Jun 5, 2023):

Hope this will be implemented. In my case, I want to scrap and store websites in my local network and then be able to process this with AI and then put it in my personal knowledge management system. AI and PKM staff is on my side, just need to have API 🙏

<!-- gh-comment-id:1576540624 --> @cogscides commented on GitHub (Jun 5, 2023): Hope this will be implemented. In my case, I want to scrap and store websites in my local network and then be able to process this with AI and then put it in my personal knowledge management system. AI and PKM staff is on my side, just need to have API 🙏
Author
Owner

@aitorllj93 commented on GitHub (Nov 17, 2023):

hello! what's the current state of this? It's kinda confusing since it says it's on Alpha but reading the comments I don't know if it's possible to use it on Docker. I'm interested on building an alternative front end for this application and the REST API would help me a lot

<!-- gh-comment-id:1816278418 --> @aitorllj93 commented on GitHub (Nov 17, 2023): hello! what's the current state of this? It's kinda confusing since it says it's on Alpha but reading the comments I don't know if it's possible to use it on Docker. I'm interested on building an alternative front end for this application and the REST API would help me a lot
Author
Owner

@pirate commented on GitHub (Nov 18, 2023):

Alpha = There are a few POST/GET etc. endpoints exposed by the admin UI and the /add page that allow quick things can be hacked together, but it's not a proper REST API by any means. I'm working on a django-huey-monitor refactor to add and event driven queue system in the backend, and the new REST API I'm planning will insert messages into this queue to manage extractor jobs and snapshots.

Can I ask why you're going in the direction of an alternative frontend vs contributing changes to AB directly? I'd definitely be open to PRs improving our existing frontend!

See the discussion here too: https://github.com/ArchiveBox/ArchiveBox/issues/1126

<!-- gh-comment-id:1817479901 --> @pirate commented on GitHub (Nov 18, 2023): Alpha = There are a few POST/GET etc. endpoints exposed by the admin UI and the /add page that allow quick things can be hacked together, but it's not a proper REST API by any means. I'm working on a `django-huey-monitor` refactor to add and event driven queue system in the backend, and the new REST API I'm planning will insert messages into this queue to manage extractor jobs and snapshots. Can I ask why you're going in the direction of an alternative frontend vs contributing changes to AB directly? I'd definitely be open to PRs improving our existing frontend! See the discussion here too: https://github.com/ArchiveBox/ArchiveBox/issues/1126
Author
Owner

@aitorllj93 commented on GitHub (Nov 19, 2023):

Alpha = There are a few POST/GET etc. endpoints exposed by the admin UI and the /add page that allow quick things can be hacked together, but it's not a proper REST API by any means. I'm working on a django-huey-monitor refactor to add and event driven queue system in the backend, and the new REST API I'm planning will insert messages into this queue to manage extractor jobs and snapshots.

Can I ask why you're going in the direction of an alternative frontend vs contributing changes to AB directly? I'd definitely be open to PRs improving our existing frontend!

See the discussion here too: #1126

@pirate my main issue about contributing to the existing frontend is that the current version is far from what I think would be useful for me, so probably my changes might be too much disturbing to include them just with a PR without previous discussion. If you still think this project could benefit from a total rework on the FrontEnd (which I do) I can think about making some proposals and reach to an agreement

<!-- gh-comment-id:1817724215 --> @aitorllj93 commented on GitHub (Nov 19, 2023): > Alpha = There are a few POST/GET etc. endpoints exposed by the admin UI and the /add page that allow quick things can be hacked together, but it's not a proper REST API by any means. I'm working on a `django-huey-monitor` refactor to add and event driven queue system in the backend, and the new REST API I'm planning will insert messages into this queue to manage extractor jobs and snapshots. > > Can I ask why you're going in the direction of an alternative frontend vs contributing changes to AB directly? I'd definitely be open to PRs improving our existing frontend! > > See the discussion here too: #1126 @pirate my main issue about contributing to the existing frontend is that the current version is far from what I think would be useful for me, so probably my changes might be too much disturbing to include them just with a PR without previous discussion. If you still think this project could benefit from a total rework on the FrontEnd (which I do) I can think about making some proposals and reach to an agreement
Author
Owner

@pirate commented on GitHub (Nov 20, 2023):

I'm down to add a new frontend to the existing app as long as we keep the Django admin one available as well in parallel. I was considering using htmx to do this myself (it plays well with Django templates) but haven't gotten around to it.

One of the core principles is that we should rely on JS as little as possible because I want ArchiveBox views to be extremely durable long term and viewable across many different types of devices.

I'm ok with some of the UI requiring JS but ideally the most critical parts should fall back to working with old school plain html.

If that design direction sounds compatible with your ideas then I'm down to work together to add your UI changes to AB directly, otherwise maybe an independent app/mod may be better.

<!-- gh-comment-id:1818041457 --> @pirate commented on GitHub (Nov 20, 2023): I'm down to add a new frontend to the existing app as long as we keep the Django admin one available as well in parallel. I was considering using htmx to do this myself (it plays well with Django templates) but haven't gotten around to it. One of the core principles is that we should rely on JS as little as possible because I want ArchiveBox views to be extremely durable long term and viewable across many different types of devices. I'm ok with some of the UI requiring JS but ideally the most critical parts should fall back to working with old school plain html. If that design direction sounds compatible with your ideas then I'm down to work together to add your UI changes to AB directly, otherwise maybe an independent app/mod may be better.
Author
Owner

@aitorllj93 commented on GitHub (Nov 20, 2023):

@pirate sure, that sounds nice. I don't want to include a JavaScript framework neither. Regarding htmx, we can give it a try if we need it, I already did some works on a side project and it's great. About the CSS I saw the current implementation uses Bootstrap, I wonder if we can move to Tailwind, which I think fits better for an open source project these days, in that way we don't need to implement custom classes and it's easier for external contributions

<!-- gh-comment-id:1818514124 --> @aitorllj93 commented on GitHub (Nov 20, 2023): @pirate sure, that sounds nice. I don't want to include a JavaScript framework neither. Regarding htmx, we can give it a try if we need it, I already did some works on a side project and it's great. About the CSS I saw the current implementation uses Bootstrap, I wonder if we can move to Tailwind, which I think fits better for an open source project these days, in that way we don't need to implement custom classes and it's easier for external contributions
Author
Owner

@pirate commented on GitHub (Nov 21, 2023):

Nice! I also prefer tailwind to bootstrap, happy to move to that.

If you want to open a new issue for your UI ideas as they come up I think we should move frontend discussion away from the REST API thread so we don't spam everyone.

<!-- gh-comment-id:1820050702 --> @pirate commented on GitHub (Nov 21, 2023): Nice! I also prefer tailwind to bootstrap, happy to move to that. If you want to open a new issue for your UI ideas as they come up I think we should move frontend discussion away from the REST API thread so we don't spam everyone.
Author
Owner

@zblesk commented on GitHub (Nov 21, 2023):

If you do create a new thread for that, can you please @ me? Thanks.

<!-- gh-comment-id:1820870833 --> @zblesk commented on GitHub (Nov 21, 2023): If you do create a new thread for that, can you please @ me? Thanks.
Author
Owner

@pirate commented on GitHub (Apr 26, 2024):

Hey everyone, check out the new REST API on dev! Big thanks to @Brandl for the first PR that kickstarted it!

For users who want to try it out, get v0.8.0-rc (unstable) or later, start archivebox server, then visit http://127.0.0.1:8000/api and (/api/v1/docs) to get started with the interactive Swagger API docs/test page ➡️

image

It also supports sending webhooks to external servers whenever archiving events happen.

image

image

image

<!-- gh-comment-id:2080174235 --> @pirate commented on GitHub (Apr 26, 2024): Hey everyone, check out [the new REST API](https://github.com/ArchiveBox/ArchiveBox/pull/1397#issuecomment-2076913352) on `dev`! Big thanks to @Brandl for the first PR that kickstarted it! For users who want to try it out, get [v0.8.0-rc](https://github.com/ArchiveBox/ArchiveBox/releases/v0.8.0-rc) (unstable) or later, start `archivebox server`, then visit `http://127.0.0.1:8000/api` and (`/api/v1/docs`) to get started with the interactive Swagger API docs/test page ➡️ ![image](https://github.com/ArchiveBox/ArchiveBox/assets/511499/0adff6db-b602-4898-9d4b-a15d80fe8d34) It also supports [sending webhooks](https://github.com/ArchiveBox/ArchiveBox/pull/1418) to external servers whenever archiving events happen. ![image](https://github.com/ArchiveBox/ArchiveBox/assets/511499/c1ed23a1-0843-45ff-b51f-f732611dded4) ![image](https://github.com/ArchiveBox/ArchiveBox/assets/511499/49e598ce-6f9f-4690-92e7-12f51317061d) ![image](https://github.com/ArchiveBox/ArchiveBox/assets/511499/2ddd24f9-3b28-429f-a35c-75476d0f301e)
Author
Owner

@zblesk commented on GitHub (May 5, 2024):

Currently can't make a backup of my archive, so I can't switch to dev; but I'm really looking forward to trying this. Thanks.

<!-- gh-comment-id:2094903771 --> @zblesk commented on GitHub (May 5, 2024): Currently can't make a backup of my archive, so I can't switch to `dev`; but I'm really looking forward to trying this. Thanks.
Author
Owner

@rcarmo commented on GitHub (May 11, 2024):

I can't wait for this to make it to stable.

<!-- gh-comment-id:2105729637 --> @rcarmo commented on GitHub (May 11, 2024): I can't wait for this to make it to stable.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1833
No description provided.