[GH-ISSUE #630] Question: Obey uBlock filter rules? #1903

Closed
opened 2026-03-01 17:54:49 +03:00 by kerem · 3 comments
Owner

Originally created by @winteriscariot on GitHub (Jan 22, 2021).
Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/630

Is it possible for any archive methods to obey uBlock filter rules added via the 'picker' utility in uBlock (you can find the full list of them in the 'My Filters' tab in uBlock settings). This would be for uBlock installed in the chrome profile identified in the config.

My other option would be to process the resulting archived files separately to remove the indicated elements, but if archivebox is using chrome to capture the website anyway figure it might be worth not pulling those elements at all (if possible)

thanks!

Originally created by @winteriscariot on GitHub (Jan 22, 2021). Original GitHub issue: https://github.com/ArchiveBox/ArchiveBox/issues/630 Is it possible for any archive methods to obey uBlock filter rules added via the 'picker' utility in uBlock (you can find the full list of them in the 'My Filters' tab in uBlock settings). This would be for uBlock installed in the chrome profile identified in the config. My other option would be to process the resulting archived files separately to remove the indicated elements, but if archivebox is using chrome to capture the website anyway figure it might be worth not pulling those elements at all (if possible) thanks!
Author
Owner

@winteriscariot commented on GitHub (Jan 22, 2021):

Issue #211 mentions being able to use ublock:

Using a chrome extension like Ublock Origin and/or Ghostery (this is already possible)

however it doesn't appear to be obeying user filters

<!-- gh-comment-id:765709154 --> @winteriscariot commented on GitHub (Jan 22, 2021): Issue #211 mentions being able to use ublock: > Using a chrome extension like Ublock Origin and/or Ghostery (this is already possible) however it doesn't appear to be obeying user filters
Author
Owner

@jacobwhall commented on GitHub (Feb 21, 2021):

I would also like to get this set up. According to #211 that you linked, this can be achieved by using the CHROME_USER_DATA_DIR option. You might also find #516 helpful. Good luck!

<!-- gh-comment-id:782892582 --> @jacobwhall commented on GitHub (Feb 21, 2021): I would also like to get this set up. According to #211 that you linked, this can be achieved by [using the CHROME_USER_DATA_DIR option](https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration#chrome_user_data_dir). You might also find #516 helpful. Good luck!
Author
Owner

@pirate commented on GitHub (Feb 25, 2021):

This is a valid question, but I'm actually going to close this in favor of keeping the discussion here: https://github.com/ArchiveBox/ArchiveBox/issues/211, because I already had uBock Origin + Ghostery in mind for the implementation of that ticket.

Right now, using CHROME_USER_DATA_DIR with a profile that has the extension works for some people, but is buggy/impossible on other OS/browser/docker combos.

I have an eventual more elegant solution in mind (involving a config option for chrome extensions + a profile to run) but it is blocked by:

  1. an upcoming event-sourcing refactor in >=v0.7 to allow for faster worker-pool based parallel archiving (3-6 months)
  2. addition of the playwright extractor dependency in favor of the old chrome-headless CLI / pupppeteer approach (2-4 months)

If you hit "subscribe" over on https://github.com/ArchiveBox/ArchiveBox/issues/211 I'll notify you when we get close to it.

If you can think of any quick-and-dirty solutions to make this easier that would get us there before the playwright implementation, I'm all ears, please suggest them over on #211!

<!-- gh-comment-id:785566416 --> @pirate commented on GitHub (Feb 25, 2021): This is a valid question, but I'm actually going to close this in favor of keeping the discussion here: https://github.com/ArchiveBox/ArchiveBox/issues/211, because I already had uBock Origin + Ghostery in mind for the implementation of that ticket. Right now, using `CHROME_USER_DATA_DIR` with a profile that has the extension works for some people, but is buggy/impossible on other OS/browser/docker combos. I have an eventual more elegant solution in mind (involving a config option for chrome extensions + a profile to run) but it is blocked by: 1. an upcoming event-sourcing refactor in >=v0.7 to allow for faster worker-pool based parallel archiving (3-6 months) 2. addition of the `playwright` extractor dependency in favor of the old chrome-headless CLI / pupppeteer approach (2-4 months) If you hit "subscribe" over on https://github.com/ArchiveBox/ArchiveBox/issues/211 I'll notify you when we get close to it. If you can think of any quick-and-dirty solutions to make this easier that would get us there before the `playwright` implementation, I'm all ears, please suggest them over on #211!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#1903
No description provided.