[GH-ISSUE #160] Doesn't always grab images #118

Closed
opened 2026-02-25 23:33:29 +03:00 by kerem · 5 comments
Owner

Originally created by @danmed on GitHub (Sep 22, 2019).
Original GitHub issue: https://github.com/go-shiori/shiori/issues/160

I'm just testing this and the archive feature doesn't appear to always grab images...

For example, when archiving the below link, none of the images are captured in the archive, but a thumbnail is generated.

https://imgur.com/a/W5wZxHT

I'm running the docker image btw..

Originally created by @danmed on GitHub (Sep 22, 2019). Original GitHub issue: https://github.com/go-shiori/shiori/issues/160 I'm just testing this and the archive feature doesn't appear to always grab images... For example, when archiving the below link, none of the images are captured in the archive, but a thumbnail is generated. https://imgur.com/a/W5wZxHT I'm running the docker image btw..
Author
Owner

@RadhiFadlillah commented on GitHub (Sep 22, 2019):

@danmed yep, unfortunately the archival still doesn't work properly with page that uses a lot of Javascript like imgur and Reddit (the old Reddit is fine though), and unfortunately I don't have any idea where to start to solve this.

<!-- gh-comment-id:533860444 --> @RadhiFadlillah commented on GitHub (Sep 22, 2019): @danmed yep, unfortunately the archival still doesn't work properly with page that uses a lot of Javascript like imgur and Reddit (the old Reddit is fine though), and unfortunately I don't have any idea where to start to solve this.
Author
Owner

@danmed commented on GitHub (Sep 22, 2019):

Ah ok, will it still does a pretty good job.. so thank you..

On Sun, 22 Sep 2019, 09:22 Radhi, notifications@github.com wrote:

@danmed https://github.com/danmed yep, unfortunately the archival still
doesn't work properly with page that uses a lot of Javascript like imgur
and Reddit (the old Reddit is fine though), and unfortunately I don't have
any idea where to start to solve this.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/go-shiori/shiori/issues/160?email_source=notifications&email_token=AA5S4WVLZ7DP2EOK7WVX3J3QK4TOBA5CNFSM4IZBRP5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7JBAXA#issuecomment-533860444,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA5S4WQCQWT7NSAONEDR54TQK4TOBANCNFSM4IZBRP5A
.

<!-- gh-comment-id:533865498 --> @danmed commented on GitHub (Sep 22, 2019): Ah ok, will it still does a pretty good job.. so thank you.. On Sun, 22 Sep 2019, 09:22 Radhi, <notifications@github.com> wrote: > @danmed <https://github.com/danmed> yep, unfortunately the archival still > doesn't work properly with page that uses a lot of Javascript like imgur > and Reddit (the old Reddit is fine though), and unfortunately I don't have > any idea where to start to solve this. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/go-shiori/shiori/issues/160?email_source=notifications&email_token=AA5S4WVLZ7DP2EOK7WVX3J3QK4TOBA5CNFSM4IZBRP5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7JBAXA#issuecomment-533860444>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AA5S4WQCQWT7NSAONEDR54TQK4TOBANCNFSM4IZBRP5A> > . >
Author
Owner

@deanishe commented on GitHub (Sep 23, 2019):

and unfortunately I don't have any idea where to start to solve this.

I was thinking about something like a set of URL rewriting rules, which might help Shiori handle certain sites (like Reddit). For example, a rule might rewrite www.reddit.com/... to old.reddit.com/... to get a more easily parseable version of a page.

For other sites, the mobile or print version (if they exist) might be much easier to parse.

For example, Shiori does a good job with this article from the Spiegel, but it also grabs a bunch of the dumb user comments that aren't in the print version.

<!-- gh-comment-id:534220443 --> @deanishe commented on GitHub (Sep 23, 2019): > and unfortunately I don't have any idea where to start to solve this. I was thinking about something like a set of URL rewriting rules, which might help Shiori handle certain sites (like Reddit). For example, a rule might rewrite `www.reddit.com/...` to `old.reddit.com/...` to get a more easily parseable version of a page. For other sites, the mobile or print version (if they exist) might be much easier to parse. For example, Shiori does a good job with [this article from the Spiegel](https://www.spiegel.de/wirtschaft/unternehmen/thomas-cook-gegen-tui-das-duell-der-reisekonzerne-ist-entschieden-a-1288236.html), but it also grabs a bunch of the dumb user comments that aren't in [the print version](https://www.spiegel.de/wirtschaft/unternehmen/thomas-cook-gegen-tui-das-duell-der-reisekonzerne-ist-entschieden-a-1288236-druck.html).
Author
Owner

@linwaytin commented on GitHub (Dec 26, 2019):

I also have the same problem.
I think there are two possible ways to deal with this problem.

First, let users enter the link manually.
Second, as @deanishe mentioned, rules can help Shiori to determine where the image is.

At least, I like to be able to enter the link manually.
For now, if the image is not correct, there is nothing I can do to fix it.

Anyway, thank you for this great project.

<!-- gh-comment-id:569103131 --> @linwaytin commented on GitHub (Dec 26, 2019): I also have the same problem. I think there are two possible ways to deal with this problem. First, let users enter the link manually. Second, as @deanishe mentioned, rules can help Shiori to determine where the image is. At least, I like to be able to enter the link manually. For now, if the image is not correct, there is nothing I can do to fix it. Anyway, thank you for this great project.
Author
Owner

@fmartingr commented on GitHub (Oct 7, 2022):

This will get worked on with #353. After the switch, we can start fixing this kind of things directly on obelisk.

<!-- gh-comment-id:1271385709 --> @fmartingr commented on GitHub (Oct 7, 2022): This will get worked on with #353. After the switch, we can start fixing this kind of things directly on [obelisk](https://github.com/go-shiori/obelisk).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shiori#118
No description provided.