[GH-ISSUE #230] Low-res images for sites that use progressive enhancement? #171

Closed
opened 2026-02-25 23:33:36 +03:00 by kerem · 9 comments
Owner

Originally created by @neezer on GitHub (Jan 21, 2020).
Original GitHub issue: https://github.com/go-shiori/shiori/issues/230

I just tried adding a bookmark for a Medium article and noticed the images were imported into Shiori at an atrocious quality:

Screen Shot 2020-01-20 at 8 58 11 PM

I'm guessing this is because Medium will lazy-load the higher-resolution copies with JS, but the Shiori importer doesn't wait around for that. That's my best guess anyways. Inspecting the Medium page source, I see that the images have a noscript tag near 'em with he full-quality version of the image... perhaps that could be useful when importing?

Think this is fixable?

Originally created by @neezer on GitHub (Jan 21, 2020). Original GitHub issue: https://github.com/go-shiori/shiori/issues/230 I just tried adding a bookmark for a Medium article and noticed the images were imported into Shiori at an atrocious quality: <img width="1071" alt="Screen Shot 2020-01-20 at 8 58 11 PM" src="https://user-images.githubusercontent.com/29997/72776876-83dcb900-3bc8-11ea-91cf-a5e1c35efb61.png"> I'm guessing this is because Medium will lazy-load the higher-resolution copies with JS, but the Shiori importer doesn't wait around for that. That's my best guess anyways. Inspecting the Medium page source, I see that the images have a `noscript` tag near 'em with he full-quality version of the image... perhaps that could be useful when importing? Think this is fixable?
Author
Owner

@neezer commented on GitHub (Jan 21, 2020):

This was the article I was importing, if it's helpful for testing: https://medium.com/voodoo-engineering/node-js-and-cpu-profiling-on-production-in-real-time-without-downtime-d6e62af173e2

<!-- gh-comment-id:576520733 --> @neezer commented on GitHub (Jan 21, 2020): This was the article I was importing, if it's helpful for testing: https://medium.com/voodoo-engineering/node-js-and-cpu-profiling-on-production-in-real-time-without-downtime-d6e62af173e2
Author
Owner

@neezer commented on GitHub (Jan 21, 2020):

Definitely two different URLs:

- https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20
+ https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png

The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent noscript tags. In my testing, the second URL parameter is the deciding factor; the query parameter q does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image.

This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all:

Screen Shot 2020-01-20 at 9 23 57 PM

Seems like this would work fine if Shiori pulled the noscript value instead of the given img value, but I'm unsure if that's safe/wise to do categorically.

Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.

<!-- gh-comment-id:576525229 --> @neezer commented on GitHub (Jan 21, 2020): Definitely two different URLs: ```diff - https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20 + https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png ``` The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent `noscript` tags. In my testing, the second URL parameter is the deciding factor; the query parameter `q` does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image. This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all: <img width="1103" alt="Screen Shot 2020-01-20 at 9 23 57 PM" src="https://user-images.githubusercontent.com/29997/72777738-8856a100-3bcb-11ea-82af-fb920845e360.png"> --- Seems like this would work fine if Shiori pulled the `noscript` value instead of the given `img` value, but I'm unsure if that's safe/wise to do categorically. Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.
Author
Owner

@8bitgentleman commented on GitHub (Jan 21, 2020):

I would also love to see a fix for medium articles, as that's one of the more common sites I use

<!-- gh-comment-id:576687676 --> @8bitgentleman commented on GitHub (Jan 21, 2020): I would also love to see a fix for medium articles, as that's one of the more common sites I use
Author
Owner

@RadhiFadlillah commented on GitHub (Mar 27, 2020):

@neezer @8bitgentleman sorry for late reply.

Just want to tell you the fix for this issue has been implemented in go-readability.

However, it might take a while to merge it to Shiori because I also want to improve the archival method to make it better, at least to make Shiori able to archive pages from Github and its gist.

<!-- gh-comment-id:604988600 --> @RadhiFadlillah commented on GitHub (Mar 27, 2020): @neezer @8bitgentleman sorry for late reply. Just want to tell you the fix for this issue has been implemented in [`go-readability`](https://github.com/go-shiori/go-readability/commit/b2fb1ff01cbe3205872b76477f624abd49dca49a). However, it might take a while to merge it to Shiori because I also want to improve the archival method to make it better, at least to make Shiori able to archive pages from Github and its gist.
Author
Owner

@fmartingr commented on GitHub (Feb 6, 2022):

Hey everyone, I've tested this and it's currently working on the latest version:

Screenshot 2022-02-06 at 17 05 25

I'm closing this as solved, but if you have any other issues please comment again so we can reopen.

<!-- gh-comment-id:1030862947 --> @fmartingr commented on GitHub (Feb 6, 2022): Hey everyone, I've tested this and it's currently working on the latest version: ![Screenshot 2022-02-06 at 17 05 25](https://user-images.githubusercontent.com/812088/152689706-e72a5e5a-fcf8-4aaf-ac6b-b6cb564aa2c6.png) I'm closing this as solved, but if you have any other issues please comment again so we can reopen.
Author
Owner

@rundx commented on GitHub (Feb 24, 2022):

Definitely two different URLs:

- https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20
+ https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png

The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent noscript tags. In my testing, the second URL parameter is the deciding factor; the query parameter q does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image.

This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all:

Screen Shot 2020-01-20 at 9 23 57 PM

Seems like this would work fine if Shiori pulled the noscript value instead of the given img value, but I'm unsure if that's safe/wise to do categorically.

Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.

This is still an issue when loading archived version of medium articles

<!-- gh-comment-id:1050333135 --> @rundx commented on GitHub (Feb 24, 2022): > Definitely two different URLs: > > ```diff > - https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20 > + https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png > ``` > > The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent `noscript` tags. In my testing, the second URL parameter is the deciding factor; the query parameter `q` does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image. > > This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all: > > <img alt="Screen Shot 2020-01-20 at 9 23 57 PM" width="1103" src="https://user-images.githubusercontent.com/29997/72777738-8856a100-3bcb-11ea-82af-fb920845e360.png"> > > Seems like this would work fine if Shiori pulled the `noscript` value instead of the given `img` value, but I'm unsure if that's safe/wise to do categorically. > > Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too. This is still an issue when loading archived version of medium articles
Author
Owner

@fmartingr commented on GitHub (Feb 26, 2022):

This is still an issue when loading archived version of medium articles

Have you tried updating the cache and see if the new achived version has been correctly downloaded?

<!-- gh-comment-id:1051973389 --> @fmartingr commented on GitHub (Feb 26, 2022): > This is still an issue when loading archived version of medium articles Have you tried updating the cache and see if the new achived version has been correctly downloaded?
Author
Owner

@rundx commented on GitHub (Feb 26, 2022):

This is still an issue when loading archived version of medium articles

Have you tried updating the cache and see if the new achived version has been correctly downloaded?

Yes, still the same

Screen Shot 2022-02-26 at 12 35 52 PM
<!-- gh-comment-id:1051987833 --> @rundx commented on GitHub (Feb 26, 2022): > > This is still an issue when loading archived version of medium articles > > Have you tried updating the cache and see if the new achived version has been correctly downloaded? Yes, still the same <img width="796" alt="Screen Shot 2022-02-26 at 12 35 52 PM" src="https://user-images.githubusercontent.com/60038039/155840179-32e8be83-967b-49a6-90ae-17188b665104.png">
Author
Owner

@fmartingr commented on GitHub (Feb 26, 2022):

I believe we have been talking about two different things here. In this issue we're talking about the content view of the article (which comes from go-readability) and your problem comes from the archived version which right know comes from warc. Warc is not maintained anymore and we need to migrate to obelisk (#353).

<!-- gh-comment-id:1052049849 --> @fmartingr commented on GitHub (Feb 26, 2022): I believe we have been talking about two different things here. In this issue we're talking about the content view of the article (which comes from [`go-readability`](https://github.com/go-shiori/go-readability)) and your problem comes from the archived version which right know comes from [`warc`](https://github.com/go-shiori/warc). Warc is not maintained anymore and we need to migrate to [`obelisk`](https://github.com/go-shiori/obelisk) (#353).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shiori#171
No description provided.