starred/shiori

Fork 0

mirror of https://github.com/go-shiori/shiori.git synced 2026-04-25 14:35:52 +03:00

[GH-ISSUE #230] Low-res images for sites that use progressive enhancement? #171

New issue

Closed

opened 2026-02-25 23:33:36 +03:00 by kerem · 9 comments

kerem commented

2026-02-25 23:33:36 +03:00

Owner

Originally created by @neezer on GitHub (Jan 21, 2020).
Original GitHub issue: https://github.com/go-shiori/shiori/issues/230

I just tried adding a bookmark for a Medium article and noticed the images were imported into Shiori at an atrocious quality:

I'm guessing this is because Medium will lazy-load the higher-resolution copies with JS, but the Shiori importer doesn't wait around for that. That's my best guess anyways. Inspecting the Medium page source, I see that the images have a noscript tag near 'em with he full-quality version of the image... perhaps that could be useful when importing?

Think this is fixable?

Originally created by @neezer on GitHub (Jan 21, 2020). Original GitHub issue: https://github.com/go-shiori/shiori/issues/230 I just tried adding a bookmark for a Medium article and noticed the images were imported into Shiori at an atrocious quality: <img width="1071" alt="Screen Shot 2020-01-20 at 8 58 11 PM" src="https://user-images.githubusercontent.com/29997/72776876-83dcb900-3bc8-11ea-91cf-a5e1c35efb61.png"> I'm guessing this is because Medium will lazy-load the higher-resolution copies with JS, but the Shiori importer doesn't wait around for that. That's my best guess anyways. Inspecting the Medium page source, I see that the images have a `noscript` tag near 'em with he full-quality version of the image... perhaps that could be useful when importing? Think this is fixable?

kerem

2026-02-25 23:33:36 +03:00

closed this issue
added the
type:bug

component:backend

component:readability
labels

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@neezer commented on GitHub (Jan 21, 2020):

This was the article I was importing, if it's helpful for testing: https://medium.com/voodoo-engineering/node-js-and-cpu-profiling-on-production-in-real-time-without-downtime-d6e62af173e2

@neezer commented on GitHub (Jan 21, 2020): This was the article I was importing, if it's helpful for testing: https://medium.com/voodoo-engineering/node-js-and-cpu-profiling-on-production-in-real-time-without-downtime-d6e62af173e2

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@neezer commented on GitHub (Jan 21, 2020):

Definitely two different URLs:

- https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20
+ https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png

The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent noscript tags. In my testing, the second URL parameter is the deciding factor; the query parameter q does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image.

This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all:

Seems like this would work fine if Shiori pulled the noscript value instead of the given img value, but I'm unsure if that's safe/wise to do categorically.

Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.

@neezer commented on GitHub (Jan 21, 2020): Definitely two different URLs: ```diff - https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20 + https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png ``` The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent `noscript` tags. In my testing, the second URL parameter is the deciding factor; the query parameter `q` does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image. This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all: <img width="1103" alt="Screen Shot 2020-01-20 at 9 23 57 PM" src="https://user-images.githubusercontent.com/29997/72777738-8856a100-3bcb-11ea-82af-fb920845e360.png"> --- Seems like this would work fine if Shiori pulled the `noscript` value instead of the given `img` value, but I'm unsure if that's safe/wise to do categorically. Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@8bitgentleman commented on GitHub (Jan 21, 2020):

I would also love to see a fix for medium articles, as that's one of the more common sites I use

@8bitgentleman commented on GitHub (Jan 21, 2020): I would also love to see a fix for medium articles, as that's one of the more common sites I use

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@RadhiFadlillah commented on GitHub (Mar 27, 2020):

@neezer @8bitgentleman sorry for late reply.

Just want to tell you the fix for this issue has been implemented in go-readability.

However, it might take a while to merge it to Shiori because I also want to improve the archival method to make it better, at least to make Shiori able to archive pages from Github and its gist.

@RadhiFadlillah commented on GitHub (Mar 27, 2020): @neezer @8bitgentleman sorry for late reply. Just want to tell you the fix for this issue has been implemented in [`go-readability`](https://github.com/go-shiori/go-readability/commit/b2fb1ff01cbe3205872b76477f624abd49dca49a). However, it might take a while to merge it to Shiori because I also want to improve the archival method to make it better, at least to make Shiori able to archive pages from Github and its gist.

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@fmartingr commented on GitHub (Feb 6, 2022):

Hey everyone, I've tested this and it's currently working on the latest version:

I'm closing this as solved, but if you have any other issues please comment again so we can reopen.

@fmartingr commented on GitHub (Feb 6, 2022): Hey everyone, I've tested this and it's currently working on the latest version: ![Screenshot 2022-02-06 at 17 05 25](https://user-images.githubusercontent.com/812088/152689706-e72a5e5a-fcf8-4aaf-ac6b-b6cb564aa2c6.png) I'm closing this as solved, but if you have any other issues please comment again so we can reopen.

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@rundx commented on GitHub (Feb 24, 2022):

Definitely two different URLs:
- https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20
+ https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png
The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent noscript tags. In my testing, the second URL parameter is the deciding factor; the query parameter q does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image.

This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all:

Seems like this would work fine if Shiori pulled the noscript value instead of the given img value, but I'm unsure if that's safe/wise to do categorically.

Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.

This is still an issue when loading archived version of medium articles

@rundx commented on GitHub (Feb 24, 2022): > Definitely two different URLs: > > ```diff > - https://miro.medium.com/max/60/1*87KlGgfbuWP38nAaQaj3xw.png?q=20 > + https://miro.medium.com/max/1860/1*87KlGgfbuWP38nAaQaj3xw.png > ``` > > The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent `noscript` tags. In my testing, the second URL parameter is the deciding factor; the query parameter `q` does not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image. > > This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all: > > <img alt="Screen Shot 2020-01-20 at 9 23 57 PM" width="1103" src="https://user-images.githubusercontent.com/29997/72777738-8856a100-3bcb-11ea-82af-fb920845e360.png"> > > Seems like this would work fine if Shiori pulled the `noscript` value instead of the given `img` value, but I'm unsure if that's safe/wise to do categorically. > > Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too. This is still an issue when loading archived version of medium articles

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@fmartingr commented on GitHub (Feb 26, 2022):

This is still an issue when loading archived version of medium articles

Have you tried updating the cache and see if the new achived version has been correctly downloaded?

@fmartingr commented on GitHub (Feb 26, 2022): > This is still an issue when loading archived version of medium articles Have you tried updating the cache and see if the new achived version has been correctly downloaded?

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@rundx commented on GitHub (Feb 26, 2022):

This is still an issue when loading archived version of medium articles

Have you tried updating the cache and see if the new achived version has been correctly downloaded?

Yes, still the same

@rundx commented on GitHub (Feb 26, 2022): > > This is still an issue when loading archived version of medium articles > > Have you tried updating the cache and see if the new achived version has been correctly downloaded? Yes, still the same <img width="796" alt="Screen Shot 2022-02-26 at 12 35 52 PM" src="https://user-images.githubusercontent.com/60038039/155840179-32e8be83-967b-49a6-90ae-17188b665104.png">

kerem commented

2026-02-25 23:33:37 +03:00

Author

Owner

@fmartingr commented on GitHub (Feb 26, 2022):

I believe we have been talking about two different things here. In this issue we're talking about the content view of the article (which comes from go-readability) and your problem comes from the archived version which right know comes from warc. Warc is not maintained anymore and we need to migrate to obelisk (#353).

@fmartingr commented on GitHub (Feb 26, 2022): I believe we have been talking about two different things here. In this issue we're talking about the content view of the article (which comes from [`go-readability`](https://github.com/go-shiori/go-readability)) and your problem comes from the archived version which right know comes from [`warc`](https://github.com/go-shiori/warc). Warc is not maintained anymore and we need to migrate to [`obelisk`](https://github.com/go-shiori/obelisk) (#353).