mirror of
https://github.com/go-shiori/shiori.git
synced 2026-04-25 14:35:52 +03:00
[GH-ISSUE #230] Low-res images for sites that use progressive enhancement? #171
Labels
No labels
component:backend
component:builds
component:builds
component:extension
component:frontend
component:readability
database
database:mysql
database:postgres
database:sqlite
feature:ebooks
github_actions
good first issue
hacktoberfest
note:duplicate?
note:fixed?
note:out-of-scope?
os:windows
priority:high
priority:low
pull-request
resolution:as-intended
resolution:cant-reproduce
resolution:duplicate
resolution:fixed
resolution:wontfix
tag:TBD
tag:big-task
tag:help-wanted
tag:huge-data
tag:meta
tag:more-info
tag:next
tag:no-stale
tag:requires-migrations
tag:research
tag:security 🛡️
tag:stale
tag:waiting-for-assignee
type:bug
type:documentation
type:enhancement
type:meta
type:ux
user:cli
user:web
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/shiori#171
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @neezer on GitHub (Jan 21, 2020).
Original GitHub issue: https://github.com/go-shiori/shiori/issues/230
I just tried adding a bookmark for a Medium article and noticed the images were imported into Shiori at an atrocious quality:
I'm guessing this is because Medium will lazy-load the higher-resolution copies with JS, but the Shiori importer doesn't wait around for that. That's my best guess anyways. Inspecting the Medium page source, I see that the images have a
noscripttag near 'em with he full-quality version of the image... perhaps that could be useful when importing?Think this is fixable?
@neezer commented on GitHub (Jan 21, 2020):
This was the article I was importing, if it's helpful for testing: https://medium.com/voodoo-engineering/node-js-and-cpu-profiling-on-production-in-real-time-without-downtime-d6e62af173e2
@neezer commented on GitHub (Jan 21, 2020):
Definitely two different URLs:
The former is the URL Shiori pulls; the latter is the URL in the fully-loaded Medium article, and also the URL found in the adjacent
noscripttags. In my testing, the second URL parameter is the deciding factor; the query parameterqdoes not seem to make a significant change, and is missing entirely when the page fully loads and the JS executes on a given image.This problem is further compounded if you have attempted to Archive the page in Shiori; at present the images don't load at all:
Seems like this would work fine if Shiori pulled the
noscriptvalue instead of the givenimgvalue, but I'm unsure if that's safe/wise to do categorically.Right now I'm contemplating manually massaging the imported HTML in SQLite to the correct URLs, but that's obviously pretty labor intensive and not something I'd like to do routinely. However, I've also noticed the embedded code examples didn't import at all, so I might have to do that anyways, as I want to have those archived too.
@8bitgentleman commented on GitHub (Jan 21, 2020):
I would also love to see a fix for medium articles, as that's one of the more common sites I use
@RadhiFadlillah commented on GitHub (Mar 27, 2020):
@neezer @8bitgentleman sorry for late reply.
Just want to tell you the fix for this issue has been implemented in
go-readability.However, it might take a while to merge it to Shiori because I also want to improve the archival method to make it better, at least to make Shiori able to archive pages from Github and its gist.
@fmartingr commented on GitHub (Feb 6, 2022):
Hey everyone, I've tested this and it's currently working on the latest version:
I'm closing this as solved, but if you have any other issues please comment again so we can reopen.
@rundx commented on GitHub (Feb 24, 2022):
This is still an issue when loading archived version of medium articles
@fmartingr commented on GitHub (Feb 26, 2022):
Have you tried updating the cache and see if the new achived version has been correctly downloaded?
@rundx commented on GitHub (Feb 26, 2022):
Yes, still the same
@fmartingr commented on GitHub (Feb 26, 2022):
I believe we have been talking about two different things here. In this issue we're talking about the content view of the article (which comes from
go-readability) and your problem comes from the archived version which right know comes fromwarc. Warc is not maintained anymore and we need to migrate toobelisk(#353).