mirror of
https://github.com/cypht-org/cypht.git
synced 2026-04-25 04:56:03 +03:00
[GH-ISSUE #284] RSS feed not supported #246
Labels
No labels
2fa
I18N
PGP
Security
Security
account
advanced_search
advanced_search
announcement
api_login
authentication
awaiting feedback
blocker
bug
bug
bug
calendar
config
contacts
core
core
devops
docker
docs
duplicate
dynamic_login
enhancement
epic
feature
feeds
framework
github
github
gmail_contacts
good first issue
help wanted
history
history
imap
imap_folders
inline_message
installation
keyboard_shortcuts
keyboard_shortcuts
ldap_contacts
mobile
need-ssh-access
new module set
nux
pop3
profiles
pull-request
question
refactor
release
research
saved_searches
smtp
strategic
tags
tests
themes
website
wordpress
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/cypht#246
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ehanuise on GitHub (Sep 25, 2018).
Original GitHub issue: https://github.com/cypht-org/cypht/issues/284
Originally assigned to: @jasonmunro on GitHub.
I wanted to add 2 Belgian newspapers feeds :
http://www.lesoir.be/rss/81853/cible_principale_gratuit
http://www.lalibre.be/rss.xml
The first works OK, the second is refused by Cypht : http://www.lalibre.be/rss.xml
Cypht should be able to process both.
I noticed a small difference in headers, which might be the cause :
this one is Le Soir :
This one is La Libre :
@jasonmunro commented on GitHub (Sep 25, 2018):
Thanks for the feedback. I will re-create the issue and figure out whats wrong!
@jasonmunro commented on GitHub (Sep 25, 2018):
I just popped it in and it worked without issue. I'm running the git master branch however - if you are running the latest release could you try switching over to the latest code? the latest release is quite old and out of date - I'm trying to start a new release cycle this week actually.
@ehanuise commented on GitHub (Sep 25, 2018):
Hi.
I installed with the script from your site. Doesn't it DL the latest version?
I'm not familiar with git - not a dev. If you have a script, I'm game :-)
Sent from my mobile.
-----Original Message-----
From: Jason Munro notifications@github.com
To: jasonmunro/cypht cypht@noreply.github.com
Cc: Eric Hanuise ehanuise@fantasybel.net, Author author@noreply.github.com
Sent: Tue, 25 Sep 2018 17:25
Subject: Re: [jasonmunro/cypht] RSS feed not supported (#284)
I just popped it in and it worked without issue. I'm running the git master branch however - if you are running the latest release could you try switching over to the latest code? the latest release is quite old and out of date - I'm trying to start a new release cycle this week actually.
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/jasonmunro/cypht/issues/284#issuecomment-424386644
@jasonmunro commented on GitHub (Sep 25, 2018):
Actually, the script as defined on the install page at cypht.org does use the latest git master branch. I can't explain the issue you are seeing here - I was able to add that RSS source without a problem. Can you tell me more about your PHP version, and which PHP packages are installed?
Thanks!
@ehanuise commented on GitHub (Sep 25, 2018):
It's php 7.1.1 on debian latest version
I can try and capture some logs if you tell me where to look :)
On 26/09/18 00:18, Jason Munro wrote:
@ehanuise commented on GitHub (Sep 26, 2018):
I get this error on the la libre feed : Cound not add feed: php_network_getaddresses: getaddrinfo failed: Name or service not known
@jasonmunro commented on GitHub (Sep 26, 2018):
Weird. So this is telling me your server cannot resolve the address for that feed. What happens when you try:
nslookup www.lalibre.beFrom your server?
@ehanuise commented on GitHub (Sep 27, 2018):
It rsolves just fine :
@jasonmunro commented on GitHub (Sep 27, 2018):
Thanks for the follow up. Looking a bit closer at the code, the first thing we do is split up the url into it's parts, and try to connect to the host portion - this is where it is failing for you. Is it possible you had a typo? Looks like even a leading space before the address could cause an issue here (which I will fix). Can you retry and if it still fails run this from your server:
php -r 'print_r(parse_url(" http://www.lalibre.be/rss.xml"));'it should return the following:
@ehanuise commented on GitHub (Sep 27, 2018):
Thanks.
I get it differently :
php -r 'print_r(parse_url(" http://www.lalibre.be/rss.xml"));'
Array
(
[path] => http://www.lalibre.be/rss.xml
)
On 27/09/18 17:08, Jason Munro wrote:
@jasonmunro commented on GitHub (Sep 27, 2018):
Oh shoot, I copied the command with the leading space in the host that does not work - should be this:
php -r 'print_r(parse_url("http://www.lalibre.be/rss.xml"));'@ehanuise commented on GitHub (Sep 27, 2018):
:-)
This one works :
php -r 'print_r(parse_url("http://www.lalibre.be/rss.xml"));'
Array
(
[scheme] => http
[host] => www.lalibre.be
[path] => /rss.xml
)
On 27/09/18 18:23, Jason Munro wrote:
@jasonmunro commented on GitHub (Sep 27, 2018):
yep, looks good. Did you try adding it again in Cypht making sure there is no leading space?
@ehanuise commented on GitHub (Sep 27, 2018):
yup, sure, typed it by hand, doublechecked, the issue is somewhere else
i'm afraid :(
On 27/09/18 18:25, Jason Munro wrote:
@jasonmunro commented on GitHub (Sep 27, 2018):
how about this command?
php -r 'print_r(fsockopen("www.lalibre.be", 80));'@ehanuise commented on GitHub (Sep 27, 2018):
php -r 'print_r(fsockopen("www.lalibre.be", 80));'
Resource id #4USERNAME@HOSTNAME:/home/USERNAME (caps are edited)
On 27/09/18 18:33, Jason Munro wrote:
@ehanuise commented on GitHub (Sep 27, 2018):
Ah sorry it was my prompt.
it just returns
Resource id #4
On 27/09/18 18:35, Eric Hanuise wrote:
@jasonmunro commented on GitHub (Sep 27, 2018):
well this is a puzzle! I can't reproduce this here, so the only thing I can think to try is to give you a patch to insert some debugging info into the feed related code that will output some data to the PHP/webserver error log.
@ehanuise commented on GitHub (Sep 27, 2018):
I enabled debug mode , and copied the output on when I try to enter the
feed :
[Thu Sep 27 18:44:10.803701 2018] [php7:notice] [pid 20177] [client
XXXXXXXXXXXXX:60378] Array\n(\n [0] => Using Hm_PHP_Session with
Hm_Auth_IMAP\n [1] => Using file based user configuration\n [2] =>
Using sapi: apache2handler\n [3] => Request type: HTTP\n [4] =>
Request path: /webmail/\n [5] => TLS request: 1\n [6] => Mobile
request: 0\n [7] => Page ID: servers\n [8] => LOGGED IN\n [9]
=> XML Parse error: Reserved XML Name\n [10] => Setting cookie: name:
hm_msgs, lifetime: 0, path: /webmail/, domain: www.XXXXXXX.com, secure:
1, html_only 1\n [11] => Redirecting to /webmail/?page=servers\n
[12] => PHP version 7.1.20-1+0
20180910100430.3+jessie1.gbp17c613\n[13] => Zend version 3.1.0\n [14] => Peak Memory: 2048\n [15] =>
PID: 20177\n [16] => Included files: 68\n)\n, referer:
https://www.XXXXXXXX.com/webmail/?page=servers
[Thu Sep 27 18:44:10.919458 2018] [php7:notice] [pid 20177] [client
XXXXXXXXXX:60378] Array\n(\n [0] => Using Hm_PHP_Session with
Hm_Auth_IMAP\n [1] => Using file based user configuration\n [2] =>
Using sapi: apache2handler\n [3] => Request type: HTTP\n [4] =>
Request path: /webmail/\n [5] => TLS request: 1\n [6] => Mobile
request: 0\n [7] => Page ID: servers\n [8] => LOGGED IN\n [9]
=> Deleting cookie: name: hm_msgs, lifetime: 1538063050, path:
/webmail/, domain: www.XXXXXXX.com, secure: 1, html_only 1\n [10] =>
TRANSLATION NOT FOUND :Could not find an RSS or ATOM feed at that
address:\n [11] => TRANSLATION NOT FOUND :Office365:\n [12] =>
TRANSLATION NOT FOUND :STARTTLS or unencrypted:\n [13] => TRANSLATION
NOT FOUND :STARTTLS or unencrypted:\n [14] => TRANSLATION NOT FOUND
:STARTTLS or unencrypted:\n [15] => PHP version
7.1.20-1+0
20180910100430.3+jessie1.gbp17c613\n [16] => Zend version3.1.0\n [17] => Peak Memory: 2048\n [18] => PID: 20177\n [19]
=> Included files: 69\n)\n, referer:
https://www.XXXXXXXX.com/webmail/?page=servers
On 27/09/18 18:39, Jason Munro wrote:
@jasonmunro commented on GitHub (Sep 27, 2018):
This is interesting, looks we are getting the xml from the feed but can't parse it. Thanks, this helps! Still can't explain why it's working here (yet), but it's a clue :)
@ehanuise commented on GitHub (Sep 27, 2018):
https://stackoverflow.com/questions/11107592/xml-error-parsing-soap-payload-reserved-xml-name/15604229
I notice the 'lalibre' feed has no whitespace before the <?xml statement
and the 'Le Soir' has.
I can't get them to change it of course, and it works with other feed
readers, so maybe this is the root cause ?
Hete's another feed that works OK elsewhere, doesn't work in cypht, and
has no whitespace before <?xml
http://www.bitcoin.fr/feed/rss2
On 27/09/18 18:55, Jason Munro wrote:
@jasonmunro commented on GitHub (Sep 27, 2018):
do you have php curl installed?
@jasonmunro commented on GitHub (Sep 27, 2018):
looks like we try to use curl if it's installed, otherwise we fall back to file_get_contents(). If you don't have curl maybe this is the issue.
@ehanuise commented on GitHub (Sep 27, 2018):
it's installed : php7.1-curl 7.1.20-1+0
20180910100430.3+jessie1.gbp17c613On 27/09/18 19:07, Jason Munro wrote:
@jasonmunro commented on GitHub (Sep 27, 2018):
darn, thought we might have a hit on that one :) I saw the leading white-space issues from googling the error as well, but that still does not explain why it works fine here :/
@ehanuise commented on GitHub (Sep 27, 2018):
Sorry can't help much more at this point :)
Maybe a change between different PHP versions ?
I'll email you privately a copy of phpinfo();
On 27/09/18 19:11, Jason Munro wrote:
@jasonmunro commented on GitHub (Sep 27, 2018):
It's looking like this is not a bug in Cypht, however we could do a few things better to figure out issues like this:
@jasonmunro commented on GitHub (Sep 27, 2018):
better debugging added in
github.com/jasonmunro/cypht@354536bf13@ehanuise I will leave this open for a while in case you run into further issues!@dumblob commented on GitHub (Sep 27, 2018):
To cover these cases when the RSS feed (the XML) is invalid, we could switch from an XML parser to an HTML parser which is way more tolerant to any mistakes. Anything which uses libxml2 in its core shall be able to use the builtin
HTMLparserAPI which parses HTML 4.0.I'm not sure though how difficult this switch would be and how high priority it has (make the RSS parser more tolerant).
@jasonmunro commented on GitHub (Sep 27, 2018):
@dumblob not a bad idea. For the record this was not a badly formatted feed, but 403 permission denied response with a small HTML payload. I track about 12 feeds and don't recall seeing any badly formatted XML over the last few yars (though maybe I just have not noticed, and YMMV since that is not a very wide sample size).
For now I think the additional debugging will shine some light on potentially problematic feeds, and if we decide to use something more forgiving as a fallback for bad formatting we can look more closely into it.
@ehanuise commented on GitHub (Sep 28, 2018):
OK, I digged a bit further.
On http://www.lalibre.be/rss.xml or any other part of that site, I get a varnish 403 error. Lokks like a problem on their end with my server and fixed IP - I contacted them to investigate.
I also tried http://www.bitcoin.fr/feed This one is more interesting for our purposes here : when I try to open it in lynx or w3m from the server, it receives an html files and offers to download it. The file is in fact the correctly formed RSS XML feed.
In other rss readers this gets processed OK, but cypht misses that and can't open the feed.
@ehanuise commented on GitHub (Sep 28, 2018):
Might be a CURL referrer issue :
https://unix.stackexchange.com/questions/139698/why-would-curl-and-wget-result-in-a-403-forbidden
https://stackoverflow.com/questions/26173689/curl-not-able-to-download-image-file-from-server-running-varnish-cache
@jasonmunro commented on GitHub (Sep 28, 2018):
Reproduced and fixed in
github.com/jasonmunro/cypht@e7489ed1caThe issue was we were not following HTTP redirects, which we should :) @ehanuise you can get this fix and all the additional debugging I have added recently by downloading and copying in this file:https://raw.githubusercontent.com/jasonmunro/cypht/master/modules/feeds/hm-feed.php
Thanks for the great feedback on feeds - already several great improvements thanks to your reports!
@ehanuise commented on GitHub (Sep 29, 2018):
Thanks. The http://www.bitcoin.fr/feed feed now works, looks all good so far :)
Will add other feeds and report issues that aren't varnish 403-tied.
I added 50 feeds, so I see what's it like with a loaded set of feeds. I created a improvement ticket for feeds UI with some suggestions ;-)
Only the lalibre feed still eludes me - I'll use feedburner to bypass that 403 issue.
@jasonmunro commented on GitHub (Oct 17, 2018):
@ehanuise I'm going to close this since it think all issues in this thread are resolved. If not, please feel free to open a new issue around any specific problem remaining. Thanks for the feedback!