mirror of
https://github.com/Seldaek/monolog.git
synced 2026-04-27 00:25:50 +03:00
[GH-ISSUE #545] No error on non UTF8 input in NormalizerFormatter #190
Labels
No labels
Bug
Documentation
Feature
Needs Work
Support
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/monolog#190
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @nicolashohm on GitHub (Apr 20, 2015).
Original GitHub issue: https://github.com/Seldaek/monolog/issues/545
I'm using the
LogstashFormatter.When I log non UTF8 stuff theNormalizerFormatter::toJsonreturns false which cause finally just an \n in the log and no error message at all. Is that expected? Would it perhaps not be better to throw an error intoJson()?@bd808 commented on GitHub (Nov 10, 2015):
This decision is causing fatal runtime errors for Wikipedia: https://phabricator.wikimedia.org/T118057
The choice to raise a RuntimeException for a logging failure seems to be a bit short sighted. @Seldaek would you be open to some patch to make this configurable behavior (preferably with the default being no errors)?
@Seldaek commented on GitHub (Nov 10, 2015):
Reopening while we discuss. Not sure I see it the same way you do though. I see that it sucks if user input randomly blows up the logging, but before you would just lose those logs silently as well which isn't ideal.
As a quick fix for now (and I'd argue it might be a good thing to do in any case in a context such a wikipedia, you can wrap all your handlers within a WhatFailureGroupHandler, it will ignore any exception thrown by child handlers, which should get you back to where you were before (dropping log records.. but not failing hard).
There are many handlers that can fail while logging for various reasons (backend down and such) and therefore I think being strict rather than hiding errors by default there is a good thing.
@bd808 commented on GitHub (Nov 10, 2015):
We've gone to pretty great lengths to keep logging failures from causing application failures in our configuration, but I understand that others may have different goals. I like the idea of not silently losing events but again I think the Exception route is a bit heavy handed.
It seems that you at least at some level agree with this as can be seen in
github.com/Seldaek/monolog@63e3bfdf7ewhere you introduced the$ignoreErrorsflag toNormalizerFormatter::toJson.For my use case the frequency of having non-UTF-8 data in the log record is pretty low. We are mostly seeing it creep in from user-agent strings and referrer URLs which are sometimes embedded in the log record. My first thought was to create a Filter that did UTF-8 normalization of every record, but I realized that detecting encoding of strings is error prone and unnecessary most of the time. I've got a mostly working class subclass of LogstashFormatter put together that tries to recover the event by scrubbing the encoding of the string elements. It isn't passing our local PHP5.3 tests yet which I need to spend more time looking at.
If I can get some error recovery by sanitizing encoding method working at least semi-reliably would you be interested in having it upstreamed? It would probably be nicer to put it into
NormalizerFormatter::toJsonthan to only have it benefit LogstashFormatter.@Seldaek commented on GitHub (Nov 11, 2015):
Yeah I think that would be nice. Avoiding failures is of course better than
failing hard which is better than failing silently IMO.
Would you have a chance to store a few of those bad inputs (maybe using
serialize+base64encode so we have strings that aren't charset dependent and
can be shared around)? It would be good to have for the test suite and for
trying to find a workaround. I'm thinking doing iconv from iso-8859-15 to
UTF-8//IGNORE might help to get rid of those invalid sequences and preserve
data in most cases.
@bd808 commented on GitHub (Nov 11, 2015):
This may not be the sexiest conversion function, but it seems to be fairly fast and capable of making any random binary input strings I throw at it safe for json_encode():
It also doesn't require any special support in the PHP install other than PCRE UTF-8 support which I think is fairly common these days.
There are quite a few UTF-8 related libraries on Packagist as well that could be used instead of something simple like this including Wikimedia's own utfnormal library.
I'll see if I can come up with a good way to gather some real error data and make it clean enough to share as a test set. For my initial testing I've just been using
openssl_random_pseudo_bytesto generate random binary strings.