[GH-ISSUE #751] GelfMessageFormatter doesnt truncate large data #285

Closed
opened 2026-03-04 02:13:48 +03:00 by kerem · 3 comments
Owner

Originally created by @rubao on GitHub (Mar 23, 2016).
Original GitHub issue: https://github.com/Seldaek/monolog/issues/751

monolog/monolog/src/Monolog/Formatter/GelfMessageFormatter.php only convert toJson() when is not a scalar value.

but in case of big exceptions with many previous exception, we have a problem in graylog, because it's uses elasticsearch to store the data and elasticsearch accept max 32766 bytes.
see https://issues.apache.org/jira/browse/LUCENE-5472

in graylog we got some indexer failures:
IllegalArgumentException[Document contains at least one immense term in field="ctxt_exception" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[123, 34, 99, 108, 97, 115, 115, 34, 58, 34, 69, 120, 99, 101, 112, 116, 105, 111, 110, 34, 44, 34, 109, 101, 115, 115, 97, 103, 101, 34]...', original message: bytes can be at most 32766 in length; got 33656]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 33656];

and the message is not logged in graylog.

failing test:

public function testFormatWithLargeData() {
    $formatter = new GelfMessageFormatter();
    $record = array(
        'level' => Logger::ERROR,
        'level_name' => 'ERROR',
        'channel' => 'meh',
        'context' => array('exception' => str_repeat(' ', 32767)),
        'datetime' => new \DateTime("@0"),
        'extra' => array('key' => str_repeat(' ', 32767)),
        'message' => 'log'
    );
    $message = $formatter->format($record);
    $messageArray = $message->toArray();
    $this->assertLessThanOrEqual(32766, strlen($messageArray['_key']));
    $this->assertLessThanOrEqual(32766, strlen($messageArray['_ctxt_exception']));
}
Originally created by @rubao on GitHub (Mar 23, 2016). Original GitHub issue: https://github.com/Seldaek/monolog/issues/751 monolog/monolog/src/Monolog/Formatter/GelfMessageFormatter.php only convert toJson() when is not a scalar value. but in case of big exceptions with many previous exception, we have a problem in graylog, because it's uses elasticsearch to store the data and elasticsearch accept max 32766 bytes. see https://issues.apache.org/jira/browse/LUCENE-5472 in graylog we got some indexer failures: ` IllegalArgumentException[Document contains at least one immense term in field="ctxt_exception" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[123, 34, 99, 108, 97, 115, 115, 34, 58, 34, 69, 120, 99, 101, 112, 116, 105, 111, 110, 34, 44, 34, 109, 101, 115, 115, 97, 103, 101, 34]...', original message: bytes can be at most 32766 in length; got 33656]; nested: MaxBytesLengthExceededException[bytes can be at most 32766 in length; got 33656]; ` and the message is not logged in graylog. failing test: ``` php public function testFormatWithLargeData() { $formatter = new GelfMessageFormatter(); $record = array( 'level' => Logger::ERROR, 'level_name' => 'ERROR', 'channel' => 'meh', 'context' => array('exception' => str_repeat(' ', 32767)), 'datetime' => new \DateTime("@0"), 'extra' => array('key' => str_repeat(' ', 32767)), 'message' => 'log' ); $message = $formatter->format($record); $messageArray = $message->toArray(); $this->assertLessThanOrEqual(32766, strlen($messageArray['_key'])); $this->assertLessThanOrEqual(32766, strlen($messageArray['_ctxt_exception'])); } ```
kerem closed this issue 2026-03-04 02:13:48 +03:00
Author
Owner

@Tobion commented on GitHub (May 19, 2016):

@Seldaek I don't think it's the task of Monolog to handle this but must be fixed at the elasticsearch level of Graylog. And they have a ticket for that: https://github.com/Graylog2/graylog2-server/issues/873

Also the solution implemented here looks really strange. The 32766 maximum is per token. And the tokenization depends on the ES configuration per field. But the implemented seems to be doing something based on the total length. And even if it's done per field Monolog would make strange assumptions on the tokenization of the field, which it cannot.

<!-- gh-comment-id:220347642 --> @Tobion commented on GitHub (May 19, 2016): @Seldaek I don't think it's the task of Monolog to handle this but must be fixed at the elasticsearch level of Graylog. And they have a ticket for that: https://github.com/Graylog2/graylog2-server/issues/873 Also the solution implemented here looks really strange. The 32766 maximum is per **token**. And the tokenization depends on the ES configuration per field. But the implemented seems to be doing something based on the total length. And even if it's done per field Monolog would make strange assumptions on the tokenization of the field, which it cannot.
Author
Owner

@enleur commented on GitHub (Apr 7, 2017):

Can we revert this or at least make it more configurable?

<!-- gh-comment-id:292492683 --> @enleur commented on GitHub (Apr 7, 2017): Can we revert this or at least make it more configurable?
Author
Owner

@Seldaek commented on GitHub (Apr 7, 2017):

I stand by my last comment on the commit github.com/Seldaek/monolog@6bc1a444db (commitcomment-17544637) - if someone can improve this and actually uses gelf, please send a PR (to 1.x branch!) :)

<!-- gh-comment-id:292493145 --> @Seldaek commented on GitHub (Apr 7, 2017): I stand by my last comment on the commit https://github.com/Seldaek/monolog/commit/6bc1a444dbd287a0d88278fe0ed105f2b93263ca#commitcomment-17544637 - if someone can improve this and actually uses gelf, please send a PR (to 1.x branch!) :)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/monolog#285
No description provided.