[GH-ISSUE #1631] Why might NormalizerFormatter be logging malformed JSON? #694

Closed
opened 2026-03-04 02:17:07 +03:00 by kerem · 4 comments
Owner

Originally created by @garethellis36 on GitHub (Jan 28, 2022).
Original GitHub issue: https://github.com/Seldaek/monolog/issues/1631

Monolog version 1.24.0 on PHP 8, running on Windows

We've just started logging the peak memory usage of all web requests and CLI runs in our application so that we can do some focused performance refactoring.

To do this we've added a shutdown handler which invokes a Monolog Logger instance and records JSON in a file. That file is monitored by Filebeat, which ships each new line to Logstash, which then attempts to parse the line as JSON and store it in ElasticSearch.

I had a report from my colleague in the ELK team that our records were causing some errors in Logstash. On investigation I found that I had a few records (only a small percentage) which had malformed JSON in the log file on disk. For example, yesterday's log file has 220,384 lines, but there are 220,336 records in ElasticSearch (48 missing).

Example malformed JSON:

ct_tasks_steps/ajax_save_qtp/665613","http_method":"PUT"}}

The logger is created as follows:

                $logger = new Logger("memory_usage");

                $logger->pushProcessor(new MemoryPeakUsageProcessor(realUsage: true));

                $handler = new RotatingFileHandler($logFilesDirectory . "memory_usage.log", 10, MonologLogger::INFO);
                $handler->setFormatter(new MemoryUsageFormatter());

                $logger->pushHandler($handler);

                if (!environment()->isCLI()) {
                    $logger->pushProcessor($container[WebProcessor::class]);

                    // Logs the current route in template form, i.e. `/projects/view/:id`
                    $logger->pushProcessor(function (array $record) {
                        $route = Router::parse(Router::url());

                        $controller = $route['controller'] ?? 'unknown_controller';
                        $action = $route['action'] ?? 'unknown_action';
                        $record['extra']['controller_action'] = "/{$controller}/{$action}";

                        return $record;
                    });
                } else {
                    $logger->pushProcessor($container[CommandLineInterfaceArgumentsProcessor::class]);
                }

                return $logger;

MemoryUsageFormatter looks like this:

<?php declare(strict_types=1);

namespace App\Logging\Formatter;

use App\DateTime\DateTime;
use DateTimeZone;
use Monolog\Formatter\NormalizerFormatter;

class MemoryUsageFormatter extends NormalizerFormatter
{
    /**
     * Formats message for storage in ELK
     * Uses ECS with bare-minimum fields to keep file size as small as possible
     */
    public function format(array $record)
    {
        /** @var array $record */
        $record = parent::format($record);

        $message = [
            '@timestamp' => DateTime::createFromMysqlDateTime($record['datetime'], new DateTimeZone("UTC")),
            'ecs' => [
                'version' => '1.7',
            ],
            'message' => "logged_memory_usage", // an easily searchable string

            // deliberately title-cased to distinguish from ECS-mapped fields
            'MemoryUsage' => [
                'bytes' => (int)$record['message'],
                'formatted' => $record['extra']['memory_peak_usage'],
            ],
        ];

        if (isset($record['extra']['controller_action'])) {
            $message['MemoryUsage']['controller_action'] = $record['extra']['controller_action'];
        }

        if (isset($record['extra']['command'])) {
            $message['MemoryUsage']['command_line'] = $record['extra']['command'];
        }

        if (isset($record['extra']['url'])) {
            $message['MemoryUsage']['url'] = $record['extra']['url'];
        }

        if (isset($record['extra']['http_method'])) {
            $message['MemoryUsage']['http_method'] = $record['extra']['http_method'];
        }

        if (isset($record['context']['job'])) {
            $message['MemoryUsage']['job_uuid'] = $record['context']['job']['uuid'];
            $message['MemoryUsage']['job_class'] = $record['context']['job']['class'];
        }

        return $this->toJson($message) . "\n";
    }
}

I'm not really sure where to begin debugging this. Have you ever seen anything like this before? The example malformed message has truncated the JSON in the middle of the MemoryUsage.url property. /projec has been omitted from the beginning of the URL.

Originally created by @garethellis36 on GitHub (Jan 28, 2022). Original GitHub issue: https://github.com/Seldaek/monolog/issues/1631 Monolog version `1.24.0` on PHP 8, running on Windows We've just started logging the peak memory usage of all web requests and CLI runs in our application so that we can do some focused performance refactoring. To do this we've added a shutdown handler which invokes a Monolog `Logger` instance and records JSON in a file. That file is monitored by Filebeat, which ships each new line to Logstash, which then attempts to parse the line as JSON and store it in ElasticSearch. I had a report from my colleague in the ELK team that our records were causing some errors in Logstash. On investigation I found that I had a few records (only a small percentage) which had malformed JSON in the log file on disk. For example, yesterday's log file has 220,384 lines, but there are 220,336 records in ElasticSearch (48 missing). Example malformed JSON: ``` ct_tasks_steps/ajax_save_qtp/665613","http_method":"PUT"}} ``` The logger is created as follows: ``` $logger = new Logger("memory_usage"); $logger->pushProcessor(new MemoryPeakUsageProcessor(realUsage: true)); $handler = new RotatingFileHandler($logFilesDirectory . "memory_usage.log", 10, MonologLogger::INFO); $handler->setFormatter(new MemoryUsageFormatter()); $logger->pushHandler($handler); if (!environment()->isCLI()) { $logger->pushProcessor($container[WebProcessor::class]); // Logs the current route in template form, i.e. `/projects/view/:id` $logger->pushProcessor(function (array $record) { $route = Router::parse(Router::url()); $controller = $route['controller'] ?? 'unknown_controller'; $action = $route['action'] ?? 'unknown_action'; $record['extra']['controller_action'] = "/{$controller}/{$action}"; return $record; }); } else { $logger->pushProcessor($container[CommandLineInterfaceArgumentsProcessor::class]); } return $logger; ``` `MemoryUsageFormatter` looks like this: ``` <?php declare(strict_types=1); namespace App\Logging\Formatter; use App\DateTime\DateTime; use DateTimeZone; use Monolog\Formatter\NormalizerFormatter; class MemoryUsageFormatter extends NormalizerFormatter { /** * Formats message for storage in ELK * Uses ECS with bare-minimum fields to keep file size as small as possible */ public function format(array $record) { /** @var array $record */ $record = parent::format($record); $message = [ '@timestamp' => DateTime::createFromMysqlDateTime($record['datetime'], new DateTimeZone("UTC")), 'ecs' => [ 'version' => '1.7', ], 'message' => "logged_memory_usage", // an easily searchable string // deliberately title-cased to distinguish from ECS-mapped fields 'MemoryUsage' => [ 'bytes' => (int)$record['message'], 'formatted' => $record['extra']['memory_peak_usage'], ], ]; if (isset($record['extra']['controller_action'])) { $message['MemoryUsage']['controller_action'] = $record['extra']['controller_action']; } if (isset($record['extra']['command'])) { $message['MemoryUsage']['command_line'] = $record['extra']['command']; } if (isset($record['extra']['url'])) { $message['MemoryUsage']['url'] = $record['extra']['url']; } if (isset($record['extra']['http_method'])) { $message['MemoryUsage']['http_method'] = $record['extra']['http_method']; } if (isset($record['context']['job'])) { $message['MemoryUsage']['job_uuid'] = $record['context']['job']['uuid']; $message['MemoryUsage']['job_class'] = $record['context']['job']['class']; } return $this->toJson($message) . "\n"; } } ``` I'm not really sure where to begin debugging this. Have you ever seen anything like this before? The example malformed message has truncated the JSON in the middle of the `MemoryUsage.url` property. `/projec` has been omitted from the beginning of the URL.
kerem 2026-03-04 02:17:07 +03:00
  • closed this issue
  • added the
    Support
    label
Author
Owner

@mfn commented on GitHub (Jan 29, 2022):

wild guess: multiple processes are writing in a non-atomic way to the same file and thus sometimes corrupting it?

<!-- gh-comment-id:1024888265 --> @mfn commented on GitHub (Jan 29, 2022): wild guess: multiple processes are writing in a non-atomic way to the same file and thus sometimes corrupting it?
Author
Owner

@garethellis36 commented on GitHub (Jan 29, 2022):

I wondered that, but Windows uses exclusive file locking, so it shouldn’t even be possible for a process to write to a file until locks made by other processes have been released.

<!-- gh-comment-id:1024934507 --> @garethellis36 commented on GitHub (Jan 29, 2022): I wondered that, but Windows uses exclusive file locking, so it shouldn’t even be possible for a process to write to a file until locks made by other processes have been released.
Author
Owner

@Seldaek commented on GitHub (Mar 13, 2022):

I'd say try to upgrade to latest Monolog, 2.3 fixed some issues there with highly concurrent writes by increasing the stream chunk size so that logs are flushed at once to the file instead of being written in smaller chunks which perhaps can result in races like you're seeing.

If that doesn't fix it I am not sure what else to do sorry.

<!-- gh-comment-id:1066170643 --> @Seldaek commented on GitHub (Mar 13, 2022): I'd say try to upgrade to latest Monolog, 2.3 fixed some issues there with highly concurrent writes by increasing the stream chunk size so that logs are flushed at once to the file instead of being written in smaller chunks which perhaps can result in races like you're seeing. If that doesn't fix it I am not sure what else to do sorry.
Author
Owner

@garethellis36 commented on GitHub (Mar 14, 2022):

Thanks, I'll give that a go. It's on my TODO list anyway :)

<!-- gh-comment-id:1066975946 --> @garethellis36 commented on GitHub (Mar 14, 2022): Thanks, I'll give that a go. It's on my TODO list anyway :)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/monolog#694
No description provided.