[GH-ISSUE #1816] [Bug]: 🐛 Due to the music id3 information coding, the music information obtained is garbled. #1005

Open
opened 2026-02-26 02:34:55 +03:00 by kerem · 6 comments
Owner

Originally created by @PBK-B on GitHub (Aug 31, 2024).
Original GitHub issue: https://github.com/koel/koel/issues/1816

Originally assigned to: @phanan on GitHub.

Read the Troubleshooting guide.

  • I have read and followed the Troubleshooting guide

Reproduction steps

  1. Download id3_gb2312_encoding.tar.gz decompress to the music folder path to be scanned
  2. Execute php artisan koel:scan -F -n scan music
  3. Check the music list and the following garbled information is displayed.
img

Expected behavior

users expect accurate music information.

Actual behavior

because id3 is encoded by GB2312 music information, resulting in the display of garbled code.

Logs

null

Koel version

v7.0.11

How did you install Koel?

Compiled from source

Additional information

I found that people from different countries have encountered the same problems, such as https://github.com/koel/koel/issues/646

Users' manual conversion of music id3 encoding may indeed be one of the solutions, but I wonder if we can have a more friendly solution?

I tried to check the upstream https://github.com/JamesHeinrich/getID3 code and understand the logic code related to koel scanning music. At present, I can only say that I have made a simple adaptation to the GB2312 encoding ( WIP I will submit this part of the code in my code warehouse later ). Make it work properly, but in fact, I originally intended to make it support more coding methods to benefit players all over the world.

Please forgive me for knowing too little about coding, so I want to try to ask the big guy if he has any better ideas or ideas so that we can continue to work on this matter.

Finally, I would like to thank @phanan for its continuous update and maintenance work. It is you who make this thing (the world) interesting.

Originally created by @PBK-B on GitHub (Aug 31, 2024). Original GitHub issue: https://github.com/koel/koel/issues/1816 Originally assigned to: @phanan on GitHub. ### Read the Troubleshooting guide. - [X] I have read and followed the Troubleshooting guide ### Reproduction steps 1. Download [id3_gb2312_encoding.tar.gz](https://github.com/user-attachments/files/16824057/id3_gb2312_encoding.tar.gz) decompress to the music folder path to be scanned 2. Execute `php artisan koel:scan -F -n` scan music 3. Check the music list and the following garbled information is displayed. <img width="1394" alt="img" src="https://github.com/user-attachments/assets/21bfc48c-c8d4-47e9-a1b7-3c71f2e7c270"> ### Expected behavior users expect accurate music information. ### Actual behavior because id3 is encoded by `GB2312` music information, resulting in the display of garbled code. ### Logs null ### Koel version v7.0.11 ### How did you install Koel? Compiled from source ### Additional information I found that people from different countries have encountered the same problems, such as https://github.com/koel/koel/issues/646 Users' manual conversion of music id3 encoding may indeed be one of the solutions, but I wonder if we can have a more friendly solution? I tried to check the upstream https://github.com/JamesHeinrich/getID3 code and understand the logic code related to koel scanning music. At present, I can only say that I have made a simple adaptation to the `GB2312` encoding ( WIP I will submit this part of the code in my code warehouse later ). Make it work properly, but in fact, I originally intended to make it support more coding methods to benefit players all over the world. Please forgive me for knowing too little about coding, so I want to try to ask the big guy if he has any better ideas or ideas so that we can continue to work on this matter. Finally, I would like to thank @phanan for its continuous update and maintenance work. It is you who make this thing (the world) interesting.
Author
Owner

@phanan commented on GitHub (Aug 31, 2024):

Indeed, text encoding is always a headache to deal with, partly because there are so many languages and edge cases. I think Koel attempts to detect the encoding and converts the tags to UTF-8, but I may be wrong (AFK right now).

<!-- gh-comment-id:2322849728 --> @phanan commented on GitHub (Aug 31, 2024): Indeed, text encoding is always a headache to deal with, partly because there are so many languages and edge cases. I think Koel attempts to detect the encoding and converts the tags to UTF-8, but I may be wrong (AFK right now).
Author
Owner

@PBK-B commented on GitHub (Aug 31, 2024):

Indeed, text encoding is always a headache to deal with, partly because there are so many languages and edge cases. I think Koel attempts to detect the encoding and converts the tags to UTF-8, but I may be wrong (AFK right now).

@phanan Yes, I have noticed this problem. The current default parameter of the encoding tag returned by getID3 is always ISO-8859-1 (code location <https://github.com/JamesHeinrich/getID3/blob/master/geti d3/getid3.php#L96>), I also tried to configure $encoding_id3v1_autodetect = true in this location https://github.com/JamesHeinric h/getID3/blob/master/getid3/getid3.php#L103 But the result returned in the music file (id3_gb2312_encoding.tar.gz) above is Windows-1251 (in fact, the correct result should be GB2312 or EUC-CN)

Yesterday, I tried it Arr::get($info, 'id3v2.title', null) [code#72](github.com/PBK-B/koel@f2c6bc6a 98561dffcb5290c98127fc9dc72f94cd/app/Values/SongScanInformation.php#L72) get string bytes use mb_detect_ The result obtained by encoding($title, mb_list_encodings(), false) is GB18030. But I'm not sure if it's applicable to other coding sets (maybe we can collect some music files of other coding set information for testing?). The test code is roughly as follows:

…
    public static function fromGetId3Info(array $info, string $path): self
    {
        // dealing with GB2312 character encoding problems
        $raw_tags = array_merge(
            Arr::get($info, 'id3v1', []),
            Arr::get($info, 'id3v2', []),
            Arr::get($comments, 'id3v2', [])
        );
        Log::debug(var_export($raw_tags['title'], true));
        Log::debug(var_export(mb_detect_encoding($raw_tags['title'], mb_list_encodings(), true), true));
    }
…

I'm wondering if we should send an issue upstream.

References

https://www.php.net/manual/zh/function.mb-detect-encoding.php
https://www.php.net/manual/en/function.mb-list-encodings.php

<!-- gh-comment-id:2322861129 --> @PBK-B commented on GitHub (Aug 31, 2024): > Indeed, text encoding is always a headache to deal with, partly because there are so many languages and edge cases. I think Koel attempts to detect the encoding and converts the tags to UTF-8, but I may be wrong (AFK right now). @phanan Yes, I have noticed this problem. The current default parameter of the encoding tag returned by getID3 is always `ISO-8859-1` (code location <https://github.com/JamesHeinrich/getID3/blob/master/geti d3/getid3.php#L96>), I also tried to configure `$encoding_id3v1_autodetect = true` in this location https://github.com/JamesHeinric h/getID3/blob/master/getid3/getid3.php#L103 But the result returned in the music file (id3_gb2312_encoding.tar.gz) above is `Windows-1251` (in fact, the correct result should be `GB2312` or `EUC-CN`) Yesterday, I tried it `Arr::get($info, 'id3v2.title', null)` [code#72](https://github.com/PBK-B/koel/blob/f2c6bc6a 98561dffcb5290c98127fc9dc72f94cd/app/Values/SongScanInformation.php#L72) get string bytes use `mb_detect_ The result obtained by encoding($title, mb_list_encodings(), false)` is `GB18030`. But I'm not sure if it's applicable to other coding sets (maybe we can collect some music files of other coding set information for testing?). The test code is roughly as follows: ``` … public static function fromGetId3Info(array $info, string $path): self { // dealing with GB2312 character encoding problems $raw_tags = array_merge( Arr::get($info, 'id3v1', []), Arr::get($info, 'id3v2', []), Arr::get($comments, 'id3v2', []) ); Log::debug(var_export($raw_tags['title'], true)); Log::debug(var_export(mb_detect_encoding($raw_tags['title'], mb_list_encodings(), true), true)); } … ``` I'm wondering if we should send an issue upstream. ## References <https://www.php.net/manual/zh/function.mb-detect-encoding.php> <https://www.php.net/manual/en/function.mb-list-encodings.php>
Author
Owner

@phanan commented on GitHub (Aug 31, 2024):

What we can do without having to rely on getID3 is to check the encoding
ourselves using PHP’s encoding detection functions (gotta admit I’m not
sure how getID3 does on its side) and do the conversion when necessary.

On Sat, Aug 31, 2024 at 12:59 PBK Bin @.***> wrote:

Indeed, text encoding is always a headache to deal with, partly because
there are so many languages and edge cases. I think Koel attempts to detect
the encoding and converts the tags to UTF-8, but I may be wrong (AFK right
now).

@phanan https://github.com/phanan Yes, I have noticed this problem. The
current default parameter of the encoding tag returned by getID3 is always
ISO-8859-1 (code location <
https://github.com/JamesHeinrich/getID3/blob/master/geti
d3/getid3.php#L96>), I also tried to configure $encoding_id3v1_autodetect
= true in this location https://github.com/JamesHeinric
h/getID3/blob/master/getid3/getid3.php#L103 But the result returned in the
music file (id3_gb2312_encoding.tar.gz) above is Windows-1251 (in fact,
the correct result should be GB2312 or EUC-CN)

Yesterday, I tried it Arr::get($info, 'id3v2.title', null) [code#72](
github.com/PBK-B/koel@f2c6bc6a
98561dffcb5290c98127fc9dc72f94cd/app/Values/SongScanInformation.php#L72)
get string bytes use mb_detect_ The result obtained by encoding($title,
mb_list_encodings(), false) is GB18030. But I'm not sure if it's
applicable to other coding sets (maybe we can collect some music files of
other coding set information for testing?). The test code is roughly as
follows:


public static function fromGetId3Info(array $info, string $path): self
{
// dealing with GB2312 character encoding problems
$raw_tags = array_merge(
Arr::get($info, 'id3v1', []),
Arr::get($info, 'id3v2', []),
Arr::get($comments, 'id3v2', [])
);
Log::debug(var_export($raw_tags['title'], true));
Log::debug(var_export(mb_detect_encoding($raw_tags['title'], mb_list_encodings(), true), true));
}

I'm wondering if we should send an issue upstream.
References

https://www.php.net/manual/zh/function.mb-detect-encoding.php
https://www.php.net/manual/en/function.mb-list-encodings.php


Reply to this email directly, view it on GitHub
https://github.com/koel/koel/issues/1816#issuecomment-2322861129, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5O3UTTQ64H5G7CKXHM433ZUGOYPAVCNFSM6AAAAABNNWWNDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSHA3DCMJSHE
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2322862671 --> @phanan commented on GitHub (Aug 31, 2024): What we can do without having to rely on getID3 is to check the encoding ourselves using PHP’s encoding detection functions (gotta admit I’m not sure how getID3 does on its side) and do the conversion when necessary. On Sat, Aug 31, 2024 at 12:59 PBK Bin ***@***.***> wrote: > Indeed, text encoding is always a headache to deal with, partly because > there are so many languages and edge cases. I think Koel attempts to detect > the encoding and converts the tags to UTF-8, but I may be wrong (AFK right > now). > > @phanan <https://github.com/phanan> Yes, I have noticed this problem. The > current default parameter of the encoding tag returned by getID3 is always > ISO-8859-1 (code location < > https://github.com/JamesHeinrich/getID3/blob/master/geti > d3/getid3.php#L96>), I also tried to configure $encoding_id3v1_autodetect > = true in this location https://github.com/JamesHeinric > h/getID3/blob/master/getid3/getid3.php#L103 But the result returned in the > music file (id3_gb2312_encoding.tar.gz) above is Windows-1251 (in fact, > the correct result should be GB2312 or EUC-CN) > > Yesterday, I tried it Arr::get($info, 'id3v2.title', null) [code#72]( > https://github.com/PBK-B/koel/blob/f2c6bc6a > 98561dffcb5290c98127fc9dc72f94cd/app/Values/SongScanInformation.php#L72) > get string bytes use mb_detect_ The result obtained by encoding($title, > mb_list_encodings(), false) is GB18030. But I'm not sure if it's > applicable to other coding sets (maybe we can collect some music files of > other coding set information for testing?). The test code is roughly as > follows: > > … > public static function fromGetId3Info(array $info, string $path): self > { > // dealing with GB2312 character encoding problems > $raw_tags = array_merge( > Arr::get($info, 'id3v1', []), > Arr::get($info, 'id3v2', []), > Arr::get($comments, 'id3v2', []) > ); > Log::debug(var_export($raw_tags['title'], true)); > Log::debug(var_export(mb_detect_encoding($raw_tags['title'], mb_list_encodings(), true), true)); > } > … > > I'm wondering if we should send an issue upstream. > References > > https://www.php.net/manual/zh/function.mb-detect-encoding.php > https://www.php.net/manual/en/function.mb-list-encodings.php > > — > Reply to this email directly, view it on GitHub > <https://github.com/koel/koel/issues/1816#issuecomment-2322861129>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AB5O3UTTQ64H5G7CKXHM433ZUGOYPAVCNFSM6AAAAABNNWWNDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSHA3DCMJSHE> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@PBK-B commented on GitHub (Aug 31, 2024):

Ok, maybe there is something wrong with what I described. In the koel code, the parameter array $info of the fromGetId3Info function comes from $this->getID3->analyze($this->filePath); the code is here app/Services/FileScanner.php#L60

you do your own thing first. When you have time, we can take a look at the logic of this piece together. It's a pleasure to cooperate with you.

<!-- gh-comment-id:2322866410 --> @PBK-B commented on GitHub (Aug 31, 2024): Ok, maybe there is something wrong with what I described. In the koel code, the parameter `array $info` of the `fromGetId3Info` function comes from `$this->getID3->analyze($this->filePath);` the code is here [app/Services/FileScanner.php#L60](https://github.com/koel/koel/blob/master/app/Services/FileScanner.php#L60) you do your own thing first. When you have time, we can take a look at the logic of this piece together. It's a pleasure to cooperate with you.
Author
Owner

@phanan commented on GitHub (Aug 31, 2024):

I mean Koel uses getID3 to retrieve the tags, but we can take one further step when
it comes to encoding detection and conversion.

On Sat, Aug 31, 2024 at 13:20 PBK Bin @.***> wrote:

Ok, maybe there is something wrong with what I described. In the koel
code, the parameter array $info of the fromGetId3Info function comes from
$this->getID3->analyze($this->filePath); the code is here
app/Services/FileScanner.php#L60
https://github.com/koel/koel/blob/master/app/Services/FileScanner.php#L60

you do your own thing first. When you have time, we can take a look at the
logic of this piece together. It's a pleasure to cooperate with you.


Reply to this email directly, view it on GitHub
https://github.com/koel/koel/issues/1816#issuecomment-2322866410, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5O3URE44OPYHGQEW6AZXDZUGRIPAVCNFSM6AAAAABNNWWNDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSHA3DMNBRGA
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:2322868305 --> @phanan commented on GitHub (Aug 31, 2024): I mean Koel uses getID3 to retrieve the tags, but we can take one further step when it comes to encoding detection and conversion. On Sat, Aug 31, 2024 at 13:20 PBK Bin ***@***.***> wrote: > Ok, maybe there is something wrong with what I described. In the koel > code, the parameter array $info of the fromGetId3Info function comes from > $this->getID3->analyze($this->filePath); the code is here > app/Services/FileScanner.php#L60 > <https://github.com/koel/koel/blob/master/app/Services/FileScanner.php#L60> > > you do your own thing first. When you have time, we can take a look at the > logic of this piece together. It's a pleasure to cooperate with you. > > — > Reply to this email directly, view it on GitHub > <https://github.com/koel/koel/issues/1816#issuecomment-2322866410>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AB5O3URE44OPYHGQEW6AZXDZUGRIPAVCNFSM6AAAAABNNWWNDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSHA3DMNBRGA> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@HNIdesu commented on GitHub (Jun 1, 2025):

Hi, I noticed the issue with garbled music information is likely caused by the inconsistent character encoding in MP3 ID3 tags. Since MP3 files don’t have a unified standard for encoding, and often lack explicit fields indicating which encoding is used, it can lead to misinterpretation of text metadata.

One possible workaround is to convert the MP3 files into a format with more consistent and standardized character encoding, such as M4A. This could help ensure that metadata is correctly read and displayed without garbling.

Just a suggestion—hope it helps!

<!-- gh-comment-id:2926386119 --> @HNIdesu commented on GitHub (Jun 1, 2025): Hi, I noticed the issue with garbled music information is likely caused by the inconsistent character encoding in MP3 ID3 tags. Since MP3 files don’t have a unified standard for encoding, and often lack explicit fields indicating which encoding is used, it can lead to misinterpretation of text metadata. One possible workaround is to convert the MP3 files into a format with more consistent and standardized character encoding, such as M4A. This could help ensure that metadata is correctly read and displayed without garbling. Just a suggestion—hope it helps!
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/koel-koel#1005
No description provided.