[GH-ISSUE #2165] [Bug]: koel:scan is very slow, even with an already scanned library #1104

Closed
opened 2026-02-26 02:35:15 +03:00 by kerem · 15 comments
Owner

Originally created by @Itrimel on GitHub (Nov 11, 2025).
Original GitHub issue: https://github.com/koel/koel/issues/2165

Originally assigned to: @phanan on GitHub.

Read the Troubleshooting guide.

  • I have read and followed the Troubleshooting guide

Reproduction steps

  1. Run koel:scan multiple times (in my case, the exact command is sudo docker exec --user=www-data koel php artisan koel:scan --owner=2)

Expected behavior

A few versions ago (I don't remember when the behavior changed exactly ...), I remember the scan process being quite efficient : the already analyzed files were skipped quite quickly, giving scan rates of ~100-200 files/second.

Actual behavior

Starting sometime ago (this year ? It's been bothering me for some time, but I'm only thinking now to raise the error), the scan process has gotten quite slow, even on a already fully known library, with scan rates of 5-10 files/second.

When running koel:scan with the verbose option, I see that all files are marked SKIPPED as expected, but the scan still spends some time on each file, and it seems to be proportional to the file size (I have some 50MB FLAC files which take close to 1s to scan, when lighter mp3 files seems to be close to 0.1s ?).

With my library (~2300 files), I now have to wait ~15min when adding new files when it would take around 30s before.

I don't know if this behavior change is intended or not ?

Logs

koel:doctor

                                                                                
                             CHECKING KOEL SETUP...                             
                                                                                

  Artifacts directory /tmp/koel/ is readable/writable ..................... OK  
  Session directory storage/framework/sessions is readable/writable ....... OK  
  Cache directory storage/framework/cache is readable/writable ............ OK  
  Log directory storage/logs is readable/writable ......................... OK  
  Checking database connection ............................................ OK  
  Media storage setup (local) ............................................. OK  
  TNT search index directory storage/search-indexes is readable/writable .. OK  
  API is healthy .......................................................... OK  
  FFmpeg binary /usr/bin/ffmpeg is executable ............................. OK  
  PHP extension zip is loaded. Multi-file downloading is supported ........ OK  
  Max upload size ....................................................... 800M  
  Max post size ......................................................... 800M  
  Streaming method ................................................ x-sendfile  
  Last.fm integration ..................................................... OK  
  YouTube integration .......................................... Not available  
  Spotify integration .......................................... Not available  
  Mailer configuration ......................................... Not available  
  Koel scheduler status ........................................ Not installed  
  Koel Plus license status ............................................ Active  

 [OK] Your Koel setup should be good to go!   

Koel version

8.1.0

How did you install Koel?

Official Docker image

Additional information

  • Server OS: Debian 13
  • PHP version: 8.4.8 (from Docker image)
  • Database: PostgreSQL 18
  • Node version:
  • Browser & device:
  • Additional context:
Originally created by @Itrimel on GitHub (Nov 11, 2025). Original GitHub issue: https://github.com/koel/koel/issues/2165 Originally assigned to: @phanan on GitHub. ### Read the Troubleshooting guide. - [x] I have read and followed the Troubleshooting guide ### Reproduction steps 1. Run koel:scan multiple times (in my case, the exact command is `sudo docker exec --user=www-data koel php artisan koel:scan --owner=2`) ### Expected behavior A few versions ago (I don't remember when the behavior changed exactly ...), I remember the scan process being quite efficient : the already analyzed files were skipped quite quickly, giving scan rates of ~100-200 files/second. ### Actual behavior Starting sometime ago (this year ? It's been bothering me for some time, but I'm only thinking now to raise the error), the scan process has gotten quite slow, even on a already fully known library, with scan rates of 5-10 files/second. When running `koel:scan` with the verbose option, I see that all files are marked `SKIPPED` as expected, but the scan still spends some time on each file, and it seems to be proportional to the file size (I have some 50MB FLAC files which take close to 1s to scan, when lighter mp3 files seems to be close to 0.1s ?). With my library (~2300 files), I now have to wait ~15min when adding new files when it would take around 30s before. I don't know if this behavior change is intended or not ? ### Logs koel:doctor ``` CHECKING KOEL SETUP... Artifacts directory /tmp/koel/ is readable/writable ..................... OK Session directory storage/framework/sessions is readable/writable ....... OK Cache directory storage/framework/cache is readable/writable ............ OK Log directory storage/logs is readable/writable ......................... OK Checking database connection ............................................ OK Media storage setup (local) ............................................. OK TNT search index directory storage/search-indexes is readable/writable .. OK API is healthy .......................................................... OK FFmpeg binary /usr/bin/ffmpeg is executable ............................. OK PHP extension zip is loaded. Multi-file downloading is supported ........ OK Max upload size ....................................................... 800M Max post size ......................................................... 800M Streaming method ................................................ x-sendfile Last.fm integration ..................................................... OK YouTube integration .......................................... Not available Spotify integration .......................................... Not available Mailer configuration ......................................... Not available Koel scheduler status ........................................ Not installed Koel Plus license status ............................................ Active [OK] Your Koel setup should be good to go! ``` ### Koel version 8.1.0 ### How did you install Koel? Official Docker image ### Additional information - **Server OS**: Debian 13 - **PHP version**: 8.4.8 (from Docker image) - **Database**: PostgreSQL 18 - **Node version**: - **Browser & device**: - **Additional context**:
kerem closed this issue 2026-02-26 02:35:15 +03:00
Author
Owner

@phanan commented on GitHub (Nov 11, 2025):

Hmm, this certainly isn't the expected behavior. I'll take a look when I
have time.

Am Di., 11. Nov. 2025 um 11:16 Uhr schrieb Itrimel @.***

:

Assigned #2165 https://github.com/koel/koel/issues/2165 to @phanan
https://github.com/phanan.


Reply to this email directly, view it on GitHub
https://github.com/koel/koel/issues/2165#event-20858785424, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5O3UUCFZEAZUDDNJJKR6T34GZPTAVCNFSM6AAAAACLYGFQKCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMRQHA2TQNZYGU2DENA
.
You are receiving this because you were assigned.Message ID:
@.***>

<!-- gh-comment-id:3516305632 --> @phanan commented on GitHub (Nov 11, 2025): Hmm, this certainly isn't the expected behavior. I'll take a look when I have time. Am Di., 11. Nov. 2025 um 11:16 Uhr schrieb Itrimel ***@***.*** >: > Assigned #2165 <https://github.com/koel/koel/issues/2165> to @phanan > <https://github.com/phanan>. > > — > Reply to this email directly, view it on GitHub > <https://github.com/koel/koel/issues/2165#event-20858785424>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AB5O3UUCFZEAZUDDNJJKR6T34GZPTAVCNFSM6AAAAACLYGFQKCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMRQHA2TQNZYGU2DENA> > . > You are receiving this because you were assigned.Message ID: > ***@***.***> >
Author
Owner

@deckardstp commented on GitHub (Nov 11, 2025):

I can confirm the behavior. I have a larger library with about 77k files and it scans about 5-10 files per second. Since koel runs in a KVM on a proxmox host I already adjusted used cores and RAM, but the improvements were not big. Thanks for having a look!

<!-- gh-comment-id:3516865191 --> @deckardstp commented on GitHub (Nov 11, 2025): I can confirm the behavior. I have a larger library with about 77k files and it scans about 5-10 files per second. Since koel runs in a KVM on a proxmox host I already adjusted used cores and RAM, but the improvements were not big. Thanks for having a look!
Author
Owner

@LucasLaprad commented on GitHub (Nov 11, 2025):

I experience this as well; I tend to see 1 to 3 songs at a time per second during a scan.

I have a very, very large library (over 1M tracks) but Koel currently only has access to around 500k individual tracks as I test Koel. The library is being accessed through an NFS share (10Gbps link), so I always presumed that the slow scan speeds are an acceptable issue considering I am probably the outlier with my library size and how my files are accessible to Koel. However, it would be nice if there was a way to speed this up in the future.

I am not using the docker version, currently running it behind nginx on a Rocky Linux VM on Proxmox.

<!-- gh-comment-id:3518182221 --> @LucasLaprad commented on GitHub (Nov 11, 2025): I experience this as well; I tend to see 1 to 3 songs at a time per second during a scan. I have a very, very large library (over 1M tracks) but Koel currently only has access to around 500k individual tracks as I test Koel. The library is being accessed through an NFS share (10Gbps link), so I always presumed that the slow scan speeds are an acceptable issue considering I am probably the outlier with my library size and how my files are accessible to Koel. However, it would be nice if there was a way to speed this up in the future. I am not using the docker version, currently running it behind nginx on a Rocky Linux VM on Proxmox.
Author
Owner

@gelokatil commented on GitHub (Nov 22, 2025):

I can confirm the behavior. I have a larger library with about 77k files and it scans about 5-10 files per second. Since koel runs in a KVM on a proxmox host I already adjusted used cores and RAM, but the improvements were not big. Thanks for having a look!

Have you checked in Proxmox, in the configuration of the disk assigned to the VM, if you have write caching enabled? In my case, it slightly improved the process speed.

<!-- gh-comment-id:3567080345 --> @gelokatil commented on GitHub (Nov 22, 2025): > I can confirm the behavior. I have a larger library with about 77k files and it scans about 5-10 files per second. Since koel runs in a KVM on a proxmox host I already adjusted used cores and RAM, but the improvements were not big. Thanks for having a look! Have you checked in Proxmox, in the configuration of the disk assigned to the VM, if you have write caching enabled? In my case, it slightly improved the process speed.
Author
Owner

@deckardstp commented on GitHub (Nov 23, 2025):

@gelokatil I assume this has, as you already stated, only minor effects. I think the main reason is, that the scan is handled single threaded and this prevents higher speeds. My KVM is living on NVMe storage and the data is on a 10G SSD NAS. There are no bottlenecks on this part.

<!-- gh-comment-id:3567582818 --> @deckardstp commented on GitHub (Nov 23, 2025): @gelokatil I assume this has, as you already stated, only minor effects. I think the main reason is, that the scan is handled single threaded and this prevents higher speeds. My KVM is living on NVMe storage and the data is on a 10G SSD NAS. There are no bottlenecks on this part.
Author
Owner

@phanan commented on GitHub (Nov 23, 2025):

Even at single thread, the scan shouldn’t be that slow. I must have
changed something in the code, but I don’t have time yet lately.

On Sun, Nov 23, 2025 at 08:34 Seb @.***> wrote:

deckardstp left a comment (koel/koel#2165)
https://github.com/koel/koel/issues/2165#issuecomment-3567582818

@gelokatil https://github.com/gelokatil I assume this has, as you
already stated, only minor effects. I think the main reason is, that the
scan is handled single threaded and this prevents higher speeds. My KVM is
living on NVMe storage and the data is on a 10G SSD NAS. There are no
bottlenecks on this part.


Reply to this email directly, view it on GitHub
https://github.com/koel/koel/issues/2165#issuecomment-3567582818, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5O3UVU5J2L3S5CYE3SGNL36FPQJAVCNFSM6AAAAACLYGFQKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNRXGU4DEOBRHA
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:3567588931 --> @phanan commented on GitHub (Nov 23, 2025): Even at single thread, the scan shouldn’t be that slow. I must have changed something in the code, but I don’t have time yet lately. On Sun, Nov 23, 2025 at 08:34 Seb ***@***.***> wrote: > *deckardstp* left a comment (koel/koel#2165) > <https://github.com/koel/koel/issues/2165#issuecomment-3567582818> > > @gelokatil <https://github.com/gelokatil> I assume this has, as you > already stated, only minor effects. I think the main reason is, that the > scan is handled single threaded and this prevents higher speeds. My KVM is > living on NVMe storage and the data is on a 10G SSD NAS. There are no > bottlenecks on this part. > > — > Reply to this email directly, view it on GitHub > <https://github.com/koel/koel/issues/2165#issuecomment-3567582818>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AB5O3UVU5J2L3S5CYE3SGNL36FPQJAVCNFSM6AAAAACLYGFQKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNRXGU4DEOBRHA> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@deckardstp commented on GitHub (Nov 23, 2025):

Thanks for the reply @phanan At least I can confirm the scan is using about 80-100% of one thread of my machine. If you need more input on this, you can let me know!

<!-- gh-comment-id:3567591743 --> @deckardstp commented on GitHub (Nov 23, 2025): Thanks for the reply @phanan At least I can confirm the scan is using about 80-100% of one thread of my machine. If you need more input on this, you can let me know!
Author
Owner

@Itrimel commented on GitHub (Nov 24, 2025):

@phanan Looking at the code, I think the problem comes from this change : https://github.com/koel/koel/pull/2072 . Computing a hash of each music file forces each scan to read the full library from disk, throttling the process depending on the disk read speed (in my case an HDD, so ~150MB/s which is coherent with the scan speed I was seeing with my FLAC files).

I don't know what was your exact reasoning behind this change, but if you're ok with this I can try to write a patch to go back to the old behavior (using last modification time)/add a config option to choose whether to use hash or last modification time (after checking that the problem indeed comes from this).

<!-- gh-comment-id:3570306073 --> @Itrimel commented on GitHub (Nov 24, 2025): @phanan Looking at the code, I think the problem comes from this change : https://github.com/koel/koel/pull/2072 . Computing a hash of each music file forces each scan to read the full library from disk, throttling the process depending on the disk read speed (in my case an HDD, so ~150MB/s which is coherent with the scan speed I was seeing with my FLAC files). I don't know what was your exact reasoning behind this change, but if you're ok with this I can try to write a patch to go back to the old behavior (using last modification time)/add a config option to choose whether to use hash or last modification time (after checking that the problem indeed comes from this).
Author
Owner

@phanan commented on GitHub (Nov 24, 2025):

File hashes were introduced as a replacement for mtime because apart from
telling a file’s changed, they can be used for more useful features like
deduplication. Maybe we can consider xxhash, which is way faster than
md5/sha and natively support since PHP 8.1.

On Mon, Nov 24, 2025 at 12:28 Itrimel @.***> wrote:

Itrimel left a comment (koel/koel#2165)
https://github.com/koel/koel/issues/2165#issuecomment-3570306073

@phanan https://github.com/phanan Looking at the code, I think the
problem comes from this change : #2072
https://github.com/koel/koel/pull/2072 . Computing a hash of each music
file forces each scan to read the full library from disk, throttling the
process depending on the disk read speed (in my case an HDD, so ~150MB/s
which is coherent with the scan speed I was seeing with my FLAC files).

I don't know what was your exact reasoning behind this change, but if
you're ok with this I can try to write a patch to go back to the old
behavior (using last modification time)/add a config option to choose
whether to use hash or last modification time (after checking that the
problem indeed comes from this).


Reply to this email directly, view it on GitHub
https://github.com/koel/koel/issues/2165#issuecomment-3570306073, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5O3USXW7QXYN4TLKT4KFD36LTXBAVCNFSM6AAAAACLYGFQKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZQGMYDMMBXGM
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:3570505109 --> @phanan commented on GitHub (Nov 24, 2025): File hashes were introduced as a replacement for mtime because apart from telling a file’s changed, they can be used for more useful features like deduplication. Maybe we can consider xxhash, which is way faster than md5/sha and natively support since PHP 8.1. On Mon, Nov 24, 2025 at 12:28 Itrimel ***@***.***> wrote: > *Itrimel* left a comment (koel/koel#2165) > <https://github.com/koel/koel/issues/2165#issuecomment-3570306073> > > @phanan <https://github.com/phanan> Looking at the code, I think the > problem comes from this change : #2072 > <https://github.com/koel/koel/pull/2072> . Computing a hash of each music > file forces each scan to read the full library from disk, throttling the > process depending on the disk read speed (in my case an HDD, so ~150MB/s > which is coherent with the scan speed I was seeing with my FLAC files). > > I don't know what was your exact reasoning behind this change, but if > you're ok with this I can try to write a patch to go back to the old > behavior (using last modification time)/add a config option to choose > whether to use hash or last modification time (after checking that the > problem indeed comes from this). > > — > Reply to this email directly, view it on GitHub > <https://github.com/koel/koel/issues/2165#issuecomment-3570306073>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AB5O3USXW7QXYN4TLKT4KFD36LTXBAVCNFSM6AAAAACLYGFQKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZQGMYDMMBXGM> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@Itrimel commented on GitHub (Nov 24, 2025):

Maybe we can consider xxhash, which is way faster than
md5/sha and natively support since PHP 8.1.

I just ran some tests, and in my case (files hosted on a local HDD), changing the hash algorithm to 'xhh128' did not change the scan speed (even after running multiple times to be sure that the database had been correctly updated), I still get something around 10 files/second : I think that the real limiting factor is reading all files from storage into memory rather than computing the hash of the files

File hashes were introduced as a replacement for mtime because apart from
telling a file’s changed, they can be used for more useful features like
deduplication.

Then if file hash should be kept, I am thinking about another solution : when scanning files, only compute the hash when really needed (e.g. when the last modified time is different from the one stored in the database or when the force option has been passed)

<!-- gh-comment-id:3571066049 --> @Itrimel commented on GitHub (Nov 24, 2025): > Maybe we can consider xxhash, which is way faster than > md5/sha and natively support since PHP 8.1. > […](#) I just ran some tests, and in my case (files hosted on a local HDD), changing the hash algorithm to 'xhh128' did not change the scan speed (even after running multiple times to be sure that the database had been correctly updated), I still get something around 10 files/second : I think that the real limiting factor is reading all files from storage into memory rather than computing the hash of the files > File hashes were introduced as a replacement for mtime because apart from > telling a file’s changed, they can be used for more useful features like > deduplication. Then if file hash should be kept, I am thinking about another solution : when scanning files, only compute the hash when really needed (e.g. when the last modified time is different from the one stored in the database or when the force option has been passed)
Author
Owner

@phanan commented on GitHub (Nov 24, 2025):

Then if file hash should be kept, I am thinking about another solution :
when scanning files, only compute the hash when really needed (e.g. when
the last modified time is different from the one stored in the database or
when the force option has been passed)

Good idea. Do you think you can test this theory?

Am Mo., 24. Nov. 2025 um 15:27 Uhr schrieb Itrimel @.***

:

Itrimel left a comment (koel/koel#2165)
https://github.com/koel/koel/issues/2165#issuecomment-3571066049

Maybe we can consider xxhash, which is way faster than
md5/sha and natively support since PHP 8.1.
… <#m_-5463090708965794228_>

I just ran some tests, and in my case (files hosted on a local HDD),
changing the hash algorithm to 'xhh128' did not change the scan speed (even
after running multiple times to be sure that the database had been
correctly updated), I still get something around 10 files/second : I think
that the real limiting factor is reading all files from storage into memory
rather than computing the hash of the files

File hashes were introduced as a replacement for mtime because apart from
telling a file’s changed, they can be used for more useful features like
deduplication.

Then if file hash should be kept, I am thinking about another solution :
when scanning files, only compute the hash when really needed (e.g. when
the last modified time is different from the one stored in the database or
when the force option has been passed)


Reply to this email directly, view it on GitHub
https://github.com/koel/koel/issues/2165#issuecomment-3571066049, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AB5O3UUJMQUZKAD5PS4HZGL36MIW7AVCNFSM6AAAAACLYGFQKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZRGA3DMMBUHE
.
You are receiving this because you were mentioned.Message ID:
@.***>

<!-- gh-comment-id:3571077312 --> @phanan commented on GitHub (Nov 24, 2025): > Then if file hash should be kept, I am thinking about another solution : when scanning files, only compute the hash when really needed (e.g. when the last modified time is different from the one stored in the database or when the force option has been passed) Good idea. Do you think you can test this theory? Am Mo., 24. Nov. 2025 um 15:27 Uhr schrieb Itrimel ***@***.*** >: > *Itrimel* left a comment (koel/koel#2165) > <https://github.com/koel/koel/issues/2165#issuecomment-3571066049> > > Maybe we can consider xxhash, which is way faster than > md5/sha and natively support since PHP 8.1. > … <#m_-5463090708965794228_> > > I just ran some tests, and in my case (files hosted on a local HDD), > changing the hash algorithm to 'xhh128' did not change the scan speed (even > after running multiple times to be sure that the database had been > correctly updated), I still get something around 10 files/second : I think > that the real limiting factor is reading all files from storage into memory > rather than computing the hash of the files > > File hashes were introduced as a replacement for mtime because apart from > telling a file’s changed, they can be used for more useful features like > deduplication. > > Then if file hash should be kept, I am thinking about another solution : > when scanning files, only compute the hash when really needed (e.g. when > the last modified time is different from the one stored in the database or > when the force option has been passed) > > — > Reply to this email directly, view it on GitHub > <https://github.com/koel/koel/issues/2165#issuecomment-3571066049>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AB5O3UUJMQUZKAD5PS4HZGL36MIW7AVCNFSM6AAAAACLYGFQKCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZRGA3DMMBUHE> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >
Author
Owner

@Itrimel commented on GitHub (Nov 24, 2025):

I will try to hack something together in the coming days, but it will be my first time really touching PHP so I'll see how it goes 😅

<!-- gh-comment-id:3571135961 --> @Itrimel commented on GitHub (Nov 24, 2025): I will try to hack something together in the coming days, but it will be my first time really touching PHP so I'll see how it goes 😅
Author
Owner

@deckardstp commented on GitHub (Nov 24, 2025):

@Itrimel If you need a bigger library for testing, I would give it a try too. Just need a tagged Docker Image. :)

<!-- gh-comment-id:3571185315 --> @deckardstp commented on GitHub (Nov 24, 2025): @Itrimel If you need a bigger library for testing, I would give it a try too. Just need a tagged Docker Image. :)
Author
Owner

@Itrimel commented on GitHub (Nov 26, 2025):

@phanan I did a test with the following modification : github.com/Itrimel/koel@8680633c5f

On my fully processed library, with this patch, I am able to process 100-200 files/s by skipping processing the music tags and the file hash on files which have not been modified since the last scan.
I was also able to test that adding and removing files still works.

<!-- gh-comment-id:3582046281 --> @Itrimel commented on GitHub (Nov 26, 2025): @phanan I did a test with the following modification : https://github.com/Itrimel/koel/commit/8680633c5f45cbb2234655707f9b0ff5c0a4dbeb On my fully processed library, with this patch, I am able to process 100-200 files/s by skipping processing the music tags and the file hash on files which have not been modified since the last scan. I was also able to test that adding and removing files still works.
Author
Owner

@deckardstp commented on GitHub (Nov 27, 2025):

I added the modifications as well and can confirm. The scan process is now much faster.

<!-- gh-comment-id:3587123122 --> @deckardstp commented on GitHub (Nov 27, 2025): I added the modifications as well and can confirm. The scan process is now much faster.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/koel-koel#1104
No description provided.