mirror of
https://github.com/koel/koel.git
synced 2026-04-25 08:46:00 +03:00
[GH-ISSUE #2165] [Bug]: koel:scan is very slow, even with an already scanned library #1104
Labels
No labels
Authentication
Dependencies
Documentation
Feature Request
Flac
Help Wanted
Installation/Setup
Integration
Mobile
PR Welcome
Pending Release
Performance
Playlist
S3
Search
Sync
[Pri] Low
[Pri] Normal
[Status] Keep Open
[Status] Needs Author Reply
[Status] Needs Review
[Status] Stale
[Status] Will Implement
[Type] Blessed
[Type] Bug
[Type] Duplicate
[Type] Enhancement
[Type] Help Request
[Type] Question
[Type] Task
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/koel-koel#1104
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Itrimel on GitHub (Nov 11, 2025).
Original GitHub issue: https://github.com/koel/koel/issues/2165
Originally assigned to: @phanan on GitHub.
Read the Troubleshooting guide.
Reproduction steps
sudo docker exec --user=www-data koel php artisan koel:scan --owner=2)Expected behavior
A few versions ago (I don't remember when the behavior changed exactly ...), I remember the scan process being quite efficient : the already analyzed files were skipped quite quickly, giving scan rates of ~100-200 files/second.
Actual behavior
Starting sometime ago (this year ? It's been bothering me for some time, but I'm only thinking now to raise the error), the scan process has gotten quite slow, even on a already fully known library, with scan rates of 5-10 files/second.
When running
koel:scanwith the verbose option, I see that all files are markedSKIPPEDas expected, but the scan still spends some time on each file, and it seems to be proportional to the file size (I have some 50MB FLAC files which take close to 1s to scan, when lighter mp3 files seems to be close to 0.1s ?).With my library (~2300 files), I now have to wait ~15min when adding new files when it would take around 30s before.
I don't know if this behavior change is intended or not ?
Logs
koel:doctor
Koel version
8.1.0
How did you install Koel?
Official Docker image
Additional information
@phanan commented on GitHub (Nov 11, 2025):
Hmm, this certainly isn't the expected behavior. I'll take a look when I
have time.
Am Di., 11. Nov. 2025 um 11:16 Uhr schrieb Itrimel @.***
@deckardstp commented on GitHub (Nov 11, 2025):
I can confirm the behavior. I have a larger library with about 77k files and it scans about 5-10 files per second. Since koel runs in a KVM on a proxmox host I already adjusted used cores and RAM, but the improvements were not big. Thanks for having a look!
@LucasLaprad commented on GitHub (Nov 11, 2025):
I experience this as well; I tend to see 1 to 3 songs at a time per second during a scan.
I have a very, very large library (over 1M tracks) but Koel currently only has access to around 500k individual tracks as I test Koel. The library is being accessed through an NFS share (10Gbps link), so I always presumed that the slow scan speeds are an acceptable issue considering I am probably the outlier with my library size and how my files are accessible to Koel. However, it would be nice if there was a way to speed this up in the future.
I am not using the docker version, currently running it behind nginx on a Rocky Linux VM on Proxmox.
@gelokatil commented on GitHub (Nov 22, 2025):
Have you checked in Proxmox, in the configuration of the disk assigned to the VM, if you have write caching enabled? In my case, it slightly improved the process speed.
@deckardstp commented on GitHub (Nov 23, 2025):
@gelokatil I assume this has, as you already stated, only minor effects. I think the main reason is, that the scan is handled single threaded and this prevents higher speeds. My KVM is living on NVMe storage and the data is on a 10G SSD NAS. There are no bottlenecks on this part.
@phanan commented on GitHub (Nov 23, 2025):
Even at single thread, the scan shouldn’t be that slow. I must have
changed something in the code, but I don’t have time yet lately.
On Sun, Nov 23, 2025 at 08:34 Seb @.***> wrote:
@deckardstp commented on GitHub (Nov 23, 2025):
Thanks for the reply @phanan At least I can confirm the scan is using about 80-100% of one thread of my machine. If you need more input on this, you can let me know!
@Itrimel commented on GitHub (Nov 24, 2025):
@phanan Looking at the code, I think the problem comes from this change : https://github.com/koel/koel/pull/2072 . Computing a hash of each music file forces each scan to read the full library from disk, throttling the process depending on the disk read speed (in my case an HDD, so ~150MB/s which is coherent with the scan speed I was seeing with my FLAC files).
I don't know what was your exact reasoning behind this change, but if you're ok with this I can try to write a patch to go back to the old behavior (using last modification time)/add a config option to choose whether to use hash or last modification time (after checking that the problem indeed comes from this).
@phanan commented on GitHub (Nov 24, 2025):
File hashes were introduced as a replacement for mtime because apart from
telling a file’s changed, they can be used for more useful features like
deduplication. Maybe we can consider xxhash, which is way faster than
md5/sha and natively support since PHP 8.1.
On Mon, Nov 24, 2025 at 12:28 Itrimel @.***> wrote:
@Itrimel commented on GitHub (Nov 24, 2025):
I just ran some tests, and in my case (files hosted on a local HDD), changing the hash algorithm to 'xhh128' did not change the scan speed (even after running multiple times to be sure that the database had been correctly updated), I still get something around 10 files/second : I think that the real limiting factor is reading all files from storage into memory rather than computing the hash of the files
Then if file hash should be kept, I am thinking about another solution : when scanning files, only compute the hash when really needed (e.g. when the last modified time is different from the one stored in the database or when the force option has been passed)
@phanan commented on GitHub (Nov 24, 2025):
Good idea. Do you think you can test this theory?
Am Mo., 24. Nov. 2025 um 15:27 Uhr schrieb Itrimel @.***
@Itrimel commented on GitHub (Nov 24, 2025):
I will try to hack something together in the coming days, but it will be my first time really touching PHP so I'll see how it goes 😅
@deckardstp commented on GitHub (Nov 24, 2025):
@Itrimel If you need a bigger library for testing, I would give it a try too. Just need a tagged Docker Image. :)
@Itrimel commented on GitHub (Nov 26, 2025):
@phanan I did a test with the following modification :
github.com/Itrimel/koel@8680633c5fOn my fully processed library, with this patch, I am able to process 100-200 files/s by skipping processing the music tags and the file hash on files which have not been modified since the last scan.
I was also able to test that adding and removing files still works.
@deckardstp commented on GitHub (Nov 27, 2025):
I added the modifications as well and can confirm. The scan process is now much faster.