[GH-ISSUE #157] Mirror makes many copies #77

Closed
opened 2026-02-27 15:54:58 +03:00 by kerem · 14 comments
Owner

Originally created by @groke76 on GitHub (Nov 29, 2025).
Original GitHub issue: https://github.com/RayLabsHQ/gitea-mirror/issues/157

Hi

I haven't found anyone else with this issue, but it creates many copies of my repositories. When having LFS enabled i suspect they can take up lots of space if it continues like this.. Any idea why it does so? Example under here. When I check Gitea, I find many repositories like this:

starred/mame-mamedev-7
starred/mame-mamedev-6
starred/mame-mamedev-5
starred/mame-mamedev-4
starred/mame-mamedev-3
starred/mame-mamedev-2
starred/mame-mamedev-1
starred/mame-mamedev

Originally created by @groke76 on GitHub (Nov 29, 2025). Original GitHub issue: https://github.com/RayLabsHQ/gitea-mirror/issues/157 Hi I haven't found anyone else with this issue, but it creates many copies of my repositories. When having LFS enabled i suspect they can take up lots of space if it continues like this.. Any idea why it does so? Example under here. When I check Gitea, I find many repositories like this: [starred/mame-mamedev-7](https://gitea.home/starred/mame-mamedev-7) [starred/mame-mamedev-6](https://gitea.home/starred/mame-mamedev-6) [starred/mame-mamedev-5](https://gitea.home/starred/mame-mamedev-5) [starred/mame-mamedev-4](https://gitea.home/starred/mame-mamedev-4) [starred/mame-mamedev-3](https://gitea.home/starred/mame-mamedev-3) [starred/mame-mamedev-2](https://gitea.home/starred/mame-mamedev-2) [starred/mame-mamedev-1](https://gitea.home/starred/mame-mamedev-1) [starred/mame-mamedev](https://gitea.home/starred/mame-mamedev)
kerem closed this issue 2026-02-27 15:54:59 +03:00
Author
Owner

@arunavo4 commented on GitHub (Nov 30, 2025):

@groke76 first are you running latest gitea and gitea-mirror cause I addressed similar race conditions earlier and if you can also send some docker logs would be great for some more context.

Similar:
https://github.com/RayLabsHQ/gitea-mirror/issues/115

<!-- gh-comment-id:3592216368 --> @arunavo4 commented on GitHub (Nov 30, 2025): @groke76 first are you running latest gitea and gitea-mirror cause I addressed similar race conditions earlier and if you can also send some docker logs would be great for some more context. Similar: https://github.com/RayLabsHQ/gitea-mirror/issues/115
Author
Owner

@groke76 commented on GitHub (Nov 30, 2025):

Hi, yes I'm running the latest ones. I have logs for both Gitea and Gitea-mirror here. Woke up now to check and I have 113 repos downloaded, and I have no more than 80 of them. There are many duplicates that I can see here. It seems that it starts to download a new one if the previous one is falsly labeled with "synced" instead of "mirrored", and/or if they fail the first time, then creates a new one, and then tries the first one again later. Gitea also uses a lot of time to import (compared to GitLab for example), and maybe gitea-mirror becomes "impatient" and then just duplicates recursively instead

_gitea-mirror_logs.txt
_gitea_logs.txt

<!-- gh-comment-id:3592298956 --> @groke76 commented on GitHub (Nov 30, 2025): Hi, yes I'm running the latest ones. I have logs for both Gitea and Gitea-mirror here. Woke up now to check and I have 113 repos downloaded, and I have no more than 80 of them. There are many duplicates that I can see here. It seems that it starts to download a new one if the previous one is falsly labeled with "synced" instead of "mirrored", and/or if they fail the first time, then creates a new one, and then tries the first one again later. Gitea also uses a lot of time to import (compared to GitLab for example), and maybe gitea-mirror becomes "impatient" and then just duplicates recursively instead [_gitea-mirror_logs.txt](https://github.com/user-attachments/files/23838188/_gitea-mirror_logs.txt) [_gitea_logs.txt](https://github.com/user-attachments/files/23838187/_gitea_logs.txt)
Author
Owner

@arunavo4 commented on GitHub (Nov 30, 2025):

@groke76 Hey if the repo is too large then there is a timeout currently can you confirm that its only happening with very large repos?

<!-- gh-comment-id:3592299724 --> @arunavo4 commented on GitHub (Nov 30, 2025): @groke76 Hey if the repo is too large then there is a timeout currently can you confirm that its only happening with very large repos?
Author
Owner

@arunavo4 commented on GitHub (Nov 30, 2025):

Also the logs you pasted does not show any of the issues, I am guessing it was just tail of the logs if you can paste like part of the log where the duplicates happen that would be super helpful. and maybe for large repos I can expose a custom timeout that you can set using the env vars

<!-- gh-comment-id:3592301034 --> @arunavo4 commented on GitHub (Nov 30, 2025): Also the logs you pasted does not show any of the issues, I am guessing it was just tail of the logs if you can paste like part of the log where the duplicates happen that would be super helpful. and maybe for large repos I can expose a custom timeout that you can set using the env vars
Author
Owner

@PascalH214 commented on GitHub (Dec 11, 2025):

I have the same problem and it only occures at large repos. Thats pretty sad. :/

<!-- gh-comment-id:3642310289 --> @PascalH214 commented on GitHub (Dec 11, 2025): I have the same problem and it only occures at large repos. Thats pretty sad. :/
Author
Owner

@arunavo4 commented on GitHub (Dec 13, 2025):

@PascalH214 I think the issue is that your network is not fast enough with the current timeouts for large repos and thats whats casuing this. I will make the timeouts configurable for people with large repos or reimplement it maybe.

<!-- gh-comment-id:3649440683 --> @arunavo4 commented on GitHub (Dec 13, 2025): @PascalH214 I think the issue is that your network is not fast enough with the current timeouts for large repos and thats whats casuing this. I will make the timeouts configurable for people with large repos or reimplement it maybe.
Author
Owner

@PascalH214 commented on GitHub (Dec 13, 2025):

@arunavo4 Meanwhile, I think the problem lies with my Gitea setup. It is failing to migrate the repository. My server can clone the repository in just four seconds. However, Gitea takes two hours to migrate the same repository. I don't know where the problem lies, so configurable timeouts wouldn't solve it. Sorry for commenting here. I need to find out how to fix the problem within Gitea.

<!-- gh-comment-id:3649785531 --> @PascalH214 commented on GitHub (Dec 13, 2025): @arunavo4 Meanwhile, I think the problem lies with my Gitea setup. It is failing to migrate the repository. My server can clone the repository in just four seconds. However, Gitea takes two hours to migrate the same repository. I don't know where the problem lies, so configurable timeouts wouldn't solve it. Sorry for commenting here. I need to find out how to fix the problem within Gitea.
Author
Owner

@emrebasarannn commented on GitHub (Dec 22, 2025):

For me its the issues. They are mostly duplicate. How can i sovle this?

<!-- gh-comment-id:3682729162 --> @emrebasarannn commented on GitHub (Dec 22, 2025): For me its the issues. They are mostly duplicate. How can i sovle this?
Author
Owner

@arunavo4 commented on GitHub (Dec 22, 2025):

For me its the issues. They are mostly duplicate. How can i sovle this?

@emrebasarannn can you put some related logs here where there are duplicate issues. and also can you check if its happening for one repo or more than one?

<!-- gh-comment-id:3683445676 --> @arunavo4 commented on GitHub (Dec 22, 2025): > For me its the issues. They are mostly duplicate. How can i sovle this? @emrebasarannn can you put some related logs here where there are duplicate issues. and also can you check if its happening for one repo or more than one?
Author
Owner

@emrebasarannn commented on GitHub (Dec 22, 2025):

@arunavo4 Hi,

Sharing logs might be overwhelming since i am mirroring more then +200 repos. Here is an example of my situation, we can see 3 different issue has been writed for 9 times in total.

Image
<!-- gh-comment-id:3684255230 --> @emrebasarannn commented on GitHub (Dec 22, 2025): @arunavo4 Hi, Sharing logs might be overwhelming since i am mirroring more then +200 repos. Here is an example of my situation, we can see 3 different issue has been writed for 9 times in total. <img width="804" height="601" alt="Image" src="https://github.com/user-attachments/assets/cd8b9c17-4033-4cc1-ac52-48134996ba72" />
Author
Owner

@emrebasarannn commented on GitHub (Jan 2, 2026):

@arunavo4 Hi,

Sharing logs might be overwhelming since i am mirroring more then +200 repos. Here is an example of my situation, we can see 3 different issue has been writed for 9 times in total.

Image

@arunavo4 Hi,

I guess i figured out the problem. Since my repo's are big(some of them has +10k issue) and while mirroring for the first time the gitea-mirror's self schedule also running(i setted to 5 minute), the issues were duplicating. Giving more time for the initial mirror solved my problem.

<!-- gh-comment-id:3704658694 --> @emrebasarannn commented on GitHub (Jan 2, 2026): > [@arunavo4](https://github.com/arunavo4) Hi, > > Sharing logs might be overwhelming since i am mirroring more then +200 repos. Here is an example of my situation, we can see 3 different issue has been writed for 9 times in total. > > <img alt="Image" width="804" height="601" src="https://private-user-images.githubusercontent.com/221116373/529401456-cd8b9c17-4033-4cc1-ac52-48134996ba72.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjczMzg3MDIsIm5iZiI6MTc2NzMzODQwMiwicGF0aCI6Ii8yMjExMTYzNzMvNTI5NDAxNDU2LWNkOGI5YzE3LTQwMzMtNGNjMS1hYzUyLTQ4MTM0OTk2YmE3Mi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwMTAyJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDEwMlQwNzIwMDJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02MWVjYzZiYzZiNDMzY2FlNGVlZWFiZjQ2ZWNkZmE3OWRlYjgxYWVlYjkxYTUzYWFkODRlMDA5M2UyMmEwOWVmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.SfRyt_oMD0rrEGOHgFfW8YRGa0cfAZD_hKXaNX-3b3o"> @arunavo4 Hi, I guess i figured out the problem. Since my repo's are big(some of them has +10k issue) and while mirroring for the first time the gitea-mirror's self schedule also running(i setted to 5 minute), the issues were duplicating. Giving more time for the initial mirror solved my problem.
Author
Owner

@arunavo4 commented on GitHub (Jan 2, 2026):

@arunavo4 Hi,

Sharing logs might be overwhelming since i am mirroring more then +200 repos. Here is an example of my situation, we can see 3 different issue has been writed for 9 times in total.

Image

@arunavo4 Hi,

I guess i figured out the problem. Since my repo's are big(some of them has +10k issue) and while mirroring for the first time the gitea-mirror's self schedule also running(i setted to 5 minute), the issues or the prs getting duplicated. Giving more time for the initial mirror solved my problem.

Ahhh okay maybe we should put some guards but it will make it very complex, maybe i will document this somewhere in readme. Thanks for reporting back @emrebasarannn

<!-- gh-comment-id:3704661304 --> @arunavo4 commented on GitHub (Jan 2, 2026): > > [@arunavo4](https://github.com/arunavo4) Hi, > > > > Sharing logs might be overwhelming since i am mirroring more then +200 repos. Here is an example of my situation, we can see 3 different issue has been writed for 9 times in total. > > > > <img alt="Image" width="804" height="601" src="https://private-user-images.githubusercontent.com/221116373/529401456-cd8b9c17-4033-4cc1-ac52-48134996ba72.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NjczMzg3MDIsIm5iZiI6MTc2NzMzODQwMiwicGF0aCI6Ii8yMjExMTYzNzMvNTI5NDAxNDU2LWNkOGI5YzE3LTQwMzMtNGNjMS1hYzUyLTQ4MTM0OTk2YmE3Mi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwMTAyJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDEwMlQwNzIwMDJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02MWVjYzZiYzZiNDMzY2FlNGVlZWFiZjQ2ZWNkZmE3OWRlYjgxYWVlYjkxYTUzYWFkODRlMDA5M2UyMmEwOWVmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.SfRyt_oMD0rrEGOHgFfW8YRGa0cfAZD_hKXaNX-3b3o"> > > @arunavo4 Hi, > > I guess i figured out the problem. Since my repo's are big(some of them has +10k issue) and while mirroring for the first time the gitea-mirror's self schedule also running(i setted to 5 minute), the issues or the prs getting duplicated. Giving more time for the initial mirror solved my problem. Ahhh okay maybe we should put some guards but it will make it very complex, maybe i will document this somewhere in readme. Thanks for reporting back @emrebasarannn
Author
Owner

@arunavo4 commented on GitHub (Feb 24, 2026):

Thanks for the detailed reports. I agree this should be handled operationally rather than by adding complex guards.

I opened #179 to document a safer first-run setup for large repositories:

  • avoid very short intervals (e.g. 5m) during initial bootstrap
  • use 1h to 8h (or temporarily disable scheduling)
  • re-enable your normal interval after initial import/mirror finishes

This addresses the duplicate-looking retry pattern seen when Gitea import/migration is slow.

<!-- gh-comment-id:3948894444 --> @arunavo4 commented on GitHub (Feb 24, 2026): Thanks for the detailed reports. I agree this should be handled operationally rather than by adding complex guards. I opened #179 to document a safer first-run setup for large repositories: - avoid very short intervals (e.g. `5m`) during initial bootstrap - use `1h` to `8h` (or temporarily disable scheduling) - re-enable your normal interval after initial import/mirror finishes This addresses the duplicate-looking retry pattern seen when Gitea import/migration is slow.
Author
Owner

@arunavo4 commented on GitHub (Feb 24, 2026):

Resolved by #179 (merged): documented initial large-repo sync scheduling guidance to prevent duplicate-looking retries during bootstrap.

<!-- gh-comment-id:3948901993 --> @arunavo4 commented on GitHub (Feb 24, 2026): Resolved by #179 (merged): documented initial large-repo sync scheduling guidance to prevent duplicate-looking retries during bootstrap.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/gitea-mirror#77
No description provided.