mirror of
https://github.com/lldap/lldap.git
synced 2026-04-25 08:15:52 +03:00
[GH-ISSUE #756] [BUG] LLDAP stuck in restart loop with UNIQUE constraint failures on fresh sqlite install - lldap exited code 132 #277
Labels
No labels
backend
blocked
bug
cleanup
dependencies
docker
documentation
duplicate
enhancement
enhancement
frontend
github_actions
good first issue
help wanted
help wanted
integration
invalid
ldap
pull-request
question
rust
rust
tests
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/lldap-lldap#277
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tylerpace on GitHub (Dec 10, 2023).
Original GitHub issue: https://github.com/lldap/lldap/issues/756
Describe the bug
lldapv0.5.0 is stuck in a restart loop on a fresh install due to failed UNIQUE database constraints in sqlite.To Reproduce
For context, I tried to upgrade from 0.4.3 to 0.5.0 and got the expected UNIQUE constraint error on emails that I had shared across different users. I updated the emails directly in
users.dbusingsqlite3butlldapcontinued to restart with failed UNIQUE constraints.So, I started with a fresh v0.5.0 install with a clean data directory with the following
docker composespecification:lldapcontinues to get stuck in a restart loop with this output:I stopped
lldapand manually updated theadminemail viasqlite3with this command:Then, I restarted
lldapand got stuck in a similar restart loop due to a failed UNIQUE constrains onusers.user_id. The following output will loop every few seconds.Expected behavior
I expect
lldapto instantiate with a default admin account that I can use to login and to complete my configuration.Additional context
I can't get 0.4.3 to work on a fresh install either.
lldapimmediately falls into a restart loop even though logs appear to be successful. The following output will loop every few seconds.@nitnelave commented on GitHub (Dec 11, 2023):
Something seems wrong here: normally, before doing the migration, we check
whether there are duplicate emails and we refuse to upgrade (or is that
only on latest and it hasn't been released?)
It's failing when trying to create the admin user, which doesn't come with
an email. But why is it trying to create an admin user? It should only do
that if there are no users in the lldap_admin group. And certainly, it
should work with an empty database (it's tested on every commit).
Can you:
SELECT user_id, email FROM users;)SELECT user_id FROM memberships WHERE group_id = 0;)You can also join Discord and ask over there to try to debug interactively,
we'll post the results back here.
On Sun, 10 Dec 2023, 23:41 Tyler Pace, @.***> wrote:
@tylerpace commented on GitHub (Dec 13, 2023):
Verbose log output from
lldap0.5.0 with a fresh install (empty/datadirectory).The user list:
lldap_admin members:
@nitnelave commented on GitHub (Dec 13, 2023):
There's something weird, that cannot be the logs from a fresh install with an empty data directory: the database already exists (you can see that no migration was done, it directly returned the latest schema version).
I'm guessing that there is an issue, and you're not looking at the correct /data directory. Can you post the LLDAP section of your docker compose?
That, or it's not actually the very first logs you get after deleting the DB, only the second time you start the service.
@tylerpace commented on GitHub (Dec 13, 2023):
My
composespecification forlldap.${DOCKER_DIR}/lldapis completely empty at startup. I'mrming after every attempt as part of this troubleshooting.But, you raise a good point -- I was grabbing the most recent copy of the
lldaplogs by mistake. The restart cycle happens almost instantly so it's hard to capture the initial startup fromdocker logs. However, I pipe all my container logs to grafana and took a look over there to see if I can find the first execution oflldap.Here's a
lldaplog export from grafana that covers the initial startup and subsequent restarts. It looks likelldapstarts and silently fails (?) and then gets stuck in the loop caused by theUNIQUEconstraint.lldap_v050_startup.txt
Same logs in a Github Gist
@martadinata666 commented on GitHub (Dec 14, 2023):
Is this work if using volume?
And about the command,
docker compose upanddocker compose down -v(will wipe the volume also, just ensuring it empty. Is the${DOCKER_DIR}/lldapon remote mounts such as nfs, smb?@nitnelave commented on GitHub (Dec 14, 2023):
Interesting! Maybe you can remove the "restart: unless-stopped" from compose? That way you'd only get the first start.
I think grafana only captured stdout, but docker will also give you stderr. Or can you get that in grafana as well?
I feel like there was a panic before it finished the "setting up server" part and didn't get to log anything.
@tylerpace commented on GitHub (Dec 14, 2023):
@martadinata666 I modified my
composeto start with a fresh volume and appear to be in the same restart loop.Here's the initial output from
docker compose up:But, checking
docker logs lldapand grafana show the same restart cycle ending in theUNIQUEconstraint problem.My docker volumes are hosted via NFS via TrueNAS. I know it's not ideal to run sqlite dbs on NFS, but this setup ran fine for a long while on v0.4.x and now I can't even rollback to that version family. I can try a setup without sqlite, but probably not until middle of next week.
@nitnelave Progress! I removed
restart: unless-stoppedand got a new exit code of132.Thank you both for the troubleshooting help!
@nitnelave commented on GitHub (Dec 14, 2023):
132? That's illegal instruction (unless you have a weird Linux). That would indeed stop the program in it's tracks with no hope of logging, to stderr or stdout.
What is your cpu/architecture? And which docker image were you using?
A potential way forward would be to recompile lldap yourself (which is very easy with cargo, check the readme). If you compile it on the machine itself, it shouldn't generate illegal instructions.
Let's see if it solves the problem (you should be able to just copy the new binary into the container and restart it).
@tylerpace commented on GitHub (Dec 14, 2023):
@martadinata666 To clarify my earlier comment about NFS.
My normal process is to host persistent docker volumes on NFS, but in your test case using the
lldap-datavolume that volume is hosted directly on my docker host. My docker host is ubuntu server virtualized on TrueNAS using a zvol for storage.@nitnelave commented on GitHub (Dec 14, 2023):
What's the output of
uname -aandcat /proc/cpuinfo(you can truncate to just one core)?@tylerpace commented on GitHub (Dec 14, 2023):
@nitnelave I'm running virtualized Ubuntu on TrueNAS with
HostCPU mode (AMD Ryzen).lldapworked great for months on this setup. It all went south when I first tried to upgrade to v0.5.0 and I've been stuck in this restart loop every since. There have been no changes to server HW or the ubuntu virtualization settings, just the normal course of ubuntu OS updates, updates to other containers, etc.OS info:
CPU info:
Docker info:
Container info:
@tylerpace commented on GitHub (Dec 14, 2023):
@nitnelave commented on GitHub (Dec 14, 2023):
You could try the non-alpine container, but I'm not sure it'd change anything.
Actually if you could get a coredump, we could see where the invalid instruction is and try to see if there's anything specific to the dependency that we could change.
@nitnelave commented on GitHub (Dec 14, 2023):
Note that given the state of the DB after the crash, it's almost certainly in the crypto operations to set the password.
@martadinata666 v4 to v5, could it be due to the libc change? I remember something about musl, did we change anything with the Alpine container?
@nitnelave commented on GitHub (Dec 14, 2023):
Could be relevant: https://gitlab.torproject.org/tpo/core/arti/-/issues/571
Note that SIGILL can also occur when panicking while processing a panic.
@martadinata666 commented on GitHub (Dec 15, 2023):
Oh interesting, 132 is about missing cpu instructions. Mostly
avxorsse. Thus used by some cryptography/random number generation.1st let's rule out a VM issue or a Host issue:
I predicted this will run fine@tylerpace commented on GitHub (Dec 15, 2023):
FYI I'll be AFK until Tuesday of next week. I don't want you to think that I've ghosted you after all the support.
I'll take a swing at the latest suggestions upon my return.
@tylerpace commented on GitHub (Dec 19, 2023):
Good news, it appears the issue was related to my virtualization settings in TrueNAS.
Changing
CPU ModefromHost ModeltoHost Passthroughresolved the132exit code which allowed for the proper initialization of the user database and prevented theUNIQUEconstraint cycle on future runs.Thanks for the troubleshooting!