[GH-ISSUE #261] System broke after upgrade to 1.7.0 #193

Closed
opened 2026-02-25 23:41:32 +03:00 by kerem · 8 comments
Owner

Originally created by @stevenmcastano on GitHub (Jun 21, 2019).
Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/261

I'm getting an error while trying to look at the checks... I can get to other pages, but anytime I try to look at the checks I get the following:

Environment:


Request Method: GET
Request URL: http://localhost:8080/projects/ebc77da9-a809-43be-a3b3-444d809ea3ac/checks/

Django Version: 2.2.2
Python Version: 3.6.8
Installed Applications:
('django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.humanize',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles',
 'compressor',
 'hc.accounts',
 'hc.api',
 'hc.front',
 'hc.payments')
Installed Middleware:
('django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware',
 'hc.accounts.middleware.TeamAccessMiddleware')



Traceback:

File "/opt/hc-venv/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner
  34.             response = get_response(request)

File "/opt/hc-venv/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  115.                 response = self.process_exception_by_middleware(e, request)

File "/opt/hc-venv/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response
  113.                 response = wrapped_callback(request, *callback_args, **callback_kwargs)

File "/opt/hc-venv/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view
  21.                 return view_func(request, *args, **kwargs)

File "/opt/hc2/healthchecks-1.7.0/hc/front/views.py" in my_checks
  113.     sortchecks(checks, request.profile.sort)

File "/opt/hc2/healthchecks-1.7.0/hc/front/templatetags/hc_extras.py" in sortchecks
  80.     checks.sort(key=not_down_key)

File "/opt/hc2/healthchecks-1.7.0/hc/front/templatetags/hc_extras.py" in not_down_key
  64.     return check.get_status() != "down"

File "/opt/hc2/healthchecks-1.7.0/hc/api/models.py" in get_status
  161.         grace_end = grace_start + self.grace

Exception Type: TypeError at /projects/ebc77da9-a809-43be-a3b3-444d809ea3ac/checks/
Exception Value: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta'
Originally created by @stevenmcastano on GitHub (Jun 21, 2019). Original GitHub issue: https://github.com/healthchecks/healthchecks/issues/261 I'm getting an error while trying to look at the checks... I can get to other pages, but anytime I try to look at the checks I get the following: ``` Environment: Request Method: GET Request URL: http://localhost:8080/projects/ebc77da9-a809-43be-a3b3-444d809ea3ac/checks/ Django Version: 2.2.2 Python Version: 3.6.8 Installed Applications: ('django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.humanize', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', 'compressor', 'hc.accounts', 'hc.api', 'hc.front', 'hc.payments') Installed Middleware: ('django.middleware.security.SecurityMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.common.CommonMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', 'django.middleware.clickjacking.XFrameOptionsMiddleware', 'hc.accounts.middleware.TeamAccessMiddleware') Traceback: File "/opt/hc-venv/lib/python3.6/site-packages/django/core/handlers/exception.py" in inner 34. response = get_response(request) File "/opt/hc-venv/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response 115. response = self.process_exception_by_middleware(e, request) File "/opt/hc-venv/lib/python3.6/site-packages/django/core/handlers/base.py" in _get_response 113. response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/opt/hc-venv/lib/python3.6/site-packages/django/contrib/auth/decorators.py" in _wrapped_view 21. return view_func(request, *args, **kwargs) File "/opt/hc2/healthchecks-1.7.0/hc/front/views.py" in my_checks 113. sortchecks(checks, request.profile.sort) File "/opt/hc2/healthchecks-1.7.0/hc/front/templatetags/hc_extras.py" in sortchecks 80. checks.sort(key=not_down_key) File "/opt/hc2/healthchecks-1.7.0/hc/front/templatetags/hc_extras.py" in not_down_key 64. return check.get_status() != "down" File "/opt/hc2/healthchecks-1.7.0/hc/api/models.py" in get_status 161. grace_end = grace_start + self.grace Exception Type: TypeError at /projects/ebc77da9-a809-43be-a3b3-444d809ea3ac/checks/ Exception Value: unsupported operand type(s) for +: 'NoneType' and 'datetime.timedelta' ```
kerem closed this issue 2026-02-25 23:41:32 +03:00
Author
Owner

@stevenmcastano commented on GitHub (Jun 21, 2019):

It looks like if I create a new project, everything seems to work under that new one... but not my original project that was migrated in.

<!-- gh-comment-id:504286602 --> @stevenmcastano commented on GitHub (Jun 21, 2019): It looks like if I create a new project, everything seems to work under that new one... but not my original project that was migrated in.
Author
Owner

@cuu508 commented on GitHub (Jun 21, 2019):

One quick thing to check before digging deeper – did you run the database migrations (./manage.py migrate)?

<!-- gh-comment-id:504322045 --> @cuu508 commented on GitHub (Jun 21, 2019): One quick thing to check before digging deeper – did you run the database migrations (`./manage.py migrate`)?
Author
Owner

@stevenmcastano commented on GitHub (Jun 21, 2019):

I did run the migrate yes... and I've got it "maybe" fixed. I dug a little deeper, after looking at the sort that was actually failing, based on check status, I went into the django admin console and took a look at the data to see if there was something weird...

It turns out one check someone had no status... it wasn't up, down, late, paused... it was just blank. Apparently that null value was what I throwing the whole thing off. When I manually set it in the admin console to "down" I was able to log in and see all my checks and the error went away.

It took until about 3am last night to figure it out those... I was pretty wiped out. So right now, I'm pulling a copy of my production data and putting it back into my staging server and I'm going to run the migrate process again as well as look at the SQL export from both yesterday and today and try to figure out if this was just an insane stroke of luck to catch a check in between status changes or if the migration somehow is messing with.

If it's reproducible, I'll capture as much data as possible... but it could have just been a fluke of freak timing!

<!-- gh-comment-id:504448365 --> @stevenmcastano commented on GitHub (Jun 21, 2019): I did run the migrate yes... and I've got it "maybe" fixed. I dug a little deeper, after looking at the sort that was actually failing, based on check status, I went into the django admin console and took a look at the data to see if there was something weird... It turns out one check someone had *no* status... it wasn't up, down, late, paused... it was just blank. Apparently that null value was what I throwing the whole thing off. When I manually set it in the admin console to "down" I was able to log in and see all my checks and the error went away. It took until about 3am last night to figure it out those... I was pretty wiped out. So right now, I'm pulling a copy of my production data and putting it back into my staging server and I'm going to run the migrate process again as well as look at the SQL export from both yesterday and today and try to figure out if this was just an insane stroke of luck to catch a check in between status changes or if the migration somehow is messing with. If it's reproducible, I'll capture as much data as possible... but it could have just been a fluke of freak timing!
Author
Owner

@stevenmcastano commented on GitHub (Jun 23, 2019):

As it turns out this is reproducible... it seems to be happen to one check in particular. I've got one check with 1 minute timeout and a 1 minute grace period... and I've tried resetting the data to the old version and doing the migration again a few times, and every time it seems to hang up on that one check. Even in my production system that one check sits there with no status in it.

<!-- gh-comment-id:504712692 --> @stevenmcastano commented on GitHub (Jun 23, 2019): As it turns out this is reproducible... it seems to be happen to one check in particular. I've got one check with 1 minute timeout and a 1 minute grace period... and I've tried resetting the data to the old version and doing the migration again a few times, and every time it seems to hang up on that one check. Even in my production system that one check sits there with no status in it.
Author
Owner

@cuu508 commented on GitHub (Jun 24, 2019):

Even in my production system that one check sits there with no status in it.

That's the culprit here. If you manually changed it to, say, "down" in the production database I'm pretty sure the issues would go away.

But the obvious question of course is how it ended up with a blank / null status in the first place. I'm not sure how that could have happened. How old is that check? Has it received any pings? If you have database backups, would it be possible to look at backups and pin down the time when it went from a valid status to null?

<!-- gh-comment-id:505007596 --> @cuu508 commented on GitHub (Jun 24, 2019): > Even in my production system that one check sits there with no status in it. That's the culprit here. If you manually changed it to, say, "down" in the production database I'm pretty sure the issues would go away. But the obvious question of course is how it ended up with a blank / null status in the first place. I'm not sure how that could have happened. How old is that check? Has it received any pings? If you have database backups, would it be possible to look at backups and pin down the time when it went from a valid status to null?
Author
Owner

@stevenmcastano commented on GitHub (Jun 24, 2019):

That's the weird part... it's an active check that actually gets pings constantly. I did change it to down, and up... and it just changes back to NULL again. It seems to be something weird with the 1 minute and 1 minute configuration.

The good news though... is that once I switch it in the django admin site everything goes back to normal and on v1.7.0 it seems to reflect it's status properly now.

<!-- gh-comment-id:505123032 --> @stevenmcastano commented on GitHub (Jun 24, 2019): That's the weird part... it's an active check that actually gets pings constantly. I did change it to down, and up... and it just changes back to NULL again. It seems to be something weird with the 1 minute and 1 minute configuration. The good news though... is that once I switch it in the django admin site everything goes back to normal and on v1.7.0 it seems to reflect it's status properly now.
Author
Owner

@cuu508 commented on GitHub (Jun 24, 2019):

Still trying to wrap my head around this, some clarifying questions--

I did change it to down, and up

Did you do that through the Django admin or some other way?

and it just changes back to NULL again

Do you reckon that was done by sendalerts? i.e., did it switch back to null instantly, or a minute or so later?

It seems to be something weird with the 1 minute and 1 minute configuration.

Is this the only check with that configuration, and also the only check having an issue?

once I switch it in the django admin site everything goes back to normal

So if you update the status in Django admin, it doesn't flip back to null any more?

and on v1.7.0 it seems to reflect it's status properly now

Did you get the problematic check fixed "permanenlty" before upgrading to v1.7.0 or only after?

<!-- gh-comment-id:505135739 --> @cuu508 commented on GitHub (Jun 24, 2019): Still trying to wrap my head around this, some clarifying questions-- > I did change it to down, and up Did you do that through the Django admin or some other way? > and it just changes back to NULL again Do you reckon that was done by sendalerts? i.e., did it switch back to null instantly, or a minute or so later? > It seems to be something weird with the 1 minute and 1 minute configuration. Is this the only check with that configuration, and also the only check having an issue? > once I switch it in the django admin site everything goes back to normal So if you update the status in Django admin, it doesn't flip back to null any more? > and on v1.7.0 it seems to reflect it's status properly now Did you get the problematic check fixed "permanenlty" before upgrading to v1.7.0 or only after?
Author
Owner

@stevenmcastano commented on GitHub (Jun 24, 2019):

The manual changes I made were through django admin, yes

As for it switching back... as soon as it hit a status change away from "Up" it did... it's a check that happens "almost" every minute, so it does come in a few seconds late sometimes. So once it was late and changed, status, then a new ping would come in, it would switch back to NULL... and yes, this is the only check did. It's also the only check in my system that had such low times.

Yes, after the upgrade to v1.7.0 it does seem to switch status properly now. As I look closer, then doesn't appear to be a status for "late"... I wonder if it just kept getting caught in between? Was there one in a previous version?

The thing I'm finding weird here is that I've created a "Testing" ping with 1 minute interval and a 1 minute grace... and it shows in the normal web interface when it's late, and that it's down... but if I look in django admin, and directly in the database, both still say it's up. It seems like it's just not updating the database. Is it supposed to? Or does it cache it in memory somewhere?

<!-- gh-comment-id:505190851 --> @stevenmcastano commented on GitHub (Jun 24, 2019): The manual changes I made were through django admin, yes As for it switching back... as soon as it hit a status change away from "Up" it did... it's a check that happens "almost" every minute, so it does come in a few seconds late sometimes. So once it was late and changed, status, then a new ping would come in, it would switch back to NULL... and yes, this is the only check did. It's also the only check in my system that had such low times. Yes, after the upgrade to v1.7.0 it does seem to switch status properly now. As I look closer, then doesn't appear to be a status for "late"... I wonder if it just kept getting caught in between? Was there one in a previous version? The thing I'm finding weird here is that I've created a "Testing" ping with 1 minute interval and a 1 minute grace... and it shows in the normal web interface when it's late, and that it's down... but if I look in django admin, and directly in the database, both still say it's up. It seems like it's just not updating the database. Is it supposed to? Or does it cache it in memory somewhere?
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/healthchecks#193
No description provided.