[GH-ISSUE #801] [BUG] redis: discarding bad PubSub connection #2420

Open
opened 2026-03-15 20:25:21 +03:00 by kerem · 2 comments
Owner

Originally created by @BrandSnob on GitHub (Jan 10, 2024).
Original GitHub issue: https://github.com/hibiken/asynq/issues/801

Originally assigned to: @hibiken on GitHub.

Describe the bug
Hello, The bug seems like to be a famous bug with go-redis v8
BUT the problem is that there is no use of go-redis v8 in my project, ONLY go-redis v9
The error happens randomly, rarely at the starts and sometimes after few hours!

To Reproduce
Steps to reproduce the behavior (Code snippets if applicable):
Server codes is as below, however, the server is not only asynq server and it actually does other jobs in goroutines
Server codes:

	// limit 1 event per 5 minutes
	limiter = rate.NewLimiter(rate.Every(5*time.Minute), 2)
	if ctx == nil {
		ctx = context.Background()
	}
	// Build the worker server
	srv := asynq.NewServer(
		asynq.RedisClientOpt{
			Addr:     fmt.Sprintf("%s:%d", c.Config.Cache.Hostname, c.Config.Cache.Port),
			DB:       db,
			Password: c.Config.Cache.Password,
		},
		asynq.Config{
			Concurrency:    1,
			BaseContext:    func() context.Context { return ctx },
			IsFailure:      func(err error) bool { return !IsRateLimitError(err) },
			RetryDelayFunc: retryDelay,
			Queues: map[string]int{
				"critical": 6,
				"default":  3,
				"low":      1,
			},
			Logger:                   c.Web.Logger,
			ShutdownTimeout:          0,
			DelayedTaskCheckInterval: 0,
			GroupGracePeriod:         10 * time.Second,
			GroupMaxDelay:            10 * time.Second,
			GroupMaxSize:             0,
			GroupAggregator:          nil,
		},
	)

	// Map task types to the handlers
	mux := asynq.NewServeMux()
	mux.Handle(..., ...)
	if err := srv.Run(mux); err != nil {
		log.Fatalf("could not run worker server: %v", err)
	}

the BaseContext is created once in the main and used over several parts of the app

Expected behavior
Just keep running as it usually does. or at least, reconnect once this issue happens!
It keep stuck like that all the whole time until I restart the server container (not redis or anything else)

Environment (please complete the following information):

  • OS: Linux, Debian, Docker, Debian application container, service_cache is a redis:alpine container with health-check enabled (and healthy while the error occurs)
  • Containers are all in the same external network:
networks:
  app-net:
    external: true
  • Version of asynq package: v0.24.1

Additional context
Logs:

redis: 2024/01/09 13:26:38 pubsub.go:168: redis: discarding bad PubSub connection: write tcp 172.23.0.16:52106->172.23.0.8:6379: i/o timeout
{"time":"2024-01-09T13:26:47.795096432","level":"WARN","prefix":"echo","file":"log.go","line":"169","message":"recoverer: could not list lease expired tasks: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"}
asynq: pid=15 2024/01/09 16:32:52.600476 WARN: Scheduler could not write heartbeat data: UNKNOWN: redis command error: ZADD failed: dial tcp: lookup service_cache: i/o timeout
asynq: pid=15 2024/01/09 16:32:58.352098 WARN: Scheduler could not write heartbeat data: UNKNOWN: redis command error: ZADD failed: dial tcp: lookup service_cache: i/o timeout
{"time":"2024-01-09T20:02:58.361862078","level":"WARN","prefix":"echo","file":"log.go","line":"169","message":"recoverer: could not list lease expired tasks: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"}
{"time":"2024-01-09T20:05:26.076166895","level":"ERROR","prefix":"echo","file":"log.go","line":"176","message":"Failed to forward scheduled tasks: INTERNAL_ERROR: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"}

And it goes on with similar logs

Originally created by @BrandSnob on GitHub (Jan 10, 2024). Original GitHub issue: https://github.com/hibiken/asynq/issues/801 Originally assigned to: @hibiken on GitHub. **Describe the bug** Hello, The bug seems like to be a famous bug with go-redis v8 BUT the problem is that there is no use of go-redis v8 in my project, ONLY go-redis v9 The error happens randomly, rarely at the starts and sometimes after few hours! **To Reproduce** Steps to reproduce the behavior (Code snippets if applicable): Server codes is as below, however, the server is not only asynq server and it actually does other jobs in goroutines Server codes: ```go // limit 1 event per 5 minutes limiter = rate.NewLimiter(rate.Every(5*time.Minute), 2) if ctx == nil { ctx = context.Background() } // Build the worker server srv := asynq.NewServer( asynq.RedisClientOpt{ Addr: fmt.Sprintf("%s:%d", c.Config.Cache.Hostname, c.Config.Cache.Port), DB: db, Password: c.Config.Cache.Password, }, asynq.Config{ Concurrency: 1, BaseContext: func() context.Context { return ctx }, IsFailure: func(err error) bool { return !IsRateLimitError(err) }, RetryDelayFunc: retryDelay, Queues: map[string]int{ "critical": 6, "default": 3, "low": 1, }, Logger: c.Web.Logger, ShutdownTimeout: 0, DelayedTaskCheckInterval: 0, GroupGracePeriod: 10 * time.Second, GroupMaxDelay: 10 * time.Second, GroupMaxSize: 0, GroupAggregator: nil, }, ) // Map task types to the handlers mux := asynq.NewServeMux() mux.Handle(..., ...) if err := srv.Run(mux); err != nil { log.Fatalf("could not run worker server: %v", err) } ``` the BaseContext is created once in the main and used over several parts of the app **Expected behavior** Just keep running as it usually does. or at least, reconnect once this issue happens! It keep stuck like that all the whole time until I restart the server container (not redis or anything else) **Environment (please complete the following information):** - OS: Linux, Debian, Docker, Debian application container, service_cache is a `redis:alpine` container with health-check enabled (and healthy while the error occurs) - Containers are all in the same external network: ``` networks: app-net: external: true ``` - Version of `asynq` package: `v0.24.1` **Additional context** Logs: ``` redis: 2024/01/09 13:26:38 pubsub.go:168: redis: discarding bad PubSub connection: write tcp 172.23.0.16:52106->172.23.0.8:6379: i/o timeout {"time":"2024-01-09T13:26:47.795096432","level":"WARN","prefix":"echo","file":"log.go","line":"169","message":"recoverer: could not list lease expired tasks: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"} asynq: pid=15 2024/01/09 16:32:52.600476 WARN: Scheduler could not write heartbeat data: UNKNOWN: redis command error: ZADD failed: dial tcp: lookup service_cache: i/o timeout asynq: pid=15 2024/01/09 16:32:58.352098 WARN: Scheduler could not write heartbeat data: UNKNOWN: redis command error: ZADD failed: dial tcp: lookup service_cache: i/o timeout {"time":"2024-01-09T20:02:58.361862078","level":"WARN","prefix":"echo","file":"log.go","line":"169","message":"recoverer: could not list lease expired tasks: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"} {"time":"2024-01-09T20:05:26.076166895","level":"ERROR","prefix":"echo","file":"log.go","line":"176","message":"Failed to forward scheduled tasks: INTERNAL_ERROR: INTERNAL_ERROR: redis eval error: dial tcp: lookup service_cache: i/o timeout"} ``` And it goes on with similar logs
Author
Owner

@kamikazechaser commented on GitHub (Jan 28, 2024):

Check if the issue persists when you use the following versions:

# asynq
go get github.com/hibiken/asynq@master
# asynq/x
go get github.com/hibiken/asynq/x@master
#asynq/tools
go get github.com/hibiken/asynq/tools@master
<!-- gh-comment-id:1913514403 --> @kamikazechaser commented on GitHub (Jan 28, 2024): Check if the issue persists when you use the following versions: ``` # asynq go get github.com/hibiken/asynq@master # asynq/x go get github.com/hibiken/asynq/x@master #asynq/tools go get github.com/hibiken/asynq/tools@master ```
Author
Owner

@BrandSnob commented on GitHub (Jun 12, 2024):

Check if the issue persists when you use the following versions:

# asynq
go get github.com/hibiken/asynq@master
# asynq/x
go get github.com/hibiken/asynq/x@master
#asynq/tools
go get github.com/hibiken/asynq/tools@master

Thanks, I did made some changes and also updated the packages you mentioned and I have not yet encountered this issue.

<!-- gh-comment-id:2162441284 --> @BrandSnob commented on GitHub (Jun 12, 2024): > Check if the issue persists when you use the following versions: > > ``` > # asynq > go get github.com/hibiken/asynq@master > # asynq/x > go get github.com/hibiken/asynq/x@master > #asynq/tools > go get github.com/hibiken/asynq/tools@master > ``` Thanks, I did made some changes and also updated the packages you mentioned and I have not yet encountered this issue.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/asynq#2420
No description provided.