[GH-ISSUE #533] A more resilient circuit breaker strategy #183

Closed
opened 2026-02-26 04:34:18 +03:00 by kerem · 0 comments
Owner

Originally created by @mageddo on GitHub (Aug 1, 2024).
Original GitHub issue: https://github.com/mageddo/dns-proxy-server/issues/533

Summary & Motivation

Currently the circuit breaker strategy is set as below:

CircuitBreaker
  .builder()
  .failureThreshold(3)
  .failureThresholdCapacity(10)
  .successThreshold(5)
  .testDelay(Duration.ofSeconds(20))
  .build();

It's a very unflexible config, remote servers that are totally offline will be reactivated every 20 seconds and stable
servers which got 3 timeouts will be removed from the pool for 20 seconds, which is not ideal because it will make some
requests slower than they could be and not solve some requests for some time because of a temporary and isolated
denial of service of one remote server.

Description

  • Define a new circuit break strategy that will be more flexible and will be able to be configured by
    the user, strategy will be named CANARY_RATE_THRESHOLD
  • Actual strategy will be named: STATIC_THRESHOLD
  • When no config file exists the STATIC_THRESHOLD will be configured by default
  • When there is already a config file with circuit breaker config defined it will be parsed as STATIC_THRESHOLD
    even if no type was defined, making DPS compatible with previous versions.

Estratégia 2

Unit Test use cases

CircuitBreaker
  .builder()
  .failureRateThreshold(21f)
  .minimumNumberOfCalls(50)
  .permittedNumberOfCallsInHalfOpenState(10)
  .build();
....
"solverRemote" : {
    "circuitBreaker" : { 
      "strategy": "CANARY_RATE_THRESHOLD",
      "failureRateThreshold" : 21, // If the failure rate is equal to or greater than the threshold, the CircuitBreaker will transition to open. criteria: values greater than 0 and not greater than 100.
      "minimumNumberOfCalls" : 50, // Configures the minimum number of calls which are required (per sliding window period) before the CircuitBreaker can calculate the error rate.
      "permittedNumberOfCallsInHalfOpenState" : 10 // Configures the number of permitted calls when the CircuitBreaker is half open.
    }
  }

Circuito Fechado - Servidor cai

No circuit breaker, depois de 50 chamadas feitas, se 10 execuções tiverem 20% de erro ou mais, abre o circuito

Circuito Aberto - Servidor Volta

Worker que roda a cada 1 segundo checa o circuito de 1 até 3 chamadas, se uma delas retornar OK, muda o estado
para meio aberto.

Circuito Meio Aberto - Servidor Saudável

No circuit breaker, 10 execuções com menos de 20% de erro, o circuito muda o estado para fechado

Circuito Meio Aberto - Servidor Não Saudável

No circuit breaker, 10 execuções com mais de 20% de erro, o circuito mudar o estado para aberto

Tasks

  • Create an abstraction of circuit breaker implementation - Keep failsafe dep as resilience4j can't reproduce STATIC_THRESHOLD usecase, then I will need to abstract circuit breaker solution to be able to use the two libs at the same time
  • Create CANARY_RATE_THRESHOLD strategy
  • Make it possible to expose the new circuit breaker strategy at the config file
  • Expose new circuit breaker strategy config at the JSON file
  • Consider all circuits as open on app start this will evict to app get resolution fails right on the start because the first server on the remote servers list is offiline

Alternatives

Estratégia 1

  • Um servidor com circuito fechado tem que falhar pelo menos 50% das vezes com uma quantidade mínima de 10
    tentativas num período de 20 segundos
  • Um servidor com circuito aberto será testado a cada 3 segundos, apenas uma requisição irá ser enviada a ele,
    se 70% das requisições derem sucesso ele fecha novamente

Risks and Assumptions

  • The new strategy will be created but not defined as default for now as it would be a breaking change.
Originally created by @mageddo on GitHub (Aug 1, 2024). Original GitHub issue: https://github.com/mageddo/dns-proxy-server/issues/533 ## Summary & Motivation Currently the circuit breaker strategy is set as below: ```java CircuitBreaker .builder() .failureThreshold(3) .failureThresholdCapacity(10) .successThreshold(5) .testDelay(Duration.ofSeconds(20)) .build(); ``` It's a very unflexible config, remote servers that are totally offline will be reactivated every 20 seconds and stable servers which got 3 timeouts will be removed from the pool for 20 seconds, which is not ideal because it will make some requests slower than they could be and not solve some requests for some time because of a temporary and isolated denial of service of one remote server. ## Description * Define a new circuit break strategy that will be more flexible and will be able to be configured by the user, strategy will be named `CANARY_RATE_THRESHOLD` * Actual strategy will be named: `STATIC_THRESHOLD` * When no config file exists the `STATIC_THRESHOLD` will be configured by default * When there is already a config file with circuit breaker config defined it will be parsed as `STATIC_THRESHOLD` even if no type was defined, making DPS compatible with previous versions. ### Estratégia 2 [Unit Test use cases][1] ```java CircuitBreaker .builder() .failureRateThreshold(21f) .minimumNumberOfCalls(50) .permittedNumberOfCallsInHalfOpenState(10) .build(); ``` ```javascript .... "solverRemote" : { "circuitBreaker" : { "strategy": "CANARY_RATE_THRESHOLD", "failureRateThreshold" : 21, // If the failure rate is equal to or greater than the threshold, the CircuitBreaker will transition to open. criteria: values greater than 0 and not greater than 100. "minimumNumberOfCalls" : 50, // Configures the minimum number of calls which are required (per sliding window period) before the CircuitBreaker can calculate the error rate. "permittedNumberOfCallsInHalfOpenState" : 10 // Configures the number of permitted calls when the CircuitBreaker is half open. } } ``` #### Circuito Fechado - Servidor cai No circuit breaker, depois de 50 chamadas feitas, se 10 execuções tiverem 20% de erro ou mais, abre o circuito #### Circuito Aberto - Servidor Volta Worker que roda a cada 1 segundo checa o circuito de 1 até 3 chamadas, se uma delas retornar OK, muda o estado para **meio aberto**. #### Circuito Meio Aberto - Servidor Saudável No circuit breaker, 10 execuções com menos de 20% de erro, o circuito muda o estado para **fechado** #### Circuito Meio Aberto - Servidor Não Saudável No circuit breaker, 10 execuções com mais de 20% de erro, o circuito mudar o estado para **aberto** ### Tasks * [x] Create an abstraction of circuit breaker implementation - Keep failsafe dep as resilience4j can't reproduce `STATIC_THRESHOLD` usecase, then I will need to abstract circuit breaker solution to be able to use the two libs at the same time * [x] Create CANARY_RATE_THRESHOLD strategy * [x] Make it possible to expose the new circuit breaker strategy at the config file * [x] Expose new circuit breaker strategy config at the JSON file * [x] Consider all circuits as open on app start this will evict to app get resolution fails right on the start because the first server on the remote servers list is offiline ## Alternatives ### Estratégia 1 * Um servidor com circuito **fechado** tem que falhar pelo menos 50% das vezes com uma quantidade mínima de 10 tentativas num período de 20 segundos * Um servidor com circuito aberto será testado a cada 3 segundos, apenas uma requisição irá ser enviada a ele, se 70% das requisições derem sucesso ele fecha novamente ## Risks and Assumptions * The new strategy will be created but not defined as default for now as it would be a breaking change. [1]: https://github.com/mageddo/java-examples/blob/d67994da82fe8e6440187e8126b4ef94c8d9b9f4/fault-tolerance/fault-tolerance-libs/src/test/java/com/mageddo/resilience4j/circuitbreaker/DpsCircuitBreakerWithManualHalfOpenTest.java [2]: https://github.com/mageddo/java-examples/blob/d67994da82fe8e6440187e8126b4ef94c8d9b9f4/fault-tolerance/fault-tolerance-libs/src/test/java/com/mageddo/resilience4j/circuitbreaker/DpsCircuitBreakerTest.java
kerem 2026-02-26 04:34:18 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/dns-proxy-server-mageddo#183
No description provided.