[PR #386] [MERGED] Add per-agent timeouts and partial committee handling in today workflow service #688

Closed
opened 2026-03-13 21:04:00 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/AJaySi/ALwrity/pull/386
Author: @AJaySi
Created: 3/6/2026
Status: Merged
Merged: 3/7/2026
Merged by: @AJaySi

Base: mainHead: codex/wrap-propose_daily_tasks-in-asyncio.wait_for


📝 Commits (1)

  • 198143e Add per-agent timeout handling for daily committee proposals

📊 Changes

1 file changed (+90 additions, -26 deletions)

View changed files

📝 backend/services/today_workflow_service.py (+90 -26)

📄 Description

Motivation

  • Prevent slow or stuck committee agents from blocking the daily plan generation pipeline by bounding each agent call with a short timeout.
  • Collect whatever proposals are available from responsive agents so the system can still consolidate and enforce pillar coverage when only a subset of agents respond.
  • Record per-agent timing/timeout/error metrics to help observability and tuning of agent timeouts.
  • Keep the original LLM fallback only for total committee failure to avoid unnecessary fallbacks when partial results exist.

Description

  • Added _get_agent_proposal_timeout_seconds(grounding) to read a configurable workflow_config.agent_proposal_timeout_seconds (default 4s, min 1s).
  • Replaced the single asyncio.gather over agent calls with a per-agent _collect_agent_proposals runner that wraps each agent.propose_daily_tasks(grounding) in asyncio.wait_for(...) and returns a structured result (agent_key, status, elapsed_ms, proposals, error).
  • Logged per-agent metrics (agent, status, elapsed_ms, timeout_s, proposal_count) and emitted warnings on timeouts/errors.
  • Kept deduplication and memory filtering, but changed final selection to proceed with consolidation and _ensure_pillar_coverage when the committee partially succeeds; only trigger the LLM fallback when the entire committee phase fails or no agents responded successfully.

Testing

  • Ran python -m py_compile backend/services/today_workflow_service.py and it completed successfully.
  • Verified the modified file compiles and basic runtime syntax is valid via the repository tooling used in the change process.

Codex Task


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/AJaySi/ALwrity/pull/386 **Author:** [@AJaySi](https://github.com/AJaySi) **Created:** 3/6/2026 **Status:** ✅ Merged **Merged:** 3/7/2026 **Merged by:** [@AJaySi](https://github.com/AJaySi) **Base:** `main` ← **Head:** `codex/wrap-propose_daily_tasks-in-asyncio.wait_for` --- ### 📝 Commits (1) - [`198143e`](https://github.com/AJaySi/ALwrity/commit/198143e6cad444b1798ae4524c00f3a13f5fbd75) Add per-agent timeout handling for daily committee proposals ### 📊 Changes **1 file changed** (+90 additions, -26 deletions) <details> <summary>View changed files</summary> 📝 `backend/services/today_workflow_service.py` (+90 -26) </details> ### 📄 Description ### Motivation - Prevent slow or stuck committee agents from blocking the daily plan generation pipeline by bounding each agent call with a short timeout. - Collect whatever proposals are available from responsive agents so the system can still consolidate and enforce pillar coverage when only a subset of agents respond. - Record per-agent timing/timeout/error metrics to help observability and tuning of agent timeouts. - Keep the original LLM fallback only for total committee failure to avoid unnecessary fallbacks when partial results exist. ### Description - Added `_get_agent_proposal_timeout_seconds(grounding)` to read a configurable `workflow_config.agent_proposal_timeout_seconds` (default 4s, min 1s). - Replaced the single `asyncio.gather` over agent calls with a per-agent `_collect_agent_proposals` runner that wraps each `agent.propose_daily_tasks(grounding)` in `asyncio.wait_for(...)` and returns a structured result (`agent_key`, `status`, `elapsed_ms`, `proposals`, `error`). - Logged per-agent metrics (`agent`, `status`, `elapsed_ms`, `timeout_s`, `proposal_count`) and emitted warnings on timeouts/errors. - Kept deduplication and memory filtering, but changed final selection to proceed with consolidation and `_ensure_pillar_coverage` when the committee partially succeeds; only trigger the LLM fallback when the entire committee phase fails or no agents responded successfully. ### Testing - Ran `python -m py_compile backend/services/today_workflow_service.py` and it completed successfully. - Verified the modified file compiles and basic runtime syntax is valid via the repository tooling used in the change process. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_e_69aaf77551848328aec1e42cf81a7981) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-13 21:04:00 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ALwrity#688
No description provided.