[PR #430] Add bounded query-page opportunities to GSC analytics and normalization #736

Open
opened 2026-03-13 21:06:25 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/AJaySi/ALwrity/pull/430
Author: @AJaySi
Created: 3/12/2026
Status: 🔄 Open

Base: mainHead: codex/extend-gsc-data-retrieval-and-processing


📝 Commits (1)

  • 04612c7 Add bounded query-page opportunities to GSC analytics flow

📊 Changes

3 files changed (+68 additions, -5 deletions)

View changed files

📝 backend/api/content_planning/services/content_strategy/autofill/normalizers/analytics_normalizer.py (+2 -0)
📝 backend/services/analytics/handlers/gsc_handler.py (+36 -0)
📝 backend/services/gsc_service.py (+30 -5)

📄 Description

Motivation

  • Provide query→page opportunity data (query, page, clicks, impressions, ctr, position) to downstream content-planning and agent workflows so they can make refresh-vs-new brief recommendations.
  • Prevent oversized Search Console query+page payloads by bounding the request size and date window to keep API usage predictable and responses manageable.

Description

  • Added constants QUERY_PAGE_OPPORTUNITIES_ROW_LIMIT = 2500 and QUERY_PAGE_OPPORTUNITIES_MAX_WINDOW_DAYS = 90 and use them for the GSC ['query','page'] request to bound rows and date window in backend/services/gsc_service.py.
  • Implemented _get_query_page_opportunity_window to derive a bounded start/end date for the query+page request and included query_page_data.requested_window metadata in the returned analytics payload.
  • Added query_page_opportunities processing in backend/services/analytics/handlers/gsc_handler.py to transform query+page rows into a prioritized list (fields: query, page, clicks, impressions, ctr, position) sorted by opportunity and capped (top 100) for downstream consumption, and included this field in both success and error/partial responses.
  • Updated the autofill normalizer backend/api/content_planning/services/content_strategy/autofill/normalizers/analytics_normalizer.py to pass through query_page_opportunities into the normalized analytics structure.

Testing

  • Compiled the modified modules with python -m compileall backend/services/gsc_service.py backend/services/analytics/handlers/gsc_handler.py backend/api/content_planning/services/content_strategy/autofill/normalizers/analytics_normalizer.py, which completed successfully.
  • No additional automated unit tests were added or run as part of this change.

Codex Task


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/AJaySi/ALwrity/pull/430 **Author:** [@AJaySi](https://github.com/AJaySi) **Created:** 3/12/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `codex/extend-gsc-data-retrieval-and-processing` --- ### 📝 Commits (1) - [`04612c7`](https://github.com/AJaySi/ALwrity/commit/04612c7ff9360343cdafa4f64c26f8ab5d239b41) Add bounded query-page opportunities to GSC analytics flow ### 📊 Changes **3 files changed** (+68 additions, -5 deletions) <details> <summary>View changed files</summary> 📝 `backend/api/content_planning/services/content_strategy/autofill/normalizers/analytics_normalizer.py` (+2 -0) 📝 `backend/services/analytics/handlers/gsc_handler.py` (+36 -0) 📝 `backend/services/gsc_service.py` (+30 -5) </details> ### 📄 Description ### Motivation - Provide query→page opportunity data (query, page, clicks, impressions, ctr, position) to downstream content-planning and agent workflows so they can make refresh-vs-new brief recommendations. - Prevent oversized Search Console query+page payloads by bounding the request size and date window to keep API usage predictable and responses manageable. ### Description - Added constants `QUERY_PAGE_OPPORTUNITIES_ROW_LIMIT = 2500` and `QUERY_PAGE_OPPORTUNITIES_MAX_WINDOW_DAYS = 90` and use them for the GSC `['query','page']` request to bound rows and date window in `backend/services/gsc_service.py`. - Implemented `_get_query_page_opportunity_window` to derive a bounded start/end date for the query+page request and included `query_page_data.requested_window` metadata in the returned analytics payload. - Added `query_page_opportunities` processing in `backend/services/analytics/handlers/gsc_handler.py` to transform query+page rows into a prioritized list (fields: `query`, `page`, `clicks`, `impressions`, `ctr`, `position`) sorted by opportunity and capped (top 100) for downstream consumption, and included this field in both success and error/partial responses. - Updated the autofill normalizer `backend/api/content_planning/services/content_strategy/autofill/normalizers/analytics_normalizer.py` to pass through `query_page_opportunities` into the normalized analytics structure. ### Testing - Compiled the modified modules with `python -m compileall backend/services/gsc_service.py backend/services/analytics/handlers/gsc_handler.py backend/api/content_planning/services/content_strategy/autofill/normalizers/analytics_normalizer.py`, which completed successfully. - No additional automated unit tests were added or run as part of this change. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_e_69b26cbe6d4c832894fb44bfd7c71e69) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ALwrity#736
No description provided.