[PR #363] [MERGED] Implement evidence-backed semantic gap detection and onboarding theme extraction #331

Closed
opened 2026-03-02 23:35:10 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/AJaySi/ALwrity/pull/363
Author: @AJaySi
Created: 3/2/2026
Status: Merged
Merged: 3/2/2026
Merged by: @AJaySi

Base: mainHead: codex/implement-find_semantic_gaps-method


📝 Commits (1)

  • cd9ffb5 Implement evidence-based semantic gap detection for strategy agents

📊 Changes

3 files changed (+226 additions, -107 deletions)

View changed files

📝 backend/services/intelligence/agents/specialized_agents.py (+82 -38)
📝 backend/services/intelligence/sif_agents.py (+82 -31)
📝 backend/services/sif_onboarding_service.py (+62 -38)

📄 Description

Motivation

  • Replace placeholder gap detection with a real, evidence-backed analysis that separates user vs competitor content using indexed metadata.
  • Surface actionable, explainable outputs (confidence, severity, coverage delta, supporting doc counts and sample titles) so onboarding can present trustworthy recommendations.
  • Replace static theme literals in onboarding with metadata-driven theme extraction to reflect actual indexed content.

Description

  • Implemented find_semantic_gaps in StrategyArchitectAgent for both backend/services/intelligence/sif_agents.py and backend/services/intelligence/agents/specialized_agents.py to: split indexed docs into user vs competitor sets using metadata typing, derive topic densities, compute coverage_delta, derive confidence and a combined severity_score, assign priority, and return evidence-backed items including competitor_supporting_docs, user_supporting_docs, competitor_sample_titles, and coverage_delta.
  • Added helper methods in both agent implementations: _infer_document_role, _extract_topics_from_document, and _map_topic_to_doc_titles to standardize role inference and topic normalization from metadata and lightweight title tokenization.
  • Updated onboarding flow in backend/services/sif_onboarding_service.py to use real indexed outputs by fetching indexed documents, building competitor_doc_ids, passing them into find_semantic_gaps, and replacing the previous static theme_queries approach with indexed_metadata-based theme analysis that returns top_themes, classification, and evidence.
  • Adjusted function signatures to accept flexible index identifiers (List[Any]) and improved gap ranking to prioritize severity_score (combining coverage delta and confidence) and include supporting-document counts in evidence.

Testing

  • Ran Python compilation on the modified modules with python -m py_compile backend/services/intelligence/sif_agents.py backend/services/intelligence/agents/specialized_agents.py backend/services/sif_onboarding_service.py, which completed successfully (no syntax errors).
  • Performed quick content checks (rg) to confirm removal of static theme literals and that find_semantic_gaps is invoked with real competitor_doc_ids; checks succeeded.
  • No unit tests were added in this change; runtime validation should be performed in an environment with the intelligence index available to verify semantic outputs and evidence fields end-to-end.

Codex Task


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/AJaySi/ALwrity/pull/363 **Author:** [@AJaySi](https://github.com/AJaySi) **Created:** 3/2/2026 **Status:** ✅ Merged **Merged:** 3/2/2026 **Merged by:** [@AJaySi](https://github.com/AJaySi) **Base:** `main` ← **Head:** `codex/implement-find_semantic_gaps-method` --- ### 📝 Commits (1) - [`cd9ffb5`](https://github.com/AJaySi/ALwrity/commit/cd9ffb5ef59d2ed03b286bf852ebac1392fd6edb) Implement evidence-based semantic gap detection for strategy agents ### 📊 Changes **3 files changed** (+226 additions, -107 deletions) <details> <summary>View changed files</summary> 📝 `backend/services/intelligence/agents/specialized_agents.py` (+82 -38) 📝 `backend/services/intelligence/sif_agents.py` (+82 -31) 📝 `backend/services/sif_onboarding_service.py` (+62 -38) </details> ### 📄 Description ### Motivation - Replace placeholder gap detection with a real, evidence-backed analysis that separates user vs competitor content using indexed metadata. - Surface actionable, explainable outputs (confidence, severity, coverage delta, supporting doc counts and sample titles) so onboarding can present trustworthy recommendations. - Replace static theme literals in onboarding with metadata-driven theme extraction to reflect actual indexed content. ### Description - Implemented `find_semantic_gaps` in `StrategyArchitectAgent` for both `backend/services/intelligence/sif_agents.py` and `backend/services/intelligence/agents/specialized_agents.py` to: split indexed docs into user vs competitor sets using metadata typing, derive topic densities, compute `coverage_delta`, derive `confidence` and a combined `severity_score`, assign `priority`, and return evidence-backed items including `competitor_supporting_docs`, `user_supporting_docs`, `competitor_sample_titles`, and `coverage_delta`. - Added helper methods in both agent implementations: `_infer_document_role`, `_extract_topics_from_document`, and `_map_topic_to_doc_titles` to standardize role inference and topic normalization from metadata and lightweight title tokenization. - Updated onboarding flow in `backend/services/sif_onboarding_service.py` to use real indexed outputs by fetching indexed documents, building `competitor_doc_ids`, passing them into `find_semantic_gaps`, and replacing the previous static `theme_queries` approach with `indexed_metadata`-based theme analysis that returns `top_themes`, classification, and evidence. - Adjusted function signatures to accept flexible index identifiers (`List[Any]`) and improved gap ranking to prioritize `severity_score` (combining coverage delta and confidence) and include supporting-document counts in evidence. ### Testing - Ran Python compilation on the modified modules with `python -m py_compile backend/services/intelligence/sif_agents.py backend/services/intelligence/agents/specialized_agents.py backend/services/sif_onboarding_service.py`, which completed successfully (no syntax errors). - Performed quick content checks (`rg`) to confirm removal of static theme literals and that `find_semantic_gaps` is invoked with real `competitor_doc_ids`; checks succeeded. - No unit tests were added in this change; runtime validation should be performed in an environment with the intelligence index available to verify semantic outputs and evidence fields end-to-end. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_e_69a47b14ec3883288de18de9fa17ce84) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 23:35:10 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ALwrity#331
No description provided.