[PR #363] [MERGED] Implement evidence-backed semantic gap detection and onboarding theme extraction #331

New issue

Closed

opened 2026-03-02 23:35:10 +03:00 by kerem · 0 comments

kerem commented

2026-03-02 23:35:10 +03:00

Owner

📋 Pull Request Information

Original PR: https://github.com/AJaySi/ALwrity/pull/363
Author: @AJaySi
Created: 3/2/2026
Status: ✅ Merged
Merged: 3/2/2026
Merged by: @AJaySi

Base: main ← Head: codex/implement-find_semantic_gaps-method

📝 Commits (1)

cd9ffb5 Implement evidence-based semantic gap detection for strategy agents

📊 Changes

3 files changed (+226 additions, -107 deletions)

View changed files

📝 backend/services/intelligence/agents/specialized_agents.py (+82 -38)
📝 backend/services/intelligence/sif_agents.py (+82 -31)
📝 backend/services/sif_onboarding_service.py (+62 -38)

📄 Description

Motivation

Replace placeholder gap detection with a real, evidence-backed analysis that separates user vs competitor content using indexed metadata.
Surface actionable, explainable outputs (confidence, severity, coverage delta, supporting doc counts and sample titles) so onboarding can present trustworthy recommendations.
Replace static theme literals in onboarding with metadata-driven theme extraction to reflect actual indexed content.

Description

Implemented find_semantic_gaps in StrategyArchitectAgent for both backend/services/intelligence/sif_agents.py and backend/services/intelligence/agents/specialized_agents.py to: split indexed docs into user vs competitor sets using metadata typing, derive topic densities, compute coverage_delta, derive confidence and a combined severity_score, assign priority, and return evidence-backed items including competitor_supporting_docs, user_supporting_docs, competitor_sample_titles, and coverage_delta.
Added helper methods in both agent implementations: _infer_document_role, _extract_topics_from_document, and _map_topic_to_doc_titles to standardize role inference and topic normalization from metadata and lightweight title tokenization.
Updated onboarding flow in backend/services/sif_onboarding_service.py to use real indexed outputs by fetching indexed documents, building competitor_doc_ids, passing them into find_semantic_gaps, and replacing the previous static theme_queries approach with indexed_metadata-based theme analysis that returns top_themes, classification, and evidence.
Adjusted function signatures to accept flexible index identifiers (List[Any]) and improved gap ranking to prioritize severity_score (combining coverage delta and confidence) and include supporting-document counts in evidence.

Testing

Ran Python compilation on the modified modules with python -m py_compile backend/services/intelligence/sif_agents.py backend/services/intelligence/agents/specialized_agents.py backend/services/sif_onboarding_service.py, which completed successfully (no syntax errors).
Performed quick content checks (rg) to confirm removal of static theme literals and that find_semantic_gaps is invoked with real competitor_doc_ids; checks succeeded.
No unit tests were added in this change; runtime validation should be performed in an environment with the intelligence index available to verify semantic outputs and evidence fields end-to-end.

Codex Task

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/AJaySi/ALwrity/pull/363 **Author:** [@AJaySi](https://github.com/AJaySi) **Created:** 3/2/2026 **Status:** ✅ Merged **Merged:** 3/2/2026 **Merged by:** [@AJaySi](https://github.com/AJaySi) **Base:** `main` ← **Head:** `codex/implement-find_semantic_gaps-method` --- ### 📝 Commits (1) - [`cd9ffb5`](https://github.com/AJaySi/ALwrity/commit/cd9ffb5ef59d2ed03b286bf852ebac1392fd6edb) Implement evidence-based semantic gap detection for strategy agents ### 📊 Changes **3 files changed** (+226 additions, -107 deletions) <details> <summary>View changed files</summary> 📝 `backend/services/intelligence/agents/specialized_agents.py` (+82 -38) 📝 `backend/services/intelligence/sif_agents.py` (+82 -31) 📝 `backend/services/sif_onboarding_service.py` (+62 -38) </details> ### 📄 Description ### Motivation - Replace placeholder gap detection with a real, evidence-backed analysis that separates user vs competitor content using indexed metadata. - Surface actionable, explainable outputs (confidence, severity, coverage delta, supporting doc counts and sample titles) so onboarding can present trustworthy recommendations. - Replace static theme literals in onboarding with metadata-driven theme extraction to reflect actual indexed content. ### Description - Implemented `find_semantic_gaps` in `StrategyArchitectAgent` for both `backend/services/intelligence/sif_agents.py` and `backend/services/intelligence/agents/specialized_agents.py` to: split indexed docs into user vs competitor sets using metadata typing, derive topic densities, compute `coverage_delta`, derive `confidence` and a combined `severity_score`, assign `priority`, and return evidence-backed items including `competitor_supporting_docs`, `user_supporting_docs`, `competitor_sample_titles`, and `coverage_delta`. - Added helper methods in both agent implementations: `_infer_document_role`, `_extract_topics_from_document`, and `_map_topic_to_doc_titles` to standardize role inference and topic normalization from metadata and lightweight title tokenization. - Updated onboarding flow in `backend/services/sif_onboarding_service.py` to use real indexed outputs by fetching indexed documents, building `competitor_doc_ids`, passing them into `find_semantic_gaps`, and replacing the previous static `theme_queries` approach with `indexed_metadata`-based theme analysis that returns `top_themes`, classification, and evidence. - Adjusted function signatures to accept flexible index identifiers (`List[Any]`) and improved gap ranking to prioritize `severity_score` (combining coverage delta and confidence) and include supporting-document counts in evidence. ### Testing - Ran Python compilation on the modified modules with `python -m py_compile backend/services/intelligence/sif_agents.py backend/services/intelligence/agents/specialized_agents.py backend/services/sif_onboarding_service.py`, which completed successfully (no syntax errors). - Performed quick content checks (`rg`) to confirm removal of static theme literals and that `find_semantic_gaps` is invoked with real `competitor_doc_ids`; checks succeeded. - No unit tests were added in this change; runtime validation should be performed in an environment with the intelligence index available to verify semantic outputs and evidence fields end-to-end. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_e_69a47b14ec3883288de18de9fa17ce84) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

kerem

2026-03-02 23:35:10 +03:00

closed this issue
added the
pull-request
label

kerem referenced this issue

2026-03-13 21:01:02 +03:00

[PR #331] [MERGED] Align Copilot persistence keys, fix mutable JSON defaults in UserProfile, and update docs #634

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

starred/ALwrity#331

No description provided.

Rows
Columns