[GH-ISSUE #297] [FEATURE] OpenRouter Support for Custom Models Evaluation #550

Closed
opened 2026-03-13 20:53:48 +03:00 by kerem · 10 comments
Owner

Originally created by @doncat99 on GitHub (Oct 13, 2025).
Original GitHub issue: https://github.com/AJaySi/ALwrity/issues/297

Originally assigned to: @AJaySi on GitHub.

🚀 Feature Description
Integration of OpenRouter to enable support for custom AI models, facilitating comprehensive model evaluation within the platform.
💡 Motivation
This feature is essential to enhance the platform's flexibility by allowing users to incorporate and assess a diverse array of AI models from multiple providers. It addresses the limitation of relying solely on predefined models, enabling users to evaluate performance metrics such as accuracy, response quality, and efficiency in content generation tasks, thereby optimizing outcomes for specific use cases.
📝 Detailed Description
The feature should integrate OpenRouter as an API gateway to route requests to custom AI models. Users would access a dedicated settings panel to input their OpenRouter API key and select from available models. An evaluation module would be implemented, allowing side-by-side comparisons of model outputs based on user-defined prompts, with metrics including generation speed, coherence, relevance, and creativity scores. Integration would involve backend handling of API calls via OpenRouter, ensuring secure authentication and error management. Frontend components would include dashboards for visualization of evaluation results, such as charts displaying comparative performance data.
🎯 Use Cases
Describe specific use cases for this feature:

A content creator evaluates multiple models for blog writing to identify the one producing the most engaging and SEO-optimized articles.
An SEO specialist compares model outputs for keyword integration and content planning, selecting the optimal model for dashboard analytics.
A social media manager tests custom models for generating LinkedIn or Facebook posts, assessing tone consistency and audience engagement potential.

🎨 Mockups/Designs
Not applicable at this stage; however, wireframes could include a settings interface for API key entry, a model selection dropdown, and a results dashboard with tabular and graphical representations of evaluation metrics.
🔧 Technical Considerations
Any technical considerations or implementation notes:

Requires backend changes
Requires frontend changes
Requires database changes
Requires third-party integration
Other: _______________

Implementation notes: Ensure compliance with OpenRouter's API rate limits and authentication protocols. Handle potential latency variations across models and incorporate fallback mechanisms to default models in case of integration failures.
🏷️ Component/Feature Area
Which component or feature area does this relate to?

Blog Writer
SEO Dashboard
Content Planning
Facebook Writer
LinkedIn Writer
Onboarding
Authentication
API
UI/UX
Performance
Other: Model Management

🎯 Priority

Critical (essential for core functionality)
High (significant value add)
Medium (nice to have)
Low (future consideration)

🔄 Alternatives Considered
Direct integrations with individual AI providers (e.g., OpenAI, Anthropic, or Google) were evaluated; however, these would require multiple separate implementations, increasing maintenance complexity. OpenRouter offers a unified interface for accessing diverse models, reducing development overhead while providing greater extensibility.
📚 Additional Context
Reference OpenRouter's official documentation for API specifications: https://openrouter.ai/docs. This integration aligns with industry trends toward model-agnostic platforms, as seen in similar tools like LangChain, which emphasize evaluation frameworks for AI performance benchmarking.
🤝 Contribution
Are you willing to contribute to implementing this feature?

Yes, I can help implement this
Yes, I can help with testing
Yes, I can help with documentation
No, but I can provide feedback
No, just suggesting the idea

Originally created by @doncat99 on GitHub (Oct 13, 2025). Original GitHub issue: https://github.com/AJaySi/ALwrity/issues/297 Originally assigned to: @AJaySi on GitHub. 🚀 Feature Description Integration of OpenRouter to enable support for custom AI models, facilitating comprehensive model evaluation within the platform. 💡 Motivation This feature is essential to enhance the platform's flexibility by allowing users to incorporate and assess a diverse array of AI models from multiple providers. It addresses the limitation of relying solely on predefined models, enabling users to evaluate performance metrics such as accuracy, response quality, and efficiency in content generation tasks, thereby optimizing outcomes for specific use cases. 📝 Detailed Description The feature should integrate OpenRouter as an API gateway to route requests to custom AI models. Users would access a dedicated settings panel to input their OpenRouter API key and select from available models. An evaluation module would be implemented, allowing side-by-side comparisons of model outputs based on user-defined prompts, with metrics including generation speed, coherence, relevance, and creativity scores. Integration would involve backend handling of API calls via OpenRouter, ensuring secure authentication and error management. Frontend components would include dashboards for visualization of evaluation results, such as charts displaying comparative performance data. 🎯 Use Cases Describe specific use cases for this feature: A content creator evaluates multiple models for blog writing to identify the one producing the most engaging and SEO-optimized articles. An SEO specialist compares model outputs for keyword integration and content planning, selecting the optimal model for dashboard analytics. A social media manager tests custom models for generating LinkedIn or Facebook posts, assessing tone consistency and audience engagement potential. 🎨 Mockups/Designs Not applicable at this stage; however, wireframes could include a settings interface for API key entry, a model selection dropdown, and a results dashboard with tabular and graphical representations of evaluation metrics. 🔧 Technical Considerations Any technical considerations or implementation notes: Requires backend changes Requires frontend changes Requires database changes Requires third-party integration Other: _______________ Implementation notes: Ensure compliance with OpenRouter's API rate limits and authentication protocols. Handle potential latency variations across models and incorporate fallback mechanisms to default models in case of integration failures. 🏷️ Component/Feature Area Which component or feature area does this relate to? Blog Writer SEO Dashboard Content Planning Facebook Writer LinkedIn Writer Onboarding Authentication API UI/UX Performance Other: Model Management 🎯 Priority Critical (essential for core functionality) High (significant value add) Medium (nice to have) Low (future consideration) 🔄 Alternatives Considered Direct integrations with individual AI providers (e.g., OpenAI, Anthropic, or Google) were evaluated; however, these would require multiple separate implementations, increasing maintenance complexity. OpenRouter offers a unified interface for accessing diverse models, reducing development overhead while providing greater extensibility. 📚 Additional Context Reference OpenRouter's official documentation for API specifications: https://openrouter.ai/docs. This integration aligns with industry trends toward model-agnostic platforms, as seen in similar tools like LangChain, which emphasize evaluation frameworks for AI performance benchmarking. 🤝 Contribution Are you willing to contribute to implementing this feature? Yes, I can help implement this Yes, I can help with testing Yes, I can help with documentation No, but I can provide feedback No, just suggesting the idea
kerem 2026-03-13 20:53:48 +03:00
Author
Owner

@AJaySi commented on GitHub (Oct 14, 2025):

@doncat99

Thank you so much for the great feature suggestion. @Om-Singh1808 and @Ratna-Babu have been exploring ollama and also unsloth.
We are looking at 2 things to achieve, in recent future:

1). Fine Tuning Small LLMs on enduser's digital Presence and also a base models as SMEs for SEO, Blogging, Social Media platforms etc. Thus, ALwrity will have home-grown/fine-tuned SLM and will be lot cheaper to experiment with.

2). We want to do this to inch closer to content Hyper personalization, through fine-tuning on end user's data, there is a basic implementation present in onboarding. At present, this allows us to not ask irritating inputs and produce/mimic end user linguistic styles from previously written articles.

3). You are absolutely right to point to openrouter as a solution to orchestrate ALwrity fine-tuned models for specific tasks.
But, then as copilotkit is already integrated, maybe using AG-UI with ADK Or Dify makes more sense. We also went down the crewai path, but all AI agents frameworks are always expensive and overkill, if one can design better workflows.

4). At present, our onboarding process collects end user website articles, gsc and social media accounts and competitor data(in-progress) and then generates a persona. Also, SML with better prompting and context, produces better results than best LLMs.

5). I agress with you on OpenRouter, but let us also know your views on litellm and ollama(with custome routing) ?

Please refer to previous discussion and maybe we can align this there : https://github.com/AJaySi/ALwrity/issues/287

<!-- gh-comment-id:3400468307 --> @AJaySi commented on GitHub (Oct 14, 2025): @doncat99 Thank you so much for the great feature suggestion. @Om-Singh1808 and @Ratna-Babu have been exploring ollama and also unsloth. We are looking at 2 things to achieve, in recent future: 1). Fine Tuning Small LLMs on enduser's digital Presence and also a base models as SMEs for SEO, Blogging, Social Media platforms etc. Thus, ALwrity will have home-grown/fine-tuned SLM and will be lot cheaper to experiment with. 2). We want to do this to inch closer to content Hyper personalization, through fine-tuning on end user's data, there is a basic implementation present in onboarding. At present, this allows us to not ask irritating inputs and produce/mimic end user linguistic styles from previously written articles. 3). You are absolutely right to point to openrouter as a solution to orchestrate ALwrity fine-tuned models for specific tasks. But, then as copilotkit is already integrated, maybe using AG-UI with ADK Or Dify makes more sense. We also went down the crewai path, but all AI agents frameworks are always expensive and overkill, if one can design better workflows. 4). At present, our onboarding process collects end user website articles, gsc and social media accounts and competitor data(in-progress) and then generates a persona. Also, SML with better prompting and context, produces better results than best LLMs. 5). I agress with you on OpenRouter, but let us also know your views on litellm and ollama(with custome routing) ? Please refer to previous discussion and maybe we can align this there : https://github.com/AJaySi/ALwrity/issues/287
Author
Owner

@doncat99 commented on GitHub (Oct 14, 2025):

Thank you for your thoughtful response to my feature request.

I appreciate your proposal to incorporate model evaluation capabilities, aimed at selecting the most suitable model for specific scenarios or use cases. Regarding LiteLLM and Ollama, I agree they extend beyond mere model providers, offering robust tools for management, local deployment, and routing.

In my configuration, Google's ADK integrates LiteLLM to enable OpenRouter support (https://github.com/google/adk-python/issues/171)

A hybrid approach—leveraging OpenRouter for core routing while integrating LiteLLM for advanced evaluation and fine-tuning—could optimally support ALwrity's goals of cost efficiency and hyper-personalization. I am open to further dialogue on implementation.

<!-- gh-comment-id:3400663486 --> @doncat99 commented on GitHub (Oct 14, 2025): Thank you for your thoughtful response to my feature request. I appreciate your proposal to incorporate model evaluation capabilities, aimed at selecting the most suitable model for specific scenarios or use cases. Regarding LiteLLM and Ollama, I agree they extend beyond mere model providers, offering robust tools for management, local deployment, and routing. In my configuration, Google's ADK integrates LiteLLM to enable OpenRouter support (https://github.com/google/adk-python/issues/171) A hybrid approach—leveraging OpenRouter for core routing while integrating LiteLLM for advanced evaluation and fine-tuning—could optimally support ALwrity's goals of cost efficiency and hyper-personalization. I am open to further dialogue on implementation.
Author
Owner

@AJaySi commented on GitHub (Oct 14, 2025):

Thank you @doncat99, your hybrid approach makes a lot of sense.
Request your patience, while I clarify my doubts and seek your guidance:


We are in complete agreement on the following core principles:

  • User Choice: Informed users should be able to select their LLMs for the final content generation step, possibly presented via a dedicated column in the Step 4 workflow that includes OpenRouter options.

  • Hybrid Functionality: The end user should choose the model for the final draft, but the platform should use SLMs for iterative refinement and small edits (Copilot/Editor tasks).

  • Advanced Routing: Once we scale specialized fine-tuned models, advanced routing or an Agent framework will be essential.


Assumption1: ALwrity target audience are non-tech content creators, digital marketing professionals, solopreneurs etc, who cannot compete in biased online market. We need to KIS, for them.

Assumption2: ALwrity is AI-First, Copilot & VUI(TBD) based platform, with multimodal content generation. As a SME digital marketing platform, we want to guide the end users and abstract all AI complexities, including prompting.

Assumption3: Digital Marketing is Tough and Alwrity as SME platform, will need to decide a lot of models for the end user, seo, platform-specific, analytics, research, db, editor specific, copilot etc. Which, when left to them, will prove to be too much for non-tech-marketing end users.

Assumption4: ALwrity is a complete AI content lifecycle platform, while it makes sense to choose AI model for final content draft, but there is iterative online research, gsc insights, competitor gap analysis, outline generation, refining outline, generating draft, hallucination checker, assistive writing, AI editor.
Using, one AI model for the content life cycle will be too expensive and environmentally.

  • It should be upto the platform to provide model routing and match right AI models for that content lifecycle task. The end user, improves that AI model through platform interaction, feedback, AI-memory and fine-tuning.

  • Example:

As a specific usecase, in ALwrity onboarding, step 4, we generate a end user persona, based on home page content and structured style, linguistic analysis. The idea is to scale this to multiple blogs and social media content. In this step 4, the end user can see the results of content generated with/without persona and provides feedback to tweak the persona and accept/confirm it. This persona will improve with every content generated and user feedback + mem0(any AI memory layer).

In the above workflow, the non tech end user can arrive at a personality, without knowing the underlying AI model used. In future, a fine tuned, persona generation SLM will suffice. Also, stuffing persona in system prompts wont scale. I am inclined towards the environmental impact, when the end users simply throw 700b models for mundane tasks.


  • Should we first go down the path of fine tuning as per end user digital assests and analytics first with unsloth ? We have designed our onboarding to help gather all the data needed for fine-tuning a gpt.

  • We can then experiment with our fine-tuned models and route to them with openrouter ? Thus, we have fine-tuning with unsloth and routing with openrouter/ADK and not use litellm ?

<!-- gh-comment-id:3401213603 --> @AJaySi commented on GitHub (Oct 14, 2025): Thank you @doncat99, your hybrid approach makes a lot of sense. Request your patience, while I clarify my doubts and seek your guidance: ----- We are in complete agreement on the following core principles: - User Choice: Informed users should be able to select their LLMs for the final content generation step, possibly presented via a dedicated column in the Step 4 workflow that includes OpenRouter options. - Hybrid Functionality: The end user should choose the model for the final draft, but the platform should use SLMs for iterative refinement and small edits (Copilot/Editor tasks). - Advanced Routing: Once we scale specialized fine-tuned models, advanced routing or an Agent framework will be essential. ------ Assumption1: ALwrity target audience are non-tech content creators, digital marketing professionals, solopreneurs etc, who cannot compete in biased online market. We need to KIS, for them. Assumption2: ALwrity is AI-First, Copilot & VUI(TBD) based platform, with multimodal content generation. As a SME digital marketing platform, we want to guide the end users and abstract all AI complexities, including prompting. Assumption3: Digital Marketing is Tough and Alwrity as SME platform, will need to decide a lot of models for the end user, seo, platform-specific, analytics, research, db, editor specific, copilot etc. Which, when left to them, will prove to be too much for non-tech-marketing end users. Assumption4: ALwrity is a complete AI content lifecycle platform, while it makes sense to choose AI model for final content draft, but there is iterative online research, gsc insights, competitor gap analysis, outline generation, refining outline, generating draft, hallucination checker, assistive writing, AI editor. Using, one AI model for the content life cycle will be too expensive and environmentally. - It should be upto the platform to provide model routing and match right AI models for that content lifecycle task. The end user, improves that AI model through platform interaction, feedback, AI-memory and fine-tuning. --- - Example: As a specific usecase, in ALwrity onboarding, step 4, we generate a end user persona, based on home page content and structured style, linguistic analysis. The idea is to scale this to multiple blogs and social media content. In this step 4, the end user can see the results of content generated with/without persona and provides feedback to tweak the persona and accept/confirm it. This persona will improve with every content generated and user feedback + mem0(any AI memory layer). In the above workflow, the non tech end user can arrive at a personality, without knowing the underlying AI model used. In future, a fine tuned, persona generation SLM will suffice. Also, stuffing persona in system prompts wont scale. I am inclined towards the environmental impact, when the end users simply throw 700b models for mundane tasks. --------- - Should we first go down the path of fine tuning as per end user digital assests and analytics first with unsloth ? We have designed our onboarding to help gather all the data needed for fine-tuning a gpt. - We can then experiment with our fine-tuned models and route to them with openrouter ? Thus, we have fine-tuning with unsloth and routing with openrouter/ADK and not use litellm ?
Author
Owner

@doncat99 commented on GitHub (Oct 14, 2025):

Thank you for your detailed reply and for clarifying your assumptions and questions. I appreciate the alignment on core principles like user choice, hybrid functionality, and advanced routing.
From a product manager's perspective, while non-technical end users should indeed be shielded from AI complexities to keep the platform simple (KIS principle), they deserve agency in selecting output quality levels without delving into model specifics. This can be achieved by mapping models to intuitive quality tiers (e.g., "Basic," "Standard," "Premium") based on factors like accuracy, speed, and cost. For instance, extend your existing TASK_LLM_CONFIGS structure with a "quality" key to reflect these relationships:

# Task-specific LLM configs with quality mapping
TASK_LLM_CONFIGS = {
    TASK.DOCUMENT_OUTLINE.value: [
        {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 30_000, "quality": "Standard"},
    ],
    TASK.CHAPTER_OUTLINE.value: [
        {"name": LLMModel.GPT_5, "model": LLMModel.GPT_5.value, "max_tokens": 350_000, "quality": "Premium"},
        {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 900_000, "quality": "Standard"},
        # {"name": LLMModel.CLAUDE_OPUS_4_1, "model": LLMModel.CLAUDE_OPUS_4_1.value, "max_tokens": 200_000, "quality": "Premium"},
    ],
    TASK.LEARNING_OUTCOME.value: [
        {"name": LLMModel.GPT_4_1_MINI, "model": LLMModel.GPT_4_1_MINI.value, "max_tokens": 10_000, "quality": "Basic"},
    ],
    TASK.CASE_STUDY.value: [
        {"name": LLMModel.GPT_4_1_MINI, "model": LLMModel.GPT_4_1_MINI.value, "max_tokens": 50_000, "quality": "Basic"},
        # {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 50_000, "quality": "Standard"},
    ],
    TASK.PRACTICE_PROBLEM.value: [
        {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 80_000, "quality": "Standard"},
    ],
    TASK.MERGE_EXTRACTIONS.value: [
        # {"name": LLMModel.CLAUDE_OPUS_4_1, "model": LLMModel.CLAUDE_OPUS_4_1.value, "max_tokens": 190_000, "quality": "Premium"},
        {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 900_000, "quality": "Standard"},
        # {"name": LLMModel.GPT_5, "model": LLMModel.GPT_5.value, "max_tokens": 400_000, "quality": "Premium"},
    ],
}

This allows users to select quality in the Step 4 workflow or final draft, while the platform routes to appropriate models behind the scenes, supporting evaluation and personalization without overwhelming them.
I must frankly note that the project code remains tightly coupled to Google Gemini from its initial 2024 version. Last year, I attempted to refactor it with a general LLM abstraction layer, achieving partial success before shifting focus to other commitments. To enable flexibility, I recommend decoupling Gemini from business logic and implementing a general model wrapper layer (e.g., using OpenAI-compatible interfaces) to facilitate seamless integration of diverse providers.
Regarding your question on prioritization: Yes, starting with fine-tuning based on end-user digital assets and analytics using Unsloth makes strategic sense, leveraging your onboarding data collection for efficient GPT-like model customization. This can then transition to experimentation and routing via OpenRouter or ADK, bypassing LiteLLM if it adds unnecessary complexity. This path aligns with environmental and cost considerations by reserving larger models for high-value tasks while using fine-tuned SLMs for mundane ones.

<!-- gh-comment-id:3402877965 --> @doncat99 commented on GitHub (Oct 14, 2025): Thank you for your detailed reply and for clarifying your assumptions and questions. I appreciate the alignment on core principles like user choice, hybrid functionality, and advanced routing. From a product manager's perspective, while non-technical end users should indeed be shielded from AI complexities to keep the platform simple (KIS principle), they deserve agency in selecting output quality levels without delving into model specifics. This can be achieved by mapping models to intuitive quality tiers (e.g., "Basic," "Standard," "Premium") based on factors like accuracy, speed, and cost. For instance, extend your existing TASK_LLM_CONFIGS structure with a "quality" key to reflect these relationships: ``` # Task-specific LLM configs with quality mapping TASK_LLM_CONFIGS = { TASK.DOCUMENT_OUTLINE.value: [ {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 30_000, "quality": "Standard"}, ], TASK.CHAPTER_OUTLINE.value: [ {"name": LLMModel.GPT_5, "model": LLMModel.GPT_5.value, "max_tokens": 350_000, "quality": "Premium"}, {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 900_000, "quality": "Standard"}, # {"name": LLMModel.CLAUDE_OPUS_4_1, "model": LLMModel.CLAUDE_OPUS_4_1.value, "max_tokens": 200_000, "quality": "Premium"}, ], TASK.LEARNING_OUTCOME.value: [ {"name": LLMModel.GPT_4_1_MINI, "model": LLMModel.GPT_4_1_MINI.value, "max_tokens": 10_000, "quality": "Basic"}, ], TASK.CASE_STUDY.value: [ {"name": LLMModel.GPT_4_1_MINI, "model": LLMModel.GPT_4_1_MINI.value, "max_tokens": 50_000, "quality": "Basic"}, # {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 50_000, "quality": "Standard"}, ], TASK.PRACTICE_PROBLEM.value: [ {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 80_000, "quality": "Standard"}, ], TASK.MERGE_EXTRACTIONS.value: [ # {"name": LLMModel.CLAUDE_OPUS_4_1, "model": LLMModel.CLAUDE_OPUS_4_1.value, "max_tokens": 190_000, "quality": "Premium"}, {"name": LLMModel.CLAUDE_SONNET_4, "model": LLMModel.CLAUDE_SONNET_4.value, "max_tokens": 900_000, "quality": "Standard"}, # {"name": LLMModel.GPT_5, "model": LLMModel.GPT_5.value, "max_tokens": 400_000, "quality": "Premium"}, ], } ``` This allows users to select quality in the Step 4 workflow or final draft, while the platform routes to appropriate models behind the scenes, supporting evaluation and personalization without overwhelming them. I must frankly note that the project code remains tightly coupled to Google Gemini from its initial 2024 version. Last year, I attempted to refactor it with a general LLM abstraction layer, achieving partial success before shifting focus to other commitments. To enable flexibility, I recommend decoupling Gemini from business logic and implementing a general model wrapper layer (e.g., using OpenAI-compatible interfaces) to facilitate seamless integration of diverse providers. Regarding your question on prioritization: Yes, starting with fine-tuning based on end-user digital assets and analytics using Unsloth makes strategic sense, leveraging your onboarding data collection for efficient GPT-like model customization. This can then transition to experimentation and routing via OpenRouter or ADK, bypassing LiteLLM if it adds unnecessary complexity. This path aligns with environmental and cost considerations by reserving larger models for high-value tasks while using fine-tuned SLMs for mundane ones.
Author
Owner

@AJaySi commented on GitHub (Oct 15, 2025):

Hello @doncat99

Thank you for intellectually stimulating dialogue and bearing with my monologues.

1). "This can be achieved by mapping models to intuitive quality tiers (e.g., "Basic," "Standard," "Premium") based on factors like accuracy, speed, and cost."

  • Should the mapping be with content life cycle phase and models suited for it ? For example, ALwrity divides content lifecycle into: Content Planning(strategy, research, competitor analysis, calendar generation), content generation, publish, Analytics, Engage and remarket.

  • For research, we depend on Google grounding, Exa and Tavily and feed that result to LLM, thus quality of online research context is more important than LLM selection, if we agree that better context can yield better results from most AI models.

  • In all of the above 6 phases, your mapping with quality tiers makes sense in content generation phase.

  • Request you to please review https://github.com/AJaySi/ALwrity/issues/287 and we can shift focus to openrouter . @Ratna-Babu has been a great addition and I request him to take openrouter support on priority, pretty please. @doncat99 I would request your guidance to Ratna, "decoupling Gemini from business logic and implementing a general model wrapper layer"


"I must frankly note that the project code remains tightly coupled to Google Gemini from its initial 2024 version."

My Bad, and I do feel ashamed about it.

  • I always thought that our end users are non-tech so CLI, streamlit just wont cut it. Content generation is just one phase, so platform integration, AI SEO tools, took priority. I am bad at programming and worse at digital marketing. So, a lot of learning since then.
  • Gemini, being free and easier to experiment, get-started, has been the default. Its time to move on.
  • Empirically, same set of prompts didnt give same results with different AI models, in 2024. But, now thats not true. There is code and tracing 'GPT_Provider' yields nascent/basic thoughts on decoupling Gemini.
  • I felt, better prompts are needed to abstract the need of end user prompting, contextual generation etc.
  • Now in 2025, I am really struggling with react + fastapi migration and providing a useable UI. Ultimatelt, our end users need a webapp/SaaS and that has been the priority.
  • You have interacted with @uniqueumesh and in 2025 we have @Om-Singh1808 @Ratna-Babu and things are looking better and they will correct me from going wrong. I really hope, you stick around this time and be a part, even by simply commenting and giving directions. They are invaluable to ALwrity.

Thank you so much & Regards.

<!-- gh-comment-id:3404393953 --> @AJaySi commented on GitHub (Oct 15, 2025): Hello @doncat99 Thank you for intellectually stimulating dialogue and bearing with my monologues. 1). "This can be achieved by mapping models to intuitive quality tiers (e.g., "Basic," "Standard," "Premium") based on factors like accuracy, speed, and cost." - Should the mapping be with content life cycle phase and models suited for it ? For example, ALwrity divides content lifecycle into: Content Planning(strategy, research, competitor analysis, calendar generation), content generation, publish, Analytics, Engage and remarket. - For research, we depend on Google grounding, Exa and Tavily and feed that result to LLM, thus quality of online research context is more important than LLM selection, if we agree that better context can yield better results from most AI models. - In all of the above 6 phases, your mapping with quality tiers makes sense in content generation phase. - Request you to please review https://github.com/AJaySi/ALwrity/issues/287 and we can shift focus to openrouter . @Ratna-Babu has been a great addition and I request him to take openrouter support on priority, pretty please. @doncat99 I would request your guidance to Ratna, "decoupling Gemini from business logic and implementing a general model wrapper layer" ------------- "I must frankly note that the project code remains tightly coupled to Google Gemini from its initial 2024 version." **My Bad, and I do feel ashamed about it.** - I always thought that our end users are non-tech so CLI, streamlit just wont cut it. Content generation is just one phase, so platform integration, AI SEO tools, took priority. I am bad at programming and worse at digital marketing. So, a lot of learning since then. - Gemini, being free and easier to experiment, get-started, has been the default. Its time to move on. - Empirically, same set of prompts didnt give same results with different AI models, in 2024. But, now thats not true. There is code and tracing 'GPT_Provider' yields nascent/basic thoughts on decoupling Gemini. - I felt, better prompts are needed to abstract the need of end user prompting, contextual generation etc. - Now in 2025, I am really struggling with react + fastapi migration and providing a useable UI. Ultimatelt, our end users need a webapp/SaaS and that has been the priority. - You have interacted with @uniqueumesh and in 2025 we have @Om-Singh1808 @Ratna-Babu and things are looking better and they will correct me from going wrong. I really hope, you stick around this time and be a part, even by simply commenting and giving directions. They are invaluable to ALwrity. Thank you so much & Regards.
Author
Owner

@Ratna-Babu commented on GitHub (Oct 15, 2025):

I will check the #299 and merge it with #288 if possible. After completing it i will take a look at this.

<!-- gh-comment-id:3404962001 --> @Ratna-Babu commented on GitHub (Oct 15, 2025): I will check the #299 and merge it with #288 if possible. After completing it i will take a look at this.
Author
Owner

@Om-Singh1808 commented on GitHub (Oct 15, 2025):

I have checked out everything here and I am ready to work on making Alwrity better
Special thanks to @doncat99 for helping us with this.

<!-- gh-comment-id:3405797360 --> @Om-Singh1808 commented on GitHub (Oct 15, 2025): I have checked out everything here and I am ready to work on making Alwrity better Special thanks to @doncat99 for helping us with this.
Author
Owner

@doncat99 commented on GitHub (Oct 15, 2025):

Indeed, I prefer Google Grounding as well. I prefer Google ADK and A2A protocol for multi-agent communication.

Below is my agent code for your reference.

# utilities/base/base_agent.py
from __future__ import annotations
import os
import pickle
import hashlib
from typing import Optional, List, Callable, Any
from pathlib import Path
from functools import wraps
import asyncio

from openinference.instrumentation.google_adk import GoogleADKInstrumentor
from google.adk.agents import LlmAgent as Agent
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
import litellm

from config import config

# set OPENINFERENCE_DISABLED=1 to disable tracing if needed
if not config.OPENINFERENCE_DISABLED:
    GoogleADKInstrumentor().instrument()

# optional: verbose HTTP/debug for LiteLLM routing
if config.DEBUG:
    try:
        litellm._turn_on_debug()
    except Exception:
        pass


def pickle_cache(cache_subdir: str):
    """Decorator to cache function results using pickle files on disk."""
    def decorator(func):
        @wraps(func)
        async def wrapper(self, *args, **kwargs):
            if not getattr(self, "use_cache", False) or not getattr(self, "cache_dir", None):
                return await func(self, *args, **kwargs)
            cache_dir = Path(self.cache_dir) / cache_subdir
            cache_dir.mkdir(parents=True, exist_ok=True)
            key = hashlib.sha256(pickle.dumps((args, kwargs))).hexdigest()
            fp = cache_dir / f"{key}.pkl"
            if fp.exists():
                return pickle.loads(fp.read_bytes())
            result = await func(self, *args, **kwargs)
            if result is not None:
                fp.write_bytes(pickle.dumps(result))
            return result
        return wrapper
    return decorator


class BaseAgent:
    """
    ADK-based base agent that supports dynamic tool updates.

    Key points:
    - Builds one Agent/Runner up-front.
    - `set_tools` tries to update tools IN-PLACE (`self.agent.tools = ...`).
      If the underlying ADK version requires a rebuild, we fall back to `_build_agent_and_runner()`.
    - Safe in both sync and async environments (lazy session creation).
    """

    def __init__(
        self,
        agent_name: str,
        model_name: str,
        instruction: str,
        tools: Optional[List[Callable[..., Any]]] = None,
        app_name: str = "default_app",
        user_id: str = "default_user",
        session_id: str = "default_session",
        use_cache: bool = True,
        cache_dir: str = ".cache",
    ):
        self.app_name = app_name
        self.user_id = user_id
        self.session_id = session_id
        self.use_cache = use_cache
        self.cache_dir = cache_dir

        # Persist config for rebuilds or in-place updates
        self._agent_name = agent_name
        self._model_name = model_name
        self._instruction = instruction

        # The current tool list we consider "source of truth"
        self.tools: List[Callable[..., Any]] = list(tools or [])

        # In-memory sessions are perfect for local/dev
        self.session_service = InMemorySessionService()
        self._session_task: Optional[asyncio.Task] = None

        # Create or schedule the session
        try:
            loop = asyncio.get_running_loop()
        except RuntimeError:
            # No running loop → safe to block synchronously
            asyncio.run(self._create_session())
        else:
            # Loop running → schedule and await later in run_async()
            self._session_task = loop.create_task(self._create_session())

        # Build agent/runner once
        self._build_agent_and_runner()

    async def _create_session(self):
        # idempotent; ADK will handle duplicates
        await self.session_service.create_session(
            app_name=self.app_name,
            user_id=self.user_id,
            session_id=self.session_id,
        )

    def _normalize_model(self, model_name: str) -> str:
        """Normalize to OpenRouter-style id once."""
        return model_name if model_name.startswith("openrouter/") else f"openrouter/{model_name}"

    def _build_agent_and_runner(self) -> None:
        """(Re)build Agent + Runner from current config + self.tools."""
        model_id = self._normalize_model(self._model_name)
        self.agent = Agent(
            name=self._agent_name,
            model=LiteLlm(
                model=model_id,
                api_key=config.OPEN_ROUTER_API_KEY,
                api_base=config.OPEN_ROUTER_API_BASE,
            ),
            instruction=self._instruction,
            tools=self.tools,
        )
        self.runner = Runner(agent=self.agent, app_name=self.app_name, session_service=self.session_service)

    # ------------------------------------------------------------------
    # Tool management
    # ------------------------------------------------------------------

    def set_tools(self, tools: List[Callable[..., Any]]) -> None:
        """
        Replace the tool list.
        1) Try to update in-place (`self.agent.tools = tools`).
        2) If ADK or the pydantic model refuses, rebuild agent/runner.
        """
        self.tools = list(tools or [])
        try:
            # Many ADK versions allow this (pydantic model kept mutable for lists)
            self.agent.tools = self.tools
        except Exception:
            # Fall back to a full rebuild if in-place update isn't supported
            self._build_agent_and_runner()

    def add_tool(self, tool: Callable[..., Any]) -> None:
        """Append a tool; prefer in-place update, rebuild only if needed."""
        if not callable(tool):
            raise TypeError("Tool must be callable")
        self.tools.append(tool)
        try:
            self.agent.tools = self.tools
        except Exception:
            self._build_agent_and_runner()

    def remove_tool(self, tool: Callable[..., Any]) -> None:
        """Remove a tool if present."""
        try:
            self.tools.remove(tool)
        except ValueError:
            return
        try:
            self.agent.tools = self.tools
        except Exception:
            self._build_agent_and_runner()

    # ------------------------------------------------------------------
    # Run
    # ------------------------------------------------------------------

    # @pickle_cache(cache_subdir="run_async")
    async def run_async(self, message_text: str) -> Optional[str]:
        # If session creation was scheduled at __init__, await it now
        if self._session_task and not self._session_task.done():
            await self._session_task

        user_msg = types.Content(role="user", parts=[types.Part(text=message_text)])
        final_text = None
        try:
            async for event in self.runner.run_async(
                user_id=self.user_id,
                session_id=self.session_id,
                new_message=user_msg,
            ):
                if event.is_final_response():
                    try:
                        final_text = event.content.parts[0].text
                    except Exception:
                        final_text = None
            return final_text
        except Exception as e:
            print(f"Error running agent: {e}")
            return None
<!-- gh-comment-id:3407493727 --> @doncat99 commented on GitHub (Oct 15, 2025): Indeed, I prefer Google Grounding as well. I prefer Google ADK and A2A protocol for multi-agent communication. Below is my agent code for your reference. ``` # utilities/base/base_agent.py from __future__ import annotations import os import pickle import hashlib from typing import Optional, List, Callable, Any from pathlib import Path from functools import wraps import asyncio from openinference.instrumentation.google_adk import GoogleADKInstrumentor from google.adk.agents import LlmAgent as Agent from google.adk.models.lite_llm import LiteLlm from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.genai import types import litellm from config import config # set OPENINFERENCE_DISABLED=1 to disable tracing if needed if not config.OPENINFERENCE_DISABLED: GoogleADKInstrumentor().instrument() # optional: verbose HTTP/debug for LiteLLM routing if config.DEBUG: try: litellm._turn_on_debug() except Exception: pass def pickle_cache(cache_subdir: str): """Decorator to cache function results using pickle files on disk.""" def decorator(func): @wraps(func) async def wrapper(self, *args, **kwargs): if not getattr(self, "use_cache", False) or not getattr(self, "cache_dir", None): return await func(self, *args, **kwargs) cache_dir = Path(self.cache_dir) / cache_subdir cache_dir.mkdir(parents=True, exist_ok=True) key = hashlib.sha256(pickle.dumps((args, kwargs))).hexdigest() fp = cache_dir / f"{key}.pkl" if fp.exists(): return pickle.loads(fp.read_bytes()) result = await func(self, *args, **kwargs) if result is not None: fp.write_bytes(pickle.dumps(result)) return result return wrapper return decorator class BaseAgent: """ ADK-based base agent that supports dynamic tool updates. Key points: - Builds one Agent/Runner up-front. - `set_tools` tries to update tools IN-PLACE (`self.agent.tools = ...`). If the underlying ADK version requires a rebuild, we fall back to `_build_agent_and_runner()`. - Safe in both sync and async environments (lazy session creation). """ def __init__( self, agent_name: str, model_name: str, instruction: str, tools: Optional[List[Callable[..., Any]]] = None, app_name: str = "default_app", user_id: str = "default_user", session_id: str = "default_session", use_cache: bool = True, cache_dir: str = ".cache", ): self.app_name = app_name self.user_id = user_id self.session_id = session_id self.use_cache = use_cache self.cache_dir = cache_dir # Persist config for rebuilds or in-place updates self._agent_name = agent_name self._model_name = model_name self._instruction = instruction # The current tool list we consider "source of truth" self.tools: List[Callable[..., Any]] = list(tools or []) # In-memory sessions are perfect for local/dev self.session_service = InMemorySessionService() self._session_task: Optional[asyncio.Task] = None # Create or schedule the session try: loop = asyncio.get_running_loop() except RuntimeError: # No running loop → safe to block synchronously asyncio.run(self._create_session()) else: # Loop running → schedule and await later in run_async() self._session_task = loop.create_task(self._create_session()) # Build agent/runner once self._build_agent_and_runner() async def _create_session(self): # idempotent; ADK will handle duplicates await self.session_service.create_session( app_name=self.app_name, user_id=self.user_id, session_id=self.session_id, ) def _normalize_model(self, model_name: str) -> str: """Normalize to OpenRouter-style id once.""" return model_name if model_name.startswith("openrouter/") else f"openrouter/{model_name}" def _build_agent_and_runner(self) -> None: """(Re)build Agent + Runner from current config + self.tools.""" model_id = self._normalize_model(self._model_name) self.agent = Agent( name=self._agent_name, model=LiteLlm( model=model_id, api_key=config.OPEN_ROUTER_API_KEY, api_base=config.OPEN_ROUTER_API_BASE, ), instruction=self._instruction, tools=self.tools, ) self.runner = Runner(agent=self.agent, app_name=self.app_name, session_service=self.session_service) # ------------------------------------------------------------------ # Tool management # ------------------------------------------------------------------ def set_tools(self, tools: List[Callable[..., Any]]) -> None: """ Replace the tool list. 1) Try to update in-place (`self.agent.tools = tools`). 2) If ADK or the pydantic model refuses, rebuild agent/runner. """ self.tools = list(tools or []) try: # Many ADK versions allow this (pydantic model kept mutable for lists) self.agent.tools = self.tools except Exception: # Fall back to a full rebuild if in-place update isn't supported self._build_agent_and_runner() def add_tool(self, tool: Callable[..., Any]) -> None: """Append a tool; prefer in-place update, rebuild only if needed.""" if not callable(tool): raise TypeError("Tool must be callable") self.tools.append(tool) try: self.agent.tools = self.tools except Exception: self._build_agent_and_runner() def remove_tool(self, tool: Callable[..., Any]) -> None: """Remove a tool if present.""" try: self.tools.remove(tool) except ValueError: return try: self.agent.tools = self.tools except Exception: self._build_agent_and_runner() # ------------------------------------------------------------------ # Run # ------------------------------------------------------------------ # @pickle_cache(cache_subdir="run_async") async def run_async(self, message_text: str) -> Optional[str]: # If session creation was scheduled at __init__, await it now if self._session_task and not self._session_task.done(): await self._session_task user_msg = types.Content(role="user", parts=[types.Part(text=message_text)]) final_text = None try: async for event in self.runner.run_async( user_id=self.user_id, session_id=self.session_id, new_message=user_msg, ): if event.is_final_response(): try: final_text = event.content.parts[0].text except Exception: final_text = None return final_text except Exception as e: print(f"Error running agent: {e}") return None ```
Author
Owner

@AJaySi commented on GitHub (Oct 16, 2025):

You should also checkout metaphor websets and tavily AI. ALwrity using google grounding for SERP analysis.

Me and @uniqueumesh experimented with CrewAI, when ADK, A2A, MCP et al were not even coined. Its gathering dust somewhere in the codebase. Following are the reasons, for agent-framework loathing:

  • If, AI agent is an AI + Tool calling and + collaboration among agents gives AI agent team, then what I am doing as a software engineer ?

  • I am required to make my own tooling, prompt engineer agents and then throw a problem at them, hoping for the best(out-of-tokens). Its like a black box. Of course, iteratively, I will tweak each agent to shut-up and talk to other ones and get that from that agent and also talk to the manager, if this is not good enough, do HITL also. This is too much of uncertainty for me and an overkill.

  • I like my software to be deterministic, io driven, phased and iteratively built, where each step, produces a result I can predict and pass on. This can be easily achieved, when we dont make softwares for the AI agents, but for the end users.

  • Agent frameworks are too chatty, I have not come across an AI agent implementation that could not have been done but using plain glue coding and passing result from one AI to another, getting context from tooling and moving on to next step of shitty code.

  • A simple example is like 'Hey, agents, write me SEO optimized Blog on AI Agents, that will rank in top 10'. Any AI will NOT refuse and give you an answer, An AI agent team will work 100x, chatting and making you convinced, the post will go viral (and i can fly without my paraglider...).

Without AI framework, we need to web research, get your target audience, gsc, bing analytics, checks your existing blogs, trending topics, analyze competitors, fact/hallucinations checks, outline, draft, SEO metadata generation, supporting articles for other social media platforms, publish, analyze/monitor, edit/update, remarket. Thus, To get the above done AI is only 20% and 80% is existing shitty-glue-code the world is used to. So, the idea is to make ALwrity a more intelligent platform than its AI, we only need AI to talk back in human and machines languages, reasoning is in algos, tool calling is shitty-glue-code.


ALwrity is environmentally conscious AI-first team, we will not give an extra joule of energy to AI and will starve them, whenever possible. We will fine-tune our own SLM and glue them together, without them even knowing that there is another AI in the workflow, let one small AI do, one small thing, very well.

I will be working on this soon : https://openrouter.ai/docs/use-cases/oauth-pkce

<!-- gh-comment-id:3409134154 --> @AJaySi commented on GitHub (Oct 16, 2025): You should also checkout metaphor websets and tavily AI. ALwrity using google grounding for SERP analysis. Me and @uniqueumesh experimented with CrewAI, when ADK, A2A, MCP et al were not even coined. Its gathering dust somewhere in the codebase. Following are the reasons, for agent-framework loathing: - If, AI agent is an AI + Tool calling and + collaboration among agents gives AI agent team, then what I am doing as a software engineer ? - I am required to make my own tooling, prompt engineer agents and then throw a problem at them, hoping for the best(out-of-tokens). Its like a black box. Of course, iteratively, I will tweak each agent to shut-up and talk to other ones and get that from that agent and also talk to the manager, if this is not good enough, do HITL also. This is too much of uncertainty for me and an overkill. - I like my software to be deterministic, io driven, phased and iteratively built, where each step, produces a result I can predict and pass on. This can be easily achieved, when we dont make softwares for the AI agents, but for the end users. - Agent frameworks are too chatty, I have not come across an AI agent implementation that could not have been done but using plain glue coding and passing result from one AI to another, getting context from tooling and moving on to next step of shitty code. - A simple example is like 'Hey, agents, write me SEO optimized Blog on AI Agents, that will rank in top 10'. Any AI will NOT refuse and give you an answer, An AI agent team will work 100x, chatting and making you convinced, the post will go viral (and i can fly without my paraglider...). Without AI framework, we need to web research, get your target audience, gsc, bing analytics, checks your existing blogs, trending topics, analyze competitors, fact/hallucinations checks, outline, draft, SEO metadata generation, supporting articles for other social media platforms, publish, analyze/monitor, edit/update, remarket. Thus, To get the above done AI is only 20% and 80% is existing shitty-glue-code the world is used to. So, the idea is to make ALwrity a more intelligent platform than its AI, we only need AI to talk back in human and machines languages, reasoning is in algos, tool calling is shitty-glue-code. ------------ ALwrity is environmentally conscious AI-first team, we will not give an extra joule of energy to AI and will starve them, whenever possible. We will fine-tune our own SLM and glue them together, without them even knowing that there is another AI in the workflow, let one small AI do, one small thing, very well. I will be working on this soon : https://openrouter.ai/docs/use-cases/oauth-pkce
Author
Owner

@AJaySi commented on GitHub (Oct 30, 2025):

Ok, So, I will be supporting huggingface response API: https://huggingface.co/docs/inference-providers/guides/responses-api

Note: This is implemented in AI blog writer and committed, I will now change the onboarding process with it.
Use : HF_TOKEN and GPT_PROVIDER=gemini|huggingface_response_api

"""
The Responses API (from OpenAI) provides a unified interface for model interactions with Hugging Face Inference Providers. Use your existing OpenAI SDKs to access features like multi-provider routing, event streaming, structured outputs, and Remote MCP tools.
"""

<!-- gh-comment-id:3466219481 --> @AJaySi commented on GitHub (Oct 30, 2025): Ok, So, I will be supporting huggingface response API: https://huggingface.co/docs/inference-providers/guides/responses-api Note: This is implemented in AI blog writer and committed, I will now change the onboarding process with it. Use : HF_TOKEN and GPT_PROVIDER=gemini|huggingface_response_api """ The Responses API (from OpenAI) provides a unified interface for model interactions with Hugging Face Inference Providers. Use your existing OpenAI SDKs to access features like multi-provider routing, event streaming, structured outputs, and Remote MCP tools. """
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ALwrity#550
No description provided.