AIM Operational Modes
The ApiInteractionManager (AIM) operates in different modes, dictating how it configures the primary agent, its tools, and its system prompt. Understanding these modes is key to leveraging AIM effectively, whether you use it directly or via the AgentB facade (which infers the mode).
Currently, the primary supported modes are:
genericOpenApihierarchicalPlanner
(An older mode, toolsetsRouter, is conceptually evolving into hierarchicalPlanner as the latter provides a more robust and explicit way to achieve delegation.)
1. genericOpenApi Mode
genericOpenApi ModePurpose: To allow an agent to interact with a single API defined by an OpenAPI specification. The agent can either use tools generated for specific API operations or a fallback "generic HTTP request" tool.
Configuration (ApiInteractionManagerOptions):
mode: 'genericOpenApi'genericOpenApiProviderConfig: OpenAPIConnectorOptions:You provide the
OpenAPIConnectorOptionswhich include the API spec (or URL to it), authentication details, etc.Crucially, set
sourceIdin these options (e.g.,sourceId: 'myApiV1').
Agent Setup:
Primary Agent: Typically
BaseAgent(or your customagentImplementationif provided).Tool Provider: A single
OpenAPIConnectorinstance configured withgenericOpenApiProviderConfig.Available Tools:
If
genericOpenApiProviderConfig.tagFilteris not set (orincludeGenericToolIfNoTagFilteris true, which is default):The agent gets tools for all operations in the OpenAPI spec.
It also gets the
GenericHttpApiTool(namedgenericHttpRequest). This allows the LLM to make arbitrary calls to the API if it can't find a specific operation tool or prefers a more direct approach.
If
genericOpenApiProviderConfig.tagFilteris set:The agent only gets tools for operations matching that specific tag.
The
GenericHttpApiToolis typically not added in this case, as the intent is to focus on a subset of operations.
System Prompt: Generated by
generateGenericHttpToolSystemPrompt. It instructs the LLM on how to use the available operation-specific tools and/or thegenericHttpRequesttool, listing details of the API's operations.
Use Cases:
Building an agent that acts as a natural language interface to a single, well-defined API (e.g., "Hey agent, get my user profile from the User API").
When you want the LLM to have broad access to an API's capabilities, including potentially calling endpoints that don't have specific, fine-grained tools generated for them.
Example Flow:
User: "Fetch my order history for the last month."
AIM (in
genericOpenApimode for an e-commerce API):LLM sees tools like
getOrderHistory,getProductDetails, andgenericHttpRequest.LLM decides to call
getOrderHistorywith{"period": "last_month"}.
OpenAPIConnectorexecutes the actual API call forgetOrderHistory.Result is returned to LLM, which then formulates a user-friendly response.
2. hierarchicalPlanner Mode
hierarchicalPlanner ModePurpose:
To enable a primary "planning" agent to break down complex, multi-step tasks and delegate sub-tasks to specialized "worker" agents. Each worker agent is equipped with a focused set of tools (an IToolSet).
Configuration (ApiInteractionManagerOptions):
mode: 'hierarchicalPlanner'toolsetOrchestratorConfig: ToolProviderSourceConfig[]:An array defining multiple sources of tools (e.g., different OpenAPI specs, or different tag-based groupings from a single large spec).
The
ToolsetOrchestratoruses these configs to create variousIToolSetinstances. EachIToolSetrepresents the capabilities of a potential "specialist" or "worker" agent.
agentImplementation?: new () => IAgent:If not provided or set to
BaseAgent, AIM typically defaults to usingPlanningAgentas the primary agent.You can explicitly provide
PlanningAgentor your own custom planning agent class here.
Agent Setup:
Primary Agent:
Usually
PlanningAgent. This agent is designed to reason about a task, break it into steps, and delegate.Its system prompt is typically
DEFAULT_PLANNER_SYSTEM_PROMPT, guiding it on this planning and delegation process.
Tool Provider for Planner: The
PlanningAgentis primarily given one main tool: theDelegateToSpecialistTool(internally named something likedelegateToSpecialistAgent).DelegateToSpecialistTool:When the
PlanningAgentcalls this tool, it specifies:specialistId: The ID of theIToolSet(i.e., the specialist) to delegate to.subTaskDescription: A clear instruction for the specialist.requiredOutputFormat(optional): How the planner wants the result.
The
DelegateToSpecialistToolthen:Retrieves the specified
IToolSetfrom theToolsetOrchestrator.Instantiates a temporary "worker" agent (e.g.,
BaseAgent).Equips this worker agent only with the tools from the chosen
IToolSet.Runs the worker agent with the
subTaskDescription.Returns the worker agent's final result back to the
PlanningAgent.
Worker Agents: These are not long-lived. They are instantiated on-demand by the
DelegateToSpecialistToolfor a single sub-task. They operate with a focused set of tools and their own isolated context.
Use Cases:
Handling complex user requests that require multiple steps or capabilities from different APIs/domains (e.g., "Book a flight to Paris for next Monday, find a hotel near the Eiffel Tower, and add both to my calendar.").
Organizing a large number of available tools into manageable, logical groups (specialists) so the primary planning LLM isn't overwhelmed.
Improving reliability and reasoning by focusing worker agents on narrower tasks.
Example Flow:
User: "Find the latest news about AI, summarize the top 3 articles, and then draft a tweet about the most interesting one."
AIM (in
hierarchicalPlannermode):PlanningAgent(Primary Agent) receives the request.Thought: "I need to first find news. The 'NewsSearchSpecialist' can do this."
Action: Calls
delegateToSpecialistAgentwithspecialistId: 'NewsSearchSpecialist_ID',subTaskDescription: "Find recent top news articles about AI".
DelegateToSpecialistToolexecutes:Instantiates a worker agent with tools from
NewsSearchSpecialist_ID(e.g., awebSearchTool).Worker agent runs, uses
webSearchTool, gets news articles.Worker agent returns a list of articles.
PlanningAgentreceives the list of articles.Thought: "Now I need to summarize. The 'TextSummarizationSpecialist' is good for this."
Action: Calls
delegateToSpecialistAgentwithspecialistId: 'TextSummarizationSpecialist_ID',subTaskDescription: "Summarize these articles: [article data...]".
DelegateToSpecialistToolexecutes again with the summarization specialist.PlanningAgentreceives summaries.Thought: "Now draft a tweet. The 'SocialMediaDraftingSpecialist' can help."
Action: Calls
delegateToSpecialistAgent...
Finally, the
PlanningAgentassembles the final response for the user.
Choosing the Right Mode
Simple, single-API interaction?
genericOpenApiis often a good starting point.Complex tasks needing multiple steps, different types of tools, or managing many API capabilities?
hierarchicalPlanneris more powerful and scalable. It promotes better task decomposition and reasoning.
The AgentB facade attempts to choose an appropriate default mode (usually hierarchicalPlanner if multiple ToolProviderSourceConfigs are registered, or genericOpenApi if only one). However, for explicit control and clarity in more complex applications, instantiating ApiInteractionManager directly with your chosen mode is recommended.
Last updated