Skip to main content

Reduce Token Usage & Optimize Costs

Agent-CoreX cuts costs by 30–70% by eliminating unnecessary tool schemas from your LLM context. Learn how to maximize savings.

The Cost Problem

Traditional AI agents include ALL tools in every request:
Agent thinking → LLM prompt + 100 tool schemas (50k tokens) → LLM response
Cost: $0.75 per request

With Agent-CoreX:
Agent thinking → Query Agent-CoreX → Get 5 relevant tools (5k tokens) → LLM response
Cost: $0.20 per request (73% savings!)

How Agent-CoreX Saves Tokens

  1. Dynamic Selection - Only relevant tools in context
  2. Semantic Matching - Find right tools without tool names
  3. Caching - Reuse tool lists when possible
  4. Batching - Combine similar queries

Cost Optimization Strategies

1. Use Specific Queries

// ❌ Vague (uses more tokens)
await agent.retrieveTools({
  query: "do something with GitHub"
});

// ✅ Specific (fewer tokens)
await agent.retrieveTools({
  query: "Create a pull request on GitHub with code changes"
});
Savings: 15-20% per query

2. Set Lower top_k

// ❌ Request all tools (200 tokens)
await agent.retrieveTools({
  query: "Deploy",
  topK: 50
});

// ✅ Request what you need (50 tokens)
await agent.retrieveTools({
  query: "Deploy",
  topK: 3
});
Savings: 30-40% per query

3. Use Server Filters

// ❌ Search all 100+ servers (higher cost)
await agent.retrieveTools({
  query: "Deploy"
});

// ✅ Filter to AWS only (lower cost)
await agent.retrieveTools({
  query: "Deploy",
  filter: {
    server: "aws-mcp"
  }
});
Savings: 20-25% per query

4. Cache Tool Lists

const toolCache = new Map();

async function getCachedTools(query) {
  // Check cache (1 hour TTL)
  if (toolCache.has(query)) {
    const cached = toolCache.get(query);
    if (Date.now() - cached.time < 3600000) {
      return cached.tools;
    }
  }

  // Not in cache, fetch
  const tools = await agent.retrieveTools({ query });
  toolCache.set(query, { tools, time: Date.now() });
  return tools;
}
Savings: 60-80% for repeated queries

5. Batch Similar Requests

// ❌ Three separate queries
const deployTools = await agent.retrieveTools({
  query: "Deploy"
});
const monitorTools = await agent.retrieveTools({
  query: "Monitor"
});
const notifyTools = await agent.retrieveTools({
  query: "Notify"
});

// ✅ One combined query
const allTools = await agent.retrieveTools({
  query: "Deploy, monitor, and notify"
});
Savings: 50-70% on this operation

Cost Calculation

Before Agent-CoreX

Scenario: ChatGPT agent with 100 tools

Per request:
- Prompt (2k tokens): $0.01
- 100 tool schemas (48k tokens): $0.72
- Completion (1k tokens): $0.03
Total: $0.76

1000 requests/month: $760

With Agent-CoreX

Per request:
- Prompt (2k tokens): $0.01
- Retrieve tools (200 tokens): $0.002
- 5 tool schemas (3k tokens): $0.04
- Completion (1k tokens): $0.03
Total: $0.082

1000 requests/month: $82
Savings: 89%!

Monitoring Your Costs

  1. Go to Dashboard → Usage
  2. View costs by server
  3. Track trends over time
  4. Set budget alerts

Cost Optimization Checklist

Quick Wins

✅ Be specific in queries ✅ Reduce top_k to 3-5 ✅ Cache tool lists ✅ Filter by server

Advanced

✅ Batch requests ✅ Implement rate limiting ✅ Use scheduled jobs ✅ Monitor trends

Real-World Example

A customer reduced costs from 2,400/monthto2,400/month to 280/month:
  1. Implemented caching (40% savings)
  2. Reduced top_k from 20 to 5 (30% savings)
  3. Used server filters (15% savings)
  4. Combined similar queries (20% savings)
Total: 89% cost reduction
Next Step: Authentication Guide →