orchagent enforces rate limits to ensure fair usage across the platform.
| Tier | Daily Calls | Concurrent Requests |
|---|
| Free | 1,000 | 10 |
| Pro | 10,000 | 50 |
| Enterprise | Custom | Custom |
Sandbox Agent Limits
Code runtime and managed loop agents have separate limits for compute time (sandbox execution):
| Tier | Daily Calls | Max Timeout | Compute Hours |
|---|
| Free | 50 | 30s | Included |
| Pro | 500 | 5min | Included |
| Enterprise | Custom | Custom | Custom |
Sandbox agent limits are separate from direct LLM agent limits. You can use both with their respective quotas.
Check your compute usage:
# Via CLI
orch usage --compute
# Via API
GET /usage/compute
Paid Calls
Paid agent calls (where credits are charged) bypass free tier daily limits and use a separate abuse protection cap of 100,000 calls/day. This prevents paid usage from being blocked by free tier limits.
How Limits Are Counted
Top-Level Calls
Each call to an agent counts as 1 call against your daily limit:
orch run acme/summarizer --data '{"text": "..."}' # +1 call
Orchestrator Calls
When you call an orchestrator that calls other agents, only the top-level call counts:
You → security-review → leak-finder
→ vuln-scanner
→ license-checker
Your daily count: +1 (not +4)
The orchestrator’s sub-calls are handled internally and don’t count against your limit.
Every response includes headers showing your limit status:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 995
X-RateLimit-Reset: 1704067200
| Header | Description |
|---|
X-RateLimit-Limit | Your daily limit |
X-RateLimit-Remaining | Remaining calls today |
X-RateLimit-Reset | Unix timestamp when limit resets |
Rate Limit Errors
When you exceed your limit:
{
"error": {
"code": "RATE_LIMITED",
"message": "Rate limit exceeded. Try again after 2024-01-16T00:00:00Z",
"is_retryable": true,
"suggested_wait_time": 3600
},
"metadata": {
"request_id": "req_abc123"
}
}
HTTP Status: 429 Too Many Requests
Timeouts
Request Timeouts
| Setting | Default | Maximum |
|---|
| Author-configured | 60s | 300s |
| Platform-enforced | — | 300s |
Authors set timeout in their manifest:
{
"timeout_seconds": 120
}
Timeout Propagation
For orchestrators, timeouts propagate through the call chain:
remaining_time = original_deadline - elapsed_time
If a sub-call would exceed the remaining time, it fails fast with TIMEOUT.
Composition Limits
Max Hops
Limits how deep agent-to-agent calls can go:
Caller → Agent A → Agent B → Agent C
↑ ↑ ↑
hop 1 hop 2 hop 3
Effective limit: min(caller's max_hops, agent's max_hops)
Downstream Cap
Controls the budget passed to each downstream dependency call. This limits what each called agent can spend in further downstream calls — it does not limit the current agent’s own call count:
{
"manifest": {
"per_call_downstream_cap": 100
}
}
Handling Rate Limits
Check Before Calling
import httpx
response = httpx.get(
"https://api.orchagent.io/usage",
headers={"Authorization": f"Bearer {api_key}"}
)
usage = response.json()
remaining = usage["calls_remaining"]
Implement Backoff
import time
import httpx
def call_with_retry(url, data, max_retries=3):
for attempt in range(max_retries):
response = httpx.post(url, json=data)
if response.status_code == 429:
wait_time = int(response.headers.get("Retry-After", 60))
time.sleep(wait_time)
continue
return response
raise Exception("Max retries exceeded")
JavaScript Example
async function callWithRetry(url, data, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(data),
});
if (response.status === 429) {
const waitTime = parseInt(response.headers.get("Retry-After") || "60");
await new Promise((resolve) => setTimeout(resolve, waitTime * 1000));
continue;
}
return response;
}
throw new Error("Max retries exceeded");
}
Upgrading Limits
Pro Plan
- 10,000 calls/day
- 50 concurrent requests
- Priority support
Enterprise
- Custom limits
- SLA guarantees
- Dedicated support
Contact [email protected] for enterprise pricing.
Service Limits
Always-on services have separate limits from on-demand agent runs:
| Tier | Concurrent Services | Max Instances per Service |
|---|
| Pro | 5 | 3 |
| Team | 20 | 10 |
| Enterprise | Custom | Custom |
Service compute time is metered by runtime minutes and counts toward your workspace usage. See Billing for details.
Best Practices
- Check remaining calls before batch operations
- Implement exponential backoff for 429 responses
- Cache responses when appropriate
- Use webhooks instead of polling when available
- Monitor usage in the dashboard