Rate limits
A rate limit defines the maximum number of requests an application can make to the API within a specified time period. This mechanism is crucial to maintain the overall health and reliability of the services.
Why we use rate limiting
- Prevents a single client from overwhelming servers with excessive requests.
- Ensures consistent and predictable response times for all clients.
- Protects backend services from unexpected traffic spikes.
- Maintains a stable and controlled load pattern on the infrastructure.
- Strengthens defense against attack attempts, such as brute force.
- Mitigates the risk of distributed denial-of-service (DDoS).
- Reduces the impact caused by poorly implemented integrations.
- Limits potential damage resulting from compromised API credentials.
- Ensures balanced access to API resources for all clients.
- Prevents abusive usage by some from degrading other users' experience.
- Allows prioritizing traffic according to business criteria and needs.
- Encourages more efficient and sustainable API consumption practices.
How rate limiting works
Each API call counts toward the allowed requests-per-second limit. The default value is 10 RPS per tenant. When the limit is exceeded, the API returns HTTP 429 Too Many Requests.
Exact limits are configured per tenant. Contact your Onboarding team to confirm the limits applied to your account before sizing for high-throughput integrations.
Handling 429 errors
If your operation will have an increase in request volume (temporarily or permanently), notify the Unico team to raise the rate limit for your environment. This request should be made before the volume actually increases to avoid your application becoming inoperative.
- Audit your code to identify inefficient API usage patterns.
- Check for unintended loops or redundant API calls.
- Distribute requests more evenly over time instead of sending them in large bursts.
- Cache frequently accessed data that rarely changes.
- Use the OAuth2 access token for its full 1-hour TTL — do not call
POST /oauth2/tokenper request. - Implement appropriate cache invalidation strategies.
Subscribe to Webhooks and Events for PROCESS_STATE_FINISHED rather than polling GET /client/v1/process/\{id\}. A polling loop can easily exhaust the GET-process budget on heavy traffic.
What happens when you hit the limit
The platform returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
| Header | Meaning |
|---|---|
Retry-After | Number of seconds to wait before retrying. Honor it. |
If Retry-After is missing, fall back to exponential backoff (1s, 2s, 4s, 8s, …) capped at 60s.
Concurrency considerations
Rate limits cap requests per minute, but the platform also has concurrency caps at the infrastructure level. Even if you have budget for 100 requests/minute, firing all 100 in the same second is more likely to be rejected than spreading them across the minute.
For high-throughput integrations, target a steady RPS rather than bursts.
Magic Link webhook delivery
Trully has its own delivery quotas for Webhook V2. The webhook server is expected to respond within 1 minute; slower responses are dropped (the user's process is not affected). For consistent processing, acknowledge the webhook fast and process it asynchronously.
What's next
- Authentication — token caching strategy.
- Webhooks and Events — replacing polling with push.
- Error codes > Retry policy — when (and when not) to retry.