api-inference

Rate Limits

The Inference API has rate limits based on the number of requests. These rate limits are subject to change in the future to be compute-based or token-based.

Serverless API is not meant to be used for heavy production applications. If you need higher rate limits, consider Inference Endpoints to have dedicated resources.

You need to be authenticated (passing a token or through your browser) to use the Inference API.

User Tier	Rate Limit
Signed-up Users	1,000 requests per day
PRO and Enterprise Users	20,000 requests per day

< > Update on GitHub