RosalindDBRosalindDB
HomeDocsBlog
View RosalindDB on GitHub
View RosalindDB on GitHub
RosalindDBRosalindDB

An object-storage-first vector database for cold and bursty workloads. Apache 2.0.

View RosalindDB on GitHub

Project

  • GitHub
  • License (Apache 2.0)
  • Issues

Read

  • Documentation
  • MCP server
  • Blog

© 2026 RosalindDB contributors. Apache License 2.0.

Privacy

    Documentation

    • Quickstart
    • Architecture
    • Datasets
    • Query
    • MCP server
    • Multi-tenant mode
    • Authentication
    • Rate limits & quotas

    Rate limits & quotas

    When RB_ENABLE_QUOTAS=true, two per-tenant quotas cap what can be stored and queried, and a short-term token-bucket rate limiter protects the service from bursts. These are the default per-tenant limits when RB_ENABLE_QUOTAS=true.

    OSS default: this page does not apply

    The OSS default is RB_ENABLE_QUOTAS=false. In that mode the rate-limit dependency is a no-op, the ingest and query handlers skip the admission checks, and GET /auth/usage returns { "enabled": false }. A self-hoster's own queries are never throttled.

    Flip RB_ENABLE_QUOTAS=true to enforce the per-tenant vector + daily-query quotas and the per-key rate limiter. The schema columns exist either way — only the runtime checks are gated. See Architecture · Two env switches.

    Default limits

    What a default tenant gets. The daily query counter resets at UTC midnight (00:00:00 UTC).

    QuotaDefaultResets
    Vectors stored100,000never (cumulative)
    Queries per day10,000daily, at UTC midnight

    GET /auth/usage

    Check usage

    Check current usage against your quotas at any time from any JWT-authenticated client.

    curl -s http://localhost:8080/auth/usage \
      -H "Authorization: Bearer $RB_API_KEY"

    Response (HTTP 200):

    {
      "vectors_used": 12500,
      "vector_quota": 100000,
      "queries_today": 342,
      "daily_query_quota": 10000,
      "queries_reset_at": "2026-05-16"
    }

    queries_reset_at is a YYYY-MM-DD date string — the UTC day the daily counter was last reset (effectively "today"). queries_today is lazily zeroed on the first usage call of a new UTC day. The call performs that reset before reading, so the values are never stale from a previous day.

    What happens at the limit

    Quota breaches return HTTP 429 with the standard error envelope (see Authentication).

    • vector_quota_exceeded — an upload to POST /v1/datasets/{name}/vectors (or a bulk POST /v1/datasets/{name}/imports) would push stored vectors past the 100,000 cap. The upload is rejected whole — there is no partial acceptance up to the cap. details carries limit and used.
    • query_quota_exceeded — a POST /v1/query after the daily 10,000-query cap is reached. details carries limit and reset_at.

    Sample 429 query_quota_exceeded body:

    {
      "error": {
        "code": "query_quota_exceeded",
        "message": "Daily query quota exceeded for this tenant",
        "details": {
          "limit": 10000,
          "reset_at": "2026-05-16"
        }
      }
    }

    Per-key rate limit

    Separate from the quotas above: a short-term limit on request rate per API key. Each API key gets roughly 50 requests/second sustained. Bursting past that returns 429 rate_limited:

    {
      "error": {
        "code": "rate_limited",
        "message": "Rate limit exceeded; slow down and retry",
        "details": { "limit_rps": 50, "burst": 100 }
      }
    }

    Treat rate_limited as transient: retry the request with exponential backoff and jitter. The limiter is an in-memory token bucket and best-effort in the MVP — it is process-local, so it resets on restart and is not shared across pods. It is applied to the customer-facing /v1/* endpoints only; the /auth/* surface is not rate-limited. Requests authenticated with a JWT are bucketed per tenant rather than per key.

    On this page

    • Default limits
    • Check usage
    • What happens at the limit
    • Per-key rate limit