proxq: Async HTTP for Backends That Won't Stop Timing Out

Some HTTP requests take forever. LLM completions, image generation, video transcoding, big report builds, slow webhook processors — they all sit there chewing on something while your reverse proxy in front of them is screaming about timeouts and your client is staring at a hung connection wondering if it should retry. The usual fix is to bolt a job queue on the backend, expose a “submit” endpoint that returns an ID, and another endpoint to poll for the result. Now you’ve rewritten your API, every client has to learn the new dance, and you still need to keep the synchronous version around for the fast endpoints.

I got tired of doing this every time. So I built proxq — a Redis-backed HTTP proxy that turns any backend into an async job queue without touching the backend at all.

What It Does

You drop proxq in front of your existing service. It accepts any HTTP request — any method, any path, any body — serializes the whole thing into Redis via asynq, hands you back a 202 Accepted with a job ID immediately, and works through the queue in the background. You poll the job ID later to get the upstream’s response replayed exactly: same status code, same headers, same body. As if you’d called upstream directly, just decoupled in time.

Client           proxq            Redis          Upstream
 |                 |                 |               |
 |-- POST /foo --> |                 |               |
 |<- 202 {jobId} - |                 |               |
 |                 |-- enqueue ----> |               |
 |  (go touch      |                 |               |
 |   grass)        |                 | <- worker --- |
 |                 |                 |    wakes up   |
 |                 |                 | -----------> (request)
 |                 |                 | <----------- (response)
 |                 |                 |               |
 |-- GET /{jobId}->|                 |               |
 |<- {status} ---- |                 |               |
 |                 |                 |               |
 |-- GET /content->|                 |               |
 |<- {response} -- |                 |               |
 |                 |                 |               |
 |-- PUT /big ---> | --------- direct proxy ------>  |
 |<- {response} -- | <-----------------------------  |

No SDK to integrate. No new endpoints to design on the backend. Your upstream service has no idea any of this is happening — proxq forwards your original headers, body, and method like a regular reverse proxy. The only difference is the request happens later and you fetch the result with a second call.

Quick Start

services:
  proxq:
    image: psyb0t/proxq
    ports:
      - "8080:8080"
    environment:
      PROXQ_CONFIG: /etc/proxq/config.yaml
    configs:
      - source: proxq_config
        target: /etc/proxq/config.yaml
    depends_on:
      - redis
  redis:
    image: redis:7-alpine
configs:
  proxq_config:
    content: |
      listenAddress: "0.0.0.0:8080"
      redis:
        addr: "redis:6379"
      upstreams:
        - prefix: "/"
          url: "http://your-api:3000"

That’s the whole thing. Your API now returns instantly, processes asynchronously, and lets clients poll for results whenever they feel like it.

Bypass When It Makes Sense

Not everything needs to be queued. Queuing a 1 GB upload through Redis is a memory bomb. Queuing a WebSocket handshake is meaningless. Queuing your /health endpoint is just adding latency to a check that exists specifically to be fast.

proxq bypasses the queue automatically for:

WebSocket upgrades (Connection: upgrade + Upgrade: websocket)
Chunked transfers (Transfer-Encoding: chunked) — size unknown, could be huge
Bodies larger than directProxyThreshold (10 MB by default)
Per-upstream pathFilter matches — regex blacklist or whitelist mode

Bypassed requests get reverse-proxied through normally (streamed in both directions) or returned as a 307 Temporary Redirect to the upstream URL, depending on directProxyMode. Either way, your fast endpoints stay fast and your big uploads don’t get buffered into Redis.

Multiple Upstreams

One proxq, many backends. Path-prefix routing with longest-match wins. The matched prefix gets stripped before forwarding, so your upstreams don’t need to know they’re behind a proxy.

upstreams:
  - prefix: "/api"
    url: "http://api:3000"
    timeout: "5m"
    maxRetries: 3
    pathFilter:
      mode: "blacklist"
      patterns:
        - "^/api/auth"
        - "^/api/health"
  - prefix: "/ml"
    url: "http://ml-service:8080"
    timeout: "15m"
  - prefix: "/uploads"
    url: "http://file-server:9000/storage"
    timeout: "10m"
    maxBodySize: 1073741824
    directProxyMode: "redirect"

Each upstream gets its own timeout, retry policy, body size limits, cache rules, and bypass behavior. Auth and health hit the API directly without queuing. ML calls get a 15-minute window to chew on whatever they’re chewing on. Uploads to the file server get a 1 GB limit and clients are redirected straight to upstream so the bytes don’t pass through proxq at all.

Routing Rules That Don’t Surprise You

A prefix matches when the request path equals the prefix exactly, or starts with the prefix followed by /. So /api matches /api/users and /api, but never /api2. No accidental cross-matching, no regex footguns. The matched prefix gets stripped before forwarding, query strings are preserved.

Request                    Prefix    Forwarded path
GET /api/users?page=1      /api      GET /users?page=1
GET /api                   /api      GET /
POST /uploads/img.png      /uploads  POST /img.png
GET /anything              /         GET /anything

Upstream URLs can include a path. The stripped request path gets appended to it, so you can mount a backend’s subroute under any prefix you want:

upstreams:
  - prefix: "/files"
    url: "http://storage:9000/bucket/data"
# GET /files/img.png       → GET http://storage:9000/bucket/data/img.png
# GET /files               → GET http://storage:9000/bucket/data/

proxq validates everything at startup and refuses to run if anything’s ambiguous: at least one upstream is required; each needs both prefix and url; a single upstream can use prefix: "/" as a catch-all but you can’t use it alongside others; nested prefixes (/api + /api/v2 together) are an error; no prefix can collide with jobsPath; path filter regexes must be valid. Bad config = no startup. Better than discovering routing weirdness in production.

The API

Three endpoints under jobsPath (default /__jobs). Everything else gets proxied.

POST /api/heavy-thing       → 202 Accepted, {"jobId": "..."}
GET  /__jobs/{id}           → {"status": "queued|running|completed|failed"}
GET  /__jobs/{id}/content   → upstream's exact response replayed
DELETE /__jobs/{id}         → cancel

The status lifecycle is straightforward: queued while waiting for a worker, running while the worker is hitting upstream or sleeping between retries, completed once a response is stored (even if upstream returned a 4xx or 5xx — that’s still a successful round-trip), failed only if the transport itself broke after retries are exhausted. The underlying asynq states (pending/scheduled/aggregating, active/retry, completed, archived) are collapsed into those four to keep the surface clean.

Concrete shapes:

POST /api/heavy-thing HTTP/1.1
Content-Type: application/json
Authorization: Bearer token
{"data": "lots of it"}
→ HTTP/1.1 202 Accepted
  X-Proxq-Source: proxq
  {"jobId": "550e8400-e29b-41d4-a716-446655440000"}
GET /__jobs/550e8400-e29b-41d4-a716-446655440000
→ HTTP/1.1 200 OK
  {"id": "550e8400-...", "status": "completed",
   "completedAt": "2026-04-29T12:00:00Z"}
GET /__jobs/550e8400-e29b-41d4-a716-446655440000/content
→ HTTP/1.1 200 OK
  Content-Type: application/json
  X-Custom-Header: from-upstream
  {"result": "done"}

Failed jobs return their failure reason: {"id": "...", "status": "failed", "error": "forward request: dial tcp: connection refused"}. Cancellation is a DELETE against the job ID, returning {"status": "cancelled"}.

Every response proxq generates carries an X-Proxq-Source: proxq header. Responses replayed from upstream don’t have it. That’s how you tell the two apart — useful when upstream returns a 404 and you want to know whether it’s “the job doesn’t exist (proxq saying so)” or “upstream said nothing’s there (and that’s the answer).” Same trick distinguishes proxq’s own 502 (no upstream matched the path) from a 502 your upstream actually emitted.

Caching

Same request twice? Don’t bother upstream. proxq has an optional response cache with three modes: none, memory (in-process LRU), or redis (shared across instances).

cache:
  mode: "redis"
  ttl: "10m"
  redisKeyPrefix: "proxq:"

Cache key is sha256(method + url + headers + body). Volatile headers (X-Request-ID, X-Forwarded-For, X-Real-IP, X-Forwarded-Proto) are excluded from the key by default so they don’t bust cache hits. Override per upstream with cacheKeyExcludeHeaders if you want to also ignore Authorization, trace IDs, or whatever else changes per request without affecting the response.

Any method is cacheable — yes, even POST. Same body produces the same key, different body misses. Only 2xx responses get stored. Cached hits come back with X-Cache-Status: HIT.

Retries Done Right

The retry semantics matter. A “failed” job in proxq means the transport broke — connection refused, network timeout, DNS resolution died. An upstream returning 500 Internal Server Error is a completed job, because upstream answered. The 500 is the answer. Storing it as a result and letting the client decide what to do is the correct behavior. Retrying a 500 silently and pretending it didn’t happen is how you turn a transient bug into a debugging nightmare.

Per-upstream maxRetries with either fixed retryDelay or exponential backoff (n^4 seconds: 1s, 16s, 81s, ~4m, ~10m). Configure once, forget about it.

Headers, the Boring-but-Correct Bits

proxq behaves like a proper reverse proxy when forwarding. Every original request header is preserved end-to-end. On top of that, three forwarding headers are injected for upstream’s benefit:

X-Forwarded-For — original client IP
X-Real-IP — original client IP, alternate spelling some backends prefer
X-Forwarded-Proto — original request scheme (http or https)

Hop-by-hop headers (Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization, TE, Trailers, Transfer-Encoding, Upgrade) get stripped per RFC 7230 because forwarding them across hops is wrong and breaks pipelining behavior in subtle ways. Hop-by-hop is hop-by-hop. Don’t argue with the RFC.

On the way back out: X-Cache-Status: HIT or MISS on cached responses when caching is enabled, and X-Proxq-Source: proxq on every response proxq itself generates (the 202, the 502 for unmatched paths, the 307 redirects, the 404 from job endpoints, internal proxy errors). Never set on responses replayed from upstream — that’s the bright line for telling proxq’s voice apart from upstream’s.

Per-Request Timeout Override (v0.9)

The per-upstream timeout in config.yaml sets the default ceiling for how long proxq will wait for upstream to answer. That works fine until you have one slow-ass request that legitimately needs more time than the rest of the traffic going through the same upstream. Cranking the upstream timeout up to accommodate the worst case makes every fast request also wait that long before it gets declared dead, which is exactly backwards.

v0.9 added the X-Proxq-Timeout request header so individual callers can override the timeout for a single job:

curl -X POST https://proxq.example.com/v1/chat/completions \
  -H "X-Proxq-Timeout: 5m" \
  -H "Content-Type: application/json" \
  -d '{"model": "deepseek-r1", "messages": [...]}'

Values use Go duration syntax — 30s, 2m30s, 1h, 1h30m. Bare numbers without a unit aren’t accepted. Invalid values return 400 Bad Request with X-Proxq-Source: proxq on the response so you can tell it was proxq rejecting the request, not upstream. Omit the header and the per-upstream config timeout applies as before — pure opt-in, zero breakage for existing clients.

The Global Knobs

The top-level config has the boring-but-load-bearing knobs:

listenAddress: "0.0.0.0:8080"   # bind address
redis:
  addr: "redis:6379"
  password: ""
  db: 0
queue: "default"                # asynq queue name
concurrency: 10                 # how many workers hammer upstream simultaneously
jobsPath: "/__jobs"             # base path for the jobs API
taskRetention: "1h"             # how long completed/failed jobs live in Redis

concurrency is the worker pool size — bump it up if you have headroom and a slow backend with parallelism, leave it low if upstream is fragile. taskRetention bounds how long a client has to come back and pick up a result before Redis garbage-collects it. jobsPath is configurable so it can’t collide with anything legitimate your backend serves; pick something with a leading double-underscore or whatever obscure prefix you like, proxq will check it doesn’t overlap any upstream’s prefix at startup. Durations use Go syntax: 30s, 5m, 1h, 1h30m.

Drop-in OpenAI Client

proxq ships with a Go client that’s a drop-in replacement for openai-go. Swap one line and your entire SDK — chat completions, embeddings, images, audio, all of it — routes through the proxq async queue transparently.

import proxqopenai "github.com/psyb0t/docker-proxq/pkg/clients/openai"
// Before: client := openai.NewClient(option.WithAPIKey("sk-..."))
// After:
client := proxqopenai.NewClient(proxqopenai.Config{
    ProxqBaseURL: "https://proxq.example.com",
    APIKey:       "sk-...",
    Timeout:      5 * time.Minute,  // v0.9.1: auto-injects X-Proxq-Timeout
})
resp, err := client.Chat.Completions.New(ctx, openai.ChatCompletionNewParams{
    Model:    openai.ChatModelGPT4o,
    Messages: []openai.ChatCompletionMessageParamUnion{
        openai.UserMessage("hello"),
    },
})

The client injects a custom http.RoundTripper into the SDK. Non-streaming requests get enqueued, polled, and returned as if you called OpenAI directly. Streaming and direct-proxied responses pass through as-is. Your HTTPClient settings (TLS config, timeouts, cookie jar) are fully preserved. Same code, same types, same return values — but now your LLM calls don’t time out behind a 30-second reverse proxy.

v0.9.1 added the Config.Timeout field. Set it to anything non-zero and the transport auto-injects X-Proxq-Timeout on every outgoing request — proxq then uses that value instead of the per-upstream config default. Leave it as the zero value and no header is sent, falling back to the upstream’s configured timeout. Useful when you’ve got one client that talks to reasoning models which take 5+ minutes per call alongside other clients hitting the same upstream for snappy chat completions.

When to Reach for It

Slow APIs behind reverse proxies with short timeouts. Your CDN or nginx has a request timeout. Your backend sometimes takes longer. Stick proxq between them, return a job ID instantly, let the worker take however long the backend actually needs.

Webhook relays. Fire webhooks without blocking the sender. Queue them, deliver at your own pace, retry transport failures automatically. The sender gets a fast 202, you handle the actual delivery in the background.

Mixed sync/async APIs. Auth and health are fast. Reports and exports are slow. Path filter blacklists the fast ones (they bypass the queue), everything else gets queued. One config, two behaviors, no separate infrastructure.

LLM and ML endpoints. Long completions, slow inference, anything where the response time is unpredictable. Queue the call, poll for the result, stop fighting timeouts.

Anything I forgot to mention but you can think of. Pipelines that emit large outputs, batch processors, scheduled deliveries — if it’s HTTP and slow, proxq will eat it.

Already Powering aigate

proxq is what backs the /q/ route on my personal AI gateway, aigate. LLM requests that would otherwise sit hanging behind a Cloudflare timeout get enqueued, processed by upstream models in the background, and picked up by clients whenever the response is ready. The /v1/ route stays synchronous for fast completions; /q/ is for the long ones. Same backend (LiteLLM), same models, two completely different latency profiles, all driven by the same proxy logic. That’s proxq earning its keep in production right now.

Bottom Line

proxq is the missing piece between “I have an HTTP service” and “I have an HTTP service that doesn’t time out.” Drop it in front of any backend, point it at Redis, get async behavior for free. Caching is built in. Retries are built in. WebSocket and large-body bypass are built in. The OpenAI SDK shim is built in.

It’s MIT-style licensed (do whatever you want, if it breaks you keep both pieces), tested with testcontainers-based e2e suites, and small enough to read in an afternoon. Source: github.com/psyb0t/docker-proxq. Image: psyb0t/proxq on Docker Hub.

Stop hanging connections. Stop fighting reverse-proxy timeouts. Ship the request, walk away, come back when it’s done.