Project | API Performance: Pooling and Parallelism | Daniel Joffe

Overview

Project: Performance and reliability pass across audit-api and job-api, two FastAPI services running on Railway behind a Next.js frontend

Role: Solo Developer

Duration: April 2026

Purpose: Eliminate per-request resource allocation, parallelize independent I/O operations, and harden authentication across both API services in the Nx monorepo

The problems

Both services shared the same bad habit: they built expensive resources on every request and ran independent operations one after another when they didn't have to.

audit-api launched a fresh Chromium instance for every scan, waited for networkidle (which hangs forever on sites with persistent connections), and ran Lighthouse and axe-core back to back even though the two share no state. Some sites blocked the headless browser with bot detection, and you'd get cryptic errors like ERR_HTTP2_PROTOCOL_ERROR.

job-api spun up a new httpx.AsyncClient per ATS fetch, tearing down and rebuilding TCP connections every time, created a new Supabase client per dependency-injection call, and ran four database operations sequentially per source during polling.

A security review turned up two more gaps: the JWT decode paths didn't require exp or sub claims, and the ALLOWED_HOSTS configuration could silently fall back to permissive behavior.

Audit-api: browser pool and parallel scanning

I swapped the per-scan browser launches for a persistent BrowserPool that keeps one Chromium process alive across scans. Each scan gets a fresh browser context for cookie and state isolation, then closes it; an idle timer shuts the browser down after 30 minutes of inactivity, and on app shutdown the FastAPI lifespan handler calls pool.shutdown().

Navigation moved from wait_until="networkidle" to wait_until="load", with a 10-second best-effort networkidle follow-up that never blocks. A realistic Chrome User-Agent header cut down on bot-detection triggers, and when sites still blocked the scanner, a pattern-matching function mapped the raw Playwright errors to messages a person could actually read.

Lighthouse and axe-core now run in parallel via asyncio.gather with return_exceptions=True. Lighthouse runs as a subprocess and axe runs inside the Playwright page, so they share nothing. The ScanQueue itself stays serialized because Lighthouse uses global performance marks that collide whenever two runs overlap.

The serialized ScanQueue feeds one scan at a time into a persistent browser pool that keeps a single Chromium process alive across scans, handing each scan a fresh isolated context. Within a scan, Lighthouse (a Node subprocess) and axe-core (inside the Playwright page) run in parallel and are joined with asyncio.gather, which produces the grade.

lighthouse_task = asyncio.create_task(_run_lighthouse(url, device))
axe_task = asyncio.create_task(_run_axe_and_capture(url, device))
 
lighthouse_result, axe_results = await asyncio.gather(
    lighthouse_task, axe_task, return_exceptions=True
)

Job-api: connection reuse and DB parallelism

A shared httpx.AsyncClient singleton replaced the per-request client creation. The pool caps out at 20 connections and 10 keepalive, with a 15-second timeout and automatic redirect following, and all six ATS fetchers (Greenhouse, Lever, Ashby, Workday, SmartRecruiters, JSON-LD) share it. The client closes on app shutdown.

def get_http_client() -> httpx.AsyncClient:
    global _client
    if _client is None or _client.is_closed:
        _client = httpx.AsyncClient(
            timeout=15.0,
            limits=httpx.Limits(
                max_connections=20, max_keepalive_connections=10
            ),
            follow_redirects=True,
        )
    return _client

The Supabase client follows the same singleton pattern: it's initialized once at startup via the FastAPI lifespan and shared across all requests.

For the poller, I found two phases of independent database operations. Phase 1 runs the upsert of new and updated jobs alongside the query that fetches existing rows to find stale jobs; Phase 2 runs the archive of stale jobs alongside the update of the last_polled_at timestamp:

# Phase 1: upsert + existing query in parallel
upsert_resp, existing_resp = await asyncio.gather(
    asyncio.to_thread(upsert_query.execute),
    asyncio.to_thread(existing_query.execute),
)
 
# Phase 2: archive stale + update timestamp in parallel
await asyncio.gather(
    asyncio.to_thread(archive_query.execute),
    asyncio.to_thread(last_polled_query.execute),
)

The asyncio.to_thread calls aren't optional here, because supabase-py is a synchronous client. Without them, await never yields the event loop and the operations run one after another despite the asyncio.gather.

Security hardening

The JWT decode paths across all three auth strategies now insist on exp and sub claims:

payload = jwt.decode(
    token,
    s.admin_session_secret,
    algorithms=["HS256"],
    options={"require": ["exp", "sub"]},
)

The ALLOWED_HOSTS configuration now raises a RuntimeError at import time when it's unset, so the server can't even start with permissive defaults.

Results

Change	Impact
Persistent browser pool	Eliminated 2-4s cold start per scan
`load` + best-effort `networkidle`	No more hanging scans on persistent-connection sites
Parallel Lighthouse + axe	Two independent processes overlap instead of running sequentially
Shared httpx client	TCP connections reused across all ATS fetchers
Supabase singleton	One client per app lifetime, not per request
Phased `asyncio.gather` on DB ops	4 serial DB round trips reduced to 2 parallel phases
JWT `exp`/`sub` requirement	Tokens without expiration or subject rejected

Putting numbers on it

The audit-api wins predate the telemetry I keep now, so they stay qualitative. The job-api changes I could measure, so I did — a read-only benchmark that toggles each optimization off and on against the live service, forty trials apiece.

Connection reuse was the one that mattered. Fetching a board eight times with a fresh httpx.AsyncClient per call — a new TCP and TLS handshake every time — averaged 1,860 ms; the shared pooled client did the same work in 947 ms. The tail moved further than the mean: p95 dropped from roughly 8.9 s to 2.6 s, because every cold handshake was another chance to land on a slow path.

Overlapping the two independent reads in a poll phase with asyncio.gather and to_thread took that phase from 284 ms to 191 ms at the mean, 279 ms to 172 ms at the median — about a third. That fits: the phase now costs the slower query instead of the sum of the two. (Its p95 wandered the other way, but at forty trials I trust the mean and median more than the tail.) Reusing the Supabase client instead of rebuilding it per call handed back the ~3.4 ms of construction it had been spending on every request.

Change (job-api)	Before → after
Shared `httpx` client, 8 fetches	1,860 → 947 ms mean (−49%); p95 8.9 s → 2.6 s
Phased `asyncio.gather` DB reads	284 → 191 ms mean (−33%); p50 279 → 172 ms
Reused Supabase client	~3.4 ms of per-request construction removed

These are a benchmark snapshot from June 2026, not production telemetry — they size each effect rather than pin it, and they'll drift with network and load.

Takeaway

Pool the expensive resources at the process level and isolate per-request state with lightweight contexts; parallelize the I/O that doesn't share mutable state. And check that async actually buys you concurrency, because a synchronous library wrapped in async def still blocks the loop unless you hand it off with asyncio.to_thread.