Blog | asyncio.gather Is Not Enough for a Sync Client | Daniel Joffe

The problem

A scan button in my jobs dashboard was creeping up to 48 seconds, and Vercel's Hobby tier kills functions at 60, so the margin was getting uncomfortable. That button triggers a FastAPI poller that fans out across 49 career boards over six ATS providers (Greenhouse, Lever, Ashby, Workday, SmartRecruiters, and a JSON-LD fallback fetcher), pulls thousands of postings, filters them, and writes the survivors to Supabase. One button, one scan, the whole pipeline's cost staring back at you the moment you click.

The poller was already async: httpx fetched in parallel, a for loop iterated the sources, and every real operation called await. It looked concurrent.

Fix 1: actually concurrent polling

Kill the for loop, reach for asyncio.gather with a bounded semaphore:

semaphore = asyncio.Semaphore(POLL_CONCURRENCY)
 
async def _worker(source):
    async with semaphore:
        return await _poll_one_source(source, supabase)
 
summaries = await asyncio.gather(*(_worker(s) for s in sources))

Ran it. Still 48 seconds.

asyncio.gather should have overlapped 49 sources across 10 concurrent workers, and the HTTP fetches clearly weren't the bottleneck since httpx does its I/O asynchronously. Something else was blocking, and I didn't know what yet.

Fix 2: batch the DB writes

Per source, the old poller did one upsert per job:

for job in jobs:
    supabase.table("job_postings").upsert(row).execute()

Across 49 sources with a few hundred jobs each, that's on the order of thousands of round trips per scan. One bulk upsert per source collapses the fan-out, and a single .in_() archive call replaces the stale-job update loop:

if rows_to_upsert:
    supabase.table("job_postings").upsert(
        rows_to_upsert, on_conflict="source_id,external_id"
    ).execute()
 
if stale_ids:
    supabase.table("job_postings").update(
        {"status": "archived"}
    ).in_("id", stale_ids).execute()

Scan time: 48s → 22s. Half the runtime gone, and still too slow for a button that fires synchronously from the browser.

Fix 3: offload the sync client to a thread pool

Here's the thing I kept missing: supabase-py is a synchronous client, so every .execute() is a plain blocking call, and calling it from an async function doesn't magically make it non-blocking; it just runs right at the line where await would normally yield.

When my ten workers each called .execute() inside their async with semaphore, they weren't overlapping any DB I/O at all. Each worker grabbed the event loop, held it for a 30-100ms round trip, then released it, so ten workers "running concurrently" were really executing their DB calls one after another. Pure theater.

asyncio.to_thread fixes this by handing the sync call off to Python's default thread pool executor:

upsert_query = supabase.table("job_postings").upsert(
    rows_to_upsert, on_conflict="source_id,external_id"
)
upsert_resp = await asyncio.to_thread(upsert_query.execute)

Each worker now submits its blocking call to a thread, releases the event loop, and gets resumed when the thread returns. Ten workers, ten overlapping DB round trips, real concurrency this time.

Scan time: 22s → 8s.

What the three fixes added up to

Change	Runtime
Sequential for-loop	48s
`asyncio.gather` + semaphore	48s (no change)
Batched upsert and archive	22s
`asyncio.to_thread` on every `.execute()`	8s

Fix 1 on its own did nothing, and if I'd stopped at fix 2 and declared victory at 22s, I'd have shipped theater and called it a win. The diagnosis that actually mattered wasn't "concurrency is broken"; it was "I wrote async but my library still blocks, so gather is a shape, not a behavior."

The full poller

async def _poll_one_source(source, supabase):
    # fetch jobs, filter, score ...
 
    if rows_to_upsert:
        query = supabase.table("job_postings").upsert(
            rows_to_upsert, on_conflict="source_id,external_id"
        )
        await asyncio.to_thread(query.execute)
 
    existing_query = (
        supabase.table("job_postings")
        .select("id, external_id")
        .eq("source_id", source_id)
        .not_.in_("status", ["saved", "applied", "archived"])
    )
    existing_resp = await asyncio.to_thread(existing_query.execute)
 
    if stale_ids:
        archive_query = (
            supabase.table("job_postings")
            .update({"status": "archived"})
            .in_("id", stale_ids)
        )
        await asyncio.to_thread(archive_query.execute)
 
 
async def poll_all_sources(supabase):
    sources_query = supabase.table("job_sources").select("*").eq("enabled", True)
    sources_resp = await asyncio.to_thread(sources_query.execute)
    sources = sources_resp.data or []
 
    semaphore = asyncio.Semaphore(10)
 
    async def _worker(source):
        async with semaphore:
            return await _poll_one_source(source, supabase)
 
    return await asyncio.gather(*(_worker(s) for s in sources))

That same poller also powers a JSON-LD fallback fetcher that extracts schema.org/JobPosting from any careers page using Python's stdlib html.parser, with zero new dependencies. It's a small aside, but worth noting: once the concurrency model is honest, bolting on a slow fetcher doesn't compound into a slow scan.

Takeaway

async is just a shape you put on a function; concurrency is whether the thing inside that function actually yields. A sync library wrapped in async def is still blocking, and asyncio.to_thread is how you make it yield. Run the timer before and after every fix and it'll tell you which theater you just bought.