Project | Multi-ATS Job Pipeline: FastAPI Behind Next.js

Updated 2026-04-17: expanded from a single-ATS Greenhouse poller to six providers (Lever, Ashby, Workday, SmartRecruiters, JSON-LD), added one-input ATS auto-detection, and cut scan time from 48 seconds to 8.

Overview

Project: Multi-ATS job scraper, scoring engine, and admin dashboard for my active job search

Role: Solo Developer

Duration: April 2026

Purpose: Replace daily manual scanning of dozens of company career pages with a single ranked list of relevant postings, refreshed automatically, hosted under a password-protected admin route on the portfolio

Business impact

Polls 49 career boards across six ATS providers (Greenhouse, Lever, Ashby, Workday, SmartRecruiters, and a JSON-LD fallback) and stores every posting with a score
Cut the time to triage a day's new listings from about 40 minutes to under 5
Filtered thousands of raw postings per scan down to a US-only, role-matched top 20
Cut scan runtime from 48 seconds to 8 across three performance passes, well under Vercel's 60-second function ceiling
Gave me one authenticated surface for every job-search tool instead of separate password prompts per dashboard

The challenge

The original pipeline only spoke Greenhouse: ten hand-curated boards, one fetcher, a scoring engine, and a dashboard, and that held up for about a week. Then I needed Lever, then Ashby, then a Workday instance, then two Workdays on different tenants, then a careers page that rendered through JavaScript and exposed nothing but JSON-LD in the HTML. The hard parts were:

Unifying five wildly different ATS APIs behind one fetcher contract
Detecting a provider automatically from either a careers URL or a plain company name, so adding a source is one input instead of three
Scraping JSON-LD job postings without pulling in a dependency for something Python's stdlib can do
Keeping the scan button under 10 seconds even with 49 sources fanning out concurrently
Scoring postings against my actual target profile without hand-curating each one
Authenticating a Next.js admin dashboard against a Python service without stuffing an API key into the browser
Running a Python service inside a JavaScript monorepo without duct tape

Architecture

The pipeline has three services cooperating through Supabase:

Six ATS providers feed the FastAPI job-api, which polls, scores, and sanitizes postings then writes them to Supabase Postgres (job_sources, job_postings, job_status_log). A Vercel cron triggers the poller with an x-api-key. The admin browser only talks to Next.js proxy routes under /api/jobs, which verify the admin JWT cookie and forward a bearer token to FastAPI; the dashboard at /tools/admin/jobs reads through those same proxy routes.

The admin browser only ever talks to the Next.js app; Next.js verifies the admin JWT cookie in proxy.ts and forwards the session token to FastAPI as a bearer credential. The scraper itself runs from a Vercel cron that authenticates with x-api-key. So that's one backend, two valid credentials, and one source of truth for sessions.

The scraper

The ATS clients and poller live in apps/job-api/app/services/, and the loop is small: list the sources, fetch each board with the right fetcher, diff against stored postings, score the new ones, and write them back.

Each ATS has its own fetcher, but every fetcher returns the same StandardJob dataclass, so the poller just dispatches off the provider column on job_sources:

FETCHERS: dict[str, Fetcher] = {
    "greenhouse": fetch_board_jobs,
    "lever": fetch_lever_jobs,
    "ashby": fetch_ashby_jobs,
    "workday": fetch_workday_jobs,
    "smartrecruiters": fetch_smartrecruiters_jobs,
    "jsonld": fetch_jsonld_jobs,
}

Adding a new ATS means adding a fetcher and a row to that dict; the poller, scoring, sanitization, and dashboard don't change at all.

The JSON-LD fetcher is the fallback for career pages with no public API. It reads the page, parses every <script type="application/ld+json"> block with Python's stdlib html.parser, and normalizes the three shapes you find in the wild (single object, array, @graph) into StandardJob rows, all without adding a single new dependency to the scraping layer.

The scoring engine is a weighted keyword config with five tiers: role titles, core technologies, domain skills, seniority signals, and negative keywords. A senior React/Next.js role scores high, while a junior PHP contract lands in the negative zone and never surfaces. The weights live in version control, so recalibrating the filter is a PR rather than a UI toggle.

HTML descriptions get sanitized on write with bleach and a tag allowlist. Assuming a third-party API hands you safe HTML is the wrong bet, so stripping tags before the row lands in Postgres means every consumer (the dashboard, email, backup export) inherits the safety without having to remember it.

Auto-detecting the provider

Adding a source used to mean three decisions: the provider dropdown, the board token, and the company name. For every new company I'd open the careers page, work out the ATS, copy the right slug, and paste it in. The dashboard now does all of that from one input:

Company name or careers URL: stripe
                             ↓
              Detected: Greenhouse (stripe), 142 jobs

The detect_ats service first tries to parse a known ATS URL pattern (boards.greenhouse.io/*, jobs.lever.co/*, jobs.ashbyhq.com/*, *.myworkdayjobs.com/*, careers.smartrecruiters.com/*). If the input is a bare slug, it probes each provider's public API and keeps the first one that responds with a non-empty board. That's one input covering five providers, and a collapsible "Advanced" pane preserves manual entry for the edge cases that don't fit the pattern.

Making the scan button fast

Fanning out across 49 sources exposed a chain of bottlenecks, none of which had shown up with ten Greenhouse boards. A single scan started the day at 48 seconds and ended it at 8, through three targeted fixes: concurrent polling with asyncio.gather, batched Supabase writes to collapse the N+1 round-trip pattern, and asyncio.to_thread on every .execute() so supabase-py's sync client stops blocking the event loop.

The most instructive bit is that the first fix did nothing at all on its own. I wrote the whole walk-through up separately: asyncio.gather is not enough for a sync client.

Auth across two runtimes

The scraper runs unattended on a cron schedule, so it needs a credential that doesn't expire. The dashboard runs in a browser, so it shouldn't be holding a long-lived API key.

The FastAPI service accepts either, and both paths are constant-time:

apps/job-api/app/dependencies.py

def verify_api_key_or_session(
    request: Request,
    key: str | None = Security(api_key_header),
    s: Settings = Depends(get_settings),
) -> str:
    if _api_key_matches(key, s.job_api_key):
        return "api-key"
    token = _extract_bearer_token(request)
    if token:
        try:
            payload = jwt.decode(token, s.admin_session_secret, algorithms=["HS256"])
        except jwt.PyJWTError:
            pass
        else:
            if payload.get("sub") == "tools-admin":
                return "session"
    raise HTTPException(status_code=401, detail="Unauthorized")

The Next.js app mints the JWT on /tools/login with jose, stores it as an httpOnly cookie, and verifies it in proxy.ts for every /tools/admin/* request. The Python service verifies the same HS256 signature with pyjwt. So it's one shared secret across two runtimes, and there are no cross-origin quirks because the browser only ever hits same-origin Next.js proxy routes.

The poll endpoint deliberately stays API-key-only, since a session cookie isn't something cron has anyway.

Running Python inside Nx

The service lives in apps/job-api/ next to the Next.js and Playwright workspaces. Nx has no Python integration, which turned out not to matter, because all Nx needs to do is dispatch commands:

apps/job-api/project.json

{
  "name": "job-api",
  "targets": {
    "dev": {
      "executor": "nx:run-commands",
      "options": {
        "command": "uv run --package job-api uvicorn app.main:app --reload --port 8000",
        "cwd": "apps/job-api"
      }
    },
    "test": {
      "executor": "nx:run-commands",
      "options": {
        "command": "uv run --package job-api pytest -v",
        "cwd": "apps/job-api"
      }
    },
    "lint": {
      "executor": "nx:run-commands",
      "options": {
        "command": "uv run --package job-api ruff check .",
        "cwd": "apps/job-api"
      }
    },
    "mypy": {
      "executor": "nx:run-commands",
      "options": {
        "command": "uv run --package job-api mypy app/",
        "cwd": "apps/job-api"
      }
    }
  }
}

A single uv workspace at the repo root locks every Python dependency, and a dedicated ci-python GitHub Actions job runs pnpm nx run-many -t lint test mypy -p job-api on every non-docs PR, gated by ci-status alongside the Node checks. Mypy runs in strict mode, ruff runs with an opinionated select list, and pytest covers 129 tests across sanitize, scoring, dependencies, schemas, every ATS fetcher, the ATS detector, the poller, and all three routers.

The uv-in-Nx plumbing got its own blog post too: Running a uv Python workspace inside an Nx monorepo.

The deploy

FastAPI runs on Railway. The Docker build uses the monorepo root as the build context so it can see the workspace lockfile:

COPY pyproject.toml uv.lock ./
COPY apps/job-api/pyproject.toml ./apps/job-api/pyproject.toml
RUN uv sync --frozen --no-dev --no-editable --package job-api
COPY apps/job-api/app ./apps/job-api/app

railway.toml binds the container to $PORT and points Railway's healthcheck at /health, and a TrustedHostMiddleware wired off an ALLOWED_HOSTS env var refuses requests with forged Host headers. The only public endpoint is /health; everything else sits behind verify_api_key or verify_api_key_or_session.

The result

The dashboard shows a ranked table of everything the poller has found in the last 30 days, with a detail view that renders the sanitized description and a status column I can flip to applied, interviewing, or rejected. Each status change appends a row to job_status_log, so the history stays immutable and exportable.

What it replaced was a folder of browser tabs and a Notion page I kept updating by hand, and what it bought me is actually knowing, at any moment, which postings are new and which are worth the next round of applications. Every piece of the system is small on its own; the value comes from having them all talking to each other behind one login.

Takeaway

Full-stack projects across two languages are easier than they sound as long as each tool owns one job: uv owns Python dependency resolution, Nx owns task dispatch, Next.js owns the browser boundary, and FastAPI owns the database. Pick a shared JWT secret, put constant-time comparisons on both sides, and the rest really is just plumbing.