Skip to main content
The OSS benchmark harness freezes cohorts and re-runs the same repositories across scanner iterations. It is used to reduce false positives without moving the target after seeing results. The harness lives in the CLI repository under tools/oss-benchmark.mjs.

Default cohort

The default cohort uses GitHub Trending repositories from these language tracks:
  • typescript
  • python
  • go
  • rust
  • ruby
  • php
  • java
The default limit is 10 repositories per language.

Deterministic scan command

Each benchmark scan runs:
AISLOP_NO_TELEMETRY=1 DO_NOT_TRACK=1 CI=1 NO_COLOR=1 node dist/cli.js scan "<repo>" --json
That disables telemetry, disables ANSI output, and keeps output stable for overnight runs.

Workflow

Build the CLI first:
pnpm build
Capture a new frozen cohort:
pnpm bench:trending:capture
Run the latest frozen cohort:
pnpm bench:trending:run -- --iteration pass-1
Capture and run in one command:
pnpm bench:trending:cycle -- --iteration pass-1
Re-run the same cohort after rule fixes:
pnpm bench:trending:run -- --manifest tools/benchmark-data/cohorts/trending-daily-2026-05-29.json --iteration pass-2
pnpm bench:trending:run -- --manifest tools/benchmark-data/cohorts/trending-daily-2026-05-29.json --iteration pass-3
Run a smaller smoke test:
pnpm bench:trending:cycle -- --languages typescript,php --limit 1 --iteration smoke

Output layout

Generated benchmark data is ignored by git and written under tools/benchmark-data/.
PathContents
cohorts/*.jsonFrozen repository lists
repos/<language>/<owner>__<repo>/Cached clones
runs/<run-id>/summary.jsonMachine-readable aggregate report
runs/<run-id>/summary.mdHuman review report
runs/<run-id>/repos/.../scan.jsonRaw per-repo scan JSON
runs/<run-id>/repos/.../stdout.txtCaptured stdout
runs/<run-id>/repos/.../stderr.txtCaptured stderr
runs/<run-id>/repos/.../metadata.jsonReproduction metadata

Review loop

  1. Check failures first so the cohort is complete.
  2. Check the lowest-score repositories.
  3. Check the highest-volume rules across many repositories.
  4. Open per-repo scan.json files for likely false positives.
  5. Fix the rules and add regression tests.
  6. Re-run the same manifest as the next iteration.
Only refresh the cohort when you want a new market snapshot. For rule iteration, keep the manifest stable.