OSS benchmark harness - Scanaislop Documentation

The OSS benchmark harness freezes cohorts and re-runs the same repositories across scanner iterations. It is used to reduce false positives without moving the target after seeing results. The harness lives in the CLI repository under tools/oss-benchmark.mjs.

Default cohort

The default cohort uses GitHub Trending repositories from these language tracks:

typescript
python
go
rust
ruby
php
java

The default limit is 10 repositories per language.

Deterministic scan command

Each benchmark scan runs:

AISLOP_NO_TELEMETRY=1 DO_NOT_TRACK=1 CI=1 NO_COLOR=1 node dist/cli.js scan "<repo>" --json

That disables telemetry, disables ANSI output, and keeps output stable for overnight runs.

Workflow

Build the CLI first:

pnpm build

Capture a new frozen cohort:

pnpm bench:trending:capture

Run the latest frozen cohort:

pnpm bench:trending:run -- --iteration pass-1

Capture and run in one command:

pnpm bench:trending:cycle -- --iteration pass-1

Re-run the same cohort after rule fixes:

pnpm bench:trending:run -- --manifest tools/benchmark-data/cohorts/trending-daily-2026-05-29.json --iteration pass-2
pnpm bench:trending:run -- --manifest tools/benchmark-data/cohorts/trending-daily-2026-05-29.json --iteration pass-3

Run a smaller smoke test:

pnpm bench:trending:cycle -- --languages typescript,php --limit 1 --iteration smoke

Output layout

Generated benchmark data is ignored by git and written under tools/benchmark-data/.

Path	Contents
`cohorts/*.json`	Frozen repository lists
`repos/<language>/<owner>__<repo>/`	Cached clones
`runs/<run-id>/summary.json`	Machine-readable aggregate report
`runs/<run-id>/summary.md`	Human review report
`runs/<run-id>/repos/.../scan.json`	Raw per-repo scan JSON
`runs/<run-id>/repos/.../stdout.txt`	Captured stdout
`runs/<run-id>/repos/.../stderr.txt`	Captured stderr
`runs/<run-id>/repos/.../metadata.json`	Reproduction metadata

Review loop

Check failures first so the cohort is complete.
Check the lowest-score repositories.
Check the highest-volume rules across many repositories.
Open per-repo scan.json files for likely false positives.
Fix the rules and add regression tests.
Re-run the same manifest as the next iteration.

Only refresh the cohort when you want a new market snapshot. For rule iteration, keep the manifest stable.

​Default cohort

​Deterministic scan command

​Workflow

​Output layout

​Review loop

Default cohort

Deterministic scan command

Workflow

Output layout

Review loop