Goals
- Turn repeated AI-agent failure modes into deterministic rules.
- Keep rule quality honest by scanning real repositories, not only fixtures.
- Publish methods and limits so research posts are credible.
- Feed product decisions with evidence about which rules matter, where noise appears, and what teams need to govern AI-written code.
Public scan protocol
For every public research run:Define the cohort before scanning
Record the selection rule, such as GitHub Trending by language, top npm packages, benchmark tasks, framework repositories, or a public nominated list. Do not swap repositories after seeing results unless the reason is disclosed.
Pin every repository
Capture
owner/repo, default branch, commit SHA, primary language, package manager, and whether install/build was attempted.Pin the scanner
Capture aislop version, Node version, OS, config file, enabled engines, and exact command.
Store raw output
Keep the JSON result for each repository before writing a summary. Do not publish private source.
Classify findings
Sample top findings per rule and mark each as true positive, false positive, needs context, or setup/toolchain failure.
Convert learning into product changes
Tighten noisy detectors, add regression tests, improve source filtering, or document setup failures.
Preferred command
For a published run, prefer a pinned scanner version:Report template
Current research tracks
| Track | Purpose |
|---|---|
| GitHub Trending quality sweep | Scan trending public repositories by language to find noisy rules before users do |
| Agent output benchmark | Run the same tasks across coding agents and score the generated repositories |
| Benchmark-to-rule translation | Convert external benchmark signals into deterministic scanner rules |
| Rule provenance | Tie first-party AI-slop rules to motivating patterns, detector strategy, and legitimate exceptions |
What not to do
- Do not publish leaderboards without pinned versions and a repeatable harness.
- Do not claim a repository is bad because of a single scan.
- Do not tune rules only to make one public report look better.
- Do not use private customer code in public research.
- Do not mix LLM judgment into scanner output. Label human review separately.
