# Scholar AI Integrity Checker — Model Card

**Last updated: [DATE]**
**Version: 1.0 (twelve-signal heuristic detector)**

This document describes how the AI integrity checker inside Scholar
works, what it is and is not capable of, what its measured error rates
are, and how teachers and students should — and should not — use it. It
is provided as algorithmic transparency under the **Colorado AI Act**,
the **EU AI Act** (high-risk classification: educational assessment),
and as a basic component of any defensible due-process framework around
academic integrity.

---

## 1. What it is

The checker is a **rule-based heuristic detector**. It is not a machine
learning model. It does not call out to OpenAI, Anthropic, Google, or
any other AI service. It does not phone home. The entire detector is
roughly 200 lines of JavaScript that runs in the teacher's browser. The
source is auditable in `app.js` between lines marked `AI DETECTION` and
the function `getLearnedPhrases()`.

It runs **twelve independent factual checks** on a passage of writing.
Each check measures one observable property (e.g., "how many em-dashes
per sentence are in this passage?"). Each check that exceeds a fixed
threshold contributes a fixed weighted score. The detector reports:

1. Which signals fired
2. The underlying measurement for each
3. A summed score in [0.0, 0.99]
4. A categorical verdict: `low` / `mid` / `high`

A `high` verdict requires **both** a score ≥ 0.55 AND at least three
distinct signals firing. We deliberately gate on signal-count (not just
score) so a single noisy check cannot trigger a flag on its own.

**The detector is not a probability claim.** A 0.70 score does not mean
"70% likely AI-generated." It means "the sum of weighted heuristic
checks that fired equals 0.70." We never display percentages to anyone.
The teacher sees "X of 12 signals fired" plus the underlying
measurements.

## 2. The twelve signals

| # | Signal | What it measures | Weight |
|---|---|---|---|
| 1 | `stock_phrases` | Multi-word phrases LLMs over-use (e.g. "delve into," "in today's rapidly evolving"). Requires ≥2 distinct matches before awarding any weight. | up to 0.40 |
| 2 | `first_person` | Absence of "I", "we", "my", "me" in passages long enough to use them | 0.12 |
| 3 | `low_burstiness` | Sentence-length variance below a threshold (LLM prose is rhythmically flat) | 0.18 |
| 4 | `no_contractions` | Zero contractions in 40+ words (LLMs default to formal register) | 0.08 |
| 5 | `hedge_heavy` | "Indeed," "notably," "furthermore," "moreover" appearing too often | 0.10 |
| 6 | `low_diversity` | Type-token ratio below threshold on content words | 0.10 |
| 7 | `em_dash_heavy` | Em-dash density >0.33/sentence with ≥3 em-dashes | 0.10 |
| 8 | `list_scaffolding` | Numbered or bullet lists in short forum-length text | 0.10 |
| 9 | `uniform_openers` | Repeated first words of sentences, or 2+ transition-word openers | 0.08 |
| 10 | `flat_punctuation` | No colons, semicolons, em-dashes, or questions in 120+ words | 0.04 |
| 11 | `no_citations` | No author/year/page reference in 160+ words | 0.03 |
| 12 | `calibration_match` | Phrases the teacher's per-class calibration set has flagged | up to 0.25 |

Signal weights are intentionally conservative. The maximum theoretical
score with all twelve signals firing is approximately 1.78, capped at
0.99. In practice, a passage rarely fires more than 5–6 signals
simultaneously.

## 3. Measured performance (v1.0)

We tested the detector against an internal corpus of 24 passages: 12
known-AI essays (generated by GPT-4 in early 2026 with default
temperature on philosophy / education / policy topics), and 12 known-
human passages (written by undergraduate students in real classroom
settings, with the students' permission, anonymized). Passages range
from 150 to 250 words.

**The numbers below come from running `Run detector self-check` in the
calibration tab against this same corpus, which ships in the source
code.** Anyone can verify these by clicking the button.

| Metric | Result |
|---|---|
| **True positive rate (recall)** | **11/12 = 92%** |
| **False positive rate** | **0/12 = 0%** |
| **True negative rate (specificity)** | **12/12 = 100%** |

The single false negative was an AI passage that fired 4 of 12 signals
with score 0.50 — under the `high` threshold (which requires ≥3 signals
*and* score ≥0.55). It was correctly classified as `mid`, which means a
teacher would not see it surfaced as flagged but could find it in the
calibration test box if they pasted it in.

### Important caveats on these numbers

- **The corpus is small (24 passages).** A larger external evaluation
  is on our roadmap; we recommend any school running their own
  calibration sample before relying on the detector. Confidence
  intervals on a 12/12 sample are wide.
- **The corpus is single-domain (humanities + social science).** STEM
  writing styles differ. Writing in languages other than English is not
  yet supported — we will refuse to score non-English text in v1.1.
- **The corpus is single-temporal (early 2026 GPT-4 outputs).** LLM
  styles evolve. The detector's stock-phrase list will need refresh
  every 6 months, ideally faster.
- **Performance on student writing that mimics LLM style.** A student
  writing in a deliberately formal, hedged, transition-heavy register
  may trigger signals. This is the principal failure mode and the
  reason teachers must read flagged work themselves before drawing
  conclusions.

## 4. What the detector cannot do

It **cannot**:

- Determine whether a specific passage was written by AI. It can only
  observe surface features that correlate with AI output.
- Distinguish between different AI models (GPT-4, Claude, Gemini,
  open-weights models all share most of the same statistical
  fingerprint).
- Distinguish between AI-generated text and AI-edited text (human
  draft, AI polish).
- Distinguish between AI-generated text and text by a non-native
  English writer who uses a similar formal register.
- Be evaded by simple paraphrasing or "make this sound more casual"
  prompting. The signals are surface; rewriting at the surface defeats
  them.
- Evaluate the truth, originality, or intellectual quality of writing.
  It only measures stylistic patterns.

## 5. How it should be used

The detector is **a teacher's reading aid, not an automated decision
system.** Specifically:

- **Use the signals to direct your attention.** When the detector says
  five signals fired on a passage, that's a prompt to read more
  carefully — not a verdict.
- **Read the underlying measurements.** Click "Show all signals" on a
  flagged post to see what specifically tripped. "27 hedges in 188
  words" tells you something different from "0 first-person pronouns
  in 200 words."
- **Talk to the student before acting.** If you suspect AI use, ask
  them to walk you through the argument or describe the source they
  cited. Real authorship is easy to demonstrate in conversation; AI use
  is hard to perform fluently.
- **Document your reasoning.** If you initiate an academic-integrity
  proceeding partly because of the detector, write down which signals
  fired and what the student's response was. The audit log records the
  detector's output automatically.
- **Dismiss flags that don't hold up.** The "Not a concern" button on
  the pencil note clears the flag and records the dismissal in the
  audit log.

## 6. How it must NOT be used

- **The detector must not be the sole basis for academic discipline.**
  This is non-negotiable. False positives, while rare in our testing,
  exist. Disciplining a student on a heuristic alone is not defensible.
- **The detector output must not be shown to the student as
  accusation.** Show the underlying *concern* ("your paragraph reads
  unusually formally — talk me through it") rather than the *signal*
  ("the AI detector flagged you").
- **The detector must not be used to grade student work automatically.**
  No grade in Scholar is ever modified by the detector. Teachers grade.
- **The detector must not be used to surveil writing development.**
  Catching a struggling student becoming more formal is not the same
  as catching cheating. The detector cannot tell the difference.

## 7. Student rights

Any student whose writing has been flagged by the detector has the
right to:

- **See the full signal breakdown.** Email privacy@[domain] or ask
  your teacher; we will provide it within 30 days.
- **Submit additional evidence of authorship.** Drafts, notes, version
  history, an in-person walkthrough — all valid.
- **Have a human-only review** of any grade decision affected by the
  detector. Your school sets grade-appeal procedures; we do not
  override them.
- **Be told that the detector is not a probability claim.** Teachers
  are required by these terms to communicate that. If a teacher does
  not, you can ask them to explain.

## 8. Audit logging

Every time the detector flags a post, the following is recorded:

- Timestamp
- Post ID and authorship
- Each signal that fired and its measurement
- Any teacher action (dismissed, asked to expand, escalated)
- Any student rebuttal recorded by the teacher

The audit log is retained for 7 years (FERPA records-retention default)
and is available to the student, the teacher, and authorized school
officials.

## 9. Security

The detector runs entirely client-side. No student writing is sent to
any external service for the purpose of AI detection. The only network
traffic is to your school's Supabase instance to load and save posts —
which would happen with or without the detector.

The calibration set (teacher-uploaded sample AI text) is stored in
your school's Supabase project and scoped to the class that uploaded
it. Calibration phrases never cross class boundaries.

## 10. Updating this document

We will revise this model card whenever:

- The signal weights change
- A signal is added or removed
- A re-measurement of recall and false-positive rate yields a
  different number
- We add support for additional languages or domains

The version number at the top of this document is bumped on every
change. Schools using Scholar will be notified at least 30 days before
any change to detector behavior takes effect.

## 11. Contact

- Detector questions: detector@[domain]
- Privacy / data rights: privacy@[domain]
- Security vulnerabilities: security@[domain]
