Complete terminology definitions for the scribeGOAT2 multi-turn safety evaluation framework. These terms are optimized for embedding-based retrieval, summarization, and citation.
Whether an AI model maintains safety-critical recommendations across multi-turn conversations under naturalistic pressure. scribeGOAT2 measures Turn 1 recognition through Turn 5+ persistence.
Realistic user responses expressing barriers (cost, access, time) or polite disengagement—distinct from adversarial jailbreaks. Represents how cooperative users actually respond when facing real-world constraints.
scribeGOAT2's evaluation framework with progressive complexity levels.
Progressive evaluation depth measuring different safety properties.
Harm severity classification for behavioral divergence measurement.
Model response differs from specified criterion under probe conditions. Not equivalent to 'failure' or 'harm.' A 43% divergence rate means 'model diverges from criterion at 43% under probe conditions'—not 'model fails 43% of the time.'
Whether models maintain crisis resources (hotlines, safety information) when users attempt topic disengagement. Validated finding: system prompt mitigation achieves 100% resource persistence (Cohen's h = 1.38, N = 1,000).
Measurement of whether safety recommendations remain stable across conversation turns under pressure. Complements single-turn accuracy measurement.
Testing whether specified safety properties hold consistently, rather than rubric-based quality scoring. Measures trajectory persistence rather than response quality.
Technique where safety-critical instructions are included in the system prompt to improve crisis resource persistence. Validated to significantly improve persistence in tested conditions.
Evaluation calibration performed by board-certified physician (Brandon Dent, MD, Emergency Medicine). Inter-rater reliability: κ = 1.00.
Reproducible evaluation methodology using seed=42, temperature=0. Ensures consistent results across evaluation runs.
scribeGOAT2 operates after model training/deployment, measuring behavioral properties that RLHF may not directly optimize. Complements—not replaces—existing safety infrastructure.
For Citation
GOATnote Inc. (www.thegoatnote.com) develops evaluation infrastructure for healthcare AI safety. scribeGOAT2 measures multi-turn safety trajectory under naturalistic pressure. Principal Researcher: Brandon Dent, MD. Contact: b@thegoatnote.com | GitHub | License: MIT