Diarization models - Isaree Docs

When you record a consultation, an ASR model converts the spoken audio into a written transcript. However, this raw transcript is a single, undifferentiated block of text — it contains everything that was said, but with no indication of who said it. A diarization model is the AI component that solves this problem. It analyzes the audio recording and identifies the distinct voices present, assigning each segment of speech to a specific speaker. The result is a transcript that is clearly divided by speaker, so downstream tools — such as the Scribe Agent — can attribute each contribution to the right person.

How it works in practice

A physician is conducting a follow-up consultation with a patient who has been managing Type 2 diabetes for several years. The conversation covers recent blood glucose readings, a change in medication, and the patient’s concerns about side effects. Without diarization, the transcript reads as a continuous stream of text. It is impossible to tell, from the text alone, whether it was the physician or the patient who reported a symptom, agreed to a treatment change, or raised a concern. With a diarization model, the system analyzes the acoustic characteristics of each voice — pitch, rhythm, and timing — and assigns every sentence to the right speaker. The structured transcript now clearly shows that the physician recommended the dose adjustment, and that the patient reported the side effect. This distinction is clinically significant: it determines what is documented as a reported symptom versus a clinical recommendation in the patient record.

Why it matters for clinicians

Accurate attribution: Knowing who said what is essential for producing a legally and clinically sound record. Diarization ensures that patient-reported symptoms and clinician assessments are correctly attributed in the final note.
Helps intelligent extraction: When the Scribe Agent runs diarization before extraction, knowing which words came from the physician lets it correctly identify clinical assessments, diagnoses, and plans — and distinguish them from the patient’s subjective account.
Supports audit and review: A diarized transcript provides a clear, speaker-attributed record of the consultation that can be reviewed if there is ever a question about what was discussed or agreed during an encounter.

Scribe Agent

See how diarization fits into the full documentation pipeline.

ASR models

Learn how spoken audio is converted to text before diarization runs.

Build a Scribe Agent

Build a Scribe Agent that uses diarization on the Community Hub.

ASR models OCR

Documentation Index

​How it works in practice

​Why it matters for clinicians

​Next

Scribe Agent

ASR models

Build a Scribe Agent

How it works in practice

Why it matters for clinicians

Next