OCR - Isaree Docs

Healthcare still runs on paper. Referral letters arrive by fax, discharge summaries are printed and handed to patients, and old records often exist only as scanned PDFs. None of that information is searchable or usable by an LLM until it has been turned into digital text. OCR stands for Optical Character Recognition. It is the technology that reads text from an image — a photograph, a scanned document, or a PDF made from a scan — and converts it into machine-readable text that other systems can search, extract, and process.

What OCR can read

OCR works reliably on clearly printed text — typed documents, printed forms, and standard templates. Modern OCR systems handle a wide range of fonts, layouts, and document qualities. Handwriting is significantly harder. Accuracy depends on how legible the writing is, the language it is written in, and whether the model has seen similar handwriting during training. Irregular or hurried handwriting often requires manual review.

Why source format matters

The format of the source document has a direct effect on the quality of the extracted text.

Digital-born PDFs that are mostly text are the easiest case. The characters are crisp and consistent, and OCR can extract them with high accuracy.
Scanned documents are harder. Scanning introduces noise, skew, and compression artifacts, all of which reduce accuracy.
Photographs are the hardest. Lighting, focus, angle, and shadows all affect what the model sees. A well-lit, in-focus photograph of a flat document produces far better text than a blurry photograph taken at an angle.

OCR is not understanding

OCR extracts text, not meaning. The output of OCR is raw text — the same characters that were on the page, in roughly the same order. Interpreting that text, summarizing it, or pulling structured information out of it is a separate step that needs an LLM or an Agent on top. In a typical workflow, OCR is the first step in a longer chain: the document is converted to text, and the text is then handed to an Agent that summarizes it, extracts a medication list, or stores it in the context window for later use.

OCR in Isaree

Isa runs OCR on-device. The image of the document — which may contain sensitive patient information — is processed entirely on your iPhone or iPad and never leaves the device. OCR runs in Patient Chat via the Scan Doc button — capture a paper document with the camera, or pick an existing image from Photos. The image source affects quality: photos captured live vary with lighting and focus, while files picked from your photo library or a digital-born PDF tend to produce cleaner text. OCR is distinct from the Camera button, which sends a photo straight to the Primary Agent as an image — no text extraction step — and is only available when the Primary Agent is a VLM.

Give each scanned document a descriptive title — for example, lab_report_blood_20260812 rather than photo_1. Descriptive titles help the Primary Agent find the right document later when you ask a question about it.

Why it matters for clinicians

Unlocks paper records. Any printed or scanned document becomes immediately usable by AI, bridging the gap between paper-based and digital workflows.
Reduces manual transcription. Instead of retyping a form into the EMR, you capture it once and let the system extract the relevant content.
Enables downstream AI. Once a document is text, it can be summarized, compared against guidelines, or stored in the patient’s context for future reference.

LLMs and VLMs

The models that interpret the text OCR extracts.

On-device vs. cloud

Why running OCR on your phone keeps patient data private.

Agent

Chain OCR into an Agent that summarizes or extracts from the result.

Documentation Index

​What OCR can read

​Why source format matters

​OCR is not understanding

​OCR in Isaree

​Why it matters for clinicians

​Next

LLMs and VLMs

On-device vs. cloud

Agent

What OCR can read

Why source format matters

OCR is not understanding

OCR in Isaree

Why it matters for clinicians

Next