Healthcare still runs on paper. Referral letters arrive by fax, discharge summaries are printed and handed to patients, and old records often exist only as scanned PDFs. None of that information is searchable or usable by an LLM until it has been turned into digital text. OCR stands for Optical Character Recognition. It is the technology that reads text from an image — a photograph, a scanned document, or a PDF made from a scan — and converts it into machine-readable text that other systems can search, extract, and process.Documentation Index
Fetch the complete documentation index at: https://docs.alpha.isaree.ai/llms.txt
Use this file to discover all available pages before exploring further.
What OCR can read
OCR works reliably on clearly printed text — typed documents, printed forms, and standard templates. Modern OCR systems handle a wide range of fonts, layouts, and document qualities. Handwriting is significantly harder. Accuracy depends on how legible the writing is, the language it is written in, and whether the model has seen similar handwriting during training. Irregular or hurried handwriting often requires manual review.Why source format matters
The format of the source document has a direct effect on the quality of the extracted text.- Digital-born PDFs that are mostly text are the easiest case. The characters are crisp and consistent, and OCR can extract them with high accuracy.
- Scanned documents are harder. Scanning introduces noise, skew, and compression artifacts, all of which reduce accuracy.
- Photographs are the hardest. Lighting, focus, angle, and shadows all affect what the model sees. A well-lit, in-focus photograph of a flat document produces far better text than a blurry photograph taken at an angle.
OCR is not understanding
OCR extracts text, not meaning. The output of OCR is raw text — the same characters that were on the page, in roughly the same order. Interpreting that text, summarizing it, or pulling structured information out of it is a separate step that needs an LLM or an Agent on top. In a typical workflow, OCR is the first step in a longer chain: the document is converted to text, and the text is then handed to an Agent that summarizes it, extracts a medication list, or stores it in the context window for later use.OCR in Isaree
Isa runs OCR on-device. The image of the document — which may contain sensitive patient information — is processed entirely on your iPhone or iPad and never leaves the device. OCR runs in Patient Chat via the Scan Doc button — capture a paper document with the camera, or pick an existing image from Photos. The image source affects quality: photos captured live vary with lighting and focus, while files picked from your photo library or a digital-born PDF tend to produce cleaner text. OCR is distinct from the Camera button, which sends a photo straight to the Primary Agent as an image — no text extraction step — and is only available when the Primary Agent is a VLM.Why it matters for clinicians
- Unlocks paper records. Any printed or scanned document becomes immediately usable by AI, bridging the gap between paper-based and digital workflows.
- Reduces manual transcription. Instead of retyping a form into the EMR, you capture it once and let the system extract the relevant content.
- Enables downstream AI. Once a document is text, it can be summarized, compared against guidelines, or stored in the patient’s context for future reference.
Next
LLMs and VLMs
The models that interpret the text OCR extracts.
On-device vs. cloud
Why running OCR on your phone keeps patient data private.
Agent
Chain OCR into an Agent that summarizes or extracts from the result.

