LLMs and VLMs - Isaree Docs

LLM stands for Large Language Model. It is the type of AI model that reads, understands, and generates text. VLM stands for Vision-Language Model — the same capability extended to images, allowing the model to interpret visual content alongside written text. These models are the reasoning engine inside an Agent. They have been trained on large volumes of text and, in the case of VLMs, image data. This training allows them to understand complex language, summarize long documents, extract structured information from free text, and — in the case of VLMs — describe or analyze the content of an image.

What they are not

A common misconception is that a large language model is an all-knowing database — a system that has memorized every medical fact and can be queried like a search engine. This is not accurate. An LLM is a pattern-recognition system. It has learned the statistical relationships between words and concepts, which allows it to reason and write fluently. However, it does not “know” facts in the way a database stores them. It can generate plausible-sounding text that is factually incorrect, particularly on highly specific or recently updated clinical topics. This is why grounding an agent in a Knowledge Base of verified, up-to-date documents (such as your hospital’s own guidelines) is critical.

How it works in practice

You are covering a night shift and receive a handover for a patient with a complex, multi-system history. You ask the Primary Agent to summarize the last two weeks of clinical notes. The LLM reads through the documentation, understands the medical context, and produces a concise summary of the patient’s current active problems, recent interventions, and outstanding investigations. It does this in seconds. If you also send a photograph of the patient’s wound taken during the day shift, a VLM can describe the visual characteristics — size, color, tissue type — and incorporate that description into the clinical note. This is a documentation support task, not a diagnostic one.

LLMs as reasoners in an agentic world

In an agentic AI system, language models are not primarily used as knowledge stores. They are used as reasoners — the component that decides what to do next, how to interpret a result, and how to structure an output. The model’s role is to think through a problem, not to recall facts from memory. This is a critical distinction. An LLM in an agent does not need to know every drug interaction by heart. Instead, it reasons about a problem, recognizes that it needs authoritative information, and calls the appropriate tool — a drug database, a hospital formulary, a clinical guideline document — to retrieve verified ground truth. The model then interprets that retrieved information and incorporates it into a coherent, structured response. This means the accuracy of an agent depends not on the LLM’s internal knowledge alone, but on the quality of the tools and verified sources it has been given access to. A well-designed agent is grounded in reliable external data; the LLM provides the reasoning layer that connects the data to the clinical task.

Why it matters for clinicians

Rapid synthesis: LLMs act as a high-speed reading assistant, distilling hours of reading into seconds of structured review.
Structured extraction: They can pull specific data points — medications, dates, diagnoses — out of unstructured free text and organize them into a consistent format.
Know the limits: Understanding that these models reason by pattern rather than by verified fact helps you use them safely — always reviewing AI-generated content before it enters the clinical record.

Choose a model

Pick the right model for your device and use case.

On-device vs. cloud

Understand where the model runs and what that means for your data.

Documentation Index

​What they are not

​How it works in practice

​LLMs as reasoners in an agentic world

​Why it matters for clinicians

​Next