AI Glossary

Plain-English definitions of the AI, LLM, voicebot, chatbot, computer vision and MLOps terms we use every day at Iedeo. No PhD required.

Architecture

RAG — Retrieval-Augmented Generation

A pattern where an LLM is grounded in retrieved documents before generating a response — reducing hallucinations and enabling answers from private data.

Retrieval-Augmented Generation (RAG) combines an information retrieval system with a large language model. When a query arrives, the system retrieves relevant chunks from a vector or hybrid index of documents, then passes those chunks plus the query to an LLM. The LLM generates an answer grounded in the retrieved evidence. RAG is the dominant architecture for production enterprise chatbots because it grounds the LLM in your private data, lets you cite sources, and reduces hallucinations dramatically compared to relying on the LLM's parametric memory alone.

AI Agent

An LLM-powered system that can plan, take actions across tools, observe results and re-plan to achieve goals.

An AI Agent goes beyond chat — it uses an LLM to plan multi-step actions, call external tools (APIs, databases, services), observe results, and iterate until a goal is met. Common patterns include ReAct (reason + act), plan-and-execute, and tree-of-thought. Production agents need scoped tool access, evaluation suites, human-in-the-loop gates and observability — not just an LLM in a loop.

Agentic RAG

Combining AI agents with RAG — agents that dynamically decide what to retrieve, when to escalate, and how to combine multiple knowledge sources.

Agentic RAG is the evolution of static RAG into adaptive retrieval. Instead of a single retrieval step, an agent decomposes the query, decides which knowledge sources to query (docs, structured DBs, APIs), re-retrieves if confidence is low, and combines results. Particularly powerful for complex enterprise workflows where a single retrieval pass isn't enough.

Multi-Modal AI

Models that handle multiple input/output types — text, image, audio, video.

Multi-modal AI processes more than one type of data simultaneously. GPT-4V, Claude 3 Vision and Gemini 1.5 can take text + images as input. Modern voicebots are multi-modal — combining ASR (audio), NLU (text) and TTS (audio). Multi-modal agents are emerging for use cases like medical imaging + patient history, retail product photos + descriptions.

Function Calling / Tool Use

LLM's ability to invoke external functions or APIs based on conversation context.

Function calling (also called tool use) is the LLM's ability to return structured JSON requests to invoke external functions, then receive results and continue. It is the building block of AI agents. OpenAI, Anthropic and most modern LLM APIs support function calling natively. Good production agents use it for CRUD operations, API queries, calculations and any deterministic operation that doesn't belong inside the LLM.

Model

LLM — Large Language Model

A neural network with billions of parameters trained on vast text corpora — used to generate, classify, summarise and reason about language.

A Large Language Model is a transformer-architecture neural network with billions to trillions of parameters, trained on internet-scale text corpora. Examples include GPT-4, Claude 3, Gemini, Llama 3, Mistral and Mixtral. LLMs can be used out of the box for text generation, summarisation, classification, code generation and reasoning — and can be further customised via fine-tuning, RAG and tool-use.

Foundations

Embeddings

High-dimensional numerical representations of text, image or audio used for semantic search and similarity.

Embeddings are dense vectors (typically 384-3072 dimensions) that represent the meaning of a piece of content. Similar content has vectors close together in this high-dimensional space. Embeddings power semantic search, recommendation, clustering and retrieval in RAG pipelines. Popular embedding models include OpenAI text-embedding-3, Cohere embed, Voyage AI, and open-source BGE / E5 models.

NLU — Natural Language Understanding

Extracting intent, entities and sentiment from natural language input.

NLU is the subset of NLP focused on understanding — intent classification (what does the user want?), entity extraction (what specific values are mentioned?), sentiment and tone. Modern NLU is increasingly built on top of LLMs, replacing older statistical approaches (CRF, BERT classifiers). In voicebot/chatbot stacks, NLU sits between ASR/text input and the dialogue engine.

Hallucination

When an LLM produces confident but incorrect or fabricated information.

Hallucination is a fundamental limitation of LLMs — they can generate plausible-sounding but factually wrong content. Mitigations include RAG (grounding in real documents), self-consistency checks, confidence scoring, output validation and human-in-the-loop review for critical decisions. Production systems must be designed assuming some hallucination will occur and handle it gracefully.

Context Window

The maximum amount of text an LLM can process at once, measured in tokens.

Context window is the LLM's working memory — the maximum tokens it can attend to in a single prompt. GPT-4 Turbo has 128K, Claude 3.5 has 200K, Gemini 1.5 Pro has 2M, Llama 3 70B Instruct has 8K-128K depending on variant. Larger context windows enable longer documents, more retrieved chunks, and longer conversations — but cost and latency scale with usage.

Tokens

Sub-word units of text used by LLMs — billing, context window and latency are all measured in tokens.

LLMs tokenise text into sub-word units before processing. Roughly 1 English word ≈ 1.3 tokens; Indian languages typically tokenise to more tokens per word due to vocabulary distribution. LLM APIs price per token (input + output), so understanding token counts is essential for cost engineering production LLM applications.

Voice

ASR — Automatic Speech Recognition

Converting speech audio into text. The first step in any voicebot pipeline.

ASR (also called STT — speech-to-text) takes audio input and produces text. Quality is measured by Word Error Rate (WER). Modern ASR models include OpenAI Whisper, Deepgram, AssemblyAI, Google Speech-to-Text and custom-trained models. For Indian-language voicebots, ASR tuning for accent, code-switching and noise is critical.

TTS — Text-to-Speech

Converting text into natural-sounding speech audio. The output layer of voicebots.

TTS converts text into speech audio. Modern TTS systems use neural networks (Tacotron, FastSpeech, VITS, ElevenLabs, Coqui XTTS) to produce natural-sounding voices. For multilingual voicebots, TTS that handles code-switching and accent is essential. Voice cloning extends TTS to clone a specific human voice from a few minutes of training audio.

Vision

OCR — Optical Character Recognition

Converting images of text into machine-readable text. A foundational layer of document AI.

OCR identifies and digitises text from images, scans and photos. Modern OCR engines (Azure Form Recognizer, AWS Textract, Google Document AI, PaddleOCR) handle typed and handwritten text with high accuracy. OCR is typically one stage of an IDP pipeline — extraction, layout understanding and downstream validation sit on top.

Computer Vision

The field of AI focused on extracting information from images and video.

Computer Vision (CV) covers detection, segmentation, classification, tracking, OCR, generation and understanding of visual data. Production applications include defect detection in manufacturing, ANPR, retail shelf monitoring, medical imaging and surveillance. Modern CV blends classical CNN architectures (YOLO, EfficientDet) with transformers (DETR, SAM) and increasingly multi-modal LLMs (GPT-4V, Claude 3 Vision).

Security

Prompt Injection

An attack where adversarial input manipulates an LLM's instructions or extracts secrets.

Prompt injection is the LLM equivalent of SQL injection. Attackers craft inputs that override the system prompt, exfiltrate data, or coerce the model into unauthorised actions. Defenses include input filtering, output validation, structured output enforcement, separating instructions from user data, and red-team testing. Listed as #1 in OWASP LLM Top 10.

Guardrails

Input/output filters and policies that constrain LLM behaviour in production.

Guardrails are runtime policies enforced around an LLM — input filtering (PII, toxicity, prompt injection), output validation (structured format, no PII leakage, no out-of-scope claims), and behavioural constraints (refuse certain topics). Tools like Guardrails AI, NeMo Guardrails and custom-built filters are common in production stacks.

Practice

Prompt Engineering

Designing and refining the text instructions given to an LLM to produce desired behaviour.

Prompt engineering covers structuring instructions, providing examples (few-shot), defining output schemas, setting personas and adding guardrails. Good prompts are critical to production LLM applications — minor wording changes can dramatically affect output quality, cost and latency. Tools like LangChain, LlamaIndex and Guidance help structure prompts in production code.

Evals — Evaluations

Automated tests that measure LLM/agent quality on a held-out set of scenarios.

Evals are the unit tests of LLM applications. They run a fixed set of inputs through your system and check outputs against expected results, scoring functions or human review. Evals are critical for catching regression when prompts, models or retrieval changes — a tweak that improves one case can break ten others. Good production teams ship evals before they ship LLM applications.

MLOps

Engineering practices for deploying, monitoring, retraining and governing machine-learning systems.

MLOps adapts DevOps principles to ML — model versioning, automated training pipelines, data validation, drift detection, monitoring, A/B testing of models, retraining triggers and rollback. For computer vision and traditional ML, MLOps is mature (MLflow, Kubeflow, SageMaker). For LLM applications, "LLMOps" practices are still emerging.

Model Drift

Degradation of model accuracy over time as the real world changes from training data.

Drift comes in two flavours — data drift (input distribution changes) and concept drift (the relationship between inputs and correct outputs changes). Production ML systems must monitor drift and retrain when accuracy degrades. For CV, lighting changes, new products and seasonal patterns cause drift. For LLMs, slang, current events and new use cases cause drift.

Infrastructure

Vector Database

A database optimised for storing and searching high-dimensional vectors using nearest-neighbour algorithms.

Vector databases store embeddings and support efficient approximate-nearest-neighbour (ANN) search. They are the storage layer of RAG. Popular options include Pinecone, Weaviate, Qdrant, Milvus, pgvector (PostgreSQL extension), Chroma and Vespa. Modern PostgreSQL with pgvector handles many workloads and is often the right choice for production teams already on Postgres.

Edge Inference

Running AI models on local devices (cameras, phones, Jetson) instead of the cloud.

Edge inference runs ML models on devices close to the data source — IoT cameras, smartphones, Jetson modules, industrial PCs. Benefits include lower latency, no internet dependency, privacy (data stays local) and lower cost at scale. Trade-offs include hardware constraints, model size limits and harder updates. Modern edge stacks include NVIDIA Jetson, Google Coral, Intel OpenVINO and ONNX Runtime.

Training

Fine-Tuning

Adapting a pre-trained model to a specific domain or task by training on labelled examples.

Fine-tuning takes an existing pre-trained LLM and continues training on a curated dataset of input-output pairs from your domain. It is used to teach the model your style, terminology, or task-specific behaviour. Fine-tuning is most effective for tasks where RAG falls short — e.g., style transfer, structured output adherence, or low-latency inference where retrieval overhead matters.

Workflow

IDP — Intelligent Document Processing

A pipeline that extracts structured data from unstructured documents — invoices, KYC docs, claims — using OCR + ML/LLM.

IDP combines OCR, classification, extraction, validation and integration to automate document workflows. Modern IDP uses LLMs in addition to traditional OCR to handle variations, ambiguity and structured extraction. Production IDP pipelines typically achieve 90%+ straight-through processing on well-scoped document types.

Ready to put these concepts to work?

If you are building an LLM application, voicebot, or agent and want production-grade engineering — talk to us.

Talk to an Iedeo AI Architect