Embeddings
High-dimensional numerical representations of text, image or audio used for semantic search and similarity.
Embeddings are dense vectors (typically 384-3072 dimensions) that represent the meaning of a piece of content. Similar content has vectors close together in this high-dimensional space. Embeddings power semantic search, recommendation, clustering and retrieval in RAG pipelines. Popular embedding models include OpenAI text-embedding-3, Cohere embed, Voyage AI, and open-source BGE / E5 models.
NLU — Natural Language Understanding
Extracting intent, entities and sentiment from natural language input.
NLU is the subset of NLP focused on understanding — intent classification (what does the user want?), entity extraction (what specific values are mentioned?), sentiment and tone. Modern NLU is increasingly built on top of LLMs, replacing older statistical approaches (CRF, BERT classifiers). In voicebot/chatbot stacks, NLU sits between ASR/text input and the dialogue engine.
Hallucination
When an LLM produces confident but incorrect or fabricated information.
Hallucination is a fundamental limitation of LLMs — they can generate plausible-sounding but factually wrong content. Mitigations include RAG (grounding in real documents), self-consistency checks, confidence scoring, output validation and human-in-the-loop review for critical decisions. Production systems must be designed assuming some hallucination will occur and handle it gracefully.
Context Window
The maximum amount of text an LLM can process at once, measured in tokens.
Context window is the LLM's working memory — the maximum tokens it can attend to in a single prompt. GPT-4 Turbo has 128K, Claude 3.5 has 200K, Gemini 1.5 Pro has 2M, Llama 3 70B Instruct has 8K-128K depending on variant. Larger context windows enable longer documents, more retrieved chunks, and longer conversations — but cost and latency scale with usage.
Tokens
Sub-word units of text used by LLMs — billing, context window and latency are all measured in tokens.
LLMs tokenise text into sub-word units before processing. Roughly 1 English word ≈ 1.3 tokens; Indian languages typically tokenise to more tokens per word due to vocabulary distribution. LLM APIs price per token (input + output), so understanding token counts is essential for cost engineering production LLM applications.