Skip to content
About

Frameworks & Libraries

The open AI ecosystem is large and fast-moving. You don’t need to know every tool — you need a map: what categories exist and what each is for. Then you can place any new library in seconds.

Hugging Face is the center of gravity for open AI — effectively the GitHub of models. It hosts hundreds of thousands of models, datasets, and demos, plus core libraries:

  • transformers — load and run virtually any open model with one consistent API.
  • datasets — access and process training/evaluation datasets.
  • tokenizers, accelerate, peft (LoRA fine-tuning), and more.

If you self-host or fine-tune open models, you’ll pass through Hugging Face.

These wire LLM calls together with retrieval, tools, and memory into applications — the architecture layer:

  • LangChain — broad framework for chains, agents, and integrations; large surface area.
  • LlamaIndex — focused on RAG — data ingestion, indexing, retrieval.
  • Lighter / lower-level options — many teams use minimal frameworks, or none, calling provider SDKs directly for full control.

Run models efficiently — see AI Infrastructure:

  • vLLM, TGI, TensorRT-LLM — high-throughput GPU serving.
  • Ollama, llama.cpp — local and laptop-scale serving. See Running Models Locally.

The storage and search layer for RAG and embeddings:

  • Vector databases — pgvector, Qdrant, Weaviate, Milvus, Chroma. See Vector Databases.
  • sentence-transformers — run open embedding and reranking models.
  • FAISS — fast in-process similarity search.

The least glamorous category, and the one that most separates serious systems from demos — see LLMOps:

  • Evaluation — RAGAS (RAG-specific), and general LLM eval frameworks.
  • Tracing / observability — Langfuse, LangSmith, Arize Phoenix: trace multi-step requests, track cost and latency.
Models & training Hugging Face hub · transformers · datasets · peft (LoRA) Orchestration LangChain · LlamaIndex · lightweight frameworks · plain SDK Retrieval pgvector · Qdrant · Weaviate · Chroma · sentence-transformers Serving & inference vLLM · TGI · TensorRT-LLM · Ollama · llama.cpp Evaluation & observability RAGAS · Langfuse · LangSmith · Phoenix

The ecosystem moves fast and every layer has loud new entrants. Stay grounded:

  • Start minimal. Add a tool when you hit a real problem it solves — not preemptively. Every dependency is a maintenance and abstraction cost.
  • Judge maturity — maintenance activity, docs, community size, production track record — over launch-week buzz.
  • Prefer standard interfaces so swapping a tool later is cheap.
  • Know what’s underneath. A framework is a convenience over LLM calls, retrieval, and loops. Understand those primitives so you can debug — and drop the framework when it’s in the way.

Hold a mental map of the ecosystem — models/training, orchestration, retrieval, serving, evaluation/observability — and you can place any tool fast. Hugging Face anchors open models. Orchestration frameworks speed prototyping but add abstraction; many production systems use them lightly or not at all. Don’t skip the evaluation and observability layer. Start minimal, add tools to solve real problems, judge maturity over hype, and understand the primitives beneath every framework.