Frameworks & Libraries
The open AI ecosystem is large and fast-moving. You don’t need to know every tool — you need a map: what categories exist and what each is for. Then you can place any new library in seconds.
The Hugging Face hub
Section titled “The Hugging Face hub”Hugging Face is the center of gravity for open AI — effectively the GitHub of models. It hosts hundreds of thousands of models, datasets, and demos, plus core libraries:
transformers— load and run virtually any open model with one consistent API.datasets— access and process training/evaluation datasets.tokenizers,accelerate,peft(LoRA fine-tuning), and more.
If you self-host or fine-tune open models, you’ll pass through Hugging Face.
Orchestration frameworks
Section titled “Orchestration frameworks”These wire LLM calls together with retrieval, tools, and memory into applications — the architecture layer:
- LangChain — broad framework for chains, agents, and integrations; large surface area.
- LlamaIndex — focused on RAG — data ingestion, indexing, retrieval.
- Lighter / lower-level options — many teams use minimal frameworks, or none, calling provider SDKs directly for full control.
Inference and serving
Section titled “Inference and serving”Run models efficiently — see AI Infrastructure:
- vLLM, TGI, TensorRT-LLM — high-throughput GPU serving.
- Ollama, llama.cpp — local and laptop-scale serving. See Running Models Locally.
Vector and retrieval tooling
Section titled “Vector and retrieval tooling”The storage and search layer for RAG and embeddings:
- Vector databases — pgvector, Qdrant, Weaviate, Milvus, Chroma. See Vector Databases.
sentence-transformers— run open embedding and reranking models.- FAISS — fast in-process similarity search.
Evaluation and observability
Section titled “Evaluation and observability”The least glamorous category, and the one that most separates serious systems from demos — see LLMOps:
- Evaluation — RAGAS (RAG-specific), and general LLM eval frameworks.
- Tracing / observability — Langfuse, LangSmith, Arize Phoenix: trace multi-step requests, track cost and latency.
The ecosystem map
Section titled “The ecosystem map”Choosing tools without chasing hype
Section titled “Choosing tools without chasing hype”The ecosystem moves fast and every layer has loud new entrants. Stay grounded:
- Start minimal. Add a tool when you hit a real problem it solves — not preemptively. Every dependency is a maintenance and abstraction cost.
- Judge maturity — maintenance activity, docs, community size, production track record — over launch-week buzz.
- Prefer standard interfaces so swapping a tool later is cheap.
- Know what’s underneath. A framework is a convenience over LLM calls, retrieval, and loops. Understand those primitives so you can debug — and drop the framework when it’s in the way.
Key takeaways
Section titled “Key takeaways”Hold a mental map of the ecosystem — models/training, orchestration, retrieval, serving, evaluation/observability — and you can place any tool fast. Hugging Face anchors open models. Orchestration frameworks speed prototyping but add abstraction; many production systems use them lightly or not at all. Don’t skip the evaluation and observability layer. Start minimal, add tools to solve real problems, judge maturity over hype, and understand the primitives beneath every framework.