Core Algorithms
You don’t need to derive these from scratch. You need to recognize them, know what they’re good at, and pick sensibly. This is a field guide, not a textbook.
Linear & logistic regression
Section titled “Linear & logistic regression”The simplest models, and a smart default baseline.
- Linear regression predicts a number as a weighted sum of features.
- Logistic regression does the same, then squashes the result into a 0–1 probability for classification.
They’re fast, need little data, and — crucially — are interpretable: each weight tells you how a feature affects the prediction. Always try a linear model first. If a complex model can’t beat it, it isn’t earning its complexity.
Decision trees
Section titled “Decision trees”A tree of yes/no questions on features: “income > 50k? → yes → age < 30? → …”. Each leaf is a prediction. Trees handle non-linear patterns, need no feature scaling, and a single tree is easy to read. Their weakness: one deep tree overfits badly. The fix is to combine many of them.
Ensembles: random forests & gradient boosting
Section titled “Ensembles: random forests & gradient boosting”Ensembles combine many weak models into one strong one.
- Random forest — train hundreds of trees on random subsets of data and features, then average them. Robust, hard to misuse, a great default.
- Gradient boosting — build trees sequentially, each one correcting the previous ensemble’s errors. Libraries: XGBoost, LightGBM, CatBoost.
k-Nearest Neighbors (kNN)
Section titled “k-Nearest Neighbors (kNN)”To classify a new point, find the k closest known points and take a majority vote. There’s no real “training” — it just stores the data. Simple and intuitive, but slow at prediction time on large datasets. Its core idea — “similar inputs have similar outputs” — is the exact intuition behind vector search.
k-Means clustering
Section titled “k-Means clustering”The go-to unsupervised algorithm. Pick k, and it partitions data into k groups by iteratively assigning points to the nearest cluster center and recomputing centers. Used for customer segmentation and exploratory analysis. You must choose k yourself, and it assumes roughly round, similar-sized clusters.
Neural networks
Section titled “Neural networks”Covered in depth in Deep Learning. In one line: layers of simple units that, stacked deep, learn their own features from raw data. They dominate unstructured input — images, audio, text — and underperform boosted trees on small tabular datasets.
Picking an algorithm
Section titled “Picking an algorithm”| Situation | Start with |
|---|---|
| Tabular data, need a baseline | Logistic / linear regression |
| Tabular data, want best accuracy | Gradient boosting (XGBoost / LightGBM) |
| Need a human-explainable model | Linear model or a shallow decision tree |
| Images, audio, or text | A neural network |
| No labels, want groups | k-Means or hierarchical clustering |
| Small dataset, simple relationship | kNN or linear regression |
Feature engineering: still the highest-leverage work
Section titled “Feature engineering: still the highest-leverage work”A feature is an input variable. Feature engineering is transforming raw data into inputs that expose the signal — and for classical ML it routinely matters more than the algorithm choice.
# Raw timestamp -> features a model can actually use.df["hour"] = df["ts"].dt.hourdf["is_weekend"] = df["ts"].dt.dayofweek >= 5df["days_since_signup"] = (df["ts"] - df["signup"]).dt.daysCommon transformations: encoding categories as numbers (one-hot, target encoding), scaling numeric ranges, bucketing continuous values, extracting parts of dates, and combining columns into ratios.
Key takeaways
Section titled “Key takeaways”Start every tabular problem with a linear baseline, then try gradient boosting — it wins most structured-data tasks. Use neural networks for images, audio, and text. Use k-Means when you have no labels. And remember that for classical ML, thoughtful feature engineering often beats a fancier algorithm.