Glossary

A quick reference for the key terms and concepts introduced in this ebook.

Activation Function : A function used in a neural network neuron to transform the weighted sum of inputs into an output. Common examples include Sigmoid, ReLU, and Tanh.

AutoML (Automated Machine Learning) : The process of automating the end-to-end tasks of applying machine learning, including data preparation, model selection, and hyperparameter tuning.

Backpropagation : The algorithm used to train neural networks. It calculates the gradient of the cost function with respect to the network's weights, allowing the model to learn by adjusting those weights via gradient descent.

Bagging (Bootstrap Aggregating) : An ensemble learning technique where multiple models are trained on different random samples of the training data (with replacement). Random Forest is a prime example.

Bias (in Bias-Variance Tradeoff) : An error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).

Boosting : An ensemble learning technique where models are built sequentially, with each new model attempting to correct the errors of the previous one. Gradient Boosting and XGBoost are examples.

Classification : A type of supervised learning where the goal is to predict a categorical label (e.g., 'spam' or 'not spam').

Clustering : An unsupervised learning task that involves grouping a set of objects in such a way that objects in the same group (or cluster) are more similar to each other than to those in other clusters.

Collaborative Filtering : A technique used by recommendation systems that makes predictions about the interests of a user by collecting preferences from many users.

Confusion Matrix : A table used to evaluate the performance of a classification algorithm. It summarizes the counts of True Positives, False Positives, True Negatives, and False Negatives.

Cost Function : A function that measures the average error of a machine learning model over the entire training dataset. The goal of training is to minimize this function.

Cross-Validation (k-fold) : A resampling procedure used to evaluate machine learning models on a limited data sample. It involves partitioning the data into 'k' subsets, training the model 'k' times, and using a different subset for validation each time.

Deep Learning : A subfield of machine learning based on artificial neural networks with multiple hidden layers (deep networks).

Dimensionality Reduction : The process of reducing the number of random variables under consideration by obtaining a set of principal variables. PCA is a common technique.

Ensemble Learning : A machine learning technique where multiple models (often called "weak learners") are combined to produce a more powerful "strong learner" with better predictive performance.

Epoch : One complete pass through the entire training dataset during the training of a neural network.

F1-Score : The harmonic mean of Precision and Recall, used as a single metric to evaluate a classification model's performance.

Feature Engineering : The process of using domain knowledge to create new features from existing data to improve model performance.

Feature Selection : The process of selecting a subset of relevant features (variables, predictors) for use in model construction.

Gini Impurity : A metric used by decision trees to measure the "purity" of a node. A lower Gini score indicates a more homogeneous node.

Gradient Descent : An iterative optimization algorithm used to find the minimum of a cost function. It is the primary way that neural networks and other models learn.

Hyperparameter : A configuration that is external to the model and whose value is set before the learning process begins (e.g., learning rate, number of trees in a random forest).

Loss Function : A function that measures the error for a single training example.

Mean Squared Error (MSE) : A common cost function for regression problems, calculated as the average of the squared differences between the predicted and actual values.

Model Deployment : The process of integrating a machine learning model into a production environment to make live predictions.

Normalization : The process of rescaling numeric data to a fixed range, typically between 0 and 1.

One-Hot Encoding : A technique for converting categorical variables into a numerical format by creating a new binary column for each unique category.

Overfitting : A modeling error that occurs when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

Precision : A classification metric that measures the proportion of positive predictions that were actually correct. TP / (TP + FP).

Principal Component Analysis (PCA) : A popular dimensionality reduction technique that transforms the data into a new set of uncorrelated variables called principal components.

R-squared (R²) : A statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

Recall (Sensitivity) : A classification metric that measures the proportion of actual positives that were identified correctly. TP / (TP + FN).

Regression : A type of supervised learning where the goal is to predict a continuous numerical value (e.g., price, temperature).

Regularization : A technique used to prevent overfitting by adding a penalty term to the cost function. L1 (Lasso) and L2 (Ridge) are common types.

Residual : The difference between the actual value and the predicted value in a regression analysis.

Standardization : The process of rescaling data to have a mean of 0 and a standard deviation of 1. Also known as creating a z-score.

Supervised Learning : A type of machine learning where the model learns from data that has been labeled with the correct outcomes.

Underfitting : A modeling error that occurs when a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both the training and test sets.

Unsupervised Learning : A type of machine learning where the model learns from unlabeled data, discovering patterns and structures on its own.

Variance (in Bias-Variance Tradeoff) : An error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data rather than the intended outputs (overfitting).