Sébastien De Greef commited on
Commit
7cade5b
1 Parent(s): 9735e61

feat: Add hyperparameter tuning to theory section

Browse files
src/_quarto.yml CHANGED
@@ -72,6 +72,10 @@ website:
72
  text: "Dying Neurons"
73
  - href: theory/overfitting.qmd
74
  text: "Overfitting"
 
 
 
 
75
 
76
  - href: theory/perplexity_in_ai.qmd
77
  text: "Perplexity and Quantization"
 
72
  text: "Dying Neurons"
73
  - href: theory/overfitting.qmd
74
  text: "Overfitting"
75
+ - href: theory/underfitting.qmd
76
+ text: "Underfitting"
77
+ - href: theory/hyperparameter_tuning.qmd
78
+ text: "Hyperparameter Tuning"
79
 
80
  - href: theory/perplexity_in_ai.qmd
81
  text: "Perplexity and Quantization"
src/theory/hyperparameter_tuning.qmd ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Understanding Hyperparameter Tuning
2
+
3
+ Hyperparameters are crucial parameters that define a machine learning model's behavior during training. They play an essential role in determining how well a model learns from data and generalizes to unseen examples. In this article, we will explore the concept of hyperparameter tuning, its importance, and various techniques used for optimizing these parameters.
4
+
5
+ ## What are Hyperparameters?
6
+
7
+ Hyperparameters are settings or configurations that control the learning process in a machine learning model. They differ from model parameters (weights) as they cannot be learned directly from data during training. Instead, hyperparameters must be set beforehand and remain constant throughout the training process. Some common examples of hyperparameters include:
8
+
9
+ - Learning rate
10
+ - Number of hidden layers and neurons in a neural network
11
+ - Kernel type and regularization parameters for Support Vector Machines (SVM)
12
+ - Tree depth or number of trees in ensemble methods like Random Forest or Gradient Boosting
13
+
14
+ ## Why is Hyperparameter Tuning Important?
15
+
16
+ Hyperparameters significantly impact the performance of machine learning models. Properly tuned hyperparameters can lead to better model accuracy, faster convergence during training, and improved generalization on unseen data. On the other hand, poorly chosen hyperparameters may result in underfitting or overfitting issues, leading to suboptimal predictions.
17
+
18
+ ## Hyperparameter Tuning Techniques
19
+
20
+ There are several techniques available for optimizing hyperparameters:
21
+
22
+ 1. Grid Search
23
+ 2. Random Search
24
+ 3. Bayesian Optimization
25
+ 4. Gradient-based optimization
26
+ 5. Evolutionary Algorithms
27
+ 6. Population Based Training (PBT)
28
+
29
+ ### 1. Grid Search
30
+
31
+ Grid search is a brute force approach that exhaustively searches through all possible combinations of hyperparameters within predefined ranges or values. It evaluates the model's performance for each combination and selects the best one based on a chosen metric, such as accuracy or loss.
32
+
33
+ ```python
34
+ from sklearn.model_selection import GridSearchCV
35
+ # Define parameter grid
36
+ param_grid = {
37
+ 'learning_rate': [0.1, 0.01, 0.001],
38
+ 'max_depth': [3, 5, 7]
39
+ }
40
+ # Create a model instance and perform GridSearchCV
41
+ model = SomeModel()
42
+ grid_search = GridSearchCV(estimator=model, param_grid=param_grid)
43
+ ```
44
+
45
+ ### 2. Random Search
46
+
47
+ Random search is an alternative to grid search that randomly samples hyperparameter combinations from a predefined distribution or range. It can be more efficient than grid search when the number of hyperparameters and their possible values are large.
48
+
49
+ ```python
50
+ from sklearn.model_selection import RandomizedSearchCV
51
+ # Define parameter distributions
52
+ param_distributions = {
53
+ 'learning_rate': [0.1, 0.01, 0.001],
54
+ 'max_depth': [3, 5, 7]
55
+ }
56
+ # Create a model instance and perform RandomizedSearchCV
57
+ model = SomeModel()
58
+ random_search = RandomizedSearchCV(estimator=model, param_distributions=param_distributions)
59
+ ```
60
+
61
+ ### 3. Bayesian Optimization
62
+
63
+ Bayesian optimization is an approach that uses a probabilistic model to estimate the performance of hyperparameter combinations and select new ones based on this estimation. It can be more efficient than grid or random search, especially when evaluating each combination's cost (e.g., time) is high.
64
+
65
+ ```python
66
+ from skopt import BayesSearchCV
67
+ # Define parameter space
68
+ param_space = [
69
+ Real(0.1, 0.3, name='learning_rate'),
70
+ Integer(2, 8, name='max_depth')
71
+ ]
72
+ # Create a model instance and perform BayesSearchCV
73
+ model = SomeModel()
74
+ bayes_search = BayesSearchCV(estimator=model, search_spaces=param_space)
75
+ ```
76
+
77
+ ### Conclusion
78
+
79
+ Hyperparameter tuning is an essential step in building effective machine learning models. By using techniques like grid search, random search, or Bayesian optimization, we can find the best hyperparameters for our model and improve its performance on unseen data.
src/theory/underfitting.qmd ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Understanding Underfitting: Detection Using Training Metrics & Visualizations
2
+
3
+ ## Introduction
4
+
5
+ In machine learning, underfitting occurs when a model is too simple to capture the underlying patterns of the data it's trying to learn from. This results in poor performance on both training and test datasets. Detecting underfitting early can help improve your models by guiding you towards more complex architectures or better feature engineering techniques. In this article, we will explore how to detect underfitting using training metrics and visualizations with Python code examples.
6
+
7
+ ## Training Metrics for Underfitting Detection
8
+
9
+ To identify if a model is underfitting, it's essential to monitor its performance on the training data over time. Two key metrics can help in this process: accuracy (for classification problems) or mean squared error (MSE) and R-squared (R²) for regression tasks.
10
+
11
+ ### Accurcuacy (Classification Problems)
12
+
13
+ For a binary classification problem, the model's accuracy is calculated as follows:
14
+
15
+ ```python
16
+ accuracy = (true_positives + true_negatives) / total_samples
17
+ ```
18
+
19
+ A low training accuracy indicates that the model may be underfitting.
20
+
21
+ ### Mean Squared Error & R-squared (Regression Problems)
22
+
23
+ For regression problems, we use MSE and R² to evaluate performance:
24
+
25
+ #### Mean Squared Error (MSE):
26
+
27
+ ```python
28
+ mse = np.mean((y_true - y_pred)**2)
29
+ ```
30
+
31
+ A high MSE indicates that the model's predictions are far from the true values, suggesting underfitting.
32
+
33
+ #### R-squared:
34
+
35
+ R² measures how well the regression line approximates the real data points. An R² close to 0 suggests a poor fit and potential underfitting.
36
+
37
+ ```python
38
+ r_squared = 1 - (np.sum((y_true - y_pred)**2) / np.sum((y_true - np.mean(y_true))**2))
39
+ ```
40
+
41
+ ## Visualizing Underfitting with Python Code Examples
42
+
43
+ To better understand underfitting, let's visualize it using a simple linear regression example in Python. We will use the `sklearn` library to create an overly simplistic model and plot its performance metrics.
44
+
45
+ First, install necessary libraries:
46
+
47
+ ```bash
48
+ pip install numpy matplotlib scikit-learn seaborn
49
+ ```
50
+
51
+ Now, let's generate some synthetic data for our regression problem:
52
+
53
+ ```{python}
54
+ import numpy as np
55
+ import matplotlib.pyplot as plt
56
+ from sklearn.linear_model import LinearRegression
57
+ from sklearn.metrics import mean_squared_error, r2_score
58
+ import seaborn as sns
59
+
60
+ # Generate synthetic data for regression problem
61
+ np.random.seed(42)
62
+ X = np.random.rand(100, 1) * 10 # Features (inputs)
63
+ y = 3*X.ravel() + np.random.randn(100)*3 # Target variable with some noise
64
+ ```
65
+
66
+ Next, we'll fit a simple linear regression model and calculate its MSE and R²:
67
+
68
+ ```{python}
69
+ # Fit the model
70
+ model = LinearRegression().fit(X, y)
71
+ y_pred = model.predict(X)
72
+
73
+ # Calculate metrics
74
+ mse = mean_squared_error(y, y_pred)
75
+ r2 = r2_score(y, y_pred)
76
+ print("MSE:", mse)
77
+ print("R²:", r2)
78
+ ```
79
+
80
+ Finally, let's visualize the data and model performance using seaborn:
81
+
82
+ ```{python}
83
+ import seaborn as sns
84
+ # Plotting synthetic data points
85
+ plt.figure(figsize=(10, 6))
86
+ sns.scatterplot(x=X.ravel(), y=y, color='blue', label="Data Points")
87
+
88
+ # Plotting the regression line
89
+ sns.lineplot(x=X.ravel(), y=model.predict(X), color='red', label="Regression Line")
90
+ plt.title("Underfitting Example: Linear Regression on Synthetic Data")
91
+ plt.legend()
92
+ plt
93
+ ```
94
+
95
+
96
+
97
+ In this example, the model's MSE and R² values indicate that it is underfitting the data:
98
+
99
+ - The regression line does not capture the underlying pattern of the synthetic dataset well.
100
+ - The high MSE value suggests a poor fit to the true target variable.
101
+ - The low R² value indicates that our model explains only a small portion of the variance in the target variable.
102
+
103
+ ## Conclusion
104
+
105
+ Detecting underfitting is crucial for improving machine learning models' performance. By monitoring training metrics such as accuracy, MSE, and R², we can identify when a model may be too simple to capture the underlying patterns in our data. Visualizations using Python code examples help us better understand these concepts and guide us towards more complex architectures or feature engineering techniques that could improve our models' performance.