Machine Learning 102: Deep Dive into Neural Networks with TensorFlow
Neural networks explained step by step.
Previous Article
Leveling Up Your AI Game
If you thought Machine Learning 101 was fun, buckle up, because now we’re getting into the good stuff. We’ve covered basic models and simple predictions, but today, we’re taking a deeper dive into neural networks, understanding their architecture, and training more complex models.
Step 1: Understanding Neural Network Architectures
At its core, a neural network consists of multiple layers of neurons, each passing data to the next in a way that mimics the human brain. Neural networks process data by applying mathematical transformations, and their goal is to learn patterns from input data to make accurate predictions or classifications. Let’s break down the fundamental components:
Input Layer: This is the starting point where raw data is fed into the network. Each neuron in this layer represents a feature of the input.
Hidden Layers: These layers apply transformations to the input data by adjusting the connections between neurons. Each hidden layer refines the input progressively, detecting more abstract patterns.
Output Layer: The final layer provides the predicted output based on the information processed by the hidden layers. In classification tasks, this layer assigns probabilities to different categories.
Overfitting
Overfitting occurs when a model learns the training data too well, capturing noise and minor details that do not generalize well to new data. This leads to a model that performs exceptionally well on training data but poorly on unseen test data.
Signs of Overfitting:
High training accuracy but low test accuracy.
Training loss continues decreasing, but validation loss starts increasing.
The model memorizes training data instead of generalizing patterns.
How to Prevent Overfitting:
Use More Training Data: A larger dataset helps the model learn general patterns instead of noise.
Add Dropout Layers: Randomly disabling neurons during training prevents the model from relying too much on specific patterns.
Early Stopping: Monitor validation loss and stop training when it starts increasing.
Data Augmentation: Create variations of training data by flipping, rotating, or slightly altering images.
Regularization (L1/L2): Penalize overly complex models to avoid over-reliance on certain neurons.
Now, let’s implement a convolutional neural network (CNN), which is specifically designed for processing visual data.
Step 2: Building an Enhanced MNIST Classifier
We previously built a simple MNIST classifier, but now, let’s construct a more sophisticated version using convolutional layers. Convolutional layers help extract features from images, such as edges, textures, and shapes, making them crucial for image classification tasks.
First, we need to load and preprocess the data:
from keras import datasets, layers, models
import matplotlib.pyplot as plt
# Load the dataset
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()
# Reshape the data to include a single channel for grayscale images
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
Defining the Model
We now define a CNN that includes multiple convolutional and pooling layers to extract meaningful patterns from images. Each layer plays a distinct role:
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
Layer Breakdown
Conv2D Layer: This applies 3x3 filters that scan the image, detecting edges and textures.
MaxPooling2D Layer: Reduces the spatial size of the feature maps, keeping the most important information while lowering computation costs.
Flatten Layer: Converts the 2D feature maps into a 1D array so that it can be processed by dense layers.
Dense Layers: Fully connected layers that interpret the extracted features and classify the image.
Compiling and Training the Model
Now, we compile the model and train it using the Adam optimizer and categorical cross-entropy loss function:
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
Step 3: Improving Model Performance
Just because we have a working CNN doesn’t mean it’s fully optimized. There are several techniques we can use to improve performance:
Increasing Epochs: Training for more iterations can improve accuracy but might lead to overfitting.
Adding Dropout Layers: Randomly disabling neurons during training prevents over-reliance on certain features.
Using Data Augmentation: Generating variations of training images can make the model more robust.
Let’s add a dropout layer to improve generalization:
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dropout(0.5), # Helps prevent overfitting
layers.Dense(10, activation='softmax')
])
Step 4: Evaluating the Model
Once training is complete, let’s test the model’s accuracy on unseen data:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Accuracy: {test_acc:.4f}')
Visualizing Predictions
To see how well our model performs, let’s plot some predictions.
import numpy as np
def plot_predictions(model, images, labels):
predictions = model.predict(images)
for i in range(5):
plt.imshow(images[i].reshape(28, 28), cmap='gray')
plt.title(f'Predicted: {np.argmax(predictions[i])}, Actual: {labels[i]}')
plt.show()
plot_predictions(model, test_images[:5], test_labels[:5])
All Put Together
from typing import Tuple
from keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np
def load_data() -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
"""Load the MNIST dataset and preprocess it"""
(train_images, train_labels), (test_images, test_labels) = (
datasets.mnist.load_data()
)
train_images = train_images.reshape((-1, 28, 28, 1)) / 255.0
test_images = test_images.reshape((-1, 28, 28, 1)) / 255.0
return train_images, train_labels, test_images, test_labels
def build_model() -> models.Model:
"""Define the CNN model"""
model = models.Sequential(
[
layers.Conv2D(32, (3, 3), activation="relu", input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation="relu"),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation="relu"),
layers.Flatten(),
layers.Dense(64, activation="relu"),
layers.Dropout(0.5),
layers.Dense(10, activation="softmax"),
]
)
return model
def train_model(
model: models.Model,
train_images: np.ndarray,
train_labels: np.ndarray,
test_images: np.ndarray,
test_labels: np.ndarray,
epochs: int = 5,
) -> models.Model:
"""Compile and train the CNN model"""
model.compile(
optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"]
)
model.fit(
train_images,
train_labels,
epochs=epochs,
validation_data=(test_images, test_labels),
)
return model
def evaluate_model(
model: models.Model, test_images: np.ndarray, test_labels: np.ndarray
) -> None:
"""Evaluate the trained model and print accuracy."""
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Accuracy: {test_acc:.4f}")
def plot_predictions(
model: models.Model, images: np.ndarray, labels: np.ndarray, num_samples: int = 5
) -> None:
"""Plot predictions for a given number of test samples."""
predictions = model.predict(images[:num_samples])
for i in range(num_samples):
plt.imshow(images[i].reshape(28, 28), cmap="gray")
plt.title(f"Predicted: {np.argmax(predictions[i])}, Actual: {labels[i]}")
plt.axis("off")
plt.show()
def main():
"""Run the full training and evaluation pipeline."""
train_images, train_labels, test_images, test_labels = load_data()
model = build_model()
model = train_model(model, train_images, train_labels, test_images, test_labels)
evaluate_model(model, test_images, test_labels)
plot_predictions(model, test_images, test_labels)
if __name__ == "__main__":
main()
Welcome to the Next Level!
You’ve just built a Convolutional Neural Network that can recognize handwritten digits. This marks a major step forward from simple neural networks. Now, you can experiment further by:
Tuning hyperparameters such as the number of filters, dropout rates, and learning rates.
Exploring different architectures like deeper CNNs or adding residual connections.
Training the model on a custom dataset, such as your own handwritten numbers.
Machine learning is a continuous learning process. Keep experimenting and refining your models. Happy coding!