Introduction to Deep Learning I
U of A InfoSci + DataLab
Provide a theoretical foundation for deep learning, focusing on the core concepts of neural networks, activation functions, and training strategies.
No code today 🙂
Machine learning has played a very important role in solving problems in:
Medical imaging and diagnostics
Health economics and predictive healthcare
Biomedical research and drug discovery
Medical devices and robotics
Clinical natural language processing
Use machine learning for complex tasks with big data and many variables when the underlying formula or equation is unknown
Supervised trains a model on labeled data to predict outputs.
Unsupervised finds hidden patterns in unlabeled data.
DL achieves higher diagnostic accuracy than ever before.
It’s used in critical healthcare applications like disease detection.
Developed in the 1980s, it wasn’t widely adopted due to limited labeled medical data and computing power.
Today, vast labeled datasets, like millions of medical images, enable high-accuracy training.
High-performance GPUs and cloud computing now power deep learning efficiently in healthcare.
Use deep learning for highly complex tasks with vast amounts of data and intricate patterns when traditional methods struggle to define the relationships.
DL is a specific type of machine learning.
In ML, features are manually extracted from images to develop a classification model.
In DL, relevant features are automatically extracted from images.
DL uses “end-to-end learning,” where a neural network learns directly from raw data to perform classification.
DL algorithms improve with more data, while traditional ML models plateau.
Finally… 😤
Still no code…
Common interpretation, at the heart of DL.
Neurons
Layers
Input
Hidden
Output
Activation Functions
Weights
Biases
The building blocks of neural networks.
Each neuron performs a weighted sum of inputs, followed by an activation function.
Data enters here \(\rightarrow\)
Processed here \(\rightarrow\)
Result / prediction
Formula: \(f(x) = \max(0, x)\)
Characteristics:
Simple and fast to compute.
Keeps positive values, helping the network learn better.
Widely used in hidden layers of neural networks.
Use Case: Great for deep networks, especially in image recognition.
Formula: \(f(x)=\frac{1}{1+e^{−x}}\)
Characteristics:
Produces values between 0 and 1, ideal for binary decisions.
Can slow down learning for extreme inputs (very high or low values).
Use Case: Often used in the final layer for binary classification tasks.
Formula: \(f(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}}\)
Characteristics:
Turns outputs into probabilities that sum to 1.
Used for multi-class classification problems.
Use Case: Essential when there are multiple possible outcomes, like in image classification with many categories.
Importance in Neural Networks:
Non-Linearity: Activation functions enable networks to tackle complex tasks, like image and speech recognition.
Layer-wise Learning: They allow each layer to learn different features of the data.
Training Dynamics: The activation function impacts the network’s learning efficiency and speed.
In Medical Applications:
Weights transform input data in the network’s hidden layers.
Each weight indicates the connection strength between nodes.
Initially small random values (see Stochastic Gradient Descent)
Biases are extra parameters that shift activation functions.
They allow neurons to activate even with zero input.
Biases help the model fit data more effectively.
\(\text{batches per epoch} = \frac{\text{dataset size}}{\text{batch size}}\)
A sample is 1 row of data
Batch size is the number of samples processed before updating the model
The number of epochs is the number of complete passes through the training dataset.
This was one pass through the data
Let’s say your model output is off target:
First, how can we tell?
A cost function is a mathematical function that calculates the difference between the target actual values (ground truth) and the values predicted by the model.
Functionally similar to model evaluation
e.g., Logistic Regression (outputs 0, 1):
\[ J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]\]
Yikes… 😣
Let’s say your model output is off target:
What do you do now?..
A method for updating model weights based on output errors.
Helps the network learn by correcting mistakes.
Interactively reduces the cost function after each Epoch
Optimization refers to the task of minimizing/maximizing an objective function \(f(x)\) parameterized by \(x\).
Goals:
Find the global minimum of the objective function, possible if it’s convex, meaning any local minimum is also the global minimum.
Find the lowest value in the local area of the objective function, typical when the function isn’t convex, as in most deep learning problems.
Example in Health Sciences: Training models to accurately diagnose diseases from medical images by iteratively improving predictions.
Gradient Descent
Minimizes a function by moving towards the steepest descent.
It updates parameters using the gradient of the cost function.
The learning rate controls step size; balance is key.
Continues until reaching a minimum.
Simple, one-way flow of information.
Commonly used for tasks like classification and regression.
Designed for image processing.
Extracts hierarchical features from images.
Health Sciences Application: Identifying tumors in radiology images.
Handles sequence data, like time series or language.
Captures dependencies over time.
Health Sciences Application: Predicting patient outcomes over time.
Feedforward Networks:
CNNs:
Ideal for image and video analysis.
Health Sciences Example: Detecting skin cancer from images.
RNNs:
Suited for time-dependent data, such as patient monitoring.
Health Sciences Example: Forecasting disease progression.
Overfitting:
When a model performs well on training data but fails to generalize to new data.
Solution: Regularization techniques like dropout and data augmentation.
Vanishing Gradients:
Gradients diminish as they are propagated back, slowing learning.
Solution: Activation functions like ReLU and proper weight initialization.
Regularization Techniques:
Dropout: Randomly deactivating neurons during training to prevent overfitting.
L2 Regularization: Penalizes large weights to simplify the model.
Revolutionizing diagnostics and personalized treatments.
Success depends on understanding neural networks, architectures, and overcoming challenges.