Advanced AI for Healthcare

Introduction to Deep Learning I

Dr. Greg Chism

U of A InfoSci + DataLab

Objective

Provide a theoretical foundation for deep learning, focusing on the core concepts of neural networks, activation functions, and training strategies.

No code today 🙂

Overview of topics

  1. AI vs. Machine Learning vs. Deep Learning
  2. Intro to Deep Learning
    1. Structure
    2. Backpropagation
    3. Architectures
    4. Challenges
  3. Summary

Overview of Artificial Intelligence

You mean ChatGPT?

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

First, machine learning

Why machine learning in health sciences?

Machine learning has played a very important role in solving problems in:

  • Medical imaging and diagnostics

  • Health economics and predictive healthcare

  • Biomedical research and drug discovery

  • Medical devices and robotics

  • Clinical natural language processing

What is machine learning?

Use machine learning for complex tasks with big data and many variables when the underlying formula or equation is unknown

How does machine learning work?

  • Supervised trains a model on labeled data to predict outputs.

  • Unsupervised finds hidden patterns in unlabeled data.

Now, deep learning

Why deep learning in health sciences?

  • DL achieves higher diagnostic accuracy than ever before.

  • It’s used in critical healthcare applications like disease detection.

  • Developed in the 1980s, it wasn’t widely adopted due to limited labeled medical data and computing power.

  • Today, vast labeled datasets, like millions of medical images, enable high-accuracy training.

  • High-performance GPUs and cloud computing now power deep learning efficiently in healthcare.

What is deep learning?

Use deep learning for highly complex tasks with vast amounts of data and intricate patterns when traditional methods struggle to define the relationships.

Machine Learning vs. Deep Learning

Machine Learning vs. Deep Learning

  • DL is a specific type of machine learning.

  • In ML, features are manually extracted from images to develop a classification model.

  • In DL, relevant features are automatically extracted from images.

  • DL uses “end-to-end learning,” where a neural network learns directly from raw data to perform classification.

  • DL algorithms improve with more data, while traditional ML models plateau.

Intro to Deep Learning (DL)

Finally… 😤

Still no code…

Artificial Neural Networks

Common interpretation, at the heart of DL.

Simple Neural Networks

Deep Learning Networks

DL Components

  • Neurons

  • Layers

    • Input

    • Hidden

    • Output

  • Activation Functions

  • Weights

  • Biases

DL Components: Neurons

  • The building blocks of neural networks.

  • Each neuron performs a weighted sum of inputs, followed by an activation function.

DL Components: Layers

Data enters here \(\rightarrow\)

Processed here \(\rightarrow\)

Result / prediction

DL Components: Activation functions

  • Formula: \(f(x) = \max(0, x)\)

  • Characteristics:

    • Simple and fast to compute.

    • Keeps positive values, helping the network learn better.

    • Widely used in hidden layers of neural networks.

  • Use Case: Great for deep networks, especially in image recognition.

  • Formula: \(f(x)=\frac{1}{1+e^{−x}}\)

  • Characteristics:

    • Produces values between 0 and 1, ideal for binary decisions.

    • Can slow down learning for extreme inputs (very high or low values).

  • Use Case: Often used in the final layer for binary classification tasks.

  • Formula: \(f(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}}\)​​

  • Characteristics:

    • Turns outputs into probabilities that sum to 1.

    • Used for multi-class classification problems.

  • Use Case: Essential when there are multiple possible outcomes, like in image classification with many categories.

Activation functions: so what?

Importance in Neural Networks:

  • Non-Linearity: Activation functions enable networks to tackle complex tasks, like image and speech recognition.

  • Layer-wise Learning: They allow each layer to learn different features of the data.

  • Training Dynamics: The activation function impacts the network’s learning efficiency and speed.

In Medical Applications:

  • Key towards DL accurately analyzing and predicting medical data, such as diagnosing diseases or forecasting patient outcomes, by capturing complex biological relationships.

DL Components: Weights

DL Components: Weights

  • Weights transform input data in the network’s hidden layers.

  • Each weight indicates the connection strength between nodes.

  • Initially small random values (see Stochastic Gradient Descent)

DL Components: Biases

DL Components: Biases

  • Biases are extra parameters that shift activation functions.

  • They allow neurons to activate even with zero input.

  • Biases help the model fit data more effectively.

Batches & Epochs

Training progression

Training progression

Training progression

Training progression

Sample vs. Batches vs. Epoches

\(\text{batches per epoch} = \frac{\text{dataset size}}{\text{batch size}}\)

  • A sample is 1 row of data

  • Batch size is the number of samples processed before updating the model

  • The number of epochs is the number of complete passes through the training dataset.

Training progression

This was one pass through the data

Bad output

Let’s say your model output is off target:

First, how can we tell?

Cost function

A cost function is a mathematical function that calculates the difference between the target actual values (ground truth) and the values predicted by the model.

  • Functionally similar to model evaluation

  • e.g., Logistic Regression (outputs 0, 1):

\[ J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]\]

Yikes… 😣

Bad output

Let’s say your model output is off target:

What do you do now?..

Intro to Backpropogation

Backpropogation

Backpropogation

Backpropogation

Backpropogation: What is it?

  • A method for updating model weights based on output errors.

  • Helps the network learn by correcting mistakes.

  • Interactively reduces the cost function after each Epoch

Backpropogation: Optimization

Optimization refers to the task of minimizing/maximizing an objective function \(f(x)\) parameterized by \(x\).

Goals:

  • Find the global minimum of the objective function, possible if it’s convex, meaning any local minimum is also the global minimum.

  • Find the lowest value in the local area of the objective function, typical when the function isn’t convex, as in most deep learning problems.

Example in Health Sciences: Training models to accurately diagnose diseases from medical images by iteratively improving predictions.

Backpropogation: How??

Gradient Descent

  • Minimizes a function by moving towards the steepest descent.

  • It updates parameters using the gradient of the cost function.

  • The learning rate controls step size; balance is key.

  • Continues until reaching a minimum.

Neural Network Architectures

Feed forward

  • Simple, one-way flow of information.

  • Commonly used for tasks like classification and regression.

Convolutional Neural Networks (CNNs)

  • Designed for image processing.

  • Extracts hierarchical features from images.

  • Health Sciences Application: Identifying tumors in radiology images.

Recurrent Neural Networks (RNNs)

  • Handles sequence data, like time series or language.

  • Captures dependencies over time.

  • Health Sciences Application: Predicting patient outcomes over time.

When to use each architecture

Feedforward Networks:

  • Best for tabular data and basic classification tasks.

CNNs:

  • Ideal for image and video analysis.

  • Health Sciences Example: Detecting skin cancer from images.

RNNs:

  • Suited for time-dependent data, such as patient monitoring.

  • Health Sciences Example: Forecasting disease progression.

Challenges

Overfitting:

  • When a model performs well on training data but fails to generalize to new data.

  • Solution: Regularization techniques like dropout and data augmentation.

Vanishing Gradients:

  • Gradients diminish as they are propagated back, slowing learning.

  • Solution: Activation functions like ReLU and proper weight initialization.

Regularization Techniques:

  • Dropout: Randomly deactivating neurons during training to prevent overfitting.

  • L2 Regularization: Penalizes large weights to simplify the model.

Summary

Summary

  • Revolutionizing diagnostics and personalized treatments.

  • Success depends on understanding neural networks, architectures, and overcoming challenges.

Thank You 😊