An introduction to deep learning (with applications to text classification)

Lecture 15

Dr. Benjamin Soltoff

Cornell University
INFO 4940/5940 - Fall 2024

October 22, 2024

Announcements

Project teams

Learning objectives

Distinguish the major workflow differences between shallow and deep learning.
Understand the basic architecture of a dense neural network.
Implement deep learning models using Keras and the {keras3} package.
Estimate a series of dense neural networks for text classification.
Incorporate GLoVE word embeddings into a deep learning model.

Recap: What is machine learning?

What is machine learning?

Forms of machine learning

Deep learning applications

Image recognition
Natural language processing
Time series forecasting
Text to image
Video/audio synthesis

Benefits and drawbacks to deep learning

Benefits

Learn complex relationships between features
Requires minimal feature engineering
Scales well to large datasets

Drawbacks

Requires lots of data
More susceptible to overfitting
Extremely computationally expensive
Data sourcing often leads to bias

Infrastructure for training deep learning models

TensorFlow

Open-source machine learning library developed by Google¹
Perform low-level mathematical expressions over numerical tensors
Runs on CPUs, GPUs, and TPUs
Widely utilized for deep learning development (but not the only option)

Keras

Deep learning API
High-level interface to defining and training any kind of deep learning models
Intended for high-level modeling tasks

Relationship between Keras and TensorFlow

flowchart TD
    A[Keras] --> B[TensorFlow]
    A --> C[Torch]
    A --> D[Jax]

Setting up a deep learning workspace

🆓 Use free GPU runtime from Kaggle, Google Colab, or others

🔐 Access research computing clusters through Cornell

💵 Use GPU instances on Google Cloud or Amazon EC2

💰 Buy and install an NVIDIA GPU on a desktop computer

A tutorial to setup an R deep learning platform on AWS

A simple application using the fashion MNIST dataset

Multilayer perceptron (MLP)

Fashion MNIST

mnist <- dataset_fashion_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

`x_train` structure

Data stored in a 3D array (60000, 28, 28) of grayscale values between 0 and 255.

class(x_train)

[1] "array"

# first observation
x_train[1, , ]

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
 [1,]    0    0    0    0    0    0    0    0    0     0     0     0     0
 [2,]    0    0    0    0    0    0    0    0    0     0     0     0     0
 [3,]    0    0    0    0    0    0    0    0    0     0     0     0     0
 [4,]    0    0    0    0    0    0    0    0    0     0     0     0     1
 [5,]    0    0    0    0    0    0    0    0    0     0     0     0     3
 [6,]    0    0    0    0    0    0    0    0    0     0     0     0     6
 [7,]    0    0    0    0    0    0    0    0    0     0     0     0     0
 [8,]    0    0    0    0    0    0    0    0    0     0     0     1     0
 [9,]    0    0    0    0    0    0    0    0    0     1     1     1     0
[10,]    0    0    0    0    0    0    0    0    0     0     0     0     0
[11,]    0    0    0    0    0    0    0    0    0     0     0     0     0
[12,]    0    0    0    0    0    0    0    0    0     1     3     0    12
[13,]    0    0    0    0    0    0    0    0    0     0     6     0    99
[14,]    0    0    0    0    0    0    0    0    0     4     0     0    55
[15,]    0    0    1    4    6    7    2    0    0     0     0     0   237
[16,]    0    3    0    0    0    0    0    0    0    62   145   204   228
[17,]    0    0    0    0   18   44   82  107  189   228   220   222   217
[18,]    0   57  187  208  224  221  224  208  204   214   208   209   200
[19,]    3  202  228  224  221  211  211  214  205   205   205   220   240
[20,]   98  233  198  210  222  229  229  234  249   220   194   215   217
[21,]   75  204  212  204  193  205  211  225  216   185   197   206   198
[22,]   48  203  183  194  213  197  185  190  194   192   202   214   219
[23,]    0  122  219  193  179  171  183  196  204   210   213   207   211
[24,]    0    0   74  189  212  191  175  172  175   181   185   188   189
[25,]    2    0    0    0   66  200  222  237  239   242   246   243   244
[26,]    0    0    0    0    0    0    0   40   61    44    72    41    35
[27,]    0    0    0    0    0    0    0    0    0     0     0     0     0
[28,]    0    0    0    0    0    0    0    0    0     0     0     0     0
      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
 [1,]     0     0     0     0     0     0     0     0     0     0     0     0
 [2,]     0     0     0     0     0     0     0     0     0     0     0     0
 [3,]     0     0     0     0     0     0     0     0     0     0     0     0
 [4,]     0     0    13    73     0     0     1     4     0     0     0     0
 [5,]     0    36   136   127    62    54     0     0     0     1     3     4
 [6,]     0   102   204   176   134   144   123    23     0     0     0     0
 [7,]     0   155   236   207   178   107   156   161   109    64    23    77
 [8,]    69   207   223   218   216   216   163   127   121   122   146   141
 [9,]   200   232   232   233   229   223   223   215   213   164   127   123
[10,]   183   225   216   223   228   235   227   224   222   224   221   223
[11,]   193   228   218   213   198   180   212   210   211   213   223   220
[12,]   219   220   212   218   192   169   227   208   218   224   212   226
[13,]   244   222   220   218   203   198   221   215   213   222   220   245
[14,]   236   228   230   228   240   232   213   218   223   234   217   217
[15,]   226   217   223   222   219   222   221   216   223   229   215   218
[16,]   207   213   221   218   208   211   218   224   223   219   215   224
[17,]   226   200   205   211   230   224   234   176   188   250   248   233
[18,]   159   245   193   206   223   255   255   221   234   221   211   220
[19,]    80   150   255   229   221   188   154   191   210   204   209   222
[20,]   241    65    73   106   117   168   219   221   215   217   223   223
[21,]   213   240   195   227   245   239   223   218   212   209   222   220
[22,]   221   220   236   225   216   199   206   186   181   177   172   181
[23,]   210   200   196   194   191   195   191   198   192   176   156   167
[24,]   188   193   198   204   209   210   210   211   188   188   194   192
[25,]   221   220   193   191   179   182   182   181   176   166   168    99
[26,]     0     0     0     0     0     0     0     0     0     0     0     0
[27,]     0     0     0     0     0     0     0     0     0     0     0     0
[28,]     0     0     0     0     0     0     0     0     0     0     0     0
      [,26] [,27] [,28]
 [1,]     0     0     0
 [2,]     0     0     0
 [3,]     0     0     0
 [4,]     1     1     0
 [5,]     0     0     3
 [6,]    12    10     0
 [7,]   130    72    15
 [8,]    88   172    66
 [9,]   196   229     0
[10,]   245   173     0
[11,]   243   202     0
[12,]   197   209    52
[13,]   119   167    56
[14,]   209    92     0
[15,]   255    77     0
[16,]   244   159     0
[17,]   238   215     0
[18,]   232   246     0
[19,]   228   225     0
[20,]   224   229    29
[21,]   221   230    67
[22,]   205   206   115
[23,]   177   210    92
[24,]   216   170     0
[25,]    58     0     0
[26,]     0     0     0
[27,]     0     0     0
[28,]     0     0     0

Reshape features

# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))

# rescale
x_train <- x_train / 255
x_test <- x_test / 255

dim(x_train)

[1] 60000   784

Reshape outcome

# original integer structure
head(y_train)

[1] 9 0 0 3 0 2

# convert to binary class matricies
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)

head(y_train)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    0    0    0    0    0     1
[2,]    1    0    0    0    0    0    0    0    0     0
[3,]    1    0    0    0    0    0    0    0    0     0
[4,]    0    0    0    1    0    0    0    0    0     0
[5,]    1    0    0    0    0    0    0    0    0     0
[6,]    0    0    1    0    0    0    0    0    0     0

Define an MLP

model <- keras_model_sequential(input_shape = c(784)) |>
  layer_dense(units = 128, activation = "relu") |>
  layer_dense(units = 10, activation = "softmax")

Fully dense neural network
Activation functions
Softmax function - converts output to probabilities for each class

MLP

summary(model)

Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                      ┃ Output Shape             ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                     │ (None, 128)              │       100,480 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ dense_1 (Dense)                   │ (None, 10)               │         1,290 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
 Total params: 101,770 (397.54 KB)
 Trainable params: 101,770 (397.54 KB)
 Non-trainable params: 0 (0.00 B)

ReLU activation function

Compile the model

model |> compile(
  loss = "categorical_crossentropy",
  optimizer = "rmsprop",
  metrics = c("accuracy", "auc")
)

Fit the model

history <- model |> fit(
  x_train, y_train,
  epochs = 10,
  batch_size = 128,
  validation_split = 0.2
)

Epoch 1/10
375/375 - 1s - 3ms/step - accuracy: 0.7891 - auc: 0.9782 - loss: 0.6128 - val_accuracy: 0.8433 - val_auc: 0.9876 - val_loss: 0.4465
Epoch 2/10
375/375 - 0s - 1ms/step - accuracy: 0.8472 - auc: 0.9883 - loss: 0.4290 - val_accuracy: 0.8493 - val_auc: 0.9888 - val_loss: 0.4139
Epoch 3/10
375/375 - 0s - 1ms/step - accuracy: 0.8619 - auc: 0.9904 - loss: 0.3821 - val_accuracy: 0.8590 - val_auc: 0.9896 - val_loss: 0.3978
Epoch 4/10
375/375 - 0s - 1ms/step - accuracy: 0.8723 - auc: 0.9917 - loss: 0.3512 - val_accuracy: 0.8771 - val_auc: 0.9920 - val_loss: 0.3456
Epoch 5/10
375/375 - 0s - 1ms/step - accuracy: 0.8800 - auc: 0.9925 - loss: 0.3294 - val_accuracy: 0.8765 - val_auc: 0.9922 - val_loss: 0.3410
Epoch 6/10
375/375 - 0s - 1ms/step - accuracy: 0.8851 - auc: 0.9932 - loss: 0.3131 - val_accuracy: 0.8663 - val_auc: 0.9914 - val_loss: 0.3612
Epoch 7/10
375/375 - 0s - 1ms/step - accuracy: 0.8903 - auc: 0.9938 - loss: 0.2980 - val_accuracy: 0.8858 - val_auc: 0.9924 - val_loss: 0.3274
Epoch 8/10
375/375 - 0s - 1ms/step - accuracy: 0.8945 - auc: 0.9942 - loss: 0.2857 - val_accuracy: 0.8768 - val_auc: 0.9918 - val_loss: 0.3426
Epoch 9/10
375/375 - 0s - 1ms/step - accuracy: 0.8988 - auc: 0.9946 - loss: 0.2735 - val_accuracy: 0.8860 - val_auc: 0.9921 - val_loss: 0.3296
Epoch 10/10
375/375 - 0s - 1ms/step - accuracy: 0.9028 - auc: 0.9949 - loss: 0.2659 - val_accuracy: 0.8891 - val_auc: 0.9932 - val_loss: 0.3096

Evaluate performance

plot(history)

Evaluate performance

evaluate(model, x_test, y_test)

313/313 - 0s - 399us/step - accuracy: 0.8780 - auc: 0.9912 - loss: 0.3411

$accuracy
[1] 0.878

$auc
[1] 0.9911949

$loss
[1] 0.3410828

Application exercise

`ae-14`

Go to the course GitHub org and find your ae-14 (repo name will be suffixed with your GitHub name).
Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
Render, commit, and push your edits by the AE deadline – end of the day

Wrap-up

Recap

Deep learning is a subset of machine learning that uses neural networks to model complex relationships.
TensorFlow is a low-level library for numerical computations, while Keras is a high-level API for building deep learning models.
Building and training deep learning models is very different from shallow ML methods, both in terms of infrastructure and workflow.

An introduction to deep learning (with applications to text classification)

Announcements

Announcements

Learning objectives

Recap: What is machine learning?

What is machine learning?

Forms of machine learning

Deep learning applications

Benefits and drawbacks to deep learning

Benefits

Drawbacks

Infrastructure for training deep learning models

TensorFlow

Keras

Relationship between Keras and TensorFlow

Setting up a deep learning workspace

A simple application using the fashion MNIST dataset

Multilayer perceptron (MLP)

Fashion MNIST

Fashion MNIST

x_train structure

Reshape features

Reshape outcome

Define an MLP

MLP

ReLU activation function

Compile the model

Fit the model

Evaluate performance

Evaluate performance

Application exercise

ae-14

Wrap-up

Recap

`x_train` structure

`ae-14`