An introduction to deep learning (with applications to text classification)

Lecture 15

Dr. Benjamin Soltoff

Cornell University
INFO 4940/5940 - Fall 2024

October 22, 2024

Announcements

Announcements

  • Project teams

Learning objectives

  • Distinguish the major workflow differences between shallow and deep learning.
  • Understand the basic architecture of a dense neural network.
  • Implement deep learning models using Keras and the {keras3} package.
  • Estimate a series of dense neural networks for text classification.
  • Incorporate GLoVE word embeddings into a deep learning model.

Recap: What is machine learning?

What is machine learning?

Forms of machine learning

Deep learning applications

  • Image recognition
  • Natural language processing
  • Time series forecasting
  • Text to image
  • Video/audio synthesis

Benefits and drawbacks to deep learning

Benefits

  • Learn complex relationships between features
  • Requires minimal feature engineering
  • Scales well to large datasets

Drawbacks

  • Requires lots of data
  • More susceptible to overfitting
  • Extremely computationally expensive
  • Data sourcing often leads to bias

Infrastructure for training deep learning models

TensorFlow

  • Open-source machine learning library developed by Google1
  • Perform low-level mathematical expressions over numerical tensors
  • Runs on CPUs, GPUs, and TPUs
  • Widely utilized for deep learning development (but not the only option)

Keras

  • Deep learning API
  • High-level interface to defining and training any kind of deep learning models
  • Intended for high-level modeling tasks

Relationship between Keras and TensorFlow

Keras

TensorFlow

Torch

Jax

Setting up a deep learning workspace

๐Ÿ†“ Use free GPU runtime from Kaggle, Google Colab, or others

๐Ÿ” Access research computing clusters through Cornell

๐Ÿ’ต Use GPU instances on Google Cloud or Amazon EC2

๐Ÿ’ฐ Buy and install an NVIDIA GPU on a desktop computer

A simple application using the fashion MNIST dataset

Multilayer perceptron (MLP)

Fashion MNIST

Fashion MNIST

mnist <- dataset_fashion_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

x_train structure

Data stored in a 3D array (60000, 28, 28) of grayscale values between 0 and 255.

class(x_train)
[1] "array"
# first observation
x_train[1, , ]
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
 [1,]    0    0    0    0    0    0    0    0    0     0     0     0     0
 [2,]    0    0    0    0    0    0    0    0    0     0     0     0     0
 [3,]    0    0    0    0    0    0    0    0    0     0     0     0     0
 [4,]    0    0    0    0    0    0    0    0    0     0     0     0     1
 [5,]    0    0    0    0    0    0    0    0    0     0     0     0     3
 [6,]    0    0    0    0    0    0    0    0    0     0     0     0     6
 [7,]    0    0    0    0    0    0    0    0    0     0     0     0     0
 [8,]    0    0    0    0    0    0    0    0    0     0     0     1     0
 [9,]    0    0    0    0    0    0    0    0    0     1     1     1     0
[10,]    0    0    0    0    0    0    0    0    0     0     0     0     0
[11,]    0    0    0    0    0    0    0    0    0     0     0     0     0
[12,]    0    0    0    0    0    0    0    0    0     1     3     0    12
[13,]    0    0    0    0    0    0    0    0    0     0     6     0    99
[14,]    0    0    0    0    0    0    0    0    0     4     0     0    55
[15,]    0    0    1    4    6    7    2    0    0     0     0     0   237
[16,]    0    3    0    0    0    0    0    0    0    62   145   204   228
[17,]    0    0    0    0   18   44   82  107  189   228   220   222   217
[18,]    0   57  187  208  224  221  224  208  204   214   208   209   200
[19,]    3  202  228  224  221  211  211  214  205   205   205   220   240
[20,]   98  233  198  210  222  229  229  234  249   220   194   215   217
[21,]   75  204  212  204  193  205  211  225  216   185   197   206   198
[22,]   48  203  183  194  213  197  185  190  194   192   202   214   219
[23,]    0  122  219  193  179  171  183  196  204   210   213   207   211
[24,]    0    0   74  189  212  191  175  172  175   181   185   188   189
[25,]    2    0    0    0   66  200  222  237  239   242   246   243   244
[26,]    0    0    0    0    0    0    0   40   61    44    72    41    35
[27,]    0    0    0    0    0    0    0    0    0     0     0     0     0
[28,]    0    0    0    0    0    0    0    0    0     0     0     0     0
      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
 [1,]     0     0     0     0     0     0     0     0     0     0     0     0
 [2,]     0     0     0     0     0     0     0     0     0     0     0     0
 [3,]     0     0     0     0     0     0     0     0     0     0     0     0
 [4,]     0     0    13    73     0     0     1     4     0     0     0     0
 [5,]     0    36   136   127    62    54     0     0     0     1     3     4
 [6,]     0   102   204   176   134   144   123    23     0     0     0     0
 [7,]     0   155   236   207   178   107   156   161   109    64    23    77
 [8,]    69   207   223   218   216   216   163   127   121   122   146   141
 [9,]   200   232   232   233   229   223   223   215   213   164   127   123
[10,]   183   225   216   223   228   235   227   224   222   224   221   223
[11,]   193   228   218   213   198   180   212   210   211   213   223   220
[12,]   219   220   212   218   192   169   227   208   218   224   212   226
[13,]   244   222   220   218   203   198   221   215   213   222   220   245
[14,]   236   228   230   228   240   232   213   218   223   234   217   217
[15,]   226   217   223   222   219   222   221   216   223   229   215   218
[16,]   207   213   221   218   208   211   218   224   223   219   215   224
[17,]   226   200   205   211   230   224   234   176   188   250   248   233
[18,]   159   245   193   206   223   255   255   221   234   221   211   220
[19,]    80   150   255   229   221   188   154   191   210   204   209   222
[20,]   241    65    73   106   117   168   219   221   215   217   223   223
[21,]   213   240   195   227   245   239   223   218   212   209   222   220
[22,]   221   220   236   225   216   199   206   186   181   177   172   181
[23,]   210   200   196   194   191   195   191   198   192   176   156   167
[24,]   188   193   198   204   209   210   210   211   188   188   194   192
[25,]   221   220   193   191   179   182   182   181   176   166   168    99
[26,]     0     0     0     0     0     0     0     0     0     0     0     0
[27,]     0     0     0     0     0     0     0     0     0     0     0     0
[28,]     0     0     0     0     0     0     0     0     0     0     0     0
      [,26] [,27] [,28]
 [1,]     0     0     0
 [2,]     0     0     0
 [3,]     0     0     0
 [4,]     1     1     0
 [5,]     0     0     3
 [6,]    12    10     0
 [7,]   130    72    15
 [8,]    88   172    66
 [9,]   196   229     0
[10,]   245   173     0
[11,]   243   202     0
[12,]   197   209    52
[13,]   119   167    56
[14,]   209    92     0
[15,]   255    77     0
[16,]   244   159     0
[17,]   238   215     0
[18,]   232   246     0
[19,]   228   225     0
[20,]   224   229    29
[21,]   221   230    67
[22,]   205   206   115
[23,]   177   210    92
[24,]   216   170     0
[25,]    58     0     0
[26,]     0     0     0
[27,]     0     0     0
[28,]     0     0     0

Reshape features

# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))

# rescale
x_train <- x_train / 255
x_test <- x_test / 255

dim(x_train)
[1] 60000   784

Reshape outcome

# original integer structure
head(y_train)
[1] 9 0 0 3 0 2
# convert to binary class matricies
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)

head(y_train)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    0    0    0    0    0     1
[2,]    1    0    0    0    0    0    0    0    0     0
[3,]    1    0    0    0    0    0    0    0    0     0
[4,]    0    0    0    1    0    0    0    0    0     0
[5,]    1    0    0    0    0    0    0    0    0     0
[6,]    0    0    1    0    0    0    0    0    0     0

Define an MLP

model <- keras_model_sequential(input_shape = c(784)) |>
  layer_dense(units = 128, activation = "relu") |>
  layer_dense(units = 10, activation = "softmax")
  • Fully dense neural network
  • Activation functions
  • Softmax function - converts output to probabilities for each class

MLP

summary(model)
Model: "sequential"
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Layer (type)                      โ”ƒ Output Shape             โ”ƒ       Param # โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ dense (Dense)                     โ”‚ (None, 128)              โ”‚       100,480 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ dense_1 (Dense)                   โ”‚ (None, 10)               โ”‚         1,290 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
 Total params: 101,770 (397.54 KB)
 Trainable params: 101,770 (397.54 KB)
 Non-trainable params: 0 (0.00 B)

ReLU activation function

Compile the model

model |> compile(
  loss = "categorical_crossentropy",
  optimizer = "rmsprop",
  metrics = c("accuracy", "auc")
)

Fit the model

history <- model |> fit(
  x_train, y_train,
  epochs = 10,
  batch_size = 128,
  validation_split = 0.2
)
Epoch 1/10
375/375 - 5s - 14ms/step - accuracy: 0.7879 - auc: 0.9785 - loss: 0.6091 - val_accuracy: 0.8162 - val_auc: 0.9837 - val_loss: 0.5105
Epoch 2/10
375/375 - 5s - 12ms/step - accuracy: 0.8474 - auc: 0.9879 - loss: 0.4323 - val_accuracy: 0.8514 - val_auc: 0.9892 - val_loss: 0.4074
Epoch 3/10
375/375 - 5s - 12ms/step - accuracy: 0.8621 - auc: 0.9904 - loss: 0.3825 - val_accuracy: 0.8622 - val_auc: 0.9903 - val_loss: 0.3770
Epoch 4/10
375/375 - 5s - 13ms/step - accuracy: 0.8739 - auc: 0.9919 - loss: 0.3511 - val_accuracy: 0.8691 - val_auc: 0.9909 - val_loss: 0.3672
Epoch 5/10
375/375 - 5s - 12ms/step - accuracy: 0.8794 - auc: 0.9926 - loss: 0.3311 - val_accuracy: 0.8690 - val_auc: 0.9912 - val_loss: 0.3551
Epoch 6/10
375/375 - 5s - 12ms/step - accuracy: 0.8855 - auc: 0.9932 - loss: 0.3132 - val_accuracy: 0.8709 - val_auc: 0.9911 - val_loss: 0.3599
Epoch 7/10
375/375 - 5s - 13ms/step - accuracy: 0.8906 - auc: 0.9937 - loss: 0.3009 - val_accuracy: 0.8687 - val_auc: 0.9912 - val_loss: 0.3531
Epoch 8/10
375/375 - 5s - 13ms/step - accuracy: 0.8943 - auc: 0.9942 - loss: 0.2880 - val_accuracy: 0.8863 - val_auc: 0.9925 - val_loss: 0.3210
Epoch 9/10
375/375 - 5s - 13ms/step - accuracy: 0.8980 - auc: 0.9945 - loss: 0.2783 - val_accuracy: 0.8869 - val_auc: 0.9922 - val_loss: 0.3228
Epoch 10/10
375/375 - 5s - 14ms/step - accuracy: 0.9016 - auc: 0.9950 - loss: 0.2672 - val_accuracy: 0.8838 - val_auc: 0.9923 - val_loss: 0.3259

Evaluate performance

plot(history)

Evaluate performance

evaluate(model, x_test, y_test)
313/313 - 3s - 10ms/step - accuracy: 0.8761 - auc: 0.9911 - loss: 0.3525
$accuracy
[1] 0.8761

$auc
[1] 0.9911444

$loss
[1] 0.3524756

Application exercise

ae-14

  • Go to the course GitHub org and find your ae-14 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio, run renv::restore() to install the required packages, open the Quarto document in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline โ€“ end of the day

Wrap-up

Recap

  • Deep learning is a subset of machine learning that uses neural networks to model complex relationships.
  • TensorFlow is a low-level library for numerical computations, while Keras is a high-level API for building deep learning models.
  • Building and training deep learning models is very different from shallow ML methods, both in terms of infrastructure and workflow.