AI/ML for mathematicians

Posts

Showing posts from July, 2024

Convolutional neural networks: 2. zero paddings

- July 09, 2024

Convolutional neural networks (CNN) In the previous posting , we discussed how a (two-dimensional) convolutional layer works in comparison to a fully connected layer, which is used for a basic building block for a feedforward neural network (FNN). A convolutional layer is a basic building block for a convolutional neural network (CNN) , but there are other components for a CNN, one of which we discuss in this posting. Reference Our main reference is the lecture notes by Smets , an excellent reference for mathematicians who pursue deep learning. Review: convolutional layers Let's first recall the general setting of a convolutional layer from the previous posting . We expect $c$ matrices $X = (X[0], X[1], \dots, X[c-1])$ of fixed size, say $h \times w$ to each such matrix $X[k]$, we have $c'$ trainable matrices $$K[0,k], K[1,k], \dots, K[c'-1,k]$$ which we filter $X[k]$ with. The recipe for the weight matrix $W : \mathbb{R}^{chw} \rightarrow \mathbb{R}^{c'(h-m...

Convolutional neural networks: 1. convolutional layers (2-dimensional)

- July 05, 2024

From super flexible to somewhat flexible In the first posting of this entire blog, we discussed how to build the most flexible form of a neural network called the feedforward neural network (FNN). There are various versions of universal approximation theorems that such neural networks can approximate any map $\mathbb{R}^m \rightarrow \mathbb{R}^d$ in a large class (e.g., continuous maps on a compact support with supreme norm or $L^p$ maps with $L^p$-norm), which we shall refer to as a universality result. Unfortunately, no universality result provides an algorithm to find an FNN . Disclaimer . I have personally gone through a decent amount of references, and the underlined sentence above is what seems to be true based on my search rather than the absolute truth. However, it is evident that the stochastic gradient descent (SGD) or any of its variants cannot guarantee that each step (which we call back propagation ) is actually a descent process, so it is safe to say that a ...