Posts

Convolutional neural networks: 2. zero paddings

Convolutional neural networks (CNN) In the  previous posting , we discussed how a (two-dimensional) convolutional layer works in comparison to a fully connected layer, which is used for a basic building block for a feedforward neural network (FNN). A convolutional layer is a basic building block for a convolutional neural network (CNN) , but there are other components for a CNN, one of which we discuss in this posting. Reference Our main reference is the  lecture notes by Smets , an excellent reference for mathematicians who pursue deep learning. Review: convolutional layers Let's first recall the general setting of a convolutional layer from the  previous posting . We expect $c$ matrices $X = (X[0], X[1], \dots, X[c-1])$ of fixed size, say $h \times w$ to each such matrix $X[k]$, we have $c'$ trainable matrices  $$K[0,k], K[1,k], \dots, K[c'-1,k]$$ which we filter $X[k]$ with. The recipe for the weight matrix $W : \mathbb{R}^{chw} \rightarrow \mathbb{R}^{c'(h-m...

Convolutional neural networks: 1. convolutional layers (2-dimensional)

From super flexible to somewhat flexible In the first posting  of this entire blog, we discussed how to build the most flexible form of a neural network called the feedforward neural network  (FNN). There are various versions of universal approximation theorems that such neural networks can approximate any map $\mathbb{R}^m \rightarrow \mathbb{R}^d$ in a large class (e.g., continuous maps on a compact support with supreme norm or $L^p$ maps with $L^p$-norm), which we shall refer to as a universality result. Unfortunately, no universality result provides an algorithm to find an FNN . Disclaimer . I have personally gone through a decent amount of references, and the underlined sentence above is what seems to be true based on my search rather than the absolute truth. However, it is evident that the stochastic gradient descent (SGD) or any of its variants cannot guarantee that each step (which we call back propagation ) is actually a descent process, so it is safe to say that a ...

First posting: How does a neural network work?

Purpose of this blog The goal of this blog is to unravel ideas in artificial intelligence (AI) and machine learning (ML) for people who have some training in mathematics but not necessarily in computer science. This does not mean that I will always "prove" things. Rather, I will often try to find heuristic explanations that are mathematically sounding or summarize ideas of some important proofs. As the first posting of this blog, we start off with understanding how a neural network works, which is central to AI. Of course, this is a vast topic, so we will only study essential principles on how it works now and discuss more specific topics in separate postings. The content of this posting is technically about feedforward neural networks , which are neural networks of the simplest architecture. However, I will try to explain some intuitions on how to generalize to other architectures at the end so that we feel more natural when we discuss those in later postings. Remark about r...