Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Autoencoder

The autoencoder is a versatile artificial neural network with multiple variations that can be utilized for unsupervised learning. Depending on the specific variant it can be used to perform tasks from anomaly detection and feature extraction, over image denoising to even image generation.

This is achieved by taking input data and compressing it into a smaller intermediate representation and then decompressing it again for the output.

Structure

The basic structure of the autoencoder consists of 3 parts: The encoder, the decoder and a bottleneck in the middle.

Basic autoencoder schema

Source: https://en.wikipedia.org/wiki/File:Autoencoder_schema.png 28.12.2021

Encoder

The encoder compresses the input into the latent space representation in the middle. This is done by using one or more convolutional and pooling layers in series with each consecutive layer producing a smaller output than the previous. The output of the last encoding layer is generally much smaller than the input and since this encoding process is lossy, the original data can not be perfectly reconstructed from this compressed representation.

Bottleneck

The output of the last encoding layer is the smallest representation of the data inside the network and creates a bottleneck that restricts how much information can pass from the encoder through the decoder. This is used to restrict the information flow to only the important parts for a given usecase. In the case of a denoising autoencoder for example, the bottleneck should filter out the noise.

Smaller bottlenecks lower the risk of overfitting since it can't contain enough information relative to the input size to effectively learn specific inputs. However the smaller the bottleneck is the larger is the risk of losing important data.

Decoder

The last stage is the decoder which takes in the compressed representation and tries to decompress it. In the most simple case the goal would be to just reconstruct the original image from the compressed form as accurately as possible. Since the bottleneck restricts how much information can pass through, the reconstruction won't be perfect but instead only an approximation. In a more interesting example with the denoising autoencoder the decoder should reconstruct the input image but remove the noise from the input in the process.

This step is generally performed by a Deconvolutional Network. As the name suggests, this is quite similar to a Convolutional Network, as it was used in the encoding step, but in reverse. Where the Convolutional Network was taking a large amount of input data and reducing it to a much smaller dataset in order to isolate certain bits of information, the Deconvolutional Network is mapping a small dataset onto a much larger one. This enables the generation of data from a given set of isolated features like the compressed representation created by the encoder.

References

{{#include ../../References.md:AUTOENCODER}}

Written by Daniel Müller