Autoencoder.md



Autoencoder
The autoencoder is a versatile artificial neural network with multiple variations that can be utilized for
unsupervised learning. Depending on the
specific variant it can be used to perform tasks from anomaly detection and
feature extraction, over image denoising to
even image generation.
This is achieved by taking input data and compressing it into a smaller intermediate representation
and then decompressing it again for the output.

Structure
The basic structure of the autoencoder consists of 3 parts: The encoder, the decoder and a
bottleneck in the middle.

Source: https://en.wikipedia.org/wiki/File:Autoencoder_schema.png 28.12.2021

Encoder
The encoder compresses the input into the latent space representation in the middle. This is done
by using one or more convolutional and pooling layers
in series with each consecutive layer producing a smaller output than the previous. The output of
the last encoding layer is generally much smaller than the input and since this encoding process is
lossy, the original data can not be perfectly reconstructed from this compressed representation.

Bottleneck
The output of the last encoding layer is the smallest representation of the data inside the network
and creates a bottleneck that restricts how much information can pass from the encoder through the
decoder. This is used to restrict the information flow to only the important parts for a given
usecase. In the case of a denoising autoencoder for example, the bottleneck should filter out the
noise.
Smaller bottlenecks lower the risk of overfitting
since it can't contain enough information relative to the input size to effectively learn
specific inputs.
However the smaller the bottleneck is the larger is the risk of losing important data.

Decoder
The last stage is the decoder which takes in the compressed representation and tries to decompress
it. In the most simple case the goal would be to just reconstruct the original image from the
compressed form as accurately as possible. Since the bottleneck restricts how much information can
pass through, the reconstruction won't be perfect but instead only an approximation. In a more
interesting example with the denoising autoencoder the decoder should reconstruct the input image
but remove the noise from the input in the process.
This step is generally performed by a Deconvolutional Network. As the name suggests, this is quite
similar to a Convolutional Network,
as it was used in the encoding step, but in reverse. Where the Convolutional Network was taking a
large amount of input data and reducing it to a much smaller dataset in order to isolate certain
bits of information, the Deconvolutional Network is mapping a small dataset onto a much larger
one. This enables the generation of data from a given set of isolated features like the compressed
representation created by the encoder.

References
{{#include ../../References.md:AUTOENCODER}}
Written by Daniel Müller