.. _mae: MAE === Example implementation of the Masked Autoencoder (MAE) architecture. MAE is a transformer model based on the `Vision Transformer (ViT) `_ architecture. It learns image representations by predicting pixel values for masked patches on the input images. The network is split into an encoder and decoder. The encoder generates the image representation and the decoder predicts the pixel values from the representation. MAE increases training efficiency compared to other transformer architectures by encoding only part of the input image and using a shallow decoder architecture. Reference: `Masked Autoencoders Are Scalable Vision Learners, 2021 `_ .. note:: MAE requires `TIMM `_ to be installed .. code-block:: bash pip install "timm>=0.9.9" .. tabs:: .. tab:: PyTorch This example can be run from the command line with:: python lightly/examples/pytorch/mae.py .. literalinclude:: ../../../examples/pytorch/mae.py .. tab:: Lightning This example can be run from the command line with:: python lightly/examples/pytorch_lightning/mae.py .. literalinclude:: ../../../examples/pytorch_lightning/mae.py .. tab:: Lightning Distributed This example runs on multiple gpus using Distributed Data Parallel (DDP) training with Pytorch Lightning. At least one GPU must be available on the system. The example can be run from the command line with:: python lightly/examples/pytorch_lightning_distributed/mae.py The model differs in the following ways from the non-distributed implementation: - Distributed Data Parallel is enabled - Distributed Sampling is used in the dataloader Distributed Sampling makes sure that each distributed process sees only a subset of the data. .. literalinclude:: ../../../examples/pytorch_lightning_distributed/mae.py