(quick-start)= # Quick Start ## Installation ```bash pip install lightly-train[tensorboard] ``` ## Prepare Data You can use any image dataset for training. No labels are required, and the dataset can be structured in any way, including subdirectories. If you don't have a dataset at hand, you can download one like this: ```bash git clone https://github.com/lightly-ai/dataset_clothing_images.git my_data_dir ``` See the [data guide](#train-data) for more information on supported data formats. ## Train Once the data is ready, you can train the model like this: ````{tab} Python ```python import lightly_train lightly_train.train( out="out/my_experiment", # Output directory data="my_data_dir", # Directory with images model="torchvision/resnet18", # Model to train method="dino", # Self-supervised learning method epochs=100, # Number of epochs to train batch_size=128, # Batch size ) ```` ````{tab} Command Line ```bash lightly-train train out="out/my_experiment" data="my_data_dir" model="torchvision/resnet18" method="dino" epochs=100 batch_size=128 ```` ```{tip} Decrease the number of epochs and batch size for faster training. ``` This will pretrain a Torchvision ResNet-18 model using images from `my_data_dir` and the DINO self-supervised learning method. All training logs and checkpoints are saved to the output directory at `out/my_experiment`. Once the training is complete, the `out/my_experiment` directory should contain the following files: ```text out/my_experiment ├── checkpoints │ ├── epoch=99-step=123.ckpt # Intermediate checkpoint │ └── last.ckpt # Last checkpoint ├── events.out.tfevents.123.0 # Tensorboard logs ├── metrics.jsonl # Training metrics └── train.log # Training logs ``` The most important file is `out/my_experiment/checkpoints/last.ckpt`, which contains the final model checkpoint. While the trained model has already learned good representations of the images, it cannot yet make any predictions for tasks such as classification, detection, or segmentation. To solve these tasks, the model needs to be fine-tuned on a labeled dataset. ## Export Before the model can be fine-tuned, it needs to be exported. The `export` command makes sure that only the model weights needed for fine-tuning are exported from the Lightly**Train** checkpoint: ````{tab} Python ```python import lightly_train lightly_train.export( out="my_exported_model.pth", # Exported model checkpoint="out/my_experiment/checkpoints/last.ckpt", # LightlyTrain checkpoint part="model", # Export ResNet-18 model with classification head format="torch_state_dict", # Export format ) ```` ````{tab} Command Line ```bash lightly-train export out="my_exported_model.pth" checkpoint="out/my_experiment/checkpoints/last.ckpt" part="model" format="torch_state_dict" ```` ## Fine-Tune Now the model is ready for fine-tuning! You can use your favorite library for this step. Below is a simple example using PyTorch: ```python import torch from torch import nn, optim from torch.utils.data import DataLoader from torchvision import datasets, transforms, models transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor() ]) dataset = datasets.ImageFolder(root="my_data_dir", transform=transform) dataloader = DataLoader(dataset, batch_size=16, shuffle=True, drop_last=True) # Load the exported model model = models.resnet18() model.load_state_dict(torch.load("my_exported_model.pth", weights_only=True)) # Update the classification head with the correct number of classes model.fc = nn.Linear(model.fc.in_features, len(dataset.classes)) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) print("Starting fine-tuning...") num_epochs = 10 for epoch in range(num_epochs): for inputs, labels in dataloader: optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}") ``` The output should show the loss decreasing over time: ```text Starting fine-tuning... Epoch [1/10], Loss: 2.1686 Epoch [2/10], Loss: 2.1290 Epoch [3/10], Loss: 2.1854 Epoch [4/10], Loss: 2.2936 Epoch [5/10], Loss: 1.9303 Epoch [6/10], Loss: 1.9949 Epoch [7/10], Loss: 1.8429 Epoch [8/10], Loss: 1.9873 Epoch [9/10], Loss: 1.8179 Epoch [10/10], Loss: 1.5360 ``` Congratulations! You just trained and fine-tuned a model using Lightly**Train**! ```{tip} Lightly**Train** has integrated support for popular libraries such as [Ultralytics](#ultralytics) and [SuperGradients](#super-gradients), which allow you to fine-tune the exported models directly from the command line. ``` ## Embed Instead of fine-tuning the model, you can also use it to generate image embeddings. This is useful for clustering, retrieval, or visualization tasks. The `embed` command generates embeddings for all images in a directory: ````{tab} Python ```python import lightly_train lightly_train.embed( out="my_embeddings.pth", # Exported embeddings checkpoint="out/my_experiment/checkpoints/last.ckpt", # LightlyTrain checkpoint data="my_data_dir", # Directory with images format="torch", # Embedding format ) ```` ````{tab} Command Line ```bash lightly-train embed out="my_embeddings.pth" checkpoint="out/my_experiment/checkpoints/last.ckpt" data="my_data_dir" format="torch" ```` The embeddings are saved to `my_embeddings.pth` and can be loaded like this: ```python import torch embeddings = torch.load("my_embeddings.pth") embeddings["filenames"] # List of filenames embeddings["embeddings"] # Tensor with embeddings with shape (num_images, embedding_dim) ```