Semantic Segmentation Guide

This guide walks you through creating and training a semantic segmentation model using AnyLearning. Semantic segmentation is a computer vision task that involves classifying each pixel in an image to a specific class (foreground classes). For example, you might want to segment road markings, vehicles, pedestrians, and buildings in street scene images for autonomous driving applications. There is an additional implicit class called background. Those pixels that do not belong to any foreground class will be consider background.

🚀 Step 1: Create a Project

First, create a new project specifically for semantic segmentation:

Click on the “New Project” button
Select "Image Segmentation” as the project type
Provide a meaningful project name and description

Project creation

📊 Step 2: Data Preparation

🏷️ 2.1. Create the Label Set

The label set defines all possible pixel classes that your model will learn to segment. For example, in a microscopy particle segmentation project, your label might be particle.

To create your label set:

Navigate to the “Overview” tab
Enter each class name individually in the input field
Click “+” after each class name
Ensure class names are descriptive and consistent

Create label

💡 Convenient tip: If you already have an annotated dataset, you can let AnyLearning infer the label set for you without manually creating the labels. PLease see 2.2 for more detail.

📁 2.2. Upload the Datasets

Go to the “Dataset” tab to manage your datasets.

For effective model training, split your data into three sets:

Training set: The largest portion (typically 70-80%) used to train the model
Validation set: A smaller portion (typically 10-15%) used to tune hyperparameters and prevent overfitting
Test set: The remaining portion (typically 10-15%) used to evaluate the final model performance

Upload Process:

Navigate to the “Dataset” tab
For each training/validation/test folder, the should be images inside. You might have your dataset labeled already. In this case, you can include corresponding json annotations with labelme format when uploading the datasets. See (1) for an example. Zip and upload each of the training/validation/test set individually. If you want to infer the label set automatically from the labeled data without going through step 2.1, tick the checkbox "Auto create categories" before uploading the zip files.
Use the respective upload buttons for each set
Wait for the upload and verification process to complete

Trial Dataset: A sample dataset is available here (opens in a new tab).

💡 Important: Ensure masks are correctly labeled, with each pixel corresponding to a class in your label set.

Dataset upload

(1) Labelme json annotation format example:

{
  "version": "4.2.10",
  "flags": {},
  "shapes": [
    {
      "label": "cat",
      "points": [
        [100, 100],
        [150, 100],
        [150, 150],
        [100, 150]
      ],
      "group_id": null,
      "shape_type": "polygon",
      "flags": {}
    },
    {
      "label": "dog",
      "points": [
        [200, 200],
        [300, 200],
        [300, 300],
        [200, 300]
      ],
      "group_id": null,
      "shape_type": "polygon",
      "flags": {}
    }
  ],
  "imagePath": "example.jpg",
  "imageData": null,
  "imageHeight": 400,
  "imageWidth": 400
}

🏷️ 2.3. Label the Data

If your dataset lacks annotations, you can use the built-in labeling tool to manually label your data

Click the “Label Now” button on the dataset tab
Use the polygon tool to draw masks around objects
Select the appropriate class label for each painted region
Repeat for all images in your dataset

Labeling Tips:

Zoom in for fine-grained labeling
Be consistent in your labeling approach (rules for foreground, background)
Use keyboard shortcuts to speed up labeling

💡 Pro Tip: For large datasets, consider using pre-trained models for auto-labeling and refine the masks manually to save time.

Labeling

🔧 Step 3: Model Training

Training Configuration:

Go to the “Training” tab
Click “New Training Session”
Configure the following hyperparameters:
- Batch size: Number of images processed together (typically 8 or more for segmentation)
- Learning rate: Controls how much the model adjusts its weights (typically 0.001 or 0.0001)
- Epochs: How many times the model will be trained with the entire dataset
- Model Variant: Choose the model architecture
- Pretrained: Choose default or fine-tune from a pre-trained model
Click “Start Training” to begin the process

Create a new training

Monitor Training Progress:

View all training sessions in the “Training” tab
Click on any session to see detailed information

Create a new training

Training Metrics and Logs:

📈 Monitor loss values (segmentation loss)
✅ Check IoU (Intersection over Union) metrics
📝 View training logs for detailed progress information
⚠️ Watch for signs of overfitting (validation loss getting worse)

View training lgos

🧪 Step 4: Test the Trained Model

After training completes, validate your model’s performance:

Go to the “Model” tab
Click the “Try” button
Upload test images that weren’t used in training
Analyze the model’s predictions

Go to model

The model will display predictions as segmentation masks overlaid on the input images, highlighting each class with a distinct color.

Make predictioj

📦 Step 5: Export the model and use with your code

Click on the download button to download the trained model. You can choose the raw Pytorch model or its ONNX conversion. The inference code is shown below.

Export models

5.1. Raw (Pytorch) model usage

Install the necessary libraries:

pip install numpy torch torchvision Pillow opencv-python pyyaml

Run the code:

import yaml
import json
 
import cv2
import numpy as np
from PIL import Image
import torch
from torchvision import transforms
 
CONFIG_PATH = "/path/to/config.yml"
MODEL_PATH = "/path/to/model.pth"
IMAGE_PATH = "/path/to/image.png"
 
 
with open(CONFIG_PATH, "r") as f:
    config = yaml.safe_load(f)
 
def hex_to_rgb(hex_color: str) -> tuple:
    hex_color = hex_color.lstrip('#')
    return tuple(int(hex_color[i:i+2], 16) for i in range(0, len(hex_color), 2))
 
def get_transformations(img_size, mean, std):
    transform = transforms.Compose(
        [
            transforms.Resize((img_size, img_size)),
            transforms.ToTensor(),
            transforms.Normalize(mean, std),
        ]
    )
    return transform
 
label_set = config["data"]["label_set"]
label_id_to_name = {v["id"]: v["name"] for v in label_set}
label_id_to_color = {v["id"]: hex_to_rgb(v["color"]) for v in label_set}
 
model = torch.load(MODEL_PATH)
model.eval()
transform = get_transformations(config["data"]["img_size"], 
                                mean=config["data"]["normalize"]["mean"],
                                std=config["data"]["normalize"]["std"])
 
pil_image = Image.open(IMAGE_PATH).convert('RGB')
inp = transform(pil_image)
inp = inp.unsqueeze(0)
    
with torch.no_grad():
    pred_mask = model(inp)
    pred_mask = torch.argmax(pred_mask, dim=1)[0]
 
# Visualize
np_image = np.array(pil_image)
pred_mask = pred_mask.cpu().numpy()
pred_mask = pred_mask.astype(np.uint8)
pred_mask = cv2.resize(pred_mask, (np_image.shape[1], np_image.shape[0]), 
                        interpolation=cv2.INTER_NEAREST)
 
visualization_image = np.zeros_like(np_image)
for label_id, color in label_id_to_color.items():
    visualization_image[pred_mask == label_id] = color
 
# Blend with original image
alpha = 0.5
visualization_image = cv2.addWeighted(
    np_image, alpha, visualization_image, 1-alpha, 0
)
 
# save the visualization image
cv2.imwrite("visualization.png", visualization_image)
 
# formatted predictions, to polygon format
formatted_predictions = []
for label_id, name in label_id_to_name.items():
    mask = (pred_mask == label_id).astype(np.uint8)
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    all_polygons = []
    for contour in contours:
        if cv2.contourArea(contour) > 5: # filter out small polygons (noise)
            all_polygons.append(contour.flatten().tolist())
    if all_polygons:
        formatted_predictions.append({
            "id": label_id,
            "label": name,
            "color": label_id_to_color[label_id],
            "points": all_polygons,
            "type": "polygon"
        })
 
# save the formatted predictions
with open("formatted_predictions.json", "w") as f:
    json.dump(formatted_predictions, f)

5.2. Exported (ONNX) model usage

Install the necessary libraries:

pip install numpy torch torchvision onnxruntime Pillow opencv-python pyyaml

Run the code:

import yaml
import json
import onnxruntime
import cv2
import numpy as np
from PIL import Image
from torchvision import transforms
 
CONFIG_PATH = "/path/to/config.yml"
MODEL_PATH = "/path/to/model.onnx"
IMAGE_PATH = "/path/to/image.png"
 
with open(CONFIG_PATH, "r") as f:
    config = yaml.safe_load(f)
 
def hex_to_rgb(hex_color: str) -> tuple:
    hex_color = hex_color.lstrip('#')
    return tuple(int(hex_color[i:i+2], 16) for i in range(0, len(hex_color), 2))
 
def get_transformations(img_size, mean, std):
    transform = transforms.Compose(
        [
            transforms.Resize((img_size, img_size)),
            transforms.ToTensor(),
            transforms.Normalize(mean, std),
        ]
    )
    return transform
 
label_set = config["data"]["label_set"]
label_id_to_name = {v["id"]: v["name"] for v in label_set}
label_id_to_color = {v["id"]: hex_to_rgb(v["color"]) for v in label_set}
 
# Initialize ONNX Runtime session
session = onnxruntime.InferenceSession(MODEL_PATH)
input_name = session.get_inputs()[0].name
 
transform = get_transformations(config["data"]["img_size"], 
                              mean=config["data"]["normalize"]["mean"],
                              std=config["data"]["normalize"]["std"])
 
pil_image = Image.open(IMAGE_PATH).convert('RGB')
inp = transform(pil_image)
inp = inp.unsqueeze(0).numpy()  # Convert to numpy array for ONNX Runtime
 
# Run inference with ONNX Runtime
pred_mask = session.run(None, {input_name: inp})[0]
pred_mask = np.argmax(pred_mask, axis=1)[0]
 
# Visualize
np_image = np.array(pil_image)
pred_mask = pred_mask.astype(np.uint8)
pred_mask = cv2.resize(pred_mask, (np_image.shape[1], np_image.shape[0]), 
                      interpolation=cv2.INTER_NEAREST)
 
visualization_image = np.zeros_like(np_image)
for label_id, color in label_id_to_color.items():
    visualization_image[pred_mask == label_id] = color
 
# Blend with original image
alpha = 0.5
visualization_image = cv2.addWeighted(
    np_image, alpha, visualization_image, 1-alpha, 0
)
 
# save the visualization image
cv2.imwrite("visualization.png", visualization_image)
 
# formatted predictions, to polygon format
formatted_predictions = []
for label_id, name in label_id_to_name.items():
    mask = (pred_mask == label_id).astype(np.uint8)
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    all_polygons = []
    for contour in contours:
        if cv2.contourArea(contour) > 5:  # filter out small polygons (noise)
            all_polygons.append(contour.flatten().tolist())
    if all_polygons:
        formatted_predictions.append({
            "id": label_id,
            "label": name,
            "color": label_id_to_color[label_id],
            "points": all_polygons,
            "type": "polygon"
        })
 
# save the formatted predictions
with open("formatted_predictions.json", "w") as f:
    json.dump(formatted_predictions, f)

5. Handpose Classification 7. Auto Labeling