Handpose Classification Guide

This guide walks you through creating and training a hand pose classification model using AnyLearning. Hand pose classification is a key computer vision task where the goal is to identify and categorize specific hand gestures or poses into predefined classes. For example, you might want to classify hand poses for sign language recognition, gaming controls, or touchless device interaction. This technique is widely applicable in areas like assistive technology, virtual reality, and human-computer interaction.

🚀 Step 1: Create a Project

First, create a new project specifically for image classification:

Click on "New Project" button
Select "Handpose Classification" as the project type
Give your project a meaningful name and description

Project Creation

📊 Step 2: Data Preparation

🏷️ 2.1. Create the Label Set

The label set defines all possible classes that your model will learn to distinguish between. For example, in a American Sign Language Recognition project, your classes might be from "A" to "Z".

To create your label set:

Navigate to the "Overview" tab
Enter each class name individually in the input field
Click "+" after each class name
Ensure class names are descriptive and consistent

Edit Class Names

📁 2.2. Upload the Datasets

Go to the "Dataset" tab to manage your datasets.

For effective model training, you need to split your data into three sets:

Training set: The largest portion (typically 70-80%) used to train the model
Validation set: A smaller portion (typically 10-15%) used to tune hyperparameters and prevent overfitting
Test set: The remaining portion (typically 10-15%) used to evaluate the final model performance

Folder Structure

Upload Process:

Navigate to the "Dataset" tab
Compress each main folder (training, validation, test) into separate zip files
Use the respective upload buttons for each set
Wait for the upload and verification process to complete

Trial Dataset: We prepared a trial dataset for you to get started. You can download it from here (opens in a new tab).

💡 Important: Use different images for training, validation, and testing to ensure accurate model evaluation.

🔧 Step 3: Model Training

Training Configuration:

Go to the "Training" tab
Click "New Training Session"
Configure the following hyperparameters:
- Batch size: Number of images processed together (typically 32 or 64)
- Learning rate: Controls how much the model adjusts its weights (typically 0.001)
- Epochs: How many times the model will see the entire dataset
- Model Variant: Choose the model architecture, balanced for speed and accuracy
- Pretrained: Choose default or fine-tune from a pre-trained model
Click "Start Training" to begin the process

Create New Training

Monitor Training Progress:

View all training sessions in the "Training" tab
Click on any session to see detailed information

Training Metrics and Logs:

📈 Monitor loss values to ensure the model is learning
✅ Check accuracy metrics on validation data
📝 View training logs for detailed progress information
⚠️ Watch for signs of overfitting (validation metrics getting worse)

View Training Detail

🧪 Step 4: Test the Trained Model

After training completes, validate your model's performance:

Go to the "Model" tab
Click the "Try" button
Upload test images that weren't used in training
Analyze the model's predictions

Try model

The model will display its prediction along with confidence scores for each class: Data Preprocessing

📦 Step 5: Export the model and use with your code

Click on the download button to download the trained model. You can choose the raw Pytorch model or its ONNX conversion. The inference code is shown below.

5.1. Exported (ONNX) model usage

Install the necessary libraries:

pip install numpy torch onnxruntime opencv-python mediapipe

Run the code:

import onnxruntime as ort
import mediapipe as mp
import numpy as np
import cv2
import torch
 
MODEL_PATH = "exported_model.onnx"
HAND_LANDMARK_MODEL_DIR = "hand_landmarker.task"
CLASS_NAMES = [
    "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", 
    "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", 
    "U", "V", "W", "X", "Y", "Z"
]
IMAGE_PATH = "test_image.jpeg"
 
def softmax(x):
    max_x = np.max(x, axis=0)
    return np.exp(x - max_x) / np.sum(np.exp(x - max_x), axis=0)
 
# Load the ONNX model
ort_session = ort.InferenceSession(MODEL_PATH)
 
# Load Hand Landmarks Detection model
BaseOptions = mp.tasks.BaseOptions
HandLandmarkerOptions = mp.tasks.vision.HandLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
 
HandLandmarker = mp.tasks.vision.HandLandmarker
options = HandLandmarkerOptions(
    base_options = BaseOptions(model_asset_path=HAND_LANDMARK_MODEL_DIR),
    running_mode = VisionRunningMode.IMAGE
)
 
detector = HandLandmarker.create_from_options(options)
 
# Read and preprocess image
img = cv2.imread(IMAGE_PATH)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
mp_image = mp.Image(
    image_format=mp.ImageFormat.SRGB, data=np.array(img)
)
 
# Detect landmarks
landmarks_result = detector.detect(mp_image)
if len(landmarks_result.hand_landmarks) > 0:
    landmarks_list = []
    for landmark in landmarks_result.hand_landmarks[0]:
        landmarks_list.append([landmark.x, landmark.y, landmark.z])
 
    landmarks_tensor = torch.Tensor(landmarks_list)
    landmarks_tensor = landmarks_tensor.reshape(1, -1).numpy()
 
    # Run inference
    outputs = ort_session.run(None, {ort_session.get_inputs()[0].name: landmarks_tensor})
    output = softmax(outputs[0][0])
 
    result = CLASS_NAMES[np.argmax(output)]
    print(result)
else:
    print("Cannot classify handpose because no landmarks detected")

4. Object Detection 6. Semantic Segmentation