Handpose Classification Guide
This guide walks you through creating and training a hand pose classification model using AnyLearning. Hand pose classification is a key computer vision task where the goal is to identify and categorize specific hand gestures or poses into predefined classes. For example, you might want to classify hand poses for sign language recognition, gaming controls, or touchless device interaction. This technique is widely applicable in areas like assistive technology, virtual reality, and human-computer interaction.
🚀 Step 1: Create a Project
First, create a new project specifically for image classification:
- Click on "New Project" button
- Select "Handpose Classification" as the project type
- Give your project a meaningful name and description
📊 Step 2: Data Preparation
🏷️ 2.1. Create the Label Set
The label set defines all possible classes that your model will learn to distinguish between. For example, in a American Sign Language Recognition project, your classes might be from "A"
to "Z"
.
To create your label set:
- Navigate to the "Overview" tab
- Enter each class name individually in the input field
- Click "+" after each class name
- Ensure class names are descriptive and consistent
📁 2.2. Upload the Datasets
Go to the "Dataset" tab to manage your datasets.
For effective model training, you need to split your data into three sets:
- Training set: The largest portion (typically 70-80%) used to train the model
- Validation set: A smaller portion (typically 10-15%) used to tune hyperparameters and prevent overfitting
- Test set: The remaining portion (typically 10-15%) used to evaluate the final model performance
Upload Process:
- Navigate to the "Dataset" tab
- Compress each main folder (training, validation, test) into separate zip files
- Use the respective upload buttons for each set
- Wait for the upload and verification process to complete
Trial Dataset: We prepared a trial dataset for you to get started. You can download it from here (opens in a new tab).
💡 Important: Use different images for training, validation, and testing to ensure accurate model evaluation.
🔧 Step 3: Model Training
Training Configuration:
- Go to the "Training" tab
- Click "New Training Session"
- Configure the following hyperparameters:
Batch size
: Number of images processed together (typically 32 or 64)Learning rate
: Controls how much the model adjusts its weights (typically 0.001)Epochs
: How many times the model will see the entire datasetModel Variant
: Choose the model architecture, balanced for speed and accuracyPretrained
: Choose default or fine-tune from a pre-trained model
- Click "Start Training" to begin the process
Monitor Training Progress:
- View all training sessions in the "Training" tab
- Click on any session to see detailed information
Training Metrics and Logs:
- 📈 Monitor loss values to ensure the model is learning
- ✅ Check accuracy metrics on validation data
- 📝 View training logs for detailed progress information
- ⚠️ Watch for signs of overfitting (validation metrics getting worse)
🧪 Step 4: Test the Trained Model
After training completes, validate your model's performance:
- Go to the "Model" tab
- Click the "Try" button
- Upload test images that weren't used in training
- Analyze the model's predictions
The model will display its prediction along with confidence scores for each class:
📦 Step 5: Export the model and use with your code
Click on the download button to download the trained model. You can choose the raw Pytorch model or its ONNX conversion. The inference code is shown below.
5.1. Exported (ONNX) model usage
- Install the necessary libraries:
pip install numpy torch onnxruntime opencv-python mediapipe
- Run the code:
import onnxruntime as ort
import mediapipe as mp
import numpy as np
import cv2
import torch
MODEL_PATH = "exported_model.onnx"
HAND_LANDMARK_MODEL_DIR = "hand_landmarker.task"
CLASS_NAMES = [
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J",
"K", "L", "M", "N", "O", "P", "Q", "R", "S", "T",
"U", "V", "W", "X", "Y", "Z"
]
IMAGE_PATH = "test_image.jpeg"
def softmax(x):
max_x = np.max(x, axis=0)
return np.exp(x - max_x) / np.sum(np.exp(x - max_x), axis=0)
# Load the ONNX model
ort_session = ort.InferenceSession(MODEL_PATH)
# Load Hand Landmarks Detection model
BaseOptions = mp.tasks.BaseOptions
HandLandmarkerOptions = mp.tasks.vision.HandLandmarkerOptions
VisionRunningMode = mp.tasks.vision.RunningMode
HandLandmarker = mp.tasks.vision.HandLandmarker
options = HandLandmarkerOptions(
base_options = BaseOptions(model_asset_path=HAND_LANDMARK_MODEL_DIR),
running_mode = VisionRunningMode.IMAGE
)
detector = HandLandmarker.create_from_options(options)
# Read and preprocess image
img = cv2.imread(IMAGE_PATH)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
mp_image = mp.Image(
image_format=mp.ImageFormat.SRGB, data=np.array(img)
)
# Detect landmarks
landmarks_result = detector.detect(mp_image)
if len(landmarks_result.hand_landmarks) > 0:
landmarks_list = []
for landmark in landmarks_result.hand_landmarks[0]:
landmarks_list.append([landmark.x, landmark.y, landmark.z])
landmarks_tensor = torch.Tensor(landmarks_list)
landmarks_tensor = landmarks_tensor.reshape(1, -1).numpy()
# Run inference
outputs = ort_session.run(None, {ort_session.get_inputs()[0].name: landmarks_tensor})
output = softmax(outputs[0][0])
result = CLASS_NAMES[np.argmax(output)]
print(result)
else:
print("Cannot classify handpose because no landmarks detected")