Avoid These Mistakes When You Create a Custom Dataset for Object Detection

4 min readJun 14, 2024

Introduction

In the field of computer vision, object detection has become a cornerstone for applications ranging from autonomous vehicles to medical imaging. However, the success of these applications heavily relies on the quality of the datasets used for training the models.

Creating a custom dataset for object detection is a critical task that requires meticulous attention to detail. Even minor mistakes can lead to significant setbacks, affecting the accuracy and reliability of our model.

In this article, we will explore the common pitfalls to avoid when creating a custom dataset for object detection.

Class Imbalance

Class imbalance is a common challenge that can significantly impact the performance of our object detection model. Class imbalance occurs when the distribution of object instances across different classes is highly skewed, with some classes have a large number of instances while others have very few instances.

In an imbalanced dataset, the model tends to become biased towards the majority classes, leading to poor performance on the minority classes. This can be particularly problematic in real-world scenarios where it’s crucial to detect rare or uncommon objects accurately.

However, there are some ways to tackle this problem. We can gather more data, assign weights for classes or augment the minority classes by applying various transformations, such as rotation, flipping, scaling, or adding noise, to generate additional synthetic instances.

Oversampling Frames from a Video

Including too many frames from the same video can lead to a dataset that lacks diversity and doesn’t generalize well to new data.

Models trained on oversampled frames may memorize the specific scenarios from the video rather than learning general features.

The dataset may become biased towards the specific conditions (lighting, background, object positions) present in the oversampled video. So, the model may not perform well in varied real-world conditions because it has not been exposed to enough different scenarios during training.

For example, let’s consider training a pedestrian detection model with thousands of frames from a single surveillance camera feed. The model learns to detect pedestrians in that specific scene but fails in different environments.

Missing or Incorrect Labels

Missing labels occur when some objects in an image are not annotated. Let’s consider an image with multiple cars where only a few are labeled. In this case, the model might learn to ignore some cars by considering it as a negative class. This will reduce the model accuracy.

Incorrect labels occur when annotations are inaccurate or incorrectly classified. Some images may be annotated as incorrect classes.

Models trained on data with missing or incorrect labels will learn incorrect patterns, leading to higher error rates. These errors lead to misleading training data, which can severely impact model performance.

Training-Validation Overlap

Training set should be different from the validation set. If we use same images in both the training and validation sets, we encourage our model to memorize.

It will show a good accuracy on the validation set, but actually it is not. It’s a fake accuracy. Validation set should be completely different from the training set.

Inconsistent Labeling

There are no fixed guidelines to say how much of an object should be visible, to be called an object. Since we should maintain the consistency throughout our dataset, if the 50% of the object is visible, then we can consider it as an object.

Whatever the assumptions we take when annotating, we should apply it in all cases.

Also, we need to understand what should be annotated. We should only annotate the objects that we need and avoid including irrelevant objects when annotating.

Too Small Dataset

A dataset is considered too small when it doesn’t contain enough images or labeled instances of objects to effectively train an object detection model.

Insufficient data can prevent the model from learning the necessary features and variations of the objects it needs to detect. Therefore, we need to collect a sufficiently large and diverse dataset to train robust models.

Tips💡

Before starting to train the model, Visualize the dataset to get familiar with it. Also, should check the annotations to ensure dataset doesn’t contain the above mistakes.

Avoid adding too many frames from the same video to your dataset. Instead, add images with different perspectives, lightning conditions and angles.

Additionally, data augmentation techniques can be used to artificially increase the diversity of your dataset. This includes changing brightness, contrast, rotation, and applying transformations to existing frames.

Finally, use different color codes to annotate different classes.

Conclusion

Creating a high-quality dataset is essential for model performance in object detection. By avoiding common mistakes such as class imbalance, oversampling frames, missing or incorrect labels, training-validation overlap, inconsistent labeling, and having too small a dataset, we can significantly enhance the performance of our models.

Remember, attention to detail and consistent quality control are key to successful object detection.