← Back to blog

Introducing Advanced Dataset Filters for Efficient Data Management | Supervisely Tutorial

Lisa Uspenyeva
β€’
β€’

Learn how to use conditional filters and build custom queries to your training data. Easily search, filter, sample and explore your Computer Vision datasets.

Introducing Advanced Dataset Filters for Efficient Data Management | Supervisely Tutorial

Supervisely team is happy to introduce new advanced dataset filtering capabilities developed to improve data filtering, search, querying and management for your custom Computer Vision training datasets.

In this complete tutorial, you will learn how to configure conditional filters, combine them into the groups using logical operators and use them for building training data pipelines. This comprehensive filtering toolkit opens up a new dimension in your training data preparation and annotation workflows.

Filtering modules have been significantly upgraded and now offer a wealth of new features:

  1. Easy to use - all filters are self-explanatory so they are suitable for any level from beginners just starting their CV journey to advanced data scientists.

  2. Works in real time so you get the results in less than a second.

  3. Optimized to handle huge datasets with millions of images and annotations.

  4. Provides quick data preview with main statistics.

  5. Available in both the training data dashboard and inside the annotation tool.

  6. Highly flexible - configure filtering operations based on a variety of criteria based on annotation classes, tags, labeling jobs, image statuses, number of objects, authors, issues, file names, date and time, and more.

  7. Quickstart with predefined presets for the most popular tasks.

  8. Go beyond simple previewing - perform actions on your filtered results (copy, move, delete, create labeling jobs, annotate, etc).

Let's take a closer look at each feature of our updated filters!

Filters page breakdownFilters page breakdown

Video tutorial

In this 3-minute video guide, you will learn how to use advanced conditional filters for Computer Vision datasets, aiding in searching, filtering, and exploring annotated images of any size.

Discover ways to use:

  1. Search and Sort: The filter tab offers sorting options by any attribute like name, size, annotation count, labeling jobs, and associated tags.

  2. Utilization of filters: Users can apply preconfigured presets or create custom queries using a variety of filters.

  3. Practical applications: Finding images with specific objects and tags, filtering based on the exact number of objects

  4. Additional features: Remove or create labeling jobs, and use filters as a tool for Neural Network model evaluation. Filters can also be integrated into the labeling toolbox for dynamic data exploration during labeling tasks.

Find a needle in a haystack

Experience improved data querying with our advanced conditional filters, which allow users to quickly find specific images and annotations in datasets and move seamlessly from filtering to annotating. Our advanced, multi-level filter combinations allow you to uncover exactly the data you need, customized to your specific requirements.

Moreover, advanced filters are available in the images panel in the Image Annotation Toolbox.

Instant Pascal VOC 2012 dataset filtering with preview of search results

Instantly access real-time insights

Our filters operate in real-time, ensuring blazing-fast results for datasets of any size - even hundred of millions of images and annotations. Supervisely Database is optimized for production use cases and lets you navigate through your data with outstanding performance.

Quick data preview

Users can access specific images either by opening them directly or by applying filters and then starting annotation. When annotation is started after applying filters, the Image Labeling Toolbox automatically opens with only selected images that meet the specified criteria, allowing targeted annotation of the most relevant data.

Quick data preview in Image Annotation ToolQuick data preview in Image Annotation Tool

Even simple sorting provides valuable insights

Now, users can access and gain valuable insight from all project data in one place. All images in a project are organized into a single table with basic statistics into columns. By working with these columns, users can sort and extract relevant information. Let's explore some basic use cases that might be very useful in every annotation project:

Sort by Last Update: Sorting by "last update" column allows to quickly identify images with the most recent modifications, simplifying the viewing of updated annotations. So managers can track the annotation process and fresh updates in labeling.

Sort by Objects Count: Sorting by object count enables users to quickly find images with a large number of objects or completely unlabeled ones. It makes it easier to prioritize images for annotation or highlight outliers in training datasets.

Sorting data by columns

Sort or Filter by Dataset: Projects in Supervisely can contain multiple datasets (folders with images). Now user can preview all images or only keep images from specific datasets for analysis.

Search or Sort Images by Name: Simply search with regular expressions or sort by images name to arrange images in a convenient way, making it easier to find specific items.

Filter by datasets and quickly annotate the filtering results

Customized filters or default presets

Users have the ability to use both ready-made filter presets or build custom conditions according to their personal requirements and preferences. This allows to tailor search to specific tasks, providing more flexible and efficient dataset exploration and management.

Predefined filters

All Images: Allows you to display all images in the project without any limitations. It shows all images, regardless of the presence of annotations or issues. You can use it as an alternative way to navigate across your data.

With one object or more: Finds images that have at least one object. Useful for identifying images that already contain any annotations.

Unlabeled (without any objects): Finds images that don't have any annotations or objects. This can be useful for finding images that require labeling or reviewing.

Has Issues: Finds images that have any problems or mistakes that reviewer assigned to objects or images during labeling job. These can be images with incorrect annotation, data anomalies, outliers, etc.

Labeled by me: Finds images that have been labeled by the current user. This can be useful for quick access to images that the user has worked on previously.

Preconfigured filters and customizable filter parametersPreconfigured filters and customizable filter parameters

Custom filters

Images Name: Allows the user to find images whose names begin with "img" and then contain any combination of characters between 0 and 1000.

Images Tag: Allows you to find images that include specific tags. The user can specify any tag value to search for.

Images Tags Author: Allows the user to find images that have been tagged by a particular tagging author.

Images Without Annotations: Allows you to find images that do not contain any annotations or labels.

Objects Class: Allows the user to customize the search for objects by their classes. You can select a concrete class of the object to search for.

Objects Tag: Allows to find objects that include specific tags. The user can specify any tag value to search for.

Objects Author: Allows the user to find objects that have been labeled by a particular author.

Issues: Allows you to find images with any problems or mistakes in annotation.

Labeling Job: Allows you to find images involved in labeling jobs with a particular status. The user can specify any job status to search for (pending, annotated, accepted, rejected).

Tracking image status in labeling jobs

Users can easily view the status of each image within a labeling job. This information makes it easier to monitor annotation progress. Additionally, by clicking on the status of a labeling job, users can also access statistics on job activity, labeling time per object and total labeling time.

Furthermore, tables with detailed statistics are available for individual images, providing information such as labeler time in the labeling tool, editing durations, and objects counts. This comprehensive information ensures thorough tracking and analysis of the annotation process, enabling managers to access actual state of their annotation workflow.

Image job status and move to statistics by labeling jobImage job status and move to statistics by labeling job

Labeling job management

When selecting multiple images, users can quickly create labeling jobs or delete unnecessary data, allowing more precise data selection for labeling.

Applying actions to images of tomatoes from different datasetsApplying actions to images of tomatoes from different datasets

Data Operations: Copy and move

Users can easily copy or move images with annotation from filtering results to other datasets, streamlining training data management.

How to use advanced filters in Supervisely

Open the images project you are working and navigate to the Filter tab. Here, you can easily search among all images, subsample the desired dataset or configure filters and preview filtering results.

For creating custom group of conditional filters, simply click the Filter button. This action opens a new modal window that allows you to customize your filters based on datasets, names, image and object tags, classes, assignees, issues, or labeling job status. Fine-tune your view by including or excluding specific criteria like conditions on the number of objects (range), according to your preferences. For example, you can filter images with tag validation, that have more than 5 objects of class plant.

Use cases for advanced dataset filters

In this chapter you will explore some illustrative examples and frequent usecases on how filters can be used in real projects.

1. Find unique data with custom filters

You can easily find unique data by combining different filters. Let's look at an example using the Pascal VOC dataset. Suppose we need to find images from the "train" dataset that contain objects of classes buses and cars at the same time. The number of buses in the image should not exceed 5 and the number of cars should be 1 or more. Here are the steps you can follow:

  1. Apply simple filter to subsample images from "train" dataset.

  2. Add filter to find images with buses and set the maximum number of buses in the image to 5.

  3. Add another filter to find images with cars and set the minimum number of cars in the image to 1.

  4. Press Apply button to retrieve the data that satisfies all the given conditions and review the filtering results.

Combining filters to search for object classes "bus" and "car" in specific quantitiesCombining filters to search for object classes "bus" and "car" in specific quantities

2. Manage huge datasets at any scale with ease

Conditional dataset filters make managing large datasets simple. These filters help you to explore and identify images that can be merged into a new labeling job, moved or submitted to additional review and verification. Instead of using API and writing custom Python scripts, data annotation managers can quickly configure custom conditions and integrate them into their labeling pipelines in a few clicks.

Creating a new labeling job from unannotated images

3. Use filters to explore model predictions

Data scientists can use custom filters to evaluate predictions from custom Neural Networks. For example, Supervisely users can easily apply custom object detection model, save model predictions and further use them for analysis or as an initial data prelabeling. In that case, every object (bounding box) will have the tag Confidence with some value from 0 to 1. Thus you can create custom filter to find all images with the most or less confident predictions (e.g. "Confidence < 0.5"). Analyzing bounding boxes with low confidence levels can help to gain insights and better understand model performance and uncover the ways to improve it.

Or developers can leverage the Supervisely platform for exploratory data analysis, finding data outliers or possible errors in training data. It can be used to improve adaptive learning strategies by dynamically filtering images based on model feedback, performance metrics, or user-defined criteria to iteratively sample, annotate, and improve model performance.

Also filters can be utilized to make smaller training datasets from the large ones for model evaluation also known K-fold cross-validation training technique.

Searching for images in the training dataset with a low threshold and removing them

4. Labeling job tracking for team collaboration

Efficient team collaboration requires streamlined processes for tracking and managing annotation tasks. Supervisely's advanced dataset filtering capabilities optimize the job tracking process. By filtering datasets based on labeling jobs and their status and combining with other conditional filters, you can quickly find relevant images, explore unusual patters in your annotations, find hidden mistakes and perform simple yet effective quality assurance operations. This also improves visibility of job progress and ensures quick access to job activity and statistics.

Labeling job statuses indicates the progress of the annotation or the quality of the data annotation: for example, images with rejected status may require review or correction. Sorting by these statuses helps to quickly identify and respond to potential data quality issues. Optimized job tracking through dataset filtering enables your team to be more productive, resulting in faster annotation project completion.

Filters by annotated images that have an issue

Conclusion

With Supervisely's new advanced conditional dataset filters users can explore, manage and operate with their custom training datasets easy and quickly. Configuring custom conditions allows users to efficiently navigate through their data, uncover unique edge cases and optimize their annotation workflows, whether you're searching for images with annotations with specific properties, tracking labeling job progress or evaluating model performance.

Using this tutorial you can try advanced filters on your custom training datasets in Supervisely Computer Vision Platform.

. . .

Supervisely for Computer Vision

Supervisely is online and on-premise platform that helps researchers and companies to build computer vision solutions. We cover the entire development pipeline: from data labeling of images, videos and 3D to model training.

Get Supervisely for free

The big difference from other products is that Supervisely is built like an OS with countless Supervisely Apps β€” interactive web-tools running in your browser, yet powered by Python. This allows to integrate all those awesome open-source machine learning tools and neural networks, enhance them with user interface and let everyone run them with a single click.

You can order a demo or try it yourself for free on our Community Edition β€” no credit card needed!

. . .
Lisa Uspenyeva
About the author

Training data expert

Connect on LinkedIn