Nicola Croce, Marcos Nieto
Over the last decade, developments in computer vision tasks have been driven by image, video, and multimodal benchmark datasets fueling the growth of machine learning methods for object detection, classification, and scene understanding. Such advances have, however, created static, goal-specific and heterogeneous datasets, with little to none emphasis on the used taxonomies and semantics behind the class definitions, making them ill-defined, and hardly mappable to each others. This approach hinders and limits the long-term usability of datasets, their intercompatibility, extensibility, and the ability to repurpose them. In this work we propose a new methodology for data labeling, which we call Ontolabeling, that detaches data structure from semantics, creating two data model layers. The first layer organizes spatio-temporal labels for multi-sensor data, while the second layer makes use of ontologies to structure, organize, maintain, extend and repurpose the semantics of the annotations.