Data-centric AI
Search by



      Finding Label Errors in Autonomous Vehicle Data With Learned Observation Assertions

      Published on
      Daniel Kang, Nikos Arechiga, Sudeep Pillai, Peter D Bailis, Matei Zaharia

      ML is being deployed in complex, real-world scenarios where errors have impactful consequences. As such, thorough testing of the ML pipelines is critical. A key component in ML deployment pipelines is the curation of labeled training data, which is assumed to be ground truth. However, in our experience in a large autonomous vehicle development center, we have found that labels can have errors, which can lead to downstream safety risks in trained models. To address these issues, we propose a new abstraction, learned observation assertions, and implement it in a system, Fixy . Fixy leverages existing organizational resources, such as existing labeled datasets or trained ML models, to learn a probabilistic model for finding errors in labels. Given user-provided features and these existing resources, Fixy learns priors that specify likely and unlikely values (e.g., a speed of 30mph is likely but 300mph is unlikely). It then uses these priors to score labels for potential errors. We show Fixy can automatically rank potential errors in real datasets with up to 2× higher precision compared to recent work on model assertions and standard techniques such as uncertainty sampling.

      This video is from the NeurIPS 2021 Data-centric AI workshop proceedings.

      Join the Data-centric AI Movement

      We want to share your Data-centric AI story. Fill out this Google form so we can feature your work!



      © 2022 Data-centric AI