The impact of data quality on model performance by Daniel Langkilde
A lot of focus in the machine learning community is on the choice of algorithm. This talk will instead focus on the dataset. While most people by now have realized that dataset quality is critical for the success of machine learning projects, most teams still struggle to quantify data quality. This talk will provide handson examples of how to quantify dataset quality, and also reason about the impact various types of quality issues can be expected to have on model performance. Daniel Langkilde is CEO and cofounder of Annotell, which provides the analytics and annotation platform used to ensure the performance of autonomous vehicle perception systems. He has focused for almost 10 years on the relationship between data quality and machine learning product performance. He has an M. Sc. in Engineering Mathematics and has been a Visiting Scholar at both UC Berkeley and MIT. Before starting Annotell he was Team Lead for Collection Analysis at Recorded Future. Besides that, he is also on the Board of Directors a
|
|