Katerina Fragkiadaki 3 D Vision with 3 D View Predictive Neural Scene representations

Views: 4

September 29th, 2020. MIT CSAIL Abstract: Current stateoftheart CNNs localize rare object categories in internet photos, yet, they miss basic facts that a twoyearold has mastered: that objects have 3D extent, they persist over time despite changes in the camera view, they do not 3D intersect, and others. We will discuss models that learn to map 2D and 2. 5D images and videos into amodal completed 3D feature maps of the scene and the objects in it by predicting views. We will show the proposed models learn object permanence, have objects emerge in 3D without human annotations, support grounding of language in 3D visual simulations, and learn intuitive physics that generalize across scene arrangements and camera configurations. In this way, they overcome many limitations of 2D CNNs for video perception, model learning and language grounding. Bio: Katerina Fragkiadaki is an Assistant Professor in the Machine Learning Department in Carnegie Mellon University. She received her from University