Yannis Kalantidis(Facebook AI, California)
hosted by Christian Theobalt
"Learning efficient representations for image and video understanding"
Two important challenges in image and video understanding are designing more efficient deep Convolutional Neural Networks and learning models that are able to achieve higher-level understanding. In this talk, I will present some of my recent works towards tackling these challenges. Specifically, I will introduce the Octave Convolution [ICCV 2019], a plug-and-play replacement for the convolution operator that exploits the spatial redundancy of CNN activations and can be used without any adjustments to the network architecture. I will also present the Global Reasoning Networks [CVPR 2019], a new approach for reasoning over arbitrary sets of features of the input, by projecting them from a coordinate space into an interaction space where relational reasoning can be efficiently computed. The two methods presented are complementary and achieve state-of-the-art performance on both image and video tasks. Aiming for higher-level understanding, I will also present our recent works on vision and language modeling, specifically our work on learning state-of-the-art image and video captioning models that are also able to better visually ground the generated sentences with [CVPR 2019] or without [arXiv 2019] explicit localization supervision. The talk will conclude with current research and a brief vision for the future.
Bio: Yannis Kalantidis was a research scientist at Facebook AI in California for the last three years. He got his PhD on large-scale visual search and clustering from the National Technical University of Athens in 2014. He was a postdoc and research scientist at Yahoo Research in San Francisco for from 2015 until 2017, leading the visual similarity search project at Flickr and participated in the Visual Genome dataset efforts with Stanford. At Facebook Research he was part of the video understanding group, conducting research on representation learning, video understanding and modeling of vision and language. He is further leading the Computer Vision for Global Challenges Initiative (cv4gc.org) that has organized impactful workshops at top venues like CVPR and ICLR. Personal website: https://www.skamalas.com/
|Time:||Wednesday, 18.03.2020, 10:00|
|Place:||SB E1 5 room 029 NOTE: Owing to the coronavirus situation, we ask if at all possible that interested parties attend the talk remotely (see instructions below). People who are unable to attend remotely may come listen to the talk in the Institute, but note that we will impose a hard limit of a maximum of 15 people physically present in the room.|
|Video:||videocast to KL G26 room 111|