19 - Self-supervised Learning for 3D Shape Analysis [ID:34618]

50 von 292 angezeigt

Thank you. Thank you for the nice introduction. It's a pleasure to be here and to contribute

the presentation to this wonderful seminar series. I should say for me this presentation

is entirely new. It's the first time that I ever talk about this line of work and it's

quite a fascinating line of work. In contrast to what I originally announced, this talk

will not be on 3D reconstruction but solely on 3D shape analysis. And I should say upfront

the work that I'm presenting is very much not really my work but the work of students

of ours, current and former students. And so in particular these are Marvin Eisenberger,

Isim Toker and Zora Lehner who did these three did most of the work that I'm going to present

in the following. Where should I start? Well, this is about deep learning. So let's start

with deep learning. I think this is a slide that should be familiar with to most of you.

If you've looked into the field of deep learning, this is at least for me what marks the breakthrough

of deep neural networks. And this was the ImageNet challenge that Fei-Fei Li and her

students set up recognizing objects in images. You're quite familiar I guess with this challenge.

And at the time around 2011 on that benchmark that they put out, the best performing methods

made on whatever scale 26 errors, incorrect classifications. Whereas humans in terms of

seeing what is in an image, airplanes, cars, bicycles, et cetera, on the same data set

only made five mistakes on average. And so there was a huge gap between the performance

of computer vision algorithms and the performance of human observers. And it seemed at the time

pretty much impossible to ever achieve human level performance on this task, which is a

task I think that humans were, one could argue, designed for recognizing things in images.

But then something surprising happened. Then Alex Krzyzewski proposed this deep neural

network architecture. Neural networks have been around for decades, but this one was

significantly deeper than previous ones and more sophisticated in its architecture with

multiple layers, et cetera. I think it's an architecture that everyone knows it's not.

It's called the AlexNet. And what he achieved in 2012 was 16, an error of roughly 16. So

a dramatic drop in error, a dramatic boost in performance. And in the wake of this work,

there's a lot of deep network architectures, further refined deeper networks, et cetera,

to a level where now the deep network-based approaches, I'm saying now this is 2016, it

isn't even fully up to date here, but they make fewer errors than humans. And this I

find is quite impressive and to me far more impressive than say computers playing chess,

right? Chess is a game you can say it's now a surprise that a machine is better at chess

because humans were not designed by nature to play chess, but you could argue they were

designed to recognize objects and images. And so this for me is the real breakthrough

in artificial intelligence that was achieved here on this benchmark. In the wake of this

work also, the community focused on, and including our team, we focused on how can we carry the

success of deep networks beyond object recognition to a more low-level image processing tasks

like optical flow estimation. And there we proposed this technique called flow net where

we basically train a network to predict optical flow from two given images. And we showed

also in this video object segmentation paper that you can train neural networks to track

pixel accurately an object across videos in very challenging sequences such as this one.

We also showed more recently that you can leverage the power of deep learning to boost

visual simultaneous localization and mapping and get completely unprecedented precision

and robustness for large-scale mapping from just a single moving camera. And most recently,

this is actually this year, upcoming next week at CVPR, we showed that you can train

neural networks to recover from just one single moving camera, dense and faithful reconstructions

of the world computed with a suitably trained deep network. So all of this are great achievements.

I think we see that deep learning can not only be deployed for classification as it

was traditionally designed, but for all sorts of data analysis, challenges, segmentation,

neural flow, reconstruction, etc. What I want to talk about today, though, is how to kind

of extend the power of deep networks to the analysis of 3D shapes. This is a topic that's

Teil einer Videoserie :

AG Mathematics of Deep Learning

Zugänglich über

Offener Zugang

Dauer

00:34:20 Min

Aufnahmedatum

2021-06-18

Hochgeladen am

2021-06-18 23:08:07

Sprache

en-US

Daniel Cremers on "Self-supervised Learning for 3D Shape Analysis":

While neural networks have swept the field of computer vision and replaced classical methods in most areas of image analysis and beyond, extending their power to the domain of 3D shape analysis remains an important open challenge. In my presentation, I will focus on the problems of shape matching, correspondence estimation and shape interpolation and develop suitable deep learning approaches to tackle these challenges. In particular, I will focus on the difficult problem of computing correspondence and interpolation for pairs of shapes from different classes — say a human and a horse — where traditional isometry assumptions no longer hold.

Tags

Per RSS abonnieren