[img] [img]
World-wide Image Localization
Student project: EPFL Computer Vision Lab, Fall semester 2019.


In this project we want to tackle one of the fundamental problems in Computer Vision: finding the precise location where an image was taken. Given an image and a representation of the world, the goal is to find the exact position and orientation of the viewer, which is crucial to enable applications such as Augmented Reality (AR). This is a very challenging problem due to large differences in viewpoint and illumination, partial occlusions, and its sheer scale.

Most Computer Vision problems, such as object detection, human pose estimation, or segmentation, are currently dominated by learned solutions: specifically, dense convolutional neural networks which process entire images, and produce per-pixel estimates. However, general pose estimation remains an outlier due to its unconstrained nature, and the state of the art solutions still rely on local features, which allow us to establish correspondences between images and then use them to estimate the pose.

Historically, this has been done with hand-crafted features, such as SIFT [1]. There is, however, a large margin for improvement in this area. The Computer Vision lab at EPFL has pioneered multiple works in this direction over the past few years [2,3]. In this project, we will build on this work, leveraging our existing frameworks and in-house datasets, identify failure cases in real-world settings, and find ways to solve them by exploiting additional cues, such as depth data or semantics.


For other projects available at the Computer Vision Lab, please visit our website.


This project is a collaboration with Google Zurich. It is aimed at first-year PhD students. The candidate should be proficient in Python. Experience with Computer Vision, Machine Learning, and libraries such as Pytorch or Tensorflow is a plus. The project is 30% theory, 30% implementation and 40% experimentation.


For further information, please send us an e-mail: