Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view 180°. However, they come at the cost of strong non-linear distortions which require more complex algorithms. We hope to encourage further research in this area by releasing our WooScape dataset.

The following list is an incomplete list of research papers using the WoodScape dataset.


This work presents a multi-task visual perception network on unrectified fisheye images to enable the vehicle to sense its surrounding environment. It consists of six primary tasks necessary for an autonomous driving system: depth estimation, visual odometry, semantic segmentation, motion segmentation, object detection, and lens soiling detection. We demonstrate that the jointly trained model performs better than the respective single task versions.


In this paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficult in practice, but self-supervised learning approaches show promising results and could potentially overcome the problem. We present a novel self-supervised scale-aware framework for learning Euclidean distance and ego-motion from raw monocular fisheye videos without applying rectification. While it is possible to perform a piece-wise linear approximation of fisheye projection surface and apply standard rectilinear models, it has its own set of issues like re-sampling distortion and discontinuities in transition regions.


This paper introduces a novel multi-task learning strategy to improve self-supervised monocular distance estimation on fisheye and pinhole camera images. Firstly, we introduce a novel distance estimation network architecture using a self-attention-based encoder coupled with robust semantic feature guidance to the decoder that can be trained in a one-stage fashion. Secondly, we integrate a generalized robust loss function, which improves performance significantly while removing the need for hyperparameter tuning with the reprojection loss. Finally, we reduce the artifacts caused by dynamic objects violating static world assumptions using a semantic masking strategy.


In this paper, we formulate a unified self-supervised scale-aware framework for learning depth, Euclidean distance, and visual odometry from raw monocular videos without applying rectification. We demonstrate a level of precision on the unrectified KITTI dataset with barrel distortion comparable to the rectified KITTI dataset. Our approach does not suffer from a reduced field of view and avoids computational costs for rectification at inference time. To further demonstrate the general applicability of the proposed framework, we apply it to wide-angle fisheye cameras with a 190° horizontal field of view.


Object detection is a comprehensively studied problem in autonomous driving. However, it has been relatively less explored in the case of fisheye cameras. The standard bounding box fails in fisheye cameras due to the strong radial distortion, particularly in the image's periphery. We explore better representations like oriented bounding box, ellipse, and generic polygon for object detection in fisheye images in this work. We use the IoU metric to compare these representations using accurate instance segmentation ground truth. We design a novel curved bounding box model that has optimal properties for fisheye distortion models.