WoodScape comprises four surround-view cameras and nine tasks, including segmentation, depth estimation, 3D bounding box detection, and a novel soiling detection. Semantic annotation of 40+ classes at the instance level is provided for over 10,000 images. With WoodScape, we would like to encourage the community to adapt computer vision models for the fisheye camera instead of using naive rectification.
One of the main goals of this dataset is to encourage the research community to develop vision algorithms natively on fisheye images without undistortion. There are very few public fisheye datasets and none of them provide semantic segmentation annotation. Fisheye is particularly beneficial to automotive low speed manoeuvring scenarios such as parking where accurate full coverage near field sensing can be achieved with just four cameras.
Surround view systems have at least four cameras rigidly connected to the body of the car. Pless did pioneering work in deriving a framework for modeling a network of cameras as one, this approach is useful for geometric vision algorithms like visual odometry. However, for semantic segmentation algorithms, there is no literature on joint modeling of rigidly connected cameras. In addition, most cameras focus on single camera with fixed intrinsics and extrinsics and we provide large variations in these in our dataset.
Autonomous driving has various vision tasks and most of the work has been focused on solving individual tasks independently. However, there is a recent trend to solve tasks using a single multi-task model to enable efficient reuse of encoder features and also provide regularization while learning multiple tasks. However, in these cases, only the encoder is shared and there is no synergy among decoders. Existing datasets are primarily designed to facilitate task-specific learning and they don’t provide simultaneous annotation for all the tasks. We have designed our dataset so that simultaneous annotation is provided for various tasks with some exceptions due to practical limitations of optimal dataset design.
Most public automotive image datasets capture using off the shelf non-automotive grade cameras. We have used automotive grade cameras that are already deployed in series production vehicles and a state of the art ground truth system to capture the dataset. The fisheye camera setup in terms of camera vehicle integration and pose mimics series production vehicle configuration.
Non-intrusive privacy anonymization:
WoodScape has publicly collected image data from several countries and there is a significant risk of violating privacy regulations. Anonymizing personally identifiable information like faces and license plates with traditional approaches like pixelating causes artifacts in the image and can have a significant negative impact on the quality of the trained model. To tackle the dilemma, we made use of brighter AI’s Deep Natural Anonymization. The technology automatically generates synthetic, irreversible replacements that protect peoples’ identities while keeping the value of the data for machine learning. When comparing a model that is trained on original image data to a model that is trained on naturally anonymized image data, brighter AI proved that using Deep Natural Anonymization has no significant impact on the accuracy of the model. Read more about their analysis in this white paper.
Our WoodScape dataset provides labels for several autonomous driving tasks including semantic segmentation, monocular depth estimation, object detection (2D & 3D bounding boxes), visual odometry, visual SLAM, motion segmentation, soiling detection and end-to-end driving (driving controls). In addition to providing fisheye data, we provide data for many more tasks than is typical (nine in total), providing completely novel tasks such as soiled lens detection. In terms of recognition tasks, we provide labels for up to forty classes.
2D Bounding Box
3D Bounding Box
Odometry & SLAM
How was the data collected?
Our diverse dataset originates from distinct geographical locations in Europe. While the majority of data was obtained from saloon vehicles there is a significant subset from a sports utility vehicle ensuring a strong mix in sensor mechanical configurations. Driving scenarios are divided across the highway, urban driving and parking use cases.
For the recordings three different cars were used, each with a different setup. Relevant vehicle’s mechanical data (e.g. wheel circumference, wheel base) are included. High-quality data is ensured via quality checks at all stages of the data collection process. Intrinsic and extrinsic calibrations are provided for all sensors as well as timestamp files to allow synchronization of the data.
Images are provided at 1 Megapixel 24-bit resolution and videos are uncompressed at 30fps ranging in duration from 30s to 120s. The dataset also provides a set of synthetic data using accurate models of the real cameras, enabling investigations of additional tasks. The camera has a HDR sensor with a rolling shutter and a dynamic range of 120 dB. It has features including black level correction, auto-exposure control, auto-gain control, lens shading (optical vignetting) compensation, gamma correction and automatic white balance for color correction.
The laser scanner point cloud provided in our data set is accurately preprocessed to provide a denser point cloud ground truth for tasks such as depth estimation and visual SLAM.
Vehicle odometry is derived from dual antenna dGNSS aided INS hardware with RTK position accuracy. This data is postprocessed to obtain the maximum accuracy possible from this configuration.