DeepDrive Project Releases Dataset for Training Autonomous Vehicles


Volvo's autonomous car in a motor show in Sao Paulo, Brazil in 2014. / Photo by: Mariordo via Wikimedia Commons


Technologies for autonomous vehicles must have a high system efficiency to ensure passenger safety. To support this effort, experts at the University of California in Berkeley released a large dataset designed for self-driving software development.

The dataset is a part of the Berkeley DeepDrive project and consists of video clips, photos, and annotations that are useful in training software created for AVs.

1. Video Data: The dataset has at least 100,000 high-definition video sequences equivalent to more than 1,100 hours of driving experience. The videos are set in a variety of times in a day and in different kinds of weather conditions. The clips also contain necessary details like GPS locations and timestamps.

2. Road Object Detection: This part of the dataset is important in educating AVs about particular obstacles found in roads and traffic. DeepDrive contains 100,000 photos of people, buses, trucks, motors, cars, trains, and more with 2D box annotations.

3. Instance Segmentation: The images in the dataset also have diversity to help instruct AVs in identifying road details. About 10,000 images have been classified as either pixel-level or rich-level instances.

4. Driveable Area: Autonomous vehicles should be equipped with software that easily recognizes pathways which are driveable. The dataset has 100,000 images to guide AV software in making complicated driving decisions.

5. Lane Markings: Each lane marking on the road is identified and classified by AV software when making a road decision. The project offers 100,000 images of several types of lane markings and relevant notes.

"To achieve a rich annotation at scale, we found that existing tooling was insufficient, and therefore develop novel schemes to annotate driving data more efficiently and flexibly than the previous method. Current tools are difficult to deploy at scale and are rarely extensible to new tasks or data-structures," explained researchers of the project.