初涉 Deep Drive Dataset

Berkeley 大学最近推出的针对自动驾驶的街景数据集，号称比 Cityscapes 数据量更大，可泛化性更好。

语义实例分割（Semantic Instance Segmentation）

数据集一共有 40 种物体类别

与 Cityscapes 的对比

街景数据来自 US 的城市

模型更熟悉美国的街景。

图片标签

时间：daytime, nighttime, dawn/dusk;

场景：Residential，High-way, City street, Parking lot, Gas station, Tunnel;

天气：Clear, Partly cloudy, Over-case, Rainy, Snowy, Foggy;

Label Maps

语义分割使用标签映射（Label Maps），不是训练索引（Training Indices）。

更高的可泛化性

使用 Dilate Residual Network （Hyper parameter 相同）测试两个数据集时发现下表的关系：

Train	Test	Accuracy
deepDriver	deepDriver	High
deepDriver	Cityscapes	Low
Cityscapes	deepDriver	Low
Cityscapes	Cityscapes	High

在同样的数据集下训练结果都很好，但交叉使用不同测试集时精度下降显著。使用 deepDriver 训练的模型在 Cityscapes 测试集上的表现虽然较差，但有部分训练结果比在特定场景训练的结果要好。这意味着该数据集涵盖场景更多，训练出的模型的可泛化性会比较好。

以上参考：https://arxiv.org/abs/1805.04687

数据集详情

文件结构：

bdd100k
|   seg
|    |  images
|    |    |  train
|    |    |  val
|    |    |  test
|    |  color_labels
|    |    |  train
|    |    |  val
|    |  labels
|    |    |  train
|    |    |  val

检查数据集完整性的 python3 脚本

import os
import sys 

if  len(sys.argv) !=  2:
    print (‘Usage: python checkdata.py <train|val>‘)
    exit(-1)

dataset_category = sys.argv[1]
if dataset_category not  in {‘train‘, ‘val‘}:
    print (f‘Invalid argument "{dataset_category}"‘)
    exit(-2)

data_size = 7000 if dataset_category == ‘train‘ else 1000

dir_root =  ‘.‘
dir_color = os.path.join(dir_root, ‘color_labels‘, dataset_category)
dir_imgs = os.path.join(dir_root, ‘images‘, dataset_category)
dir_label = os.path.join(dir_root, ‘labels‘, dataset_category)

color_names = os.listdir(dir_color)
img_names = os.listdir(dir_imgs)
label_names = os.listdir(dir_label)

assert len(color_names) ==  len(img_names) ==  len(label_names) == data_size

for i in range(len(color_names)):
    prefix_color = color_names[i].split(‘_‘)[0]
    prefix_img = img_names[i].split(‘.‘)[0]
    prefix_label = label_names[i].split(‘_‘)[0]
    assert prefix_color == prefix_img == prefix_label, f‘{prefix_color}, {prefix_img}, {prefix_label}‘

print (‘All Good!‘)

包含分割多边形信息的 Json 文件目前还没有公开，因此只能做segmentation，不能做 detection + segmentation。但是单纯的 detection 数据文件已经是提供好的，可以使用查看工具查看标注矩形框和三种图片标签（时间、场景、天气）

官方代码目前的坑

https://github.com/ucbdrive/bdd-data/issues/17

https://github.com/ucbdrive/bdd-data/issues/5

https://github.com/ucbdrive/bdd-data/issues/15

其中，#15 issue 目前还未解决。

Written with StackEdit.

原文地址：https://www.cnblogs.com/LexLuc/p/9653229.html

时间： 2024-11-01 15:00:33

初涉 Deep Drive Dataset

语义实例分割（Semantic Instance Segmentation）

与 Cityscapes 的对比

街景数据来自 US 的城市

图片标签

Label Maps

更高的可泛化性

数据集详情

官方代码目前的坑

初涉 Deep Drive Dataset的相关文章

数据集搜集整理

词组习语3057组

fashion datasets图像检索实践project

Machine and Deep Learning with Python

Growing Pains for Deep Learning

【深度学习Deep Learning】资料大全

Joint Deep Learning for Pedestrian Detection笔记

Classifying plankton with deep neural networks

[C4] Andrew Ng - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization