Fully Sparse 3D Panoptic Occupancy Prediction

programming

New method SparseOcc improves autonomous driving occupancy prediction with efficient sparse representation and instance differentiation, achieving high accuracy and real-time speed.

Authors

Haisong Liu

Haiguang Wang

Yang Chen

Zetong Yang

Jia Zeng

Li Chen

Limin Wang

Published

December 28, 2023

Major Takeaways:

SparseOcc proposes a novel fully sparse panoptic occupancy network for 3D occupancy prediction in the context of autonomous driving. It leverages the inherent sparsity of the scene and ensures instance-awareness, achieving a mean Intersection over Union (mIoU) of 26.0 on the Occ3D-nus dataset at a real-time inference speed of 25.4 FPS.
The network consists of a sparse voxel decoder to reconstruct sparse 3D geometry and a mask transformer using sparse instance queries to predict object instances in the sparse 3D space.
SparseOcc demonstrates effectiveness in incorporating temporal modeling from preceding frames, achieving a mIoU of 30.9 without sacrificing real-time inference speed.

Abstract

Occupancy prediction is critical in the field of autonomous driving. Prior methods often use dense 3D volumes, disregarding scene sparsity. SparseOcc proposes a fully sparse panoptic occupancy network to address this issue, achieving real-time inference speeds.

Introduction

Vision-centric 3D occupancy prediction aims to divide 3D scenes into structured grids with assigned labels indicating occupancy.

SparseOcc

Three modules, including an image encoder, a sparse voxel decoder, and a mask transformer, form the vision-centric occupancy model.
The sparse voxel decoder reconstructs sparse 3D geometry, achieving real-time inference speed and high mIoU.
The mask transformer utilizes sparse instance queries for semantic and instance distinction.

Panoptic Occupancy Benchmark

Utilization of object bounding boxes from 3D detection task for panoptic occupancy ground truth, incorporating eight instance and ten semantic categories.

Experiments

Evaluation on Occ3D-nus dataset illustrates the superiority of SparseOcc, achieving high mIoU with a smaller backbone and resolution.
A lite version of SparseOcc maintains high performance at a much faster speed.

Ablations

Comparative and ablation studies verify the effectiveness of each module in SparseOcc, validating the architecture’s robustness and efficiency.

Limitations

SparseOcc’s limitations are discussed, including the reliability of ground truth and accumulative errors.

Conclusion

SparseOcc presents a significant advancement in fully sparse 3D occupancy prediction, offering state-of-the-art performance while maintaining real-time inference speed.

Critique

The paper does not address the potential impact of environmental conditions or complex scenarios on the performance of SparseOcc.
The limitations section lacks a discussion of potential solutions to the identified limitations, leaving room for further exploration and development.

Appendix

Model	gpt-3.5-turbo-1106
Date Generated	2024-02-26
Abstract	http://arxiv.org/abs/2312.17118v1
HTML	https://browse.arxiv.org/html/2312.17118v1
Truncated	False
Word Count	7015