Fully Sparse 3D Panoptic Occupancy Prediction

programming
New method SparseOcc improves autonomous driving occupancy prediction with efficient sparse representation and instance differentiation, achieving high accuracy and real-time speed.
Authors

Haisong Liu

Haiguang Wang

Yang Chen

Zetong Yang

Jia Zeng

Li Chen

Limin Wang

Published

December 28, 2023

Major Takeaways:

  1. SparseOcc proposes a novel fully sparse panoptic occupancy network for 3D occupancy prediction in the context of autonomous driving. It leverages the inherent sparsity of the scene and ensures instance-awareness, achieving a mean Intersection over Union (mIoU) of 26.0 on the Occ3D-nus dataset at a real-time inference speed of 25.4 FPS.

  2. The network consists of a sparse voxel decoder to reconstruct sparse 3D geometry and a mask transformer using sparse instance queries to predict object instances in the sparse 3D space.

  3. SparseOcc demonstrates effectiveness in incorporating temporal modeling from preceding frames, achieving a mIoU of 30.9 without sacrificing real-time inference speed.

Abstract

Occupancy prediction is critical in the field of autonomous driving. Prior methods often use dense 3D volumes, disregarding scene sparsity. SparseOcc proposes a fully sparse panoptic occupancy network to address this issue, achieving real-time inference speeds.

Introduction

  • Vision-centric 3D occupancy prediction aims to divide 3D scenes into structured grids with assigned labels indicating occupancy.

SparseOcc

  • Three modules, including an image encoder, a sparse voxel decoder, and a mask transformer, form the vision-centric occupancy model.
  • The sparse voxel decoder reconstructs sparse 3D geometry, achieving real-time inference speed and high mIoU.
  • The mask transformer utilizes sparse instance queries for semantic and instance distinction.

Panoptic Occupancy Benchmark

  • Utilization of object bounding boxes from 3D detection task for panoptic occupancy ground truth, incorporating eight instance and ten semantic categories.

Experiments

  • Evaluation on Occ3D-nus dataset illustrates the superiority of SparseOcc, achieving high mIoU with a smaller backbone and resolution.
  • A lite version of SparseOcc maintains high performance at a much faster speed.

Ablations

  • Comparative and ablation studies verify the effectiveness of each module in SparseOcc, validating the architecture’s robustness and efficiency.

Limitations

  • SparseOcc’s limitations are discussed, including the reliability of ground truth and accumulative errors.

Conclusion

  • SparseOcc presents a significant advancement in fully sparse 3D occupancy prediction, offering state-of-the-art performance while maintaining real-time inference speed.

Critique

  • The paper does not address the potential impact of environmental conditions or complex scenarios on the performance of SparseOcc.
  • The limitations section lacks a discussion of potential solutions to the identified limitations, leaving room for further exploration and development.

Appendix

Model gpt-3.5-turbo-1106
Date Generated 2024-02-26
Abstract http://arxiv.org/abs/2312.17118v1
HTML https://browse.arxiv.org/html/2312.17118v1
Truncated False
Word Count 7015