Fully Sparse 3D Panoptic Occupancy Prediction
Major Takeaways:
SparseOcc proposes a novel fully sparse panoptic occupancy network for 3D occupancy prediction in the context of autonomous driving. It leverages the inherent sparsity of the scene and ensures instance-awareness, achieving a mean Intersection over Union (mIoU) of 26.0 on the Occ3D-nus dataset at a real-time inference speed of 25.4 FPS.
The network consists of a sparse voxel decoder to reconstruct sparse 3D geometry and a mask transformer using sparse instance queries to predict object instances in the sparse 3D space.
SparseOcc demonstrates effectiveness in incorporating temporal modeling from preceding frames, achieving a mIoU of 30.9 without sacrificing real-time inference speed.
Abstract
Occupancy prediction is critical in the field of autonomous driving. Prior methods often use dense 3D volumes, disregarding scene sparsity. SparseOcc proposes a fully sparse panoptic occupancy network to address this issue, achieving real-time inference speeds.
Introduction
- Vision-centric 3D occupancy prediction aims to divide 3D scenes into structured grids with assigned labels indicating occupancy.
SparseOcc
- Three modules, including an image encoder, a sparse voxel decoder, and a mask transformer, form the vision-centric occupancy model.
- The sparse voxel decoder reconstructs sparse 3D geometry, achieving real-time inference speed and high mIoU.
- The mask transformer utilizes sparse instance queries for semantic and instance distinction.
Panoptic Occupancy Benchmark
- Utilization of object bounding boxes from 3D detection task for panoptic occupancy ground truth, incorporating eight instance and ten semantic categories.
Experiments
- Evaluation on Occ3D-nus dataset illustrates the superiority of SparseOcc, achieving high mIoU with a smaller backbone and resolution.
- A lite version of SparseOcc maintains high performance at a much faster speed.
Ablations
- Comparative and ablation studies verify the effectiveness of each module in SparseOcc, validating the architecture’s robustness and efficiency.
Limitations
- SparseOcc’s limitations are discussed, including the reliability of ground truth and accumulative errors.
Conclusion
- SparseOcc presents a significant advancement in fully sparse 3D occupancy prediction, offering state-of-the-art performance while maintaining real-time inference speed.
Critique
- The paper does not address the potential impact of environmental conditions or complex scenarios on the performance of SparseOcc.
- The limitations section lacks a discussion of potential solutions to the identified limitations, leaving room for further exploration and development.
Appendix
Model | gpt-3.5-turbo-1106 |
Date Generated | 2024-02-26 |
Abstract | http://arxiv.org/abs/2312.17118v1 |
HTML | https://browse.arxiv.org/html/2312.17118v1 |
Truncated | False |
Word Count | 7015 |