亚洲αv久久久噜噜噜噜噜,无码国产伦一区二区三区视频,亚洲精品国产一区二区三

Swin3D�Q�一个用�?D室内场景理解的预先训�l�的Transformer��d�� PDF 下蝲

转蝲自：(x��)http://www.python222.com/article/1272

相关截图�Q?/strong>

主要内容�Q?/strong>

. Introduction

Pretrained backbones with fine-tuning have been widely

applied to various 2D vision and NLP tasks [13, 2, 10, 3],

where a backbone network pretrained on a large dataset is

concatenated with task-specific back-end and then fine-tuned

for different downstream tasks. This approach demonstrates

*

Interns at Microsoft Research Asia. †Contact person.

its superior performance and great advantages in reducing

the workload of network design and training, as well as the

amount of labeled data required for different vision tasks.

In the work, we present a pretrained 3D backbone, named

SWIN3D, for 3D indoor scene understanding tasks. Our

method represents the 3D point cloud of an input 3D scene as

sparse voxels in 3D space and adapts the Swin Transformer

[30] designed for regular 2D images to unorganized 3D

points as the 3D backbone. We analyze the key issues that

prevent the na¨�\ve 3D extension of Swin Transformer from

exploring large models and achieving high performance,

i.e., the high memory complexity, the ignorance of signal

irregularity. Based on our analysis, we develop a novel

3D self-attention operator to compute the self-attentions of

sparse voxels within each local window, which reduces the

memory cost of self-attention from quadratic to linear with

respect to the number of sparse voxels within a window and

computes efficiently; enhances self-attention via capturing

various signal irregularities by our generalized contextual

relative positional embedding [48, 26].

The novel design of our SWIN3D backbone enables us to

scale up the backbone model and the amount of data used

for pretraining. To this end, we pretrained a large SWIN3D

model with 60M parameters via a 3D semantic segmenta

tion task over a synthetic 3D indoor scene dataset [60] that

includes 21K rooms and is about ten times larger than the

ScanNet dataset. After pretraining, we cascade the pretrained

SWIN3D backbone with task-specific back-end decoders

and fine-tune the models for various downstream 3D indoor

scene understanding tasks.

亚洲精品92内射,午夜福利院在线观看免费 ,亚洲av中文无码乱人伦在线视色,亚洲国产欧美国产综合在线,亚洲国产精品综合久久2007

最新Java全栈��׃��实战评��(免费)

AI人工��学习(f��n)大礼�?/h2>
IDEA�怹��Ȁ�z?/h2>

IDEA�怹��Ȁ�z?/h2>

66套java实战评��无套路领�?/h2>
锋哥开始收Java学员啦！

Python学习(f��n)路线�?/h2>

锋哥开始收Java学员啦！

Python学习(f��n)路线�?/h2>

Swin3D�Q�一个用�?D室内场景理解的预先训�l�的Transformer��d�� PDF 下蝲

Java1234官方��?5�Q?/td>
Java1234官方��?5�Q?/td>	838462530