亚洲精品92内射,午夜福利院在线观看免费 ,亚洲av中文无码乱人伦在线视色,亚洲国产欧美国产综合在线,亚洲国产精品综合久久2007

?
Java知識分享網(wǎng) - 輕松學習從此開始!????

Java知識分享網(wǎng)

Java1234官方群25:java1234官方群17
Java1234官方群25:838462530
        
SpringBoot+SpringSecurity+Vue+ElementPlus權(quán)限系統(tǒng)實戰(zhàn)課程 震撼發(fā)布        

最新Java全棧就業(yè)實戰(zhàn)課程(免費)

AI人工智能學習大禮包

IDEA永久激活

66套java實戰(zhàn)課程無套路領(lǐng)取

鋒哥開始收Java學員啦!

Python學習路線圖

鋒哥開始收Java學員啦!
當前位置: 主頁 > Java文檔 > 人工智能AI >

Swin3D:一個用于3D室內(nèi)場景理解的預先訓練的Transformer主干 PDF 下載


分享到:
時間:2025-05-31 11:01來源:http://sh6999.cn 作者:轉(zhuǎn)載  侵權(quán)舉報
Swin3D:一個用于3D室內(nèi)場景理解的預先訓練的Transformer主干
失效鏈接處理
Swin3D:一個用于3D室內(nèi)場景理解的預先訓練的Transformer主干  PDF 下載

 
 
相關(guān)截圖:
 

主要內(nèi)容:
 

 

. Introduction
Pretrained backbones with fine-tuning have been widely
applied to various 2D vision and NLP tasks [132103],
where a backbone network pretrained on a large dataset is
concatenated with task-specific back-end and then fine-tuned
for different downstream tasks. This approach demonstrates
*
Interns at Microsoft Research Asia. †Contact person.
its superior performance and great advantages in reducing
the workload of network design and training, as well as the
amount of labeled data required for different vision tasks.
In the work, we present a pretrained 3D backbone, named
SWIN3D, for 3D indoor scene understanding tasks. Our
method represents the 3D point cloud of an input 3D scene as
sparse voxels in 3D space and adapts the Swin Transformer
[30] designed for regular 2D images to unorganized 3D
points as the 3D backbone. We analyze the key issues that
prevent the na¨?ve 3D extension of Swin Transformer from
exploring large models and achieving high performance,
i.e., the high memory complexitythe ignorance of signal
irregularity. Based on our analysis, we develop a novel
3D self-attention operator to compute the self-attentions of
sparse voxels within each local window, which reduces the
memory cost of self-attention from quadratic to linear with
respect to the number of sparse voxels within a window and
computes efficiently; enhances self-attention via capturing
various signal irregularities by our generalized contextual
relative positional embedding [4826].
The novel design of our SWIN3D backbone enables us to
scale up the backbone model and the amount of data used
for pretraining. To this end, we pretrained a large SWIN3D
model with 60M parameters via a 3D semantic segmenta
tion task over a synthetic 3D indoor scene dataset [60] that
includes 21K rooms and is about ten times larger than the
ScanNet dataset. After pretraining, we cascade the pretrained
SWIN3D backbone with task-specific back-end decoders
and fine-tune the models for various downstream 3D indoor
scene understanding tasks.
 


 

------分隔線----------------------------
?
鋒哥公眾號


鋒哥微信


關(guān)注公眾號
【Java資料站】
回復 666
獲取 
66套java
從菜雞到大神
項目實戰(zhàn)課程

鋒哥推薦