[问题] CUDA shared-memory

时间Tue Oct 3 10:12:05 2017

开发平台(Platform): (Ex: Win10, Linux, ...) WIN10 编译器(Ex: GCC, clang, VC++...)+目标环境(跟开发平台不同的话需列出) VC2017 额外使用到的函数库(Library Used): (Ex: OpenGL, ...) CUDA 9.0 问题(Question)：想透过 shared memory 来加速kernal的效能利用treadid 平行assign资料也有用__syncthreads 来同步但资料还是跟用回圈跑的不一样 (结果有错) 想请问大大们我的使用方式有错吗? 还有vc上可以单步执行来看CUDA变数吗? 喂入的资料(Input)：一维阵列的输入与输出指标预期的正确结果(Expected Output)： USE_SHARED_MEM = 0 与 = 1 data值要一样错误结果(Wrong Output)： github: https://github.com/ChiFang/question/blob/master/CUDA_SharedMem.cu USE_SHARED_MEM = 1 会导致最後结果错误，表示data值不一样 (後面程式完全一模一样) 程式码(Code)：(请善用置底文网页, 记得排版) #define USE_SHARED_MEM 1 __global__ void kernal_test(const int a_RangeUpScale, const int *a_CostData, int *a_Input) { // Get the work index of the current element to be processed int y = blockIdx.x*blockDim.x + threadIdx.x; //执行绪在阵列中对应的位置 #if USE_SHARED_MEM == 1 __shared__ int Buff[32]; #else int Buff[32]; #endif // Do the operation for (int x = 1; (x < g_ImgWidth_CUDA); x++) { int TmpPos = y*Area + (x-1)*a_RangeUpScale; #if USE_SHARED_MEM == 1 // Synchronize to make sure the sub-matrices are loaded before starting the computation __syncthreads(); if (threadIdx.x < 32) { Buff[threadIdx.x] = a_CostSmooth[TmpPos + threadIdx.x]; } // Synchronize to make sure the sub-matrices are loaded before starting the computation __syncthreads(); #else for (int cnt = 0; cnt < 32 ;cnt++) { Buff[cnt] = a_CostSmooth[TmpPos + cnt]; } #endif // use Buff to do something } } 补充说明(Supplement)： grid size = 8 block size = 135 所以thread id 一定会大於32 --

※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 114.34.230.27 ※ 文章网址: https://webptt.com/cn.aspx?n=bbs/C_and_CPP/M.1506996728.A.C64.html

1^F：→ johnjohnlin: do something 里面通常还要有一个 syncthread 10/03 10:30

2^F：→ hardman1110: 原因是什? 前面的同步不算吗? 困惑中= = 10/03 10:40

3^F：→ johnjohnlin: 你回圈绕回去的时候会写到 shared memory 10/03 10:44

4^F：→ hardman1110: Do something 之後就不会更改值了 10/03 11:27

5^F：→ hardman1110: 所以我才在一开始同步 10/03 11:28

6^F：→ hardman1110: 就算绕回去应该再同步一次不是吗？ 10/03 11:32

7^F：推 a1u1usul3: do something的时候有的thread提早做完先去改值了，有 10/03 12:16

8^F：→ a1u1usul3: 的thread还没做完需要用旧的值，但被改了 10/03 12:16

9^F：→ hardman1110: 所以我只要在使用前一刻同步就好罗？ 10/03 12:37

10^F：→ hardman1110: 还有在assign值前同步 10/03 13:00

※ 编辑: hardman1110 (114.34.230.27), 10/03/2017 13:24:40

11^F：→ hardman1110: 已尝试在assign前後都同步，但结果还是会错(晕 10/03 13:26

※ 编辑: hardman1110 (114.34.230.27), 10/03/2017 13:31:48

12^F：→ johnjohnlin: 那 do something 里面是不是有 break 之类的 10/03 14:00

13^F：→ johnjohnlin: BTW, blockDim.x = 135 是个很糟糕的选择，尽量避免 10/03 14:01

14^F：→ a1u1usul3: code还是贴在codepad吧 10/03 14:09

15^F：→ a1u1usul3: 然後你的code会不会逻辑上就错了 10/03 14:19

16^F：→ a1u1usul3: 不用sharedMeomry的时候，每个thread从自己的TmpPos拿 10/03 14:19

17^F：→ a1u1usul3: 32个元素进private memory，而TmpPos每个thread都不同 10/03 14:20

※ 编辑: hardman1110 (114.34.230.27), 10/03/2017 14:21:21

18^F：→ a1u1usul3: 结果用SharedMemory的时候32人从自己的TmpPos拿一个 10/03 14:22

19^F：→ a1u1usul3: 元素进SharedMemory 10/03 14:23

20^F：→ hardman1110: a1大已补上github好读版连结 10/03 14:24

21^F：→ hardman1110: 我这边纯粹想让多个thread 同时assign值甭跑回圈 10/03 14:25

22^F：→ a1u1usul3: SharedMemory的功能是让多个thread共用的资料不用重复 10/03 14:27

23^F：→ a1u1usul3: 你资料的并没有共用不是吗@@? 10/03 14:27

24^F：→ hardman1110: 我想通了~抱歉确实把y当执行绪切每个thread y不同 10/03 14:28

25^F：→ hardman1110: 纯共用的话感觉用register 就好阵列大小不大 10/03 14:30

26^F：→ a1u1usul3: thread内共用->register threads间共用->sharedMemory 10/03 14:31

27^F：→ a1u1usul3: Blocks间共用->GlobalMemory 10/03 14:32

28^F：→ a1u1usul3: 好像还有很潮的shuffle，threads间共用的样子 10/03 14:36

29^F：→ hardman1110: 要在加速的话好像还可以用surface memory来读写? 10/03 14:55

30^F：→ hardman1110: 感谢各位大大指点 10/03 14:56

	[问题/行为] 猫晚上进房间会不会有憋尿问题
	Re: [闲聊] 选了错误的女孩成为魔法少女 XDDDDDDDDDD
	[正妹] 瑞典一张
	[心得] EMS高领长版毛衣.墨小楼MC1002
	[分享] 丹龙隔热纸GE55+33+22
	[问题] 清洗洗衣机
	[寻物] 窗台下的空间
	[闲聊] 双极の女神1 木魔爵
	[售车] 新竹 1997 march 1297cc 白色四门
	[讨论] 能从照片感受到摄影者心情吗
	[狂贺] 贺贺贺贺贺！岛村卯月！总选举NO.1
	[难过] 羡慕白皮肤的女生
	阅读文章
	[黑特]
	[问题] SBK S1安装於安全帽位置
	[分享] 旧woo100绝版开箱!!
	Re: [无言] 关於小包卫生纸
	[开箱] E5-2683V3 RX480Strix 快睿C1 简单测试
	[心得] 苍の海贼龙地狱执行者16PT
	[售车] 1999年Virage iO 1.8EXi
	[心得] 挑战33 LV10 狮子座pt solo
	[闲聊] 手把手教你不被桶之新手主购教学
	[分享] Civic Type R 量产版官方照无预警流出
	[售车] Golf 4 2.0 银色自排
	[出售] Graco提篮汽座（有底座）2000元诚可议
	[问题] 请问补牙材质掉了还能再补吗?(台中半年内
	[问题] 44th 单曲生写竟然都给重复的啊啊！
	[心得] 华南红卡/icash 核卡
	[问题] 拔牙矫正这样正常吗
	[赠送] 老莫高业初业 102年版
	[情报] 三大行动支付本季掀战火
	[宝宝] 博客来Amos水蜡笔5/1特价五折
	Re: [心得] 新鲜人一些面试分享
	[心得] 苍の海贼龙地狱麒麟25PT
	Re: [闲聊] (君の名は。雷慎入) 君名二创漫画翻译
	Re: [闲聊] OGN中场影片：失踪人口局 (英文字幕)
	[问题] 台湾大哥大4G讯号差
	[出售] [全国]全新千寻侘草LED灯, 水草

WEB批踢踢(PTT)

C_and_CPP 板

[问题] CUDA shared-memory

热门看板

赞助商连结