作者caifu (C將。)
看板C_and_CPP
標題[問題] cuda 矩陣乘法有幾個元素有錯
時間Tue Dec 15 23:03:36 2009
遇到的問題: (題意請描述清楚)
最近剛開始學習CUDA,
照著國網中心的教學寫了個矩陣乘法練習,
出來的結果確在某些元素會有錯,
而且相同的輸入、相同的程式碼情況下,
跑出來的結果確可能是不一樣的(某幾個版本的答案在變動)
這是我輸入的兩個矩陣:
M =
1.000000 3.000000 2.000000 0.000000
1.000000 0.000000 2.000000 2.000000
2.000000 0.000000 1.000000 1.000000
1.000000 3.000000 1.000000 3.000000
N =
3.000000 2.000000 3.000000 0.000000
3.000000 0.000000 2.000000 1.000000
0.000000 2.000000 1.000000 0.000000
2.000000 3.000000 3.000000 2.000000
希望得到的正確結果:
P =
12.000000 6.000000 11.000000 3.000000
7.000000 12.000000 11.000000 4.000000
8.000000 9.000000 10.000000 2.000000
18.000000 13.000000 19.000000 9.000000
程式跑出來的錯誤結果:
主要就以下幾個版本在變動…
P =
12.000000 6.000000 11.000000 3.000000
8.000000 5.000000 11.000000 4.000000
8.000000 5.000000 10.000000 0.000000
9.000000 5.000000 17.000000 5.000000
P =
12.000000 6.000000 11.000000 3.000000
8.000000 5.000000 11.000000 4.000000
8.000000 9.000000 8.000000 3.000000
18.000000 13.000000 5.000000 3.000000
P =
12.000000 6.000000 11.000000 3.000000
8.000000 5.000000 11.000000 4.000000
8.000000 9.000000 10.000000 2.000000
7.000000 4.000000 19.000000 9.000000
P =
12.000000 6.000000 7.000000 6.000000
7.000000 12.000000 7.000000 6.000000
8.000000 9.000000 8.000000 3.000000
18.000000 13.000000 5.000000 3.000000
開發平台: (例: VC++ or gcc/g++ or Dev-C++, Windows or Linux)
XP - VS2005
GeForce GT 220 - CUDA 2.3
有問題的code: (請善用置底文標色功能)
其實我覺得code應該是沒什麼問題,
但還是將處理資料的那部份貼出來,
也許是我沒寫好 ->
// Matrix multiplication dernel - per thread code
__global__ void MatrixMulKernel(float* Md, float* Nd, float* Pd, int Width) {
// Block ID
int bx = blockIdx.x;
int by = blockIdx.y;
// Thread ID
int tx = threadIdx.x;
int ty = threadIdx.y;
// Pvalue stores the element of the block sub-matrix
// that is computed by the thread - automatic variable!
float Pvalue = 0;
// Loop over all the sub-matrices of M and N
// required to compute the block sub-matrix
for (int m = 0; m < Width/TILE_WIDTH; ++m) {
// Get a pointer to the current sub-matrix Msub of M
float *Mdsub = GetSubMatrix(Md, m, by, Width);
// Get a pointer to the current sub-matrix Nsub of N
float *Ndsub = GetSubMatrix(Nd, bx, m, Width);
__shared__ float Mds[TILE_WIDTH][TILE_WIDTH];
__shared__ float Nds[TILE_WIDTH][TILE_WIDTH];
// each thread loads one element of the sub-matrix
Mds[ty][tx] = GetMatrixElement(Mdsub, tx, ty, Width);
// each thread loads one element of the sub-matrix
Nds[ty][tx] = GetMatrixElement(Ndsub, tx, ty, Width);
// synchronize to make sure the sub-matrices are loaded
// before starting the computation
__syncthreads();
// each thread computes one element of the block sub-matrix
for (int k = 0; k < TILE_WIDTH; ++k)
Pvalue += Mds[ty][k] * Nds[k][tx];
// Synchronize to make sure that the preceding
// computation is done before loading two new
// sub-matrices of M and N in the next iteration
__syncthreads();
}
// Get a pointer to the block sub-matrix of P
float *Psub = GetSubMatrix(Pd, bx, by, Width);
// Write the block sub-matrix to device memory;
// each thread wreites one element
SetMatrixElement(Psub, tx, ty, Pvalue, Width);
}
__device__ float* GetSubMatrix(float* Md, int x, int y, int Width) {
return (Md + y*TILE_WIDTH*Width + x*TILE_WIDTH);
}
__device__ float GetMatrixElement(float* Mdsub, int x, int y, int Width) {
return *(Mdsub + y*Width + x);
}
__device__ void SetMatrixElement(float* Psub, int x, int y, float Pvalue, int
Width) {
*(Psub + y*Width + x) = Pvalue;
}
補充說明:
爬過其它文好像有人說過硬體相關也會影響,
但這方面我就不知道要怎麼解決了,
還請大家幫幫忙,謝謝!!
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 163.22.18.83
1F:→ lgen7604:看起來kernel的部份沒有問題 我測試執行的結果也正確 12/16 11:58
2F:→ lgen7604:要不要考慮用置底文的網站 把完整的code附上來 12/16 11:58
3F:→ lgen7604:比較容易抓出問題出在哪裡 12/16 11:59
4F:推 andyjy12:用emu mode下開debug看看 12/16 13:15
6F:→ caifu:麻煩了 謝謝 12/16 17:24
7F:推 lgen7604:關於dimension的宣告 dim3 dimBlock(Width, Width); 12/16 19:25
8F:→ lgen7604:請特別小心.. dim3 dimBlock(TILE_WIDTH, TILE_WIDTH); 12/16 19:25
9F:→ caifu:請問這句要特別小心的原因是?! 12/17 12:28
10F:推 lgen7604:因為你矩陣乘法主kernel沒問題 問題是出在dimension宣告 12/17 13:29
11F:→ lgen7604:所以提醒你注意這部份 這種問題通常不容易發現 12/17 13:29
12F:→ caifu:l大 不好意思 這部份我還是沒有很懂 有沒有什麼文章教學呢?! 12/17 21:39
13F:→ caifu:l大 我懂你意思了 程式已改好了 謝謝!! ^^ 12/18 16:41