1. ホーム
  2. CUDA

CUDA プログラミングのエラー処理

2022-03-01 04:31:57
<パス

転載先 CUDAプログラミングのエラー処理

エラー処理について

CPUベースのプログラミングであれ、GPUベースのプログラミングであれ、APIを呼び出してプログラムを実行したときにエラーが発生すると、理論的な結果に反してプログラムが実行されたり、プログラムがクラッシュしたりすることがあります。そのため、プログラミングにおいて、エラーの検出とエラー処理は非常に重要です。エラーの原因を突き止めることができれば、より早く、より正確にエラーを修正することができるのです。

Linux Cシステム プログラミング エラー処理

Linuxシステムプログラミング(CPUベース)のエラー処理をおさらいしてみましょう。

Linuxシステムプログラミングでは、エラーは関数の戻り値と特殊変数errnoで記述されます。例えば、ある関数呼び出しの戻り値は、単にその関数呼び出しでエラーが発生したことを伝えるだけで、どのようなエラーであったかを伝えるものではありません。実際、システム関数呼び出しでエラーが発生した後、グローバル変数 errno の値は特定のエラーに対応するように上書きされる。例えば、errnoの値がEACCES=1に書き換えられると、それは"不十分な特権"を意味し、他にも似たようなエラーマップがたくさんある。

ヘッダーファイル errno.h では、errno 変数は次のように定義されています。

extern int errno;

では、errnoの値で表されるエラーメッセージをどのように知らせればいいのでしょうか。
方法1

#include 
When the file "file.txt" cannot be found, the system will print this on the screen (you can check the errno at this point, errno = ENOENT)
fopen: No such file or directory
Method two.
#include 
When the file "file.txt" cannot be found, the system prints this on the screen
fopen: No such file or directory
errno Usage Notes.
When using errno across functions, be careful to save the value of errno in a temporary variable to prevent it from being changed. 

In the multi-threaded programming, each thread maintains an errno, so it is thread-safe Error handling for CUDA programming Similar to the errno variable, the error type cudaError_t is defined in CUDA programming, except that this error value is generally the return value of the runtime API. The API call is correct when and only when the API returns cudaSuccess. typedef enumcudaError cudaError_t.
There are many values for variables of type cudaError_t, 70+, and you can check nvidia's CUDA Runtime API manual for details.
So the question is, when an API call error occurs, how do we know the error message? Mainly through the following two functions. __host__ __device__ cudaError_t cudaGetLastError ( void ) __host__ __device__ cudaError_t cudaPeekAtLastError ( void ) These two functions get the last error message and the return value is a cudaError_t type. The difference is that the cudaGetLastError() function will reset the system's global error message variable to cudaSuccess, while the cudaPeekAtLastError() function will do no such thing. Caution. Any CUDA API call can be in error. So if you want to pinpoint whether it was an API 1 call error, make sure all API calls before this one are error-free, and then add cudaGetLastError() to the end of the API 1 call;
The Kernel boot is asynchronous, so to locate if it is in error, remember to add the cudaDeviceSynchronize() function for synchronization and then call cudaGetLastError();
Ok, again, we got the value of the error type, what if we knew what it represented in terms of error messages? It would be too much trouble to look up the error message mapping in the reference manual, and CUDA provides two APIs to display the corresponding error messages as follows. __host__ __device__ const char* cudaGetErrorName ( cudaError_t error ) __host__ __device__ const char* cudaGetErrorString ( cudaError_t error ) The formal input to these two functions is the cudaError_t error type variable, which returns the error name and error message. As an example. cudaError_t err; err = cudaMemcpy(p_d, p_h, sizeof(float)*1024, cudaMemcpyHostToDevice); if (err ! = cudaSuccess) { fprintf(stderr, "cudaMemcpy : %s\n", cudaGetErrorString(cudaGetLastError())); exit(EXIT_FAILURE); } Tip.
In CUDA examples we find that the program will have checkCudaErrors( ... ), this is a macro defined to ensure that the function API call is correct, this way will make the programming more brief and clear. We can also define a macro of our own. #define checkCudaErrors( a ) do { \ if (cudaSuccess ! = (a)) { \ fprintf(stderr, "Cuda runtime error in line %d of file %s \ : %s \n", __LINE__, __FILE__, cudaGetErrorString(cudaGetLastError()) ); \ exit(EXIT_FAILURE); \ } \ } while(0); So, here, we'll use this macro to analyze whether the runtime api was called correctly or not: \ checkCudaErrors( cudaMemcpy(p_d, p_h, sizeof(float)*1024, cudaMemcpyHostToDevice) ); Another macro from another book that handles error definitions static void HandleError( cudaError_t err, const char *file, int line ) { if (err ! = cudaSuccess) { printf( "%s in %s at line %d\n", cudaGetErrorString( err ), file, line ); exit( EXIT_FAILURE ); } } #define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ )) #define HANDLE_NULL( a ) {if (a == NULL) { \ printf( "Host memory failed in %s at line %d\n", \ __FILE__, __LINE__ ); \ exit( EXIT_FAILURE );}}
#include When the file "file.txt" cannot be found, the system prints this on the screen fopen: No such file or directory errno Usage Notes. When using errno across functions, be careful to save the value of errno in a temporary variable to prevent it from being changed.
In the multi-threaded programming, each thread maintains an errno, so it is thread-safe Error handling for CUDA programming Similar to the errno variable, the error type cudaError_t is defined in CUDA programming, except that this error value is generally the return value of the runtime API. The API call is correct when and only when the API returns cudaSuccess. typedef enumcudaError cudaError_t.
There are many values for variables of type cudaError_t, 70+, and you can check nvidia's CUDA Runtime API manual for details.
So the question is, when an API call error occurs, how do we know the error message? Mainly through the following two functions. __host__ __device__ cudaError_t cudaGetLastError ( void ) __host__ __device__ cudaError_t cudaPeekAtLastError ( void ) These two functions get the last error message and the return value is a cudaError_t type. The difference is that the cudaGetLastError() function will reset the system's global error message variable to cudaSuccess, while the cudaPeekAtLastError() function will do no such thing. Caution. Any CUDA API call can be in error. So if you want to pinpoint whether it was an API 1 call error, make sure all API calls before this one are error-free, and then add cudaGetLastError() to the end of the API 1 call;
The Kernel boot is asynchronous, so to locate if it is in error, remember to add the cudaDeviceSynchronize() function for synchronization and then call cudaGetLastError();
Ok, again, we got the value of the error type, what if we knew what it represented in terms of error messages? It would be too much trouble to look up the error message mapping in the reference manual, and CUDA provides two APIs to display the corresponding error messages as follows. __host__ __device__ const char* cudaGetErrorName ( cudaError_t error ) __host__ __device__ const char* cudaGetErrorString ( cudaError_t error ) The formal input to these two functions is the cudaError_t error type variable, which returns the error name and error message. As an example. cudaError_t err; err = cudaMemcpy(p_d, p_h, sizeof(float)*1024, cudaMemcpyHostToDevice); if (err ! = cudaSuccess) { fprintf(stderr, "cudaMemcpy : %s\n", cudaGetErrorString(cudaGetLastError())); exit(EXIT_FAILURE); } Tip.
In CUDA examples we find that the program will have checkCudaErrors( ... ), this is a macro defined to ensure that the function API call is correct, this way will make the programming more brief and clear. We can also define a macro of our own. #define checkCudaErrors( a ) do { \ if (cudaSuccess ! = (a)) { \ fprintf(stderr, "Cuda runtime error in line %d of file %s \ : %s \n", __LINE__, __FILE__, cudaGetErrorString(cudaGetLastError()) ); \ exit(EXIT_FAILURE); \ } \ } while(0); So, here, we'll use this macro to analyze whether the runtime api was called correctly or not: \ checkCudaErrors( cudaMemcpy(p_d, p_h, sizeof(float)*1024, cudaMemcpyHostToDevice) ); Another macro from another book that handles error definitions static void HandleError( cudaError_t err, const char *file, int line ) { if (err ! = cudaSuccess) { printf( "%s in %s at line %d\n", cudaGetErrorString( err ), file, line ); exit( EXIT_FAILURE ); } } #define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ )) #define HANDLE_NULL( a ) {if (a == NULL) { \ printf( "Host memory failed in %s at line %d\n", \ __FILE__, __LINE__ ); \ exit( EXIT_FAILURE );}}