CUDA プログラミングのエラー処理

2022-03-01 04:31:57

エラー処理について

CPUベースのプログラミングであれ、GPUベースのプログラミングであれ、APIを呼び出してプログラムを実行したときにエラーが発生すると、理論的な結果に反してプログラムが実行されたり、プログラムがクラッシュしたりすることがあります。そのため、プログラミングにおいて、エラーの検出とエラー処理は非常に重要です。エラーの原因を突き止めることができれば、より早く、より正確にエラーを修正することができるのです。

Linux Cシステムプログラミングエラー処理

Linuxシステムプログラミング（CPUベース）のエラー処理をおさらいしてみましょう。

Linuxシステムプログラミングでは、エラーは関数の戻り値と特殊変数errnoで記述されます。例えば、ある関数呼び出しの戻り値は、単にその関数呼び出しでエラーが発生したことを伝えるだけで、どのようなエラーであったかを伝えるものではありません。実際、システム関数呼び出しでエラーが発生した後、グローバル変数 errno の値は特定のエラーに対応するように上書きされる。例えば、errnoの値がEACCES=1に書き換えられると、それは"不十分な特権"を意味し、他にも似たようなエラーマップがたくさんある。

ヘッダーファイル errno.h では、errno 変数は次のように定義されています。

extern int errno;

では、errnoの値で表されるエラーメッセージをどのように知らせればいいのでしょうか。
方法1

#include 
When the file "file.txt" cannot be found, the system will print this on the screen (you can check the errno at this point, errno = ENOENT)
fopen: No such file or directory
Method two.
#include 
When the file "file.txt" cannot be found, the system prints this on the screen
fopen: No such file or directory
errno Usage Notes.
When using errno across functions, be careful to save the value of errno in a temporary variable to prevent it from being changed. 


 In the
multi-threaded programming, each thread maintains an errno, so it is thread-safe
Error handling for CUDA programming
Similar to the errno variable, the error type cudaError_t is defined in CUDA programming, except that this error value is generally the return value of the runtime API. The API call is correct when and only when the API returns cudaSuccess.
typedef enumcudaError cudaError_t. 


 There are many values for variables of type cudaError_t, 70+, and you can check nvidia's CUDA Runtime API manual for details. 


 So the question is, when an API call error occurs, how do we know the error message?
Mainly through the following two functions.
__host__ __device__ cudaError_t cudaGetLastError ( void )
__host__ __device__ cudaError_t cudaPeekAtLastError ( void )
These two functions get the last error message and the return value is a cudaError_t type. The difference is that the cudaGetLastError() function will reset the system's global error message variable to cudaSuccess, while the cudaPeekAtLastError() function will do no such thing.
Caution.
Any CUDA API call can be in error. So if you want to pinpoint whether it was an API 1 call error, make sure all API calls before this one are error-free, and then add cudaGetLastError() to the end of the API 1 call; 


The Kernel boot is asynchronous, so to locate if it is in error, remember to add the cudaDeviceSynchronize() function for synchronization and then call cudaGetLastError();


 Ok, again, we got the value of the error type, what if we knew what it represented in terms of error messages? It would be too much trouble to look up the error message mapping in the reference manual, and CUDA provides two APIs to display the corresponding error messages as follows.
__host__ __device__ const char* cudaGetErrorName ( cudaError_t error )
__host__ __device__ const char* cudaGetErrorString ( cudaError_t error )
The formal input to these two functions is the cudaError_t error type variable, which returns the error name and error message. As an example.
    cudaError_t err;
    err = cudaMemcpy(p_d, p_h, sizeof(float)*1024, cudaMemcpyHostToDevice);
    if (err ! = cudaSuccess) {
        fprintf(stderr, "cudaMemcpy : %s\n", cudaGetErrorString(cudaGetLastError()));
        exit(EXIT_FAILURE);
    }
Tip. 


 In CUDA examples we find that the program will have checkCudaErrors( ... ), this is a macro defined to ensure that the function API call is correct, this way will make the programming more brief and clear. We can also define a macro of our own.
#define checkCudaErrors( a ) do { \
    if (cudaSuccess ! = (a)) { \
    fprintf(stderr, "Cuda runtime error in line %d of file %s \
    : %s \n", __LINE__, __FILE__, cudaGetErrorString(cudaGetLastError()) ); \
    exit(EXIT_FAILURE); \
    } \
    } while(0);
So, here, we'll use this macro to analyze whether the runtime api was called correctly or not: \
checkCudaErrors( cudaMemcpy(p_d, p_h, sizeof(float)*1024, cudaMemcpyHostToDevice) );
Another macro from another book that handles error definitions
static void HandleError( cudaError_t err,
                         const char *file,
                         int line ) {
    if (err ! = cudaSuccess) {
        printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
                file, line );
        exit( EXIT_FAILURE );
    }
}
#define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))


#define HANDLE_NULL( a ) {if (a == NULL) { \
                            printf( "Host memory failed in %s at line %d\n", \
                                    __FILE__, __LINE__ ); \
                            exit( EXIT_FAILURE );}}

#include 
When the file "file.txt" cannot be found, the system prints this on the screen
fopen: No such file or directory
errno Usage Notes.
When using errno across functions, be careful to save the value of errno in a temporary variable to prevent it from being changed. 

 In the
multi-threaded programming, each thread maintains an errno, so it is thread-safe
Error handling for CUDA programming
Similar to the errno variable, the error type cudaError_t is defined in CUDA programming, except that this error value is generally the return value of the runtime API. The API call is correct when and only when the API returns cudaSuccess.
typedef enumcudaError cudaError_t. 

 There are many values for variables of type cudaError_t, 70+, and you can check nvidia's CUDA Runtime API manual for details. 

 So the question is, when an API call error occurs, how do we know the error message?
Mainly through the following two functions.
__host__ __device__ cudaError_t cudaGetLastError ( void )
__host__ __device__ cudaError_t cudaPeekAtLastError ( void )
These two functions get the last error message and the return value is a cudaError_t type. The difference is that the cudaGetLastError() function will reset the system's global error message variable to cudaSuccess, while the cudaPeekAtLastError() function will do no such thing.
Caution.
Any CUDA API call can be in error. So if you want to pinpoint whether it was an API 1 call error, make sure all API calls before this one are error-free, and then add cudaGetLastError() to the end of the API 1 call; 

The Kernel boot is asynchronous, so to locate if it is in error, remember to add the cudaDeviceSynchronize() function for synchronization and then call cudaGetLastError();

 Ok, again, we got the value of the error type, what if we knew what it represented in terms of error messages? It would be too much trouble to look up the error message mapping in the reference manual, and CUDA provides two APIs to display the corresponding error messages as follows.
__host__ __device__ const char* cudaGetErrorName ( cudaError_t error )
__host__ __device__ const char* cudaGetErrorString ( cudaError_t error )
The formal input to these two functions is the cudaError_t error type variable, which returns the error name and error message. As an example.
    cudaError_t err;
    err = cudaMemcpy(p_d, p_h, sizeof(float)*1024, cudaMemcpyHostToDevice);
    if (err ! = cudaSuccess) {
        fprintf(stderr, "cudaMemcpy : %s\n", cudaGetErrorString(cudaGetLastError()));
        exit(EXIT_FAILURE);
    }
Tip. 

 In CUDA examples we find that the program will have checkCudaErrors( ... ), this is a macro defined to ensure that the function API call is correct, this way will make the programming more brief and clear. We can also define a macro of our own.
#define checkCudaErrors( a ) do { \
    if (cudaSuccess ! = (a)) { \
    fprintf(stderr, "Cuda runtime error in line %d of file %s \
    : %s \n", __LINE__, __FILE__, cudaGetErrorString(cudaGetLastError()) ); \
    exit(EXIT_FAILURE); \
    } \
    } while(0);
So, here, we'll use this macro to analyze whether the runtime api was called correctly or not: \
checkCudaErrors( cudaMemcpy(p_d, p_h, sizeof(float)*1024, cudaMemcpyHostToDevice) );
Another macro from another book that handles error definitions
static void HandleError( cudaError_t err,
                         const char *file,
                         int line ) {
    if (err ! = cudaSuccess) {
        printf( "%s in %s at line %d\n", cudaGetErrorString( err ),
                file, line );
        exit( EXIT_FAILURE );
    }
}
#define HANDLE_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))


#define HANDLE_NULL( a ) {if (a == NULL) { \
                            printf( "Host memory failed in %s at line %d\n", \
                                    __FILE__, __LINE__ ); \
                            exit( EXIT_FAILURE );}}

CUDA プログラミングのエラー処理

エラー処理について

Linux Cシステムプログラミングエラー処理

最新

nginxです。[emerg] 0.0.0.0:80 への bind() に失敗しました (98: アドレスは既に使用中です)

htmlページでギリシャ文字を使うには

ピュアhtml+cssでの要素読み込み効果

純粋なhtml + cssで五輪を実現するサンプルコード

ナビゲーションバー・ドロップダウンメニューのHTML+CSSサンプルコード

タイピング効果を実現するピュアhtml+css

htmlの選択ボックスのプレースホルダー作成に関する質問

html css3 伸縮しない画像表示効果

トップナビゲーションバーメニュー作成用HTML+CSS

html+css 実装サイバーパンク風ボタン

おすすめ

ハートビート・エフェクトのためのHTML+CSS

HTML ホテルフォームによるフィルタリング

HTML+cssのボックスモデル例（円、半円など）「border-radius」使いやすい

HTMLテーブルのテーブル分割とマージ（colspan, rowspan）

ランダム・ネームドロッパーを実装するためのhtmlサンプルコード

Html階層型ボックスシャドウ効果サンプルコード

QQの一時的なダイアログボックスをポップアップし、友人を追加せずにオンラインで話す効果を達成する方法

sublime / vscodeショートカットHTMLコード生成の実装

HTMLページを縮小した後にスクロールバーを表示するサンプルコード

html のリストボックス、テキストフィールド、ファイルフィールドのコード例

CUDA プログラミングのエラー処理

エラー処理について

Linux Cシステム プログラミング エラー処理

最新

おすすめ

Linux Cシステムプログラミングエラー処理