wandb: ディープラーニングのための軽量な可視化ツールを始めるためのチュートリアル

2022-02-12 16:13:55

<ブロッククオート

今回は、AI開発のプロセスをより簡単に、より分かりやすくする新しい機械学習可視化ツールを紹介します。

wandb: ディープラーニング軽量可視化ツールの入門編

はじめに
ワンドブ
- バリデーションデータの可視化
- 自然言語処理
重要ツール
ミニマムチュートリアル
PyTorch アプリケーションの wandb
リファレンス

紹介文

人工知能の方向性を示すプロジェクトは、データの可視化と密接に結びついている。

モデル学習時の勾配降下プロセスはどのようなものか？損失関数はどうなっているのか？学習したモデルの精度はどのように変化するのか？

このデータを意識することが、モデルの最適化には欠かせないのです。

AIプロジェクトは膨大なデータを伴うことが多いので、データを一つ一つ肉眼で調べて分析しても、なかなか表には出てきません。そこで、データの可視化やログ分析レポートの出番となるわけです。

TensorFlowにはTensorboardが付属しており、モデルや学習の過程を可視化するのにますます良い仕事をします。しかし、どんどん肥大化しているのも事実で、AIに初めて触れる学生には一定の敷居があります。

人工知能のプロジェクトは標準化が進んでおり、例えばモデルの学習やデータセットの準備については、現在では多くの大企業が自社で自動化した機械学習プラットフォームをリリースしており、エンジニアは戦略の最適化により注力し、データの準備やデータの可視化にはあまり力を入れないようになっています。

ワンドビー

wandbはWeights & Biasesの略で、機械学習プロジェクトを追跡するのに役立つツールです。モデル学習中にハイパーパラメータと出力指標を自動的に記録し、結果を可視化して比較し、同僚と素早く結果を共有することができます。

wandbを使用すると、機械学習プロジェクトに強力なインタラクティブなビジュアルデバッグ体験をもたらし、Pythonスクリプトのアイコンの記録を自動化し、その結果をWebダッシュボードでリアルタイムに表示できます。例えば、損失関数、精度、再現率など、機械学習プロジェクトのビジュアルイメージをできるだけ短時間で作成することを可能にします。

要約すると、wandbは4つのコア機能を持っています。

カンバン：トレーニングのプロセスを追跡し、結果を視覚的に表示する
レポート：トレーニングプロセスに関する詳細で貴重な情報を保存し、共有することができます。
チューニング: ハイパーパラメータ・チューニングを使用して、学習済みモデルを最適化する。
ツール：データセットとモデルのバージョン管理

とはいえ、wandbは単なるデータ可視化ツールではありません。もっと強力なモデルやデータのバージョン管理機能を備えています。さらに、学習させたモデルをチューニングする機能もあります。

Jupyter、TensorFlow、Pytorch、Keras、Scikit、fast.ai、LightGBM、XGBoostで使用できるため、強力な互換性があることもwandbの見どころの一つです。

そのため、時間や労力の節約になるだけでなく、結果に質的な違いをもたらすことができます。

バリデーションデータの可視化

wandbは、検証データの一部を自動的に選択し、パネルに表示します。例えば、手書き文字予測の結果や、ターゲット識別のためのラップアラウンドボックスなどです。

自然言語処理

カスタムチャートを使用したNLPベースのアテンションモデルの可視化

ここでは2つの例しか挙げていませんが、現在はそれ以上の便利で価値のある機能を備えています。そして、常に新しい機能を追加しています。

重要なツール

wandb (Weights & Biases)は、tensorboardに似た非常に滑らかなオンラインモデルトレーニング可視化ツールです。

wandbは、実験を追跡し、実行しながらハイパーパラメータと出力メトリクスを記録し、結果を可視化し、結果を共有することを支援するライブラリである。

次の図は、wandbライブラリの機能を示しています。Framework Agnosticとは、どのフレームワークを使用していても、wandbを使用できることを意味します。AWS、GCP、Kubernetes、Azure、そしてローカルマシン。

以下は、wandbの重要なツールです。

ダッシュボード。実験を追跡し、結果を可視化します。
レポート。再現性のある知見を保存し、共有することができます。
スイープ。ハイパーパラメータチューニングでモデルを最適化します。
アーティファクトデータセットとモデルのバージョン管理、パイプラインの追跡。

ミニマムチュートリアル

1 ライブラリのインストール

pip install wandb

wandb login

# Inside my model training code
import wandb
wandb.init(project="my-project")

wandb.config.dropout = 0.2
wandb.config.hidden_layer_size = 128

def my_train_loop():
    for epoch in range(10):
        loss = 0 # change as appropriate :)
        wandb.log({'epoch': epoch, 'loss': loss})

# by default, this will save to a new subfolder for files associated
# with your run, created in wandb.run.dir (which is . /wandb by default)
wandb.save("mymodel.h5")

# you can pass the full path to the Keras model API
model.save(os.path.join(wandb.run.dir, "mymodel.h5"))

2 アカウントの作成

from __future__ import print_function
import argparse
import random # to set the python random seed
import numpy # to set the numpy random seed
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Ignore excessive warnings
import logging

logging.propagate = False
logging.getLogger().setLevel(logging.ERROR)

# WandB - Import the wandb library
import wandb

3 初期化

# WandB - Login to your wandb account so you can log all your metrics
! wandb login

4 ハイパーパラメータの宣言

# Define Convolutional Neural Network.

class Net(nn.Module):
    def __init__(self):
        super(Net, self). __init__()

        # In our constructor, we define our neural network architecture that we'll use in the forward pass.
        # Conv2d() adds a convolution layer that generates 2 dimensional feature maps
        # to learn different aspects of our image.
        self.conv1 = nn.Conv2d(3, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)

        # Linear(x,y) creates dense, fully connected layers with x inputs and y outputs.
        # Linear layers simply output the dot product of our inputs and weights.
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Here we feed the feature maps from the convolutional layers into a max_pool2d layer.
        # The max_pool2d layer reduces the size of the image representation our convolutional layers learnt,
        # and in doing so it reduces the number of parameters and computations the network needs to perform.
        # Finally we apply the relu activation function which gives us max(0, max_pool2d_output)
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))

        # Reshapes x into size (-1, 16 * 5 * 5)
        # so we can feed the convolutio

5 ログを記録する

def train(config, model, device, train_loader, optimizer, epoch):
    # This is necessary for layers like dropout, batchNorm etc., which behave differently in training and evaluation mode.
    This is necessary for layers like dropout, batchNorm etc. # which behave differently in training and evaluation mode.
    model.train()

    # we loop over the data iterator, and feed the inputs to the network and adjust the weights.
    for batch_id, (data, target) in enumerate(train_loader):
        if batch_id > 20:
            break
        # Loop the input features and labels from the training dataset.
        data, target = data.to(device), target.to(device)

        # Reset the gradients to 0 for all learnable weight parameters
        optimizer.zero_grad()

        # Forward pass: Pass image data from training dataset, make predictions
        # about class image belongs to (0-9 in this case).
        output = model(data)

        # Define our loss function, and compute the loss
        loss = F.nll_loss(output, target)

        # Backward pass:compute the gradients of loss, the model's parameters
        loss.backward()

        # update the neural network weights
        optimizer.step()

6 ファイルを保存する

# wandb.log is used to record some logs (accuracy, loss and epoch), so that you can check the performance of the network at any time
def test(args, model, device, test_loader, classes):
    model.eval()
    # switch model to evaluation mode.
    # This is necessary for layers like dropout, batchNorm etc. which behave differently in training and evaluation mode
    test_loss = 0
    correct = 0
    example_images = []

    with torch.no_grad():
        for data, target in test_loader:
            # Load the input features and labels from the test dataset
            data, target = data.to(device), target.to(device)

            # Make predictions: Pass image data from test dataset,
            # make predictions about class image belongs to(0-9 in this case)
            output = model(data)

            # Compute the loss sum up batch loss
            test_loss += F.nll_loss(output, target, reduction='sum').item()

            # Get the index of the max log-probability
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()

            # Log images in your test dataset automatically,
            # along with predicted and true labels by passing pytorch tensors with image data into wandb.
            example_images.append(wandb.Image(
                data[0], caption="Pred:{} Truth:{}".format(classes[pred[0].item()], classes[target[0]])))

   # wandb.log(a_dict) logs the keys and values of the dictionary passed in and associates the values with a step.
   # You can log anything by passing it to wandb.log(),
   # including histograms, custom matplotlib objects, images, video, text, tables, html, pointclounds and other 3D objects.
   # Here we use it to log test accuracy, loss and some test images (along with their true and predicted labels).
    wandb.log({
        "Examples": example_images,
        "Test Accuracy": 100. * correct / len(test_loader.dataset),
        "Test Loss": test_loss
    })

wandbを使用した後、モデルの出力、ログ、保存するファイルがクラウドに同期されます。

PyTorchアプリケーション wandb

最も単純なニューラルネットワークの例で、wandbの使い方をデモします。

まず、必要なライブラリをインポートします。

# Initialize a wandb run, and set hyperparameters
# Initialize a new run
wandb.init(project="pytorch-intro")
wandb.watch_called = False # Re-run the model without restarting the runtime, unnecessary after our next release

# config is a variable that holds and saves hyper parameters and inputs
config = wandb.config # Initialize config
config.batch_size = 4 # input batch size for training (default:64)
config.test_batch_size = 10 # input batch size for testing(default:1000)
config.epochs = 50 # number of epochs to train(default:10)
config.lr = 0.1 # learning rate(default:0.01)
config.momentum = 0.1 # SGD momentum(default:0.5)
config.no_cuda = False # disables CUDA training
config.seed = 42 # random seed(default:42)
config.log_interval = 10 # how many batches to wait before logging training status

でwandbのアカウントにログインします。

def main():
    use_cuda = not config.no_cuda and torch.cuda.is_available()
    device = torch.device("cuda:0" if use_cuda else "cpu")
    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

    # Set random seeds and deterministic pytorch for reproducibility
    # random.seed(config.seed) # python random seed
    torch.manual_seed(config.seed) # pytorch random seed
    # numpy.random.seed(config.seed) # numpy random seed
    torch.backends.cudnn.deterministic = True

    # Load the dataset: We're training our CNN on CIFAR10.
    # First we define the transformations to apply to our images.
    transform = transforms.Compose([
        ToTensor(),
        Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

    # Now we load our training and test datasets and apply the transformations defined above
    train_loader = DataLoader(datasets.CIFAR10(
        root='. /data',
        train=True,
        download=True,
        transform=transform
    ), batch_size=config.batch_size, shuffle=True, **kwargs)

    test_loader = DataLoader(datasets.CIFAR10(
        root='. /data',
        train=False,
        download=True,
        transform=transform
    ), batch_size=config.batch_size, shuffle=False, **kwargs)

    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    # Initialize our model, recursively go over all modules and convert their parameters
    # and buffers to CUDA tensors (if device is set to cuda)
    model = Net().to(device)
    optimizer = optim.SGD(model.parameters(), lr=config.lr, momentum=config.momentum)

    # wandb.watch() automatically fetches all layer dimensions, gradients, model parameters
    # and logs them automatically to your dashboard.
    # using log="all" log histograms of parameter values in addition to gradients
    wandb.watch(model, log="all")
    for epoch in range(1, config.epochs + 1):
        train(config, model, device, train_loader, optimizer, epoch)
        test(config, model, device, test_loader, classes)

    # This automatically saves a file to the cloud
    torch.save(model.state_dict(), 'model.h5')
    wandb.save('model.h5')


if __name__ == '__main__':
    main()

Convolutional Neural Network を定義する。

# Define Convolutional Neural Network.

class Net(nn.Module):
    def __init__(self):
        super(Net, self). __init__()

        # In our constructor, we define our neural network architecture that we'll use in the forward pass.
        # Conv2d() adds a convolution layer that generates 2 dimensional feature maps
        # to learn different aspects of our image.
        self.conv1 = nn.Conv2d(3, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)

        # Linear(x,y) creates dense, fully connected layers with x inputs and y outputs.
        # Linear layers simply output the dot product of our inputs and weights.
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Here we feed the feature maps from the convolutional layers into a max_pool2d layer.
        # The max_pool2d layer reduces the size of the image representation our convolutional layers learnt,
        # and in doing so it reduces the number of parameters and computations the network needs to perform.
        # Finally we apply the relu activation function which gives us max(0, max_pool2d_output)
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2(x), 2))

        # Reshapes x into size (-1, 16 * 5 * 5)
        # so we can feed the convolutio

学習関数の定義

def train(config, model, device, train_loader, optimizer, epoch):
    # This is necessary for layers like dropout, batchNorm etc., which behave differently in training and evaluation mode.
    This is necessary for layers like dropout, batchNorm etc. # which behave differently in training and evaluation mode.
    model.train()

    # we loop over the data iterator, and feed the inputs to the network and adjust the weights.
    for batch_id, (data, target) in enumerate(train_loader):
        if batch_id > 20:
            break
        # Loop the input features and labels from the training dataset.
        data, target = data.to(device), target.to(device)

        # Reset the gradients to 0 for all learnable weight parameters
        optimizer.zero_grad()

        # Forward pass: Pass image data from training dataset, make predictions
        # about class image belongs to (0-9 in this case).
        output = model(data)

        # Define our loss function, and compute the loss
        loss = F.nll_loss(output, target)

        # Backward pass:compute the gradients of loss, the model's parameters
        loss.backward()

        # update the neural network weights
        optimizer.step()

テスト関数の定義

# wandb.log is used to record some logs (accuracy, loss and epoch), so that you can check the performance of the network at any time
def test(args, model, device, test_loader, classes):
    model.eval()
    # switch model to evaluation mode.
    # This is necessary for layers like dropout, batchNorm etc. which behave differently in training and evaluation mode
    test_loss = 0
    correct = 0
    example_images = []

    with torch.no_grad():
        for data, target in test_loader:
            # Load the input features and labels from the test dataset
            data, target = data.to(device), target.to(device)

            # Make predictions: Pass image data from test dataset,
            # make predictions about class image belongs to(0-9 in this case)
            output = model(data)

            # Compute the loss sum up batch loss
            test_loss += F.nll_loss(output, target, reduction='sum').item()

            # Get the index of the max log-probability
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()

            # Log images in your test dataset automatically,
            # along with predicted and true labels by passing pytorch tensors with image data into wandb.
            example_images.append(wandb.Image(
                data[0], caption="Pred:{} Truth:{}".format(classes[pred[0].item()], classes[target[0]])))

   # wandb.log(a_dict) logs the keys and values of the dictionary passed in and associates the values with a step.
   # You can log anything by passing it to wandb.log(),
   # including histograms, custom matplotlib objects, images, video, text, tables, html, pointclounds and other 3D objects.
   # Here we use it to log test accuracy, loss and some test images (along with their true and predicted labels).
    wandb.log({
        "Examples": example_images,
        "Test Accuracy": 100. * correct / len(test_loader.dataset),
        "Test Loss": test_loss
    })

wandbの実行を初期化し、ハイパーパラメータを設定する。

# Initialize a wandb run, and set hyperparameters
# Initialize a new run
wandb.init(project="pytorch-intro")
wandb.watch_called = False # Re-run the model without restarting the runtime, unnecessary after our next release

# config is a variable that holds and saves hyper parameters and inputs
config = wandb.config # Initialize config
config.batch_size = 4 # input batch size for training (default:64)
config.test_batch_size = 10 # input batch size for testing(default:1000)
config.epochs = 50 # number of epochs to train(default:10)
config.lr = 0.1 # learning rate(default:0.01)
config.momentum = 0.1 # SGD momentum(default:0.5)
config.no_cuda = False # disables CUDA training
config.seed = 42 # random seed(default:42)
config.log_interval = 10 # how many batches to wait before logging training status

主な機能

def main():
    use_cuda = not config.no_cuda and torch.cuda.is_available()
    device = torch.device("cuda:0" if use_cuda else "cpu")
    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}

    # Set random seeds and deterministic pytorch for reproducibility
    # random.seed(config.seed) # python random seed
    torch.manual_seed(config.seed) # pytorch random seed
    # numpy.random.seed(config.seed) # numpy random seed
    torch.backends.cudnn.deterministic = True

    # Load the dataset: We're training our CNN on CIFAR10.
    # First we define the transformations to apply to our images.
    transform = transforms.Compose([
        ToTensor(),
        Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])

    # Now we load our training and test datasets and apply the transformations defined above
    train_loader = DataLoader(datasets.CIFAR10(
        root='. /data',
        train=True,
        download=True,
        transform=transform
    ), batch_size=config.batch_size, shuffle=True, **kwargs)

    test_loader = DataLoader(datasets.CIFAR10(
        root='. /data',
        train=False,
        download=True,
        transform=transform
    ), batch_size=config.batch_size, shuffle=False, **kwargs)

    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    # Initialize our model, recursively go over all modules and convert their parameters
    # and buffers to CUDA tensors (if device is set to cuda)
    model = Net().to(device)
    optimizer = optim.SGD(model.parameters(), lr=config.lr, momentum=config.momentum)

    # wandb.watch() automatically fetches all layer dimensions, gradients, model parameters
    # and logs them automatically to your dashboard.
    # using log="all" log histograms of parameter values in addition to gradients
    wandb.watch(model, log="all")
    for epoch in range(1, config.epochs + 1):
        train(config, model, device, train_loader, optimizer, epoch)
        test(config, model, device, test_loader, classes)

    # This automatically saves a file to the cloud
    torch.save(model.state_dict(), 'model.h5')
    wandb.save('model.h5')


if __name__ == '__main__':
    main()

<イグ

参考文献

https://www.jianshu.com/p/148c108b00f0
https://zhuanlan.zhihu.com/p/266337608