[解決済み] OpenAIで新しいジム環境を作るには？

2023-02-15 23:44:13

質問

MLを使ってゲームのプレイを学習するAIエージェントを作るという課題があります。既存の環境は使いたくないので、OpenAI Gymを使って新しい環境を作りたいと思っています。どのようにすれば、新しいカスタム環境を作ることができるのでしょうか？

また、OpenAI Gymを使わずに、特定のビデオゲームをプレイするAIエージェントを作る開発に着手する方法は他にありますか？

どのように解決するのですか？

私の banana-gym をご覧ください。

新しい環境の作成

リポジトリのメインページを参照してください。

https://github.com/openai/gym/blob/master/docs/creating_environments.md

手順としては

PIP-package構造を持つ新しいリポジトリを作成します。

以下のようになるはずです。

gym-foo/
  README.md
  setup.py
  gym_foo/
    __init__.py
    envs/
      __init__.py
      foo_env.py
      foo_extrahard_env.py

その内容については、上記のリンク先を参照してください。そこに書かれていない詳細については、特に foo_env.py のいくつかの関数がどのように見えるべきかということです。例を見てみると gym.openai.com/docs/ が役に立ちます。以下はその例です。

class FooEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        pass

    def _step(self, action):
        """

        Parameters
        ----------
        action :

        Returns
        -------
        ob, reward, episode_over, info : tuple
            ob (object) :
                an environment-specific object representing your observation of
                the environment.
            reward (float) :
                amount of reward achieved by the previous action. The scale
                varies between environments, but the goal is always to increase
                your total reward.
            episode_over (bool) :
                whether it's time to reset the environment again. Most (but not
                all) tasks are divided up into well-defined episodes, and done
                being True indicates the episode has terminated. (For example,
                perhaps the pole tipped too far, or you lost your last life.)
            info (dict) :
                 diagnostic information useful for debugging. It can sometimes
                 be useful for learning (for example, it might contain the raw
                 probabilities behind the environment's last state change).
                 However, official evaluations of your agent are not allowed to
                 use this for learning.
        """
        self._take_action(action)
        self.status = self.env.step()
        reward = self._get_reward()
        ob = self.env.getState()
        episode_over = self.status != hfo_py.IN_GAME
        return ob, reward, episode_over, {}

    def _reset(self):
        pass

    def _render(self, mode='human', close=False):
        pass

    def _take_action(self, action):
        pass

    def _get_reward(self):
        """ Reward is given for XY. """
        if self.status == FOOBAR:
            return 1
        elif self.status == ABC:
            return self.somestate ** 2
        else:
            return 0

環境を利用する

import gym
import gym_foo
env = gym.make('MyEnv-v0')

例

https://github.com/openai/gym-soccer
https://github.com/openai/gym-wikinav
https://github.com/alibaba/gym-starcraft
https://github.com/endgameinc/gym-malware
https://github.com/hackthemarket/gym-trading
https://github.com/tambetm/gym-minecraft
https://github.com/ppaquette/gym-doom
https://github.com/ppaquette/gym-super-mario
https://github.com/tuzzer/gym-maze

[解決済み] OpenAIで新しいジム環境を作るには？

質問

どのように解決するのですか？

新しい環境の作成

環境を利用する

例

関連

[解決済み】TensorFlowでtf.gradientsが動作する方法

[解決済み】Keras - KerasRegressorを使用して予測を実行する方法は？

[解決済み] tf.reset_default_graph() の使用方法

[解決済み】ニューラルネットワークにおけるバイアスの役割とは？[クローズド］

[解決済み】同じ問題で binary_crossentropy と categorical_crossentropy が異なる性能を示すのはなぜか？

[解決済み】教師あり学習と教師なし学習の違いは何ですか？[終了しました］

[解決済み] Appleはどのように電子メールの日付、時間、アドレスを見つけるのですか？

[解決済み] なぜFメジャーはPrecisionとRecallの算術平均ではなく調和平均なのですか？

[解決済み] クロスエントロピーとは？[クローズド］

[解決済み] word2vec: ネガティブサンプリング(平たく言うと)?

最新

nginxです。[emerg] 0.0.0.0:80 への bind() に失敗しました (98: アドレスは既に使用中です)

htmlページでギリシャ文字を使うには

ピュアhtml+cssでの要素読み込み効果

純粋なhtml + cssで五輪を実現するサンプルコード

ナビゲーションバー・ドロップダウンメニューのHTML+CSSサンプルコード

タイピング効果を実現するピュアhtml+css

htmlの選択ボックスのプレースホルダー作成に関する質問

html css3 伸縮しない画像表示効果

トップナビゲーションバーメニュー作成用HTML+CSS

html+css 実装サイバーパンク風ボタン

おすすめ

[解決済み] RuntimeError: 次元が範囲外（[-1, 0]の範囲にあると期待されたが、1が得られた）

[解決済み] エアフローとKubeflowパイプラインの違いは何ですか？

[解決済み] tf.reset_default_graph() の使用方法

[解決済み】線形回帰とロジスティック回帰の違いは何ですか？[クローズド］

[解決済み】機械学習モデルの損失と精度の解釈の仕方【終了しました

[解決済み] Appleはどのように電子メールの日付、時間、アドレスを見つけるのですか？

[解決済み] 期待値最大化手法の直感的な説明とは？[クローズド］

[解決済み] トレーニング中のナンの原因

[解決済み] pytorch - loss.backward() と optimizer.step() の間の接続。

[解決済み] 機械学習とは？[終了しました］