Selenium webdriverです。navigator.webdriverのフラグを変更して、seleniumの検出を防止する

2023-08-08 02:35:35

質問

私はseleniumとchromeを使用してウェブサイト内の非常に基本的なタスクを自動化しようとしていますが、どういうわけかウェブサイトはchromeがseleniumによって駆動されているときに検出し、すべての要求をブロックします。私は、ウェブサイトが次のような公開されたDOM変数に依存していることを疑います。 https://stackoverflow.com/a/41904453/648236 のような公開されたDOM変数に依存していると思われます。

私の質問は、私がnavigator.webdriverフラグをfalseにすることができる方法があるのでしょうか？私は修正を加えた後、seleniumソースを再コンパイルしようとすることまで喜んでいますが、NavigatorAutomationInformationソースがリポジトリのどこにも見つからないようです。 https://github.com/SeleniumHQ/selenium

どんな助けでも大いに感謝します

追伸：以下からも試してみました。 https://w3c.github.io/webdriver/#interface

Object.defineProperty(navigator, 'webdriver', {
    get: () => false,
  });

しかし、それは最初のページロードの後にのみプロパティを更新します。私のスクリプトが実行される前に、サイトが変数を検出したのだと思います。

どのように解決するのですか？

最初の更新 ¹

execute_cdp_cmd() : を利用できるようになったことで execute_cdp_cmd(cmd, cmd_args) コマンドを使えば、簡単に google-chrome-devtools（グーグルクロームデバイスツールコマンドを使ってセレン . この機能を使用すると navigator.webdriver を簡単に変更して、Selenium が検出されないようにすることができます。

検出されないようにする ²

Selenium が駆動しないようにするために WebDriver が検出されるのを防ぐには、ニッチなアプローチとして、以下のステップのいずれか/すべてが含まれます。

引数を追加する -disable-blink-features=AutomationControlled(自動制御)

from selenium import webdriver

options = webdriver.ChromeOptions() 
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://www.website.com")

関連する詳細な議論は、以下のサイトで見ることができます。 Seleniumは2番目のページを開くことができない

を回転させるユーザエージェントを通して execute_cdp_cmd() コマンドを使う。

#Setting up Chrome/83.0.4103.53 as useragent
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})

を変更します。 プロパティ の値を変更します。 navigator に対して ウェブドライバ から 未定義
```
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
```

のコレクションを除外する。 enable-automation スイッチ

options.add_experimental_option("excludeSwitches", ["enable-automation"])

ターンオフ useAutomationExtension

options.add_experimental_option('useAutomationExtension', False)

サンプルコード ³

上記のすべてのステップをクラブアップし、効果的なコードブロックは次のようになります。

from selenium import webdriver

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.httpbin.org/headers')

歴史

による W3C 編集者草案 によると、現在の実装は厳密に言及しています。

は webdriver-active フラグ が設定されています。 true が設定されている場合 ユーザエージェント の下にある場合 リモートコントロール に設定されており、初期状態では false .

さらに

Navigator includes NavigatorAutomationInformation;

ということになる。

は NavigatorAutomationInformation インタフェース で公開されるべきではありません。 WorkerNavigator .

は NavigatorAutomationInformation インタフェース は次のように定義されます。

interface mixin NavigatorAutomationInformation {
    readonly attribute boolean webdriver;
};

を返します。 true もし webdriver-active フラグ がセットされている場合、それ以外は false となります。

最後に navigator.webdriver は，協調する利用者エージェントが，文書に，その文書が WebDriver によって制御されていることをドキュメントに通知し、自動化の間に代替のコードパスをトリガーできるようにする標準的な方法を定義しています。

注意 : 上記のパラメータを変更・調整すると ナビゲーション をブロックし WebDriver のインスタンスが検出されます。

更新情報（2019年11月6日）

現在の実装では、検出されずにウェブページにアクセスする理想的な方法として ChromeOptions() クラスにいくつかの引数を追加することです。

のコレクションを除外する。 enable-automation スイッチ
ターンオフ useAutomationExtension

のインスタンスを通して ChromeOptions のようにします。

Java の例です。

System.setProperty("webdriver.chrome.driver", "C:\\Utility\\BrowserDrivers\\chromedriver.exe");
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("excludeSwitches", Collections.singletonList("enable-automation"));
options.setExperimentalOption("useAutomationExtension", false);
WebDriver driver =  new ChromeDriver(options);
driver.get("https://www.google.com/");

Pythonの例

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\path\to\chromedriver.exe')
driver.get("https://www.google.com/")

Rubyの例

  options = Selenium::WebDriver::Chrome::Options.new
  options.add_argument("--disable-blink-features=AutomationControlled")
  driver = Selenium::WebDriver.for :chrome, options: options

レジェンド

¹ : Selenium の Python クライアントにのみ適用されます。

² : Selenium の Python クライアントにのみ適用されます。

³ : Selenium の Python クライアントにのみ適用されます。

Selenium webdriverです。navigator.webdriverのフラグを変更して、seleniumの検出を防止する

質問

どのように解決するのですか？

最初の更新 1

検出されないようにする 2

サンプルコード 3

歴史

更新情報（2019年11月6日）

レジェンド

関連

最新

おすすめ

最初の更新 ¹

検出されないようにする ²

サンプルコード ³