1. ホーム
  2. セレン

Python Seleniumのリクエストヘッダ設定

2022-03-15 06:04:24

Google Chromeです。



I. ChromeOptions関連の設定

chromeOptionsは、クロームの起動時のプロパティを設定するためのクラスです。このクラスを通して、クロームの以下のパラメータを設定することができます(この部分はseleniumのソースコードで確認することができます)。

<ブロッククオート

1. クロームのバイナリ位置の設定 (binary_location) 

2. 起動パラメータを追加する(add_argument) 

3. 拡張機能の追加 (add_extension、add_encoded_extension) 

4. 実験的設定パラメータの追加 (add_experimental_option) 

5. デバッガアドレスの設定(debugger_address)

 ソースコード分解。

# . \Lib\site-packages\selenium\webdriver\chrome\options.py
class Options(object):
    def __init__(self):
        self._binary_location = '' # set chrome binary location
        self._arguments = [] # Add startup arguments
        self._extension_files = [] # Add extensions
        self._extensions = []
        self._experimental_options = {} # Add experimental setup parameters
        self._debugger_address = None # Set the debugger address

1. モバイル端末のエミュレーション

# Used to emulate mobile devices by setting up user-agent
user_ag='MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 2.3.7; zh-cn; MB200 Build/GRJ22; '+
'CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1'
options.add_argument('user-agent=%s'%user_ag)
#option.add_argument('--user-agent=iphone')

<ブロッククオート

2. 画像の読み込みを無効にする

from selenium import webdriver
options = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
# or use the following settings to increase speed
options.add_argument('blink-settings=imagesEnabled=false')

<ブロッククオート

 3. プロキシを追加する

from selenium import webdriver
# Static IP: 102.23.1.105: 2005
PROXY = "proxy_host:proxy:port"
options = webdriver.ChromeOptions()
desired_capabilities = options.to_capabilities()
desired_capabilities['proxy'] = {
    "httpProxy": PROXY,
    "ftpProxy": PROXY,
    "sslProxy": PROXY,
    "noProxy": None,
    "proxyType": "MANUAL",
    "class": "org.openqa.selenium.Proxy",
    "autodetect": False
}
driver = webdriver.Chrome(desired_capabilities = desired_capabilities)

<ブロッククオート

4. ブラウザ起動時にcrx拡張機能をインストールする

# -*- coding=utf-8 -*-
from selenium import webdriver
option = webdriver.ChromeOptions()
option.add_extension('d:\crx\AdBlock_v2.17.crx') # The path to the crx you downloaded
driver = webdriver.Chrome(chrome_options=option)
driver.get('http://www.taobao.com/')

<ブロッククオート

5. Chromeの設定をすべて読み込む

Chromeのアドレスバーにchrome://version/と入力すると、quot;プロファイルパス"が表示されるので、ブラウザ起動時にこの設定ファイルを以下のコードで呼び出します。

#-*- coding=utf-8 -*-
from selenium import webdriver
option = webdriver.ChromeOptions()
p=r'C:\Users\Administrator\AppData\Local\Google\Chrome\User Data'
option.add_argument('--user-data-dir='+p) # set to the user's own data directory
driver = webdriver.Chrome(chrome_options=option)

<ブロッククオート

6. クッキーを運ぶ   Chromeのオプションuser-data-dirを使用して、セッション間ですべてのログインを持続させます。

chrome_options = Options()
chrome_options.add_argument("user-data-dir=selenium") 
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get("http://www.baidu.com")

7. その他

# -*- coding: utf-8 -*-
from selenium import webdriver
options = webdriver.ChromeOptions()

# Google headless mode
options.add_argument('--headless')
options.add_argument('--disable-gpu') # Google documentation mentions the need to add this attribute to circumvent bugs

options.add_argument('disable-infobars')#Hide "Chrome is being controlled by automated software"
options.add_argument('lang=zh_CN.UTF-8') # set Chinese
options.add_argument('window-size=1920x3000') # Specify browser resolution
options.add_argument('--hide-scrollbars') # Hide scrollbars, for some special pages
options.add_argument('--remote-debugging-port=9222')
options.binary_location = r'/Applications/Chrome' # Manually specify the browser location to use

# Replace the header
user_agent = (
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) " +
    "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"
    )
options.add_argument('user-agent=%s'%user_agent)

# Set images not to load
prefs = {
    'profile.default_content_setting_values': {
        'images': 2
    }
}
options.add_experimental_option('prefs', prefs)
# or use the following settings to increase the speed
options.add_argument('blink-settings=imagesEnabled=false')


#Set proxy
options.add_argument('proxy-server=' +'192.168.0.28:808')

driver = webdriver.Chrome(chrome_options=options)

# Set cookies
driver.delete_all_cookies()# delete all cookies
driver.add_cookie({'name':'ABC','value':'DEF'})# open with cookie
driver.get_cookies()

# open a new window via js
driver.execute_script('window.open("https://www.baidu.com");')

chrome address bar commands
  about:version - shows the current version 
  about:memory - shows the local browser memory usage 
  about:plugins - shows the installed plugins 
  about:histograms - shows the history 
  about:dns - shows DNS status 
  about:cache - shows cached pages 
  about:gpu - whether hardware acceleration is available 
  about:flags - opens some plugins // after using these things pop up: "Please be careful, these experiments may be risky", I don't know if it will mess up my configuration! 
  chrome://extensions/ - View the installed extensions

Three, chrome utility parameters 
  -user-data-dir="[PATH]" Specify the path of user data folder User Data, you can save the user data like bookmarks in a partition other than the system partition. 
  -disk-cache-dir="[PATH]" Specify the cache path 
  -disk-cache-size= Specify the cache size in Byte 
  -first run Reset to initial state, first run 
  -incognito start in stealth mode 
  -disable-javascript Disable Javascript 
  -omnibox-popup-count="num" Change the number of popup menus in the address bar to num. I've changed it to 15. 
  -user-agent="xxxxxxxxx" Change the Agent string in the HTTP request header, you can see the effect of the change in the about:version page. 
  -disable-plugins Disable loading of all plugins to increase speed. You can see the effect through about:plugins page. 
  -disable-javascript Disable JavaScript, if you feel slow in adding this 
  -disable-java Disable java 
  -start-maximized Maximize on startup 
  -no-sandbox Disable sandbox mode 
  -single-process run single process 
  -process-per-tab Use separate processes for each tab 
  -process-per-site Use a separate process for each site 
  -in-process-plugins Plugin does not enable separate processes 
  -disable-popup-blocking Disable popup blocking 
  -disable-plugins disables plugins 
  -disable-images Disable images 
  -incognito Start stealth mode 
  -enable-udd-profiles Enable account switching menu 
  -proxy-pac-url Use pac proxy [via 1/2] 
  -lang=zh-CN Set language to Simplified Chinese 
  -disk-cache-dir Customize cache directory 
  -disk-cache-size Customize the maximum cache size (in byte) 
  -media-cache-size Customize the maximum multimedia cache size (in byte) 
  -bookmark-menu Add a bookmark button to the toolbar 
  -enable-sync Enable bookmark synchronization 
  -single-process Run Google Chrome in a single process 
  -start-maximized Maximize Google Chrome when it starts 
  -disable-java Disable Java 
  -no-sandbox Run in non-sandbox mode



<ブロッククオート

seleniumがchromedriverのネットワークをクロールする。

# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
d = DesiredCapabilities.CHROME
d['loggingPrefs'] = {'performance': 'ALL'}
driver = webdriver.Chrome( desired_capabilities=d)
driver.get("https://www.baidu.com")
register = driver.find_element_by_partial_link_text("login")
register.click()
for entry in driver.get_log('performance'):
    print(entry)

<ブロッククオート

参考:https://blog.csdn.net/Ambulong/article/details/52672384

参考ブログ記事:https://blog.csdn.net/zwq912318834/article/details/78933910 

Chrome起動時のSelenium設定オプション :https://blog.csdn.net/liaojianqiu0115/article/details/78353267 

Chromeのコマンドライン設定 https://peter.sh/experiments/chromium-command-line-switches/ 

seleniumがクロームを操作するためのいくつかの設定 https://blog.csdn.net/hellozhxy/article/details/80245296

元記事: https://blog.csdn.net/u013440574/article/details/81911954 

<ブロッククオート

 selenium は phantomjs リクエストヘッダを設定します。

#-------------------------------------------------------------------------------------
# Set phantomjs request headers and proxy method one.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# Set the proxy
service_args = [
    '--proxy=%s' % ip_html, # Proxy IP: prot (eg: 192.168.0.28:808)
    '--proxy-type=http', # Proxy type: http/https
    '--load-images=true', # turn off image loading (optional) buggy under linux, if set this way it will cause memory to keep increasing and eventually hang
    '--disk-cache=true', # turn on caching (optional)
    '--ignore-ssl-errors=true' # Ignore https errors (optional)
]

# Set request headers
user_agent = (
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) " +
    "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"
    )
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = user_agent
driver = webdriver.PhantomJS(executable_path="phantomjs.exe",
                             desired_capabilities=dcap,
                             service_args=service_args)

driver.get(url='http://www.baidu.com')




#-------------------------------------------------------------------------------------
# Set phantomjs request headers and proxy method two.
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.proxy import ProxyType
desired_capabilities = DesiredCapabilities.PHANTOMJS.copy()
# Pick a random browser header from the USER_AGENTS list to disguise the browser
user_agent = (
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) " +
    "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"
    )
desired_capabilities["phantomjs.page.settings.userAgent"] = user_agent
# Crawl pages much faster without loading images
desired_capabilities["phantomjs.page.settings.loadImages"] = False

# reopen a sessionId using the value of the DesiredCapabilities(proxy settings) parameter
# Equivalent to the browser clearing the cache, plus the proxy to re-access the url once
proxy = webdriver.
proxy.proxy_type = ProxyType.MANUAL
proxy.http_proxy = '192.168.0.28:808'
# Add proxy settings to webdriver.DesiredCapabilities.PHANTOMJS
proxy.add_to_capabilities(desired_capabilities)
# Open phantomJS browser with configuration information
driver = webdriver.PhantomJS(executable_path='phantomjs.exe',
                             desired_capabilities=desired_capabilities)
driver.start_session(desired_capabilities)

driver.get(url='http://www.baidu.com')

# ========================== or ==========================
from selenium import webdriver
proxy=webdriver.Proxy()
proxy.proxy_type=ProxyType.MANUAL
proxy.http_proxy='192.168.0.28:808'
# Add the proxy settings to webdriver.DesiredCapabilities.PHANTOMJS
proxy.add_to_capabilities(webdriver.DesiredCapabilities.PHANTOMJS)
browser.start_session(webdriver.DesiredCapabilities.PHANTOMJS)
browser.get('http://www.baidu.com')

# -------------------------------------------------------------------------------------
# Restore to system proxy
proxy=webdriver.Proxy()
proxy.proxy_type=ProxyType.DIRECT
proxy.add_to_capabilities(webdriver.DesiredCapabilities.PHANTOMJS)
browser.start_session(webdriver.DesiredCapabilities.PHANTOMJS)
browser.get('http://1212.ip138.com/ic.asp')


Firefox関連の設定です。

# -*- coding: utf-8 -*-
from selenium import webdriver
options = webdriver.FirefoxOptions()
# Firefox headless mode
options.add_argument('--headless')
options.add_argument('--disable-gpu')
# options.add_argument('window-size=1200x600')

driver_path = webdriver.Firefox(executable_path='geckodriver.exe',
                                firefox_options=options)

ブラウザの共通リクエストヘッダ User-Agent

https://blog.csdn.net/mouday/article/details/80182397

<ブロッククオート

ブログ記事参照。 

https://blog.csdn.net/xc_zhou/article/details/80823855 

https://www.zhihu.com/question/35547395 

https://segmentfault.com/a/1190000013067705 

https://www.cnblogs.com/lgh344902118/p/6339378.html