ホーム

AttributeError: 'NoneType' オブジェクトには 'get' 属性がありません。

2022-02-21 05:08:51

クローラー「Zhihu」で発生した最近のトラブルについて。
AttributeError: 'NoneType' オブジェクトには 'get' 属性がありません。
このオブジェクトは空のオブジェクト None であるため、get 属性がないことを意味します。

完了手順は以下の通りです。

#! /usr/bin/env python
# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup
import requests
import time

def captcha(captcha_data):
    with open("captcha.jpg", "wb") as f:
        f.write(captcha_data)
    text = input("Please enter a verification code: ")
    # return the verification code entered by the user
    return text

def zhihuLogin():
    # Build a Session object that can hold page cookies
    sess = requests.Session()

    # Request headers
    headers = {"User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}

    # First get the login page, find the data that needs to be POSTed (_xsrf), and the cookie value of the current page will be recorded
    html = sess.get("https://www.zhihu.com/#signin", headers = headers).text

    # Call the lxml parsing library
    bs = BeautifulSoup(html, "lxml")

    # _xsrf is used to prevent CSRF attacks (cross-site request forgery), often called cross-domain attacks, which are a way to use a website's trust mechanism for users to do bad things
    # Cross-domain attacks are usually done by disguising the request as a user trusted by the website (using cookies), stealing user information and deceiving the web server
    # So the website will store this MD5 string by setting a hidden field, this string is used to verify the user cookie and the server Session a way

    # Find the input tag with the name attribute value _xsrf, and then take out the value of the value
    _xsrf = bs.find("input", attrs={"name": "_xsrf"}).get("value")

    # Match the URL address of the captcha against the UNIX timestamp
    captcha_url = "https://www.zhihu.com/captcha.gif?r=%d&type=login" % (time.time() * 1000)
    # Send a request for an image, get the image data stream
    captcha_data = sess.get(captcha_url, headers = headers).content
    # Get the text in the captcha, which needs to be entered manually
    text = captcha(captcha_data)

    data = {
        # "_xsrf" : _xsrf,
        "username" : "***",
        "password" : "***",
        "captcha" : text
    }

    # Send the POST data needed to log in and get the cookies after logging in (saved in sess)
    response = sess.post("https://www.zhihu.com/login/email", data = data, headers = headers)
    # print response.text

    # send a request with a cookie that has a login status to get the source code of the target page
    response = sess.get("https://www.zhihu.com/people/hrycici/activities", headers = headers)
    with open("my.html", "wb") as f:
        f.write(response.text.encode("utf-8"))

if __name__ == "__main__":
    zhihuLogin()

上記のプログラムの中で、_xsrfというのがありますが、これは以前はZhihuの仕組みだったのですが、今はなくなっているようなので、この部分をコメントアウトすると、Zhihuに問題なくログインできるようになります

AttributeError: 'NoneType' オブジェクトには 'get' 属性がありません。

関連

RuntimeWarning: double_scalars で無効な値が検出されましたが、正常に解決されました。

undefined! [rejected] マスター -> マスター (フェッチファースト) プッシュコードエラー

アクセス制限の解決方法 DataSource型はAPIエラーです。

C#のTask.Delay()とThread.Sleep()

MySQLのエラー（ERROR 1046 (3D000)。選択されたデータベースがありません)

python reports an error: 'list' object has no attribute 'shape'

Gulpプロジェクトのエラーです。AssertionError [ERR_ASSERTION]: タスク関数を指定する必要があります

id 'com.android.library' を持つプラグインが見つかりません。

落とし穴を踏む-Uncaught Error: BootstrapのJavaScriptは、jQueryを必要とします。

シェルを実行するためにexecを使用するant

最新

nginxです。[emerg] 0.0.0.0:80 への bind() に失敗しました (98: アドレスは既に使用中です)

htmlページでギリシャ文字を使うには

ピュアhtml+cssでの要素読み込み効果

純粋なhtml + cssで五輪を実現するサンプルコード

ナビゲーションバー・ドロップダウンメニューのHTML+CSSサンプルコード

タイピング効果を実現するピュアhtml+css

htmlの選択ボックスのプレースホルダー作成に関する質問

html css3 伸縮しない画像表示効果

トップナビゲーションバーメニュー作成用HTML+CSS

html+css 実装サイバーパンク風ボタン

おすすめ

MFCフレームワーク_CRT_SECURE_NO_WARNINGS問題解決

デバッグアサーションに失敗しました

Python標準ライブラリ（各種モジュールの超定番入門書）

ArrayAdapter は、リソース ID が TextView である必要があります。

listen tcp :8080: bind: 各ソケットアドレス(プロトコル/ネットワークアドレス/ポート)を1つだけ使用することはできません。

解決方法：コマンドが見つかりません。

java.net.BindException: バインドに失敗しました。EADDRINUSE (アドレスは既に使用中です) 解決方法

matlabでよく使われる論理演算

json文字列のダブルクォートが&quotになるのですが、どうすれば解決できますか？

パラメータの例外です。引数型[java.lang.Integer]の名前がありません。