[解決済み] TypeError: NoneTypeではなく、strでなければならない。

2022-02-05 17:53:16

質問

最初のプロジェクトであるウェブクローラーを書いているのですが、このエラーを修正する方法がわかりません。以下は私のコードです。

import requests
from bs4 import BeautifulSoup

def main_spider(max_pages):
    page = 1
    for page in range(1, max_pages+1):
        url = "https://en.wikipedia.org/wiki/Star_Wars" + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll("a"):
            href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
            print(href)
    page += 1

main_spider(1)

以下は、そのエラーです。

href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
TypeError: must be str, not NoneType

解決方法は？

Shiping が指摘するように、あなたのコードは適切にインデントされていません．以下、修正しました。また、... link.get('href') が文字列を返さないケースがある。

import requests
from bs4 import BeautifulSoup

def main_spider(max_pages):
    for page in range(1, max_pages+1):
        url = "https://en.wikipedia.org/wiki/Star_Wars" + str(page)
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll("a"): 

            href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
            print(href)

main_spider(1)

何が起こっているかを評価するために、いくつかの行を追加しました...あなたの既存の行の間に、問題の行を削除します（当分の間）。

        soup = BeautifulSoup(plain_text, "html.parser")
        print('All anchor tags:', soup.findAll('a'))     ### ADDED
        for link in soup.findAll("a"): 
            print(type(link.get("href")), link.get("href"))  ### ADDED

私が追加した結果は以下の通りです（簡潔にするために切り捨てています）。注：最初のアンカーは href 属性を持っていないため link.get('href') は値を返せないので None

[<a id="top"></a>, <a href="#mw-head">navigation</a>, 
<a href="#p-search">search</a>, 
<a href="/wiki/Special:SiteMatrix" title="Special:SiteMatrix">sister...   
<class 'NoneType'> None
<class 'str'> #mw-head
<class 'str'> #p-search
<class 'str'> /wiki/Special:SiteMatrix
<class 'str'> /wiki/File:Wiktionary-logo-v2.svg      
...

このエラーを防ぐには、コードに条件式またはtry/except式を追加することが考えられます。条件式のデモをします。

        soup = BeautifulSoup(plain_text, "html.parser")
        for link in soup.findAll("a"): 
            if link.get('href') == None:
                continue
            else:
                href = "https://en.wikipedia.org/wiki/Star_Wars" + link.get("href")
                print(href)

[解決済み] TypeError: NoneTypeではなく、strでなければならない。

質問

解決方法は？

関連

ピロウズ画像色処理の具体的な活用方法

Pythonの画像ファイル処理用ライブラリ「Pillow」（グラフィックの詳細）

[解決済み】お使いのCPUは、このTensorFlowバイナリが使用するようにコンパイルされていない命令をサポートしています。AVX AVX2

[解決済み】RuntimeWarning: 割り算で無効な値が発生しました。

[解決済み】Pythonスクリプトで「Expected 2D array, got 1D array instead: 」というエラーが発生？

[解決済み】csv.Error：イテレータはバイトではなく文字列を返すべき

[解決済み】Python: OverflowError: 数学の範囲エラー

[解決済み】Python: SyntaxError: キーワードは式になり得ない

[解決済み】「OverflowError: Python int too large to convert to C long" on windows but not mac

[解決済み] TypeError: Python3でファイルへの書き込み時に'str'ではなくbytesのようなオブジェクトが要求される

最新

nginxです。[emerg] 0.0.0.0:80 への bind() に失敗しました (98: アドレスは既に使用中です)

htmlページでギリシャ文字を使うには

ピュアhtml+cssでの要素読み込み効果

純粋なhtml + cssで五輪を実現するサンプルコード

ナビゲーションバー・ドロップダウンメニューのHTML+CSSサンプルコード

タイピング効果を実現するピュアhtml+css

htmlの選択ボックスのプレースホルダー作成に関する質問

html css3 伸縮しない画像表示効果

トップナビゲーションバーメニュー作成用HTML+CSS

html+css 実装サイバーパンク風ボタン

おすすめ

Pythonコンテナのための組み込み汎用関数操作

PicgoのイメージベッドツールをPythonで実装する

pyCaret効率化乗算器オープンソースローコード Python機械学習ツール

[解決済み] [Solved] sklearn error ValueError: 入力に NaN、infinity または dtype('float64') に対して大きすぎる値が含まれている。

[解決済み】「RuntimeError: dictionary changed size during iteration」エラーを回避する方法とは？

[解決済み】"No JSON object could be decoded "よりも良いエラーメッセージを表示する。

[解決済み】Python elifの構文が無効です【終了しました

[解決済み】Python: OverflowError: 数学の範囲エラー

[解決済み】「OverflowError: Python int too large to convert to C long" on windows but not mac

[解決済み】Python - "ValueError: not enough values to unpack (expected 2, got 1)" の修正方法 [閉店].