_compile(pattern, flags).findall(string) TypeError: cannot use string pattern on the bytes-like

2022-02-18 23:09:47

最近pythonを独学で勉強していて、画像クローラーを作ったのですが、いくつかエラーが出たので、他の人が同じエラーに遭遇した時にすぐに解決できるようにまとめています。

#coding=utf-8
import urllib
import urllib.request
import re

url = "http://tieba.baidu.com/p/2460150866"
page = urllib.request.urlopen(url)
html = page.read()
print(html)    

# regular match
reg = r'src="(. +? \.jpg)" pic_ext'
imgre = re.compile(reg)
imglist = re.findall(imgre, html)
x = 0
print("start dowload pic")
for imgurl in imglist:
    print(imgurl)
    resp = urllib.request.urlopen(imgurl)
    respHtml = resp.read()
    picFile = open('%s.jpg' % x, "wb")
    picFile.write(respHtml)
    picFile.close()
    x = x+1
print("done")

エラーメッセージは以下の通りです。

File "C:\Python35libre.py" line 213, in findall

return _compile(pattern, flags).findall(string)

TypeError: bytes-like オブジェクトで文字列パターンを使用することはできません。

エラーの主な原因は、以下の通りです。

TypeError: can't use a string pattern on a bytes-like object.

html を decode('utf-8') でデコードし、bytes から string に変更します。

py3のurlopenは文字列ではなく、バイトを返します。

解決策は html' タイプをいじる： html.decode('utf-8')

正しいコードは以下の通りです。

#coding=utf-8
import urllib
#In python 3.3, use urllib.request instead of urllib2
import urllib.request
import re

url = "http://tieba.baidu.com/p/2460150866"
page = urllib.request.urlopen(url)
html = page.read()
print(html) #python3 can only use print(html) python2 can write print html

#regular match
reg = r'src="(. +? \.jpg)" pic_ext'
imgre = re.compile(reg)
imglist = re.findall(imgre, html.decode('utf-8'))
x = 0
print("start dowload pic")
for imgurl in imglist:
    print(imgurl)
    resp = urllib.request.urlopen(imgurl)
    respHtml = resp.read()
    picFile = open('%s.jpg' % x, "wb")
    picFile.write(respHtml)
    picFile.close()
    x = x+1
print("done")

_compile(pattern, flags).findall(string) TypeError: cannot use string pattern on the bytes-like

関連

[解決済み】Python、タプルのインデックスはタプルではなく、整数でなければならない？

[解決済み] AttributeErrorを受信しています。WITH オブジェクトに EXIT が定義されていても exit を受け取る

[解決済み] seleniumでtextareaからテキストをクリアする

[解決済み] Python time.perf_counter() が返す端数秒とは、いったい何ですか？

[解決済み] AttributeError: '_io.TextIOWrapper' オブジェクトに 'next' 属性がない python

[解決済み] 2つの機能を同時に実行させる

[解決済み] PythonによるFama Macbeth回帰 (PandasまたはStatsmodels)

[解決済み] Numpy.dot TypeError: ルール 'safe' に従って配列データを dtype('float64') から dtype('S32') にキャストできません。

TypeError: 'float' object is not callable エラーとその解決法

エラーを変更しました。[WinError 10061] ターゲットコンピュータがアクティブに拒否しているため、接続できません。回避策

最新

nginxです。[emerg] 0.0.0.0:80 への bind() に失敗しました (98: アドレスは既に使用中です)

htmlページでギリシャ文字を使うには

ピュアhtml+cssでの要素読み込み効果

純粋なhtml + cssで五輪を実現するサンプルコード

ナビゲーションバー・ドロップダウンメニューのHTML+CSSサンプルコード

タイピング効果を実現するピュアhtml+css

htmlの選択ボックスのプレースホルダー作成に関する質問

html css3 伸縮しない画像表示効果

トップナビゲーションバーメニュー作成用HTML+CSS

html+css 実装サイバーパンク風ボタン

おすすめ

[解決済み] 'DataFrame' オブジェクトに 'sort' 属性がない

[解決済み】ipython notebookのコードでセル出力をクリアする

[解決済み】Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this?

[解決済み】python object() takes no parameters エラー【終了しました。

[解決済み] ValueError：未変換のデータが残っています。02:05

[解決済み] Pandasの内部結合がValueError: len(left_on) must equal the number of levels in index of "right "を出すのはなぜですか？

[解決済み] Django の datetime の問題 (default=datetime.now())

[解決済み] str' オブジェクトには 'sort' 属性がありません。

ImportError: DispatcherMiddlewareという名前をインポートできないエラー

ImportErrorの簡単な解決法。openpyxl,xlrdという名前のモジュールがない