Python爬虫视频教程零基础小白到scrapy爬虫高手-轻松入门
源码
# -*- coding: utf-8 -*-
"""Created on Tue Mar 15 08:53:08 2016采集化工标准补录项目@author: Administrator"""import requests,bs4text=open("hb.txt",'w',encoding='utf-8')webpage="http://www.bzwxw.com/html/2016/1988_0116/9.html"res=requests.get(webpage)requests.codes.ok#中文显示全是乱码
res.text#soup1=bs4.BeautifulSoup(res.text,"lxml",from_encoding="gb18030")
soup1=bs4.BeautifulSoup(res.text,"lxml")elems=soup1.select('title')
len(elems)content=elems[0].getText()#text.write("hello")
text.write(content)text.close()
bs4显示出来是乱码
查看网页源码
发现charset=gbk,这可能是中文编码
增加一句话res.encoding = 'gbk'
# -*- coding: utf-8 -*-
"""Created on Tue Mar 15 08:53:08 2016采集化工标准补录项目@author: Administrator"""import requests,bs4text=open("hb.txt",'w',encoding='utf-8')webpage="http://www.bzwxw.com/html/2016/1988_0116/9.html"res=requests.get(webpage)res.encoding = 'gbk'requests.codes.ok#中文显示全是乱码
res.text#soup1=bs4.BeautifulSoup(res.text,"lxml",from_encoding="gb18030")
soup1=bs4.BeautifulSoup(res.text,"lxml")elems=soup1.select('title')
len(elems)content=elems[0].getText()#text.write("hello")
text.write(content)text.close()
发现输出正常
而且写入txt的中文也能正常显示