首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > perl python >

采集豆瓣遇到乱码有关问题,几天了还没解决,求高手

2014-04-19 
采集豆瓣遇到乱码问题,几天了还没解决,求高手我的环境是windows7 32位,用的IDE是wingide,已经把wingide设

采集豆瓣遇到乱码问题,几天了还没解决,求高手
我的环境是windows7 32位,用的IDE是wingide,已经把wingide设置为utf8,运行后始终显示乱码,很奇怪:

#coding:utf-8

import urllib
import urllib2
import re
import sys

default_encoding = 'utf-8'
if sys.getdefaultencoding() != default_encoding:
    reload(sys)
    sys.setdefaultencoding(default_encoding)

#豆瓣电影分类
#doubanlist = ["剧情","喜剧","动作","爱情","科幻","动画","悬疑","惊悚","恐怖","纪录片","短片","情色","同性","音乐","歌舞","家庭","儿童","传记","历史","战争","犯罪","西部","奇幻","冒险","灾难","武侠","古装","鬼怪","运动","戏曲"]
urls = "http://movie.douban.com/category/q"
headers = {
        "Host":"movie.douban.com",
        "Connection":"keep-alive",
        "X-Requested-With":"XMLHttpRequest",
        "Accept-Encoding":"gzip,deflate,sdch",
        "Accept-Language":"zh-CN,zh;q=0.8,en;q=0.6",
        "Content-Type":"application/x-www-form-urlencoded",
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36",
        "Referer":"http://movie.douban.com/category/"
               }
postdata=urllib.urlencode({  
            "types[]":"",  
            "district":"",  
            "era":"",  
            "category":"all",  
            "unwatched":"false",  
            "available":"false",  
            "sortBy":"score",
            "page":"1",
            "ck":"null",
            "source":"paginator",
            "types[]":"剧情"
            })      
req = urllib2.Request(
            url = urls,
            data = postdata,
            headers = headers
                  )
content = urllib2.urlopen(req).read()
print content
    



[解决办法]
"Accept-Encoding":"gzip,deflate,sdch"
得到压缩流直接打印自然乱,试试用gzip模块解压吧...

热点排行