nokogiri相关
http://nokogiri.org
centos5下面nokogiri爬不到网页:更新libxml2 libxslt。
502重连:
io = StringIO.newloop do begin io = open(url) rescue Exception => e if e.message == '502 Bad Gateway' log = "#{Time.now} #{e.message}" puts log Rails.logger.info log sleep(rand()/3+0.05) # 休息一会 next # 重新爬这个页面 end raise e end breakend # io = open(url)doc = Nokogiri::HTML(io)