c# 尝试抓取阿里巴巴的页面报错:无法连接远程服务器,真奇怪啊,求高人指点
抓取url:http://detail.china.alibaba.com/buyer/offerdetail/106312402.html
浏览器访问无问题
代码如下
public string getPage(String url, Encoding coding, out int err)
{
System.Net.WebResponse result = null;
string errorMsg = "";
string resultstring = "";
try
{
//WebRequest req = WebRequest.Create(url);
Uri ri = new Uri(url);
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(ri);
req.CookieContainer = new CookieContainer();
result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
//read the stream into a string
StreamReader sr = new StreamReader(ReceiveStream, coding);
resultstring = sr.ReadToEnd();
err = 0;
}
catch (WebException exp)
{
err = 1;
errorMsg = url + "获取页面失败,错误原因:" + exp.Message.ToString();
}
finally
{
if (result != null)
{
result.Close();
}
}
return resultstring;
}
运行的时候直接catch里面报错:无法连接远程服务器?什么原因?找高人指点下,另外,我记得很多采集器都能采集阿里巴巴的数据,不知道他们又是如何实现的?
[解决办法]
可能需要完全模拟浏览器发送的数据,比如浏览器类型什么的。
淘宝是禁止百度搜索引擎访问的,肯定对这方面做了技术处理。