java读取网页保存之后都是乱码
我用java读取一个网站的源码,但是获取到的源码都是乱码,由于事先不清楚网站的URL也就是说不清楚网站的编码,所以下面的建议不要和我说。
URL url = new URL(baseUrl);BufferedReader buff = new BufferedReader(new InputStreamReader(url.openStream()), charSet);StringBuilder sb = new StringBuilder();String s;while((s = buff.readLine()) != null){ sb.append(s);}return sb.toString();URL url = new URL("http://www.baidu.com"); BufferedReader buff = new BufferedReader(new InputStreamReader(url.openStream())); StringBuilder sb = new StringBuilder(); String s = null; while((s = buff.readLine()) != null){ sb.append(s+"\n"); } System.out.println(sb);
[解决办法]
应该用:
URLConnection cn = url.openConnection();
然后在从头信息中获取其字符集设置:
cn.getContentEncoding();
[解决办法]