首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 开发语言 > 编程 >

Java代码兑现远程网页抓取

2012-12-18 
Java代码实现远程网页抓取一、返回头信息的获取 步骤: 1、定义URL对象并初始化; 2、定义URLConnection对象,并

Java代码实现远程网页抓取
一、返回头信息的获取

步骤:

1、定义URL对象并初始化;

2、定义URLConnection对象,并通过URL对象的openConnection()方法获取该对象;
3、调用URLConnection对象的connect()方法实现和服务器的连接;

4、通过URLConnection对象获取请求头的域信息(getHeaderFields()、getHeaderField(key));

5、使用URLConnection对象的方法来获取信息。

示例:

              String urlName = "http://www....com ";

              try {

                     URL url = new URL(urlName);

                    URLConnection connection = url.openConnection();

                    connection.connect();

                      // print header fields

                     Map<String, List<String>> headers = connection.getHeaderFields();

                     for (Map.Entry<String, List<String>> entry : headers.entrySet()) {

                            String key = entry.getKey();

                            for (String value : entry.getValue()) {

                                   System.out.println(key + ": " + value);

                            }

                     }

 
                     // print convenience functions

                     System.out.println("------------------------");

                     System.out.println("getContentType:" + connection.getContentType());

                     System.out.println("getContentLength:"

                                   + connection.getContentLength());

                     System.out.println("getContentEncoding:"

                                   + connection.getContentEncoding());

                     System.out.println("getDate:" + connection.getDate());

                     System.out.println("getExpiration:" + connection.getExpiration());

                     System.out.println("getLastModified:"

                                   + connection.getLastModified());

                     System.out.println("------------------------");

 
                     Scanner in = new Scanner(connection.getInputStream());

                     // print first ten lines of contents

                     for (int n = 1; in.hasNextLine() && n <= 10; n++) {

                            System.out.println(in.nextLine());

                     }

                     if (in.hasNextLine())

                            System.out.println("...");

              } catch (IOException e) {

                     // TODO Auto-generated catch block

                     e.printStackTrace();

              }

  
二、带参数的请求

       在默认情况下,建立的连接只有从服务器读取信息的输入流,并没有任何之行写操作的输出流。如果想获取输出流(例如,想一个Web服务器提交数据),那么需要调用:connection.setDoOutput(true);

 
示例:

              String urlName = "……";

              Map<String, String> paras = new HashMap<String, String>();

              paras.put("flightway", "Single");

 
              String result;

              try {

                     result = doPost(urlName, paras);

                     System.out.println(result);

              } catch (IOException e) {

                     e.printStackTrace();

              }

 
       public static String doPost(String rlString,

                     Map<String, String> nameValuePairs) throws IOException {

              URL url = new URL(rlString);

              URLConnection connection = url.openConnection();

              connection.setDoOutput(true); 

              PrintWriter out = new PrintWriter(connection.getOutputStream());

              boolean first = true;

              for (Map.Entry<String, String> pair : nameValuePairs.entrySet()) {

                     if (first)

                            first = false;

                     else

                            out.print('&');

                     String name = pair.getKey();

                     String value = pair.getValue();

                     out.print(name);

                     out.print('=');

                     out.print(URLEncoder.encode(value, "GB2312"));//UTF-8

              }

              out.close(); 
              Scanner in;

              StringBuffer response = new StringBuffer();

              try {

                     in = new Scanner(connection.getInputStream());

              } catch (IOException e) {

                     if (!(connection instanceof HttpURLConnection))

                            throw e;

                     InputStream err = ((HttpURLConnection) connection).getErrorStream();

                     if (err == null)

                            throw e;

                     in = new Scanner(err);

              }

 
              while (in.hasNextLine()) {

                     response.append(in.nextLine());

                     response.append("\n");

              }

 
              in.close();

              return response.toString();

       }

备注:

import java.io.*;import java.net.*;import java.util.*;


huc.setDoOutput(true);

// 设置为post方式

huc.setRequestMethod("POST");

huc.setRequestProperty("user-agent", "mozilla/4.7 [en] (win98; i)");


转载自:http://incan.iteye.com/blog/279000

热点排行