今天完成了图书的检索功能。相对来说,还是有点复杂,因为图书检索结果页面的Html并不是那么规范,解析时需要很大的耐心。
首先需要根据查询条件获取结果的HTML,查询条件可以有很多种,这里为了实用、方便,我特意限制了查询条件为:keyword、东校区、可借出
获取结果HTML的方法如下:
/** * 根据关键字检索图书 * * 检索可以是没有登录的情况,也可以是登录后的情况。 目前是声明了一个新的HTTPclient,即不需要登录, * 如果想设置为在登陆后才可以检索,则需要使用全局的HTTPclient,而不能再声明一个 * * @param keyword * 关键字 * @return 检索结果的html */ public static String serchBook(String keyword) { HttpGet httpGet = null; String searchResultHtml = null; HttpClient httpclient = new DefaultHttpClient(); HttpResponse response; /** * 字段顺序很重要 * * 设置查询条件为:关键字、东校区、可借出 */ List<NameValuePair> params = new ArrayList<NameValuePair>(); params.add(new BasicNameValuePair("searchtype", "X")); params.add(new BasicNameValuePair("searcharg", keyword));// 查询关键字 params.add(new BasicNameValuePair("searchscope", "1"));// 1代表东区 params.add(new BasicNameValuePair("sortdropdown", "-")); params.add(new BasicNameValuePair("SORT", "DZ"));// 设置排序方式为按日期倒排 params.add(new BasicNameValuePair("extended", "0")); params.add(new BasicNameValuePair("SUBMIT", "检索"));// 查询按钮 params.add(new BasicNameValuePair("availlim", "1"));// 设置查询条件---可借出 params.add(new BasicNameValuePair("searchlimits", "")); params.add(new BasicNameValuePair("searchorigarg", ""));// 设置上次查询的关键字及排序方式 // 对参数编码 String param = URLEncodedUtils.format(params, "UTF-8"); System.out.println(param); try { // 将URL与参数拼接 // http://innopac.lib.xjtu.edu.cn/search~S1*chx/ String test_url = "http://innopac.lib.xjtu.edu.cn/search~S1*chx/"; httpGet = new HttpGet(test_url + "?" + param); httpGet.setHeader("Host", "innopac.lib.xjtu.edu.cn"); httpGet.setHeader("Referer", test_url + "?" + param); response = httpclient.execute(httpGet); int code = response.getStatusLine().getStatusCode(); System.out .println("---------------searchbook------------------------"); System.out.println(response.getStatusLine()); if (code == 200) { if (response != null) { searchResultHtml = EntityUtils.toString( response.getEntity(), HTTP.UTF_8); return searchResultHtml; } } } catch (ClientProtocolException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally { httpGet.abort(); } return ""; }
这样便得到了检索结果的HTML,下面同样是使用jsoup对其进行解析,并进行封装。
首先来看看,页面上的显示状况:
根据所需解析的信息,我们需要两个封装类。
一个类封装书目信息,另一个封装馆藏信息。这两个类如下:
1.类BookInfo
package com.ali.login.bean; import java.util.List; /** * 搜索结果中书目的具体信息 * * @author shuyan * */ public class BookInfo { private String imgLink;// 图片链接 private String briefTitle;// Java JDK 7实例宝典 Java JDK 7 shi li bao dian / 韩雪, // 郭天娇编著 private String year;// 2014 文字印刷资料 private List<BookAddress> bookAddresses;// 书目馆藏信息 private String reserveLink;// 预约链接 public BookInfo() { super(); } public BookInfo(String imgLink, String briefTitle, String year, List<BookAddress> bookAddresses, String reserveLink) { super(); this.imgLink = imgLink; this.briefTitle = briefTitle; this.year = year; this.bookAddresses = bookAddresses; this.reserveLink = reserveLink; } public String getImgLink() { return imgLink; } public void setImgLink(String imgLink) { this.imgLink = imgLink; } public String getBriefTitle() { return briefTitle; } public void setBriefTitle(String briefTitle) { this.briefTitle = briefTitle; } public String getYear() { return year; } public void setYear(String year) { this.year = year; } public List<BookAddress> getBookAddresses() { return bookAddresses; } public void setBookAddresses(List<BookAddress> bookAddresses) { this.bookAddresses = bookAddresses; } public String getReserveLink() { return reserveLink; } public void setReserveLink(String reserveLink) { this.reserveLink = reserveLink; } @Override public String toString() { return "BookInfo [imgLink=" + imgLink + ", briefTitle=" + briefTitle + ", year=" + year + ", bookAddresses=" + bookAddresses + ", reserveLink=" + reserveLink + "]"; } }
2.BookAdress
package com.ali.login.bean; /** * 书目的馆藏信息 * * @author shuyan * */ public class BookAddress { private String holdLand;// 馆藏地 private String callNumber;// 索书号 private String status;// 状态 public BookAddress() { super(); } public BookAddress(String holdLand, String callNumber, String status) { this.holdLand = holdLand; this.callNumber = callNumber; this.status = status; } public String getHoldLand() { return holdLand; } public void setHoldLand(String holdLand) { this.holdLand = holdLand; } public String getCallNumber() { return callNumber; } public void setCallNumber(String callNumber) { this.callNumber = callNumber; } public String getStatus() { return status; } public void setStatus(String status) { this.status = status; } @Override public String toString() { return "BookAddress [holdLand=" + holdLand + ", callNumber=" + callNumber + ", status=" + status + "]"; } }
有了这两个类,便可以对HTML进行解析封装了。
这里开始变得有点麻烦,因为这里的标签很是不规范。
代码如下:
/** * 处理查询结果的HTML * * @param searchResultHtml * html字符串 * * @return 书目信息集合 */ public static List<BookInfo> getSearchResult(String searchResultHtml) { List<BookInfo> bookInfos = new ArrayList<BookInfo>(); Document document = Jsoup.parse(searchResultHtml); Elements items = document.getElementsByClass("briefCitRow");// 书目集合 int i = 1; for (Element item : items) { BookInfo bookInfo = null; List<BookAddress> bookAddresses = new ArrayList<>(); Element ele_par = item.select("a[href]").get(0); // http://202.117.24.227/bibimage/zycover.php?isbn=9787121217074 String imgLink = ele_par.child(0).attr("src");// 图片链接 Element ele_reserve = item.getElementsByClass("briefcitRequest") .get(0);// 预约图书的链接 Element ele_ahref = ele_reserve.select("a[href]").get(0); String reserveLink = ele_ahref.attr("href");// 预约链接 // 需要添加host在前面 // /availlim/search~S1*chx?/XJava&searchscope=1&SORT=DZ/XJava&searchscope=1&SORT=DZ&extended=0&SUBKEY=Java/1%2C2973%2C2973%2CC/requestbrowse~b3838346&FF=XJava&searchscope=1&SORT=DZ&1%2C1%2C // 注意有两个 class = briefcitDetail Elements ele_briefcitDetails = item .getElementsByClass("briefcitDetail"); // 先处理第一个 String briefTitle = ele_briefcitDetails.get(0) .getElementsByClass("briefcitTitle").get(0).text();// 书目简要描述 // 处理第二个 String year = ele_briefcitDetails.get(1).text();// 年份 Elements ele_addresses = item.getElementsByClass("briefcitItems") .get(0).getElementsByClass("bibItems").get(0) .getElementsByClass("bibItemsEntry");// 书目的馆藏信息 /** * 预约也在这里面处理 */ for (Element ele_add : ele_addresses) { BookAddress address = null; // 这里有3个td标签 Elements ele_tds = ele_add.getElementsByTag("td"); String bookStore = ele_tds.get(0).text(); String callNumber = ele_tds.get(1).text(); String status = ele_tds.get(2).text(); address = new BookAddress(bookStore, callNumber, status); bookAddresses.add(address); } bookInfo = new BookInfo(imgLink, briefTitle, year, bookAddresses, reserveLink); bookInfos.add(bookInfo); } return bookInfos; }
这样便得到了封装好的书目信息集合,测试如下:
public static void main(String[] args) { String searchResultHtml = LibraryUtil.serchBook("Java"); List<BookInfo> bookInfos = getSearchResult(searchResultHtml); int i = 0; for (BookInfo bookInfo : bookInfos) { if(i<5) { System.out.println(bookInfo.toString()); } i++; } }
结果为:
如此便实现了图书的检索功能...
当然这里还有许多需要考虑的地方,如:检索结果总数,每页需要显示多少条记录,筛选无意义的结果,预约功能等...
时间: 2024-10-12 07:26:04