分页查询总行数缓存计策

2012-07-27

分页查询总行数缓存策略文章有点长。。。以前看到的分页模型大同小异，都是一个POJO结合各类视图技术实现的，但

分页查询总行数缓存策略
文章有点长。。。

以前看到的分页模型大同小异，都是一个POJO结合各类视图技术实现的，但对于每次查询，都要计算总页数（统计记录总行数），对于记录数较少、并发不高的系统来讲，这似乎没有什么问题，但对于高并发，记录行数很多（千万级）的情况，每次的统计行数就要花费不少时间。我这里尝试着设计了一个行数缓存和一个简单的分页POJO（跟传统的POJO大同小异），请大家批评讨论，并提出一些建议。分享才能进步！

1、在哪里缓存
可以在客户端（使用Cookie），也可以在服务器端（设计一个Cache）。服务器端可以灵活定义缓存时间、刷新策略，这里仅讨论服务器端缓存（大家也可以提提客户端缓存的优缺点）。

2、使用什么作为缓存的Key
缓存当然要有Key-Value，什么作为Key合适呢？
对于每次查询，查询条件是不一样的，尤其对于复杂的多条件动态查询，即同一个Service方法可能会有不同的查询条件，这样每次的记录行数是不确定的。所以，可以唯一标示一个查询的就是请求调用的方法（Controler里分发）和对应的查询参数，如：/listUsers.do?name=xxx&curPage=1 。那么就需要在服务器端获取这个url，然后把&curPage=1 这个条件去掉。需要注意的是：对于form的提交，需要以get方式，参数才可以用request.getQueryString()获取。

确定使用url作为缓存的key后，就要设计Page模型和缓存模型了。一般情况下，在Controler里调用Service，需要传入一个Page对象，在Dao中需要根据url进行缓存查询，决定是否要统计行数，将url直接作为Dao层中方法的参数是很不优雅的。这里我将url设计成Page的一个属性，在Dao中可以方便的使用page.getUrl()获取url了。

3、缓存策略
我们以<url,RowCount>的key-value形式在HashMap中缓存一个行数记录。关于缓存策略，可以有很多种，这里分析一下。

1）每个url具有独立的缓存策略
就是说，每个url可以有不同的缓存时间、刷新策略。缓存时间可以根据这个url对应的预计并发情况、统计耗时确定。
刷新可以访问后无论缓存中是否有对应的行数记录立刻重置计时，也可以只在缓存过期后才刷新，而缓存有效时不进行刷新。

2）统一定义每个url的缓存策略
这种情况下只需要每隔一段时间重置所有缓存中的记录的计时器即可，是最简单的一种。然而间隔时间多少不好估计。

3）是否需要换出内存
一个url按100个字符计算，加上RowCount（含2个int，一个long）本身的内存占用大概30byte，一个记录大概230byte，如果一个系统有10000个需要分页的查询（若查询参数不同，数目远不止这个），缓存占用约2Mb。还是占用了不少的内存，因此需要设计内存换出策略。可以采用最近最少使用原则LRU（Least Recently Used）、最不常用原则LFU（Least Frequently Used）等。前者简单，后者貌似更公平。我们简单采用LRU的一个简化版本：当缓存条目达到限制时，将最近最久未访问的缓存记录换出（LRU是计数，这里是计时）。

4、示例代码
Page分页模型：

/** * A pagination tool,default pageSize:20 * @author chen */public class Page {private int totalRow;private int totalPage;private int curPage;private int pageSize;private String url;private static final int DEFAULT_PAGESIZE=20;/** * Default page size is 20 * @param url A string in the address field of the browser * @param curPage current page index */public Page(String url,int curPage) {this.url=url.replaceAll("&?curPage=\\d*", "");this.curPage = curPage < 1 ? 1 : curPage;this.pageSize = DEFAULT_PAGESIZE;}/** * @param url A string in the address field of the browser * @param curPage Current page index * @param pageSize Size of the page */public Page(String url,int curPage, int pageSize) {this.url=url.replaceAll("&?curPage=\\d*", "");this.curPage = curPage < 1 ? 1 : curPage;this.pageSize = pageSize;}/** * Set total row of the pager */public void setTotalRow(int totalRow) {this.totalRow =totalRow;this.totalPage=this.totalRow < 1 ? 0 : (this.totalRow - 1) / pageSize + 1;//invalid stateif(curPage>this.totalPage || curPage<1){this.totalPage=0;this.totalRow=0;this.curPage=1;}}......}

缓存模型：

public class RowCountCache {private RowCountCache() {}private Map<String, RowCount> m = new HashMap<String, RowCount>();private static RowCountCache cache = new RowCountCache();private static final int MAXSIZE=10000;private static Calendar c=Calendar.getInstance();/** * An object of this cache. * @return this */public static RowCountCache getInstance() {return cache;}/** * Cache state of the object: In the cache and is valid. */public static final int CACHESTATE_VALID=1;/** * Cache state of the object: Cache is expired */public static final int CACHESTATE_EXPIRED=-1;/** * Cache state of the object: Not in the cache. */public static final int CACHESTATE_UNCACHED=-2;/** * Get the row-count number from the cache of the given url. * @return A row-count number,-1 if was not cached. */public int get(String url) {RowCount r = m.get(url);return r==null ? -1 : r.getTotalRow();}/** * Put or refresh cached row-count corresponding given url with default cache time. * @param totalRow A row-count number corresponding a specify url. */public void putOrRefresh(String url,int totalRow) {int cacheState=RowCountCache.getInstance().getCacheState(url);if(cacheState==RowCountCache.CACHESTATE_UNCACHED){this.put(url, totalRow);}else{this.refresh(url, totalRow);}}/** * Put or refresh cached row-count corresponding given url with custom cache time. * @param totalRow A row-count number corresponding a specify url. * @param cacheTime Time the totalRow will be cached, in seconds. */public void putOrRefresh(String url,int totalRow,int cacheTime) {int cacheState=RowCountCache.getInstance().getCacheState(url);if(cacheState==RowCountCache.CACHESTATE_UNCACHED){this.put(url, totalRow,cacheTime);}if(cacheState==RowCountCache.CACHESTATE_EXPIRED){this.refresh(url, totalRow);}}private void put(String url,int totalRow) {if(m.size()>=MAXSIZE){Set<Map.Entry<String, RowCount>> set = m.entrySet();c.setTime(new Date());long max_interval=-1;String key="";//find the farthest unused RowCount recordfor(Iterator<Map.Entry<String, RowCount>> iter=set.iterator();iter.hasNext();){Map.Entry<String, RowCount> e = iter.next();RowCount r=e.getValue();long interval=c.getTimeInMillis()-r.getLastVisit();if(max_interval<interval){max_interval=interval;key=e.getKey();}}m.remove(key);}m.put(url, new RowCount(totalRow));}private void put(String url,int totalRow,int cacheTime) {m.put(url, new RowCount(totalRow,cacheTime));}private void refresh(String url,int totalRow) {RowCount r =m.get(url);r.refresh(totalRow);}/** * Get the cache state of RowCount corresponding the given url * @return cache state */public int getCacheState(String url) {RowCount r=m.get(url);if(r==null){return RowCountCache.CACHESTATE_UNCACHED;}else if(r.isExpired()){return RowCountCache.CACHESTATE_EXPIRED;}else{return RowCountCache.CACHESTATE_VALID;}}}/** * An object in the cache corresponding a specify url. * @author chen */class RowCount{//Default cached time,5sec.private static final int DEFAULT_CACHE_TIME=5;private static Calendar c=Calendar.getInstance();private int totalRow;private int cacheTime;private long lastVisit;/** * Construct RowCount with default cached time,5sec. * @param totalRow A row count number corresponding a specify url. */public RowCount(int totalRow){this.totalRow=totalRow;this.cacheTime=DEFAULT_CACHE_TIME;c.setTime(new Date());this.lastVisit=c.getTimeInMillis();}/** * Construct RowCount with custom cached time,5sec. * @param totalRow A row count number corresponding a specify url. * @param cacheTime Time of the object will be cached,in seconds. */public RowCount(int totalRow,int cacheTime){this.totalRow=totalRow;this.cacheTime=cacheTime;c.setTime(new Date());this.lastVisit=c.getTimeInMillis();;}/** * Get the value of the row count. * @return the value of the row count.Return -1 if the cache is expired. */public int getTotalRow(){if(!isExpired()){return this.totalRow;}return -1;}/** * Refresh this RowCount object in the cache. * @param row A new row count number. */protected void refresh(int row){this.totalRow=row;c.setTime(new Date());this.lastVisit=c.getTimeInMillis();;}/** * Check whether the cache is expired * @return true,expired; false,unexpired */public boolean isExpired(){c.setTime(new Date());long t=c.getTimeInMillis();return t > this.lastVisit + this.cacheTime*1000;}/** * Get the last visit date of the object in milliseconds. */protected long getLastVisit(){return this.lastVisit;}}

下面是一个使用的Demo（部分代码）
一个请求发送到Controler（为简化，省略了Service层）

...Page p=new Page(HttpUtil.getUrl(),HttpUtil.getInteger(request, "curPage"));request.setAttribute("all", userDao.find(cond,p));request.setAttribute("page", p);return mapping.findForward("find.do");

Dao：

...int cacheState=RowCountCache.getInstance().getCacheState(p.getUrl());if(cacheState==RowCountCache.CACHESTATE_VALID){     p.setTotalRow(RowCountCache.getInstance().get(p.getUrl()));}else{     p.setTotalRow(this.countRow(cond, p));}//Whatever the cache is valid,refresh it.You can aslo refresh only when the cache is expiredRowCountCache.getInstance().putOrRefresh(p.getUrl(), p.getTotalRow());List<Users> all=ct.setFirstResult(p.getFirstRow()).setMaxResults(p.getPageSize()).list();return all;

终于写完了，请大家多多提意见，共同进步。~~

public Page(String url,int curPage, int pageSize) { this.url=url.replaceAll("&?curPage=\\d*", "");//就在这里 this.curPage = curPage < 1 ? 1 : curPage; this.pageSize = pageSize; }
2、你当然可以不使用Hibernate，代码只是一个示例，Dao层可以使用其他实现。Hibernate的二级缓存用来缓存总行数是不合适的，每次的查询条件是不一样的。
3、session在web应用中最好少用，占用服务器较大资源。
4、js当然也可以，可惜我不太熟悉。。。 19 楼 yirentianran 2009-11-04   既然是高并发，那是否应该考虑同步机制？ 20 楼凤舞凰扬 2009-11-04   prowl 写道1，楼主提到了高并发，应该看下com.java.util.concurrent包下的ConcurrentHashMap
ConcurrentHashMap并没有所谓非常高的性能，如果想看，建议看oscache吧。顺便提一下，包名多了个com
prowl 写道
2，单纯靠判断时间来对缓存进行删除操作我觉得不太科学，是否可以加一个计数器？在使用频率最少和时间之间做判决。
现在绝大多数cache的算法都是采用最近最少使用的，只是一些算法增加了当不使用超过一段时间后，也淘汰的补充行为（即使cache的空间是足够的）
prowl 写道
3，是否可以扩展一下让这个缓存应用到不同的场景，比如存取可能是一些实际的对象，或者删除之后是否可简单持久化到文件，下次读取文件的一些关键信息。（有一些操作很占内存不一定是DAO，比如一些文件的解析）
你说的这个就类似于EJB中bean的钝化和激活了，这种行为并不是cache，更多地像池的行为。对于cache，如果删除了，你又如何从文件中读取？当然，现在许多的cache实现都支持内存cache和临时的持久性cache（转储于硬盘或数据库，减少对内存的需求）。
prowl 写道
4，维护多个有关联的Map的时候是否需要加入事务处理。
事务是数据库的概念了，基本上没有简单办法去实现内存对象的事务（能做出一个事务机制，就相当威猛了，远远超出了cache及cache应用的层次）。
21 楼凤舞凰扬 2009-11-04   yirentianran 写道既然是高并发，那是否应该考虑同步机制？
有效和高效的同步控制，是cache性能优劣的一个重要评判标准的。 22 楼 prowl 2009-11-05   凤舞凰扬写道prowl 写道1，楼主提到了高并发，应该看下com.java.util.concurrent包下的ConcurrentHashMap
ConcurrentHashMap并没有所谓非常高的性能，如果想看，建议看oscache吧。顺便提一下，包名多了个com
prowl 写道
2，单纯靠判断时间来对缓存进行删除操作我觉得不太科学，是否可以加一个计数器？在使用频率最少和时间之间做判决。
现在绝大多数cache的算法都是采用最近最少使用的，只是一些算法增加了当不使用超过一段时间后，也淘汰的补充行为（即使cache的空间是足够的）
prowl 写道
3，是否可以扩展一下让这个缓存应用到不同的场景，比如存取可能是一些实际的对象，或者删除之后是否可简单持久化到文件，下次读取文件的一些关键信息。（有一些操作很占内存不一定是DAO，比如一些文件的解析）
你说的这个就类似于EJB中bean的钝化和激活了，这种行为并不是cache，更多地像池的行为。对于cache，如果删除了，你又如何从文件中读取？当然，现在许多的cache实现都支持内存cache和临时的持久性cache（转储于硬盘或数据库，减少对内存的需求）。
prowl 写道
4，维护多个有关联的Map的时候是否需要加入事务处理。
事务是数据库的概念了，基本上没有简单办法去实现内存对象的事务（能做出一个事务机制，就相当威猛了，远远超出了cache及cache应用的层次）。

有关第三点，对于频繁耗时的数据库操作，或者一些文件的解析，往往得到一些有用的信息要用上更长的时间，在源更新不频繁的情况下，有效对已经提取的有用信息进行持久化，再次访问直接读取信息文件，能显著的提高效率。这只是一个扩展，也不太同于池的概念。

第四，有时会遇到一些缓存之间是相互关联的，比如同时保存了2个Map，其中一个Map里的数据在更新或者获取的时候出现了异常，那这条有关联的数据其实是无效的。可以加一些简单的事务控制。

其实这只是我在项目中遇到的问题，及我的解决办法，有时间一定看以下oscache，多谢推荐。

热点排行

互联网

分页查询总行数缓存计策