加速你的hibernate引擎
?
1. 简介
Hibernate是最流行提供数据固话和查询的ORM引擎之一。
在你的项目中引入Hibernate并使其可以工作是非常简单的。然而,使其工作的非常好则需要会费很多的时间以及大量的经验。
通过我们使用Hibernate3.3.1以及Oracle9i的energy项目中的一些例子,这篇文章介绍了Hibernate调优用到的一些技术。
我们假设您已对Hibernate具有最基本的了解。对于某些在Hibernate官方文档(HRD)或者其他的调优文章中有所讲述,我们将仅仅提供一个文档的引用以及从一个不同视角的简单解释。我们主要聚焦在一些很有效但是又缺乏文档的条有方法。
?
调优是围绕着软件开发生命周期(SDLC)所有阶段的一个持续不断的过程。在一个典型的利用Hibernate做固化的J2EE应用中,调优覆盖下面几个领域:
业务规则调优设计调优Hibernate调优java GC调优应用容器调优底层系统,如数据库和操作系统的调优。在没有一个规划好的技术方案的情况下就对以上几个方面进行调优将会是很耗时间,并且可能是不起作用的。优秀的调优方案中一个很重要的方面就是确定调优领域的优先级。根据帕拉图原理(亦即二八原则)的解释,80%的应用性能提高来自于所有性能问题中最主要的20%[5]。
相对于基于网络的访问,基于内存和CPU的访问具有较低的延迟以及较好的吞吐率。鉴于此, 基于IO的Hibernate调优以及底层系统IO部分的调优优先于基于内存和CPU的GC调优以及底层系统中基于内存和CPU部分的调优。
示例一
?
[这幅图中包括一个"CreditCardType"的属性, 不过一下的SQL中都用 "cc_type"引用]
?
实际上只需要一张表即可。一个多态的查询将生成如下的SQL:
开始的时候,这个项目只有GasDeal以及少量的用户。它使用的是“每个类层次一张表”的策略。后来随着越来越多的业务要求的提出,增加了OilDeal和ElectricityDeal。映射策略没有改变。然而,ElectricityDeal具有太多自己特有的属性,与之相伴的是很多与之相关的,可以为null的列被添加到了Deal 这张表中。随着数据量的增加,数据变化逐渐变缓。作为重新设计,我们采用两个独立的表来保存Gas/Oil和electricity特有的属性。新的映射是“每个类层次一张表”和“每个子类一张表”的混合体。我们也重新设计了查询语句,使之可以在具体类上查询,从而去除了不必要的列以及关联。
?
4.3 调优领域对象
基于?Section 4.1中描述的对业务规则和设计的优化,我们可以得到通过POJO描述领域对象的类图。我们的建议如下:
4.3.1 调优POJO将注入引用之类的只读数据和以读为主的数据从读写数据中分离出来(译注:类似我们常说的读写分离)。对于只读数据,二级缓存是最有效的方案,其次是对以读为主的数据的非严格读写。将只读的POJO标记为immutable(不可变的)也是一个调优点。如果Service层的方法只是对只读数据的处理,你可以将其事务标为只读,这也是优化HIbernate和底层的JDBC driver的一个方案。细粒度的POJO和粗粒度的数据库表: 基于数据更改的频率和并发性等,将一个大的POJO分割成小的POJO。尽管你可以定义一个力度非常细的对象模型,但是粒度过细的表将带来过多的表连接,而这是数据仓库所不能接受的。优先使用非final类:Hibernate利用CGLIB 代理实现的延迟关联抓取只会对非final的类起作用。如果你关联的类是final的,Hibernate会直接将所有数据加载进来,这将对性能产生很大的破坏。对于游离的(detached?)实例,利用你的业务规则实现equals()和hashCode()方法。在多层的系统中,人们通常对游离对象使用乐观锁,从而提高系统的并发性,以获得较高的性能。定义一个verison或者timestamp的属性:在长对话(conversion)中(应用级事务),对于乐观锁,这样的一个列是必需的。(译注:hibernate本身支持version这个功能的,应该不需要自己单独写的吧)优先使用组合对象:前端UI所使用的数据通常来自于几个不同的POJO。传送一个组合的POJO到UI比传递多个独立的POJO具有较好的网络性能。有两种方法在Service层构建这个组合的POJO,其一是先将所有需要的POJO加载出来,然后将锁需要的属性提取出来放到组合POJO中;另外一个方法是通过HQL直接从数据库中查出所需要的属性。如果这些独立的POJO还会被其他的POJO引用,并且他们是放在二级缓存中的,推荐第一种方案。否则建议使用第二种。
4.3.2 调优POJO之间的关联关系如果关联关系可以使用one-to-one, one-to-many或者many-to-one,就不要使用many-to-many。多对多的关联将会需要一个而外的映射表。尽管在java代码中尼只需要处理两端的POJO,但是数据库在查询的时候需要关联额外的映射表,在修改的时候也需要而外的添加或者删除操作。优先单向而不是双向关联Due to the many-to-many nature, loading from one side of a bidirectional association can trigger loading of the other side which can further trigger extra data loading of the original side, and so on.
You can make similar arguments for bidirectional one-to-many and many-to-one when you navigate from the one side (the children entities) to the many side (the parent entity).
This back and forth loading takes time and may not be what you want.Don’t define an association for the sake of association; do so only when you need to load them together, which should be decided by your business rules and design (please see?Example 5?for details).
Otherwise you either don’t define any association or just define a value-typed property in the child POJO to represent the parent POJO’s ID property (similar argument for the other direction).Tuning collection
Use the “order-by” attribute instead of “sort” if your collection sorting logic can be implemented by the underlying database because the database usually does a better sorting job than you.
Collections can either model value types (element or composite-element) or entity reference types (one-to-many or many-to-many associations). Tuning the collection of reference types is mainly tuning fetch strategy. For tuning collections of value types,?Section 20.5 “Understanding Collection Performance”?in HRD?[1]?already has good coverage.Tuning fetch strategy. Please see?Section 4.7Example 5We have a core POJO called ElectricityDeals to capture electricity deals. From a business perspective, it has dozens of many-to-one associations with reference POJOs such as Portfolio, Strategy and Trader, just to name a few. Because the reference data is pretty stable, they are cached at the front end and can be quickly looked up based on their ID properties.
In order to have good loading performance, the ElectricityDeal mapping metadata only defines the value-typed ID properties of those reference POJOs because the front end can quickly look up the portfolio from cache based on a portfolioKey if needed:
<property?name="portfolioKey"?column="PORTFOLIO_ID"?type="integer"/>This implicit association avoids database table joins and extra selections, and cuts down data transfer size.
4.4 Tuning the Connection Pool
Because making a physical database connection is time consuming, you should always use a connection pool. Furthermore, you should always use a production level connection pool instead of Hibernate’s internal rudimentary pooling algorithm.
You usually provide Hibernate with a datasource which provides the pooling function. A popular open source and production level datasource is Apache DBCP’s BasicDataSource?[13]. Most database vendors also implement their own JDBC 3.0-compliant connection pools. For example, you can also get connection load balancing and failover using the Oracle provided JDBC connection pool?[14]?along with Oracle Real Application Cluster?[15].
Needless to say you can find plenty of connection pool tuning techniques on the web. Accordingly we will only mention common tuning parameters that are shared by most pools:
Min pool size: the minimum number of connections that can remain in the pool.Max pool size: the maximum number of connection that can be allocated from the pool.
If your application has high concurrency and your maximum pool size is too small, your connection pool will often experience waiting. On the other hand, if your minimum pool size is too large, you may have allocated unnecessary connections.Max idle time: the maximum time a connection may sit idle in the pool before being physically closed.Max wait time: the maximum time the pool will wait for a connection to be returned. This can prevent runaway transactions.Validation query: the SQL query that is used to validate connections before returning them to the caller. This is because some databases are configured to kill long idle connections and a network or database related exception may also kill a connection. In order to reduce this overhead, a connection pool can run validation while it is idle.4.5 Tuning Transactions and Concurrency
Short database transactions are essential for any highly performing and scalable applications. You deal with transactions using a session which represents a conversation request to process a single unit of work.
Regarding the scope of unit of work and transaction boundary demarcation, there are 3 patterns:
Session-per-operation.?Each database call needs a new session and transaction. Because your true business transaction usually encompasses several such operations and a large number of small transactions generally incur more database activities (the primary one is the database needs to flush changes to disk for each commit), application performance suffers. Accordingly it is an anti-pattern and shouldn’t be used.Session-per-request-with-detached-objects.?Each client request has a new session and a single transaction. You use Hibernate’s “current session” feature to associate the two together.?
In a multi-tier system, users usually initiate long conversations (or application transactions). Most times we use Hibernate’s automatic versioning and detached objects to achieve optimistic concurrent control and high performance anSession-per-conversion-with-extended (or long)-session.?You keep the session open for a long conversation which may span several transactions. Although it saves you from reattachment, the session may grow out of memory and probably has stale data for high concurrency systems.You also should be aware of the following points.
Use local transactions if you don’t need to use JTA because JTA requires many more resources and is much slower than local transactions. Even when you have more than one datasource, you don’t need JTA unless you have transactions spanning more than one datasource. In this last case you can consider using local transactions on each datasource using a technique similar to “Last Resource Commit Optimization”?[16]?(see?Example 6below for details).Mark your transaction as read-only if it doesn’t involve data changes as mentioned in?Section 4.3.1Always set up a default transaction timeout. It ensures that no misbehaving transaction can tie up resources while returning no response to the user. It even works for local transactions.Optimistic locking will not work if Hibernate is not the sole database user, unless you create database triggers to increment the version column for the same data change by other applications.Example 6Our application has several service layer methods which only deal with database “A” in most instances; however occasionally they also retrieve read-only data from database “B”. Because database “B” only provides read-only data, we still use local transactions on both databases for those methods.
The service layer does have one method involving data changes on both databases. Here is the pseudo-code:
//Make sure a local transaction on database A exists?@Transactional (readOnly=false, propagation=Propagation.REQUIRED)?public?void?saveIsoBids() {//it participates in the above annotated local transactioninsertBidsInDatabaseA();?//it runs in its own local transaction on database BinsertBidRequestsInDatabaseB();?//must be the last operationBecause?insertBidRequestsInDatabaseB()?is the last operation in saveIsoBids (), only the following scenario can cause data inconsistency:
The local transaction on database “A” fails to commit when the execution returns from saveIsoBids ().
However even if you use JTA for saveIsoBids (), you still get data inconsistency when the second commit phase fails in the two phase commit (2PC) process. So if you can deal with the above data inconsistency and really don’t want JTA complexities for just one or a few methods, you should use local transactions.
?