实现ORACLE与SQLSERVER数据库间的数据抽取与转换工具
以下是我从自己写的PDF说明书上COPY下来的,格式很乱,具体下载附件里的工程文件《抽取工具配置说明.pdf》有问题QQ联系:51003459 所有的包我都删除了,因为太大上传不了,要的QQ传。zExtractor数据抽取工具配置说明1.对每一套的抽取项目编写一个独立的XML文件放在classes/conf/文件夹下。例如抽取A库到B库的所有气象数据配置在一个a.xml里,把抽取B库到C库的N张电网数据表配置在另一个b.xml里,当多个xml配置文件里有相同的数据源(比如以上的B库)该工具只生成一个B库的连接池。2.XML配置文件中的属性名对大小写敏感。3.<property name="type" value="ORACLE"></property>要显式定义 因为对于不同的数据库来说:select * from table 与select * from (table), select * from (Select * from table) 与select * from (Select * from table) as temp是不一样的,要通过type来判断。4.以下为最简配置:easyDemo<?xml version="1.0" encoding="UTF-8"?><extract-case casename="案例"><!-- 源数据库 --><source-database><property name="type" value="SQLSERVER"></property><property name="driverClassName" value="com.microsoft.jdbc.sqlserver.SQLServerDriver"></property><property name="url" value="jdbc:microsoft:sqlserver://192.168.104.5:1433;databaseName=zy"></property><property name="username" value="sa"></property><property name="password" value="xx"></property><property name="minIdle" value="2"></property><property name="maxActive" value="50"></property><property name="maxWait" value="30000"></property></source-database><!-- 目标数据库 --><destination-database><property name="type" value="ORACLE"></property><property name="driverClassName" value="oracle.jdbc.driver.OracleDriver"></property><property name="url" value="jdbc:oracle:thin:@10.142.1.197:1521:sa"></property><property name="username" value="gdprs"></property><property name="password" value="xxxxx"></property></destination-database><task fromtable="text1" fromschema="dbo" totable="text2" taskinfo="两个表字段一样的最简配置" toschema="gdprs" cronexpress="”0" issamefieldname="true"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist></task></extract-case>默认配置:?抽取器extractorClass ="com.epgis. extract.core. extractor. CommonExtractor" ;?优先级别priority = 10;?批量增加batchInsertSize = 50?一次性得到JDBC数据集RESULTSET的数量fetchSize = 10;?任务允许的最多X小时没有来数,如果超过X小时没来数则短信告知 maxNoneDataHours=null表示不用短信通知功能5.抽取工具通过配置的extractKey字段(可能是多个extractKey组成的复合主键)来唯一识别一条记录进行抽取,可以有两种配置:1). <field-mapping><field name="id" sqltype="int" isextractkey="true"><tofield name="to_id" sqltype="varchar2" isextractkey="true"></tofield></field></field-mapping>2). <extractkeylist><extractkey fromfield="id" tofield=" to_id "></extractkey></extractkeylist>6.源表与目标表的字段对应通过<field-mapping>来配置,在<field>的name写上源表的字段名,<tofield>写上目标表的字段名,最好显式写出sqlType。比如:<task fromtable="t_auto_rtsq" fromschema="gdprs" totable="text5" toschema="gdprs" repeatinterval="20000" startdelay="0" issamefieldname="false"><field-mapping><field name="id" sqltype="int" isextractkey="true"><tofield name="to_id" sqltype="varchar2" isextractkey="true"></tofield></field><field name="TIME" sqltype="varchar2"><tofield name="date" sqltype="varchar2"></tofield></field><field name="senid"><tofield name="sen_id"></tofield></field><field name="ifch"><tofield name="ifch"></tofield></field></field-mapping></task>注:一般情况下可不配置field里的javaType、sqlType、 precision、scale当此字段为特殊字段时才要显式配上比如:sqlType=”blob”。7.如果源与目标表字段一样,可不配<field-mapping>,只要在<task>里配isSameFieldName="true"并配上extractKey:<task fromtable="t_auto_rtsq" totable="text5" repeatinterval="20000" startdelay="0" issamefieldname="true"><extractkeylist><extractkey fromfield="real_time" tofield="real_time"></extractkey>...........</extractkeylist></task>说明:(1) 如果isSameFieldName="true",再显式的配上以下字段映射xml时,则该字段(max_wind_speed)以以下XML显式配置的为准<field-mapping><field name="max_wind_speed1" javatype="double" sqltype="varchar2" precision="10" scale="0"><tofield name="max_wind_speed2"></tofield></field></field-mapping>(2) 如果isSameFieldName="false",则只抽取在<field-mapping>中显式配置的字段。8.任务调度可用simpleTrigger与cronTrigger两种中的其中一种,两种都配上时以cronExpression为主。1)repeatInterval="10000000" startDelay="0"2) cronExpression="0/10 * * * * ?"9.<task>里的fromTable 可以是一个表名也可以是一条SQL语句,这样可以灵活抽取所要数据(条件限制,合并某列来抽取等)比如:1)条件限制的例子:<task fromtable="select * from t_weather where real_time>'20090527010101'"></task>2)合并列的例子:<task fromtable="select id,id||filename as unionField from text" fromschema="gdprs" totable="text2" toschema="gdprs" taskinfo="测试合并列" repeatinterval="200000" startdelay="0" priority="10" issamefieldname="true" batchinsertsize="1000"><field-mapping><field name="unionField" isextractkey="false"><tofield name="filename" scale="0" isextractkey="false"></tofield></field><field name="id" javatype="int" sqltype="int" precision="10" scale="0" isextractkey="true"><tofield name="toid" isextractkey="true"></tofield></field></field-mapping></task>3)抽取A表的a字段与B表的b字段对应到C表(只有两个Ca,Cb字段时只要isSameField Name ="true" 就可):<task fromtable="select A.a as Ca , B.b as Cb from A,B" fromschema="test1" totable="test2" issamefieldname="true"></task><?xml version="1.0" encoding="UTF-8"?><configs><!-- 对抽取工具的全局变量定义 --><send-msg-phones><!-- 对数据断时进行短信告警 --><phone name="阿黎明">13850184475</phone><phone name="林雨场">13960866467</phone></send-msg-phones></configs>10.在<task>中配置alarmHours="24" 代表当超过24时都没有来数据时则发短信报知,短信号配置在config.xml中。<?xml version="1.0" encoding="UTF-8"?><configs><!-- 对抽取工具的全局变量定义 --><send-msg-phones><!-- 对数据断时进行短信告警 --><phone name="张三">13945448484</phone><phone name="李四">13945448484</phone></send-msg-phones><msg-database><!-- 短信保存库 --><driverclass>oracle.jdbc.driver.OracleDriver</driverclass><url>jdbc:oracle:thin:@10.142.1.109:1521:gdprs</url><username>gdprs</username><password>xxxx</password></msg-database></configs>11.如果是抽取ORACLE中带有BLOB的字段(支持同时抽取多个BLOB字段),需要以下两个配置:1)extractorfromschema="gdprs" totable="text2" toschema="gdprs" repeatinterval="100000" startdelay="0" extractorpriority="10" issamefieldname="false"><field-mapping><field name="id" isprimarykey="true"><tofield name="toid" isprimarykey="true"></tofield></field><field name="picture" sqltype="blob"><tofield name="fileblob" sqltype="blob"></tofield></field><field name="picture" sqltype="blob"><tofield name="fileblob2" sqltype="blob"></tofield></field><field name="create_date"><tofield name="filename"></tofield></field></field-mapping><extractkeylist><extractkey fromfield="id" tofield="toid"></extractkey></extractkeylist></task>12.对字段可自定义拦截器进行转化,fieldInterceptor有两种配置方式:1)在<field-mapping>字段里配<field name="content" fieldinterceptor="com.epgis.extract.business.interceptor.TestFieldInterceptor"><tofield name="content2"></tofield></field>2)当isSameFieldName="true"时,因为省略了配<field-mapping>,则可以用以下一种方式:<fieldinterceptorlist><fieldinterceptor fromfield="real_time" fromfieldinterceptor="com.String2DateFieldInterceptor" tofield="time" tofieldinterceptor="com.Date2StringFieldInterceptor"></fieldinterceptor>...........</fieldinterceptorlist>说明:一般情况下只要配源字段的拦截器,但如果这个字段同时又为extractKey的话,经过转换后到目标表的字段无法与源字段比较,所以要配一个目标表字段toField的拦截器让目标字段转回成和源字段一样格式的才可比较,比如:以上配了一个real_time的拦截器把2009-5-28转成了20090528抽取目标表,而real_time同时又是extractKey,这时抽取时就要配一个toField的拦截器把real_time转回成2009-5-28的格式才可以进行增量(比较大小)抽取。例子:t_weather_cityweather_info里的real_time为字符串,而目标表text3里的time为Date类型,这时要加个拦截器把real_time转为date比如:public class String2DateFieldInterceptor implements IFieldInterceptor{public Object convertField(Object field){try {SimpleDateFormat df = new SimpleDateFormat("yyyyMMddHHmmss");Date d = df.parse((String)field);java.sql.Timestamp sqlDate = new java.sql.Timestamp(d.getTime());return sqlDate;} catch (ParseException e) {e.printStackTrace();}return null;}而这时extractKey又为real_time这个转换后的字段,所以要再配一个目标字段的拦截器(如果这个转换字段real_time不是extractKey就不要配)public class Date2StringFieldInterceptor implements IFieldInterceptor{public Object convertField(Object field){Date d = new Date(((java.sql.Timestamp)field).getTime());SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMddHHmmss");return sdf.format(d);}}相应的XML配置:<task fromtable="cityweather" fromschema="gdprs" totable="text3" toschema="gdprs" repeatinterval="1000" startdelay="0" issamefieldname="true"><extractkeylist><!-- 以real_time为主键来抽取 --><extractkey fromfield="real_time" tofield="time"></extractkey></extractkeylist><fieldinterceptorlist><fieldinterceptor fromfield="real_time">fromFieldInterceptor="com.epgis.extract.business.String2DateFieldInterceptor "<!-- 把time字段的Date类型转为字符串2009080401010为了与源表比较,因为real_time为extractKey --><tofield tofieldinterceptor="com.epgis.extract.business.Date2StringFieldInterceptor"></tofield></fieldinterceptor></fieldinterceptorlist><field-mapping><field name="real_time"><!-- real_time与time的对应因为字段名不同要显式声明 --><tofield name="time"></tofield></field></field-mapping></task>13.可自定义一个抽取器,要继承一个抽象类Extract实现execute方法,比如<task extractorencoding="UTF-8"?><log-database><!-- 抽取日志保存库 --><driverclass>oracle.jdbc.driver.OracleDriver</driverclass><url>jdbc:oracle:thin:@10.142.1.109:1521:gdprs</url><username>gdprs</username><password>xxxxx</password></log-database>2. 用extractDetail类保存实时日志任务明细:http://10.142.4.191:8080/zExtractor/extractDetailShow.jsp3.日志信息:http://10.142.4.191:8080/zExtractor/log/extractLog.html类说明MODEL层结构/*** 抽象出所有抽取器的公有职能* 提供一个抽取器的模版Template:* public abstract void execute();* 当本系统默认的抽取器无法满足需求时,可继承此方法,自已实现execute()方法* @Author :黄仕勇* 2009-5-19*/public abstract class Extractor implements IExtractor {/**具体的抽取实现*/public abstract void execute();}附一:实际案例<?xml version="1.0" encoding="UTF-8"?><extract-case casename="抽取城市天气预报数据案例"><!-- 源数据库 --><source-database><property name="type" value="SQLServer"></property><property name="driverClassName" value="com.microsoft.jdbc.sqlserver.SQLServerDriver"></property><property name="url" value="jdbc:microsoft:sqlserver://192.168.104.5:1433;databaseName=zyqx"></property><property name="username" value="sa"></property><property name="password" value="xxx"></property><property name="minIdle" value="2"></property><property name="maxActive" value="50"></property><property name="maxWait" value="30000"></property></source-database><!-- 目标数据库 --><destination-database><property name="type" value="ORACLE"></property><property name="driverClassName" value="oracle.jdbc.driver.OracleDriver"></property><property name="url" value="jdbc:oracle:thin:@10.142.13.178:1521:BIETL"></property><property name="username" value="ods"></property><property name="password" value="xxxx"></property></destination-database><task fromtable="select * from forecast where t_day>getdate()-365" totable="buf_qx_forecast" fromschema="dbo" repeatinterval="210000" startdelay="0" tastinfo="城市天气预报" priority="10" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist><field-mapping><field name="forecastTime"><tofield name="forecast_date"></tofield></field></field-mapping><fieldinterceptorlist><fieldinterceptor fromfield="forecastTime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor"></fieldinterceptor><fieldinterceptor fromfield="t_day" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor"></fieldinterceptor></fieldinterceptorlist></task><task fromtable="TYPHOON_1" fromschema="dbo" totable="BUF_TF_TYPHOON_1" cronexpression="0 0/11 * * * ?" taskinfo="台风1" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist></task><task fromtable="TYPHOON_2" fromschema="dbo" totable="BUF_TF_TYPHOON_2" repeatinterval="210000" startdelay="0" taskinfo="台风2" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist><!-- 时间格式的拦截器 --><fieldinterceptorlist><fieldinterceptor fromfield="wdatatime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor><fieldinterceptor fromfield="nodetime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor></fieldinterceptorlist></task><task fromtable="TYPHOON_3" fromschema="dbo" totable="BUF_TF_TYPHOON_3" taskinfo="台风3" repeatinterval="210000" startdelay="0" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist><!-- 时间格式的拦截器 --><fieldinterceptorlist><fieldinterceptor fromfield="forecasttime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor><fieldinterceptor fromfield="comingtime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor><fieldinterceptor fromfield="wdatatime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor></fieldinterceptorlist></task><task fromtable="WEAFORECAST_2" fromschema="dbo" totable="BUF_QX_WEAFORECAST_2" cronexpression="0 0/29 * * * ?" taskinfo="三天九地市与四十小时九地市" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist></task><task fromtable="WEAFORECAST_1" fromschema="dbo" totable="BUF_QX_WEAFORECAST_1" cronexpression="0 0/29 * * * ?" taskinfo="三天九地市与四十小时九地市" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist></task><task fromtable="WEAINFO" fromschema="dbo" totable="BUF_QX_WEAINFO" cronexpression="0 0/3 * * * ?" taskinfo=" 地市气象站实时信息 每时刻12分运行一次,42分钟时也运行一次" issamefieldname="true" batchinsertsize="30"><extractkeylist><extractkey fromfield="id" tofield="id"></extractkey></extractkeylist><fieldinterceptorlist><fieldinterceptor fromfield="wdatatime" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor><fieldinterceptor fromfield="wdate" fromfieldinterceptor="com.epgis.extract.business.interceptor.Date2StringFieldInterceptor2"></fieldinterceptor></fieldinterceptorlist></task></extract-case>修订记录日期修订版本修订内容作者0906050.1初稿拟制黄仕勇0906150.2初步完成编写</task></field-mapping></field-mapping></filed></task></task></field-mapping></task></field-mapping></tofield></field></field-mapping> 1 楼 Merrygrass 2011-05-27 楼主能把lib包一起发出来吗?