java正则表达式简介跟split详细介绍

2012-10-08

java正则表达式简介和split详细介绍?java正则表达式Java的正则表达式是由java.util.regex的Pattern和Match

java正则表达式简介和split详细介绍

java正则表达式

Java的正则表达式是由java.util.regex的Pattern和Matcher类实现的。Pattern对象表示经编译的正则表达式。静态的compile( )方法负责将表示正则表达式的字符串编译成Pattern对象。只要给Pattern的matcher( )方法送一个字符串就能获取一个Matcher对象。此外，Pattern还有一个能快速判断能否在input里面找到regex的staticboolean matches(?regex, ?input)方法以及以及能返回String数组的split( )方法，它能用regex把字符串分割开来。

Matcher的方法来查询匹配的结果了。

boolean matches()

boolean lookingAt()

boolean find()

boolean find(int start)

matches( )的前提是Pattern匹配整个字符串，而lookingAt( )的意思是Pattern匹配字符串的开头。find( )像一个迭代器，从头到尾扫描一遍字符串。上次匹配的最后将是下次匹配的开始。第二个find( )是带int参数的，正如你所看到的，它会告诉方法从哪里开始找——即从参数位置开始查找。这上面的方法都会改变匹配器开始匹配的起始位置。lookingAt( )和matches( )，只有在字符串与正则表达式一开始就相匹配的情况下才能返回true。matches( )成功的前提是正则表达式与字符串完全匹配，而lookingAt( )成功的前提是，字符串的开始部分与正则表达式相匹配。上面的几个方法都会改变匹配的一些属性。通过看源码知道Mather有下面这些属性（不止这些，这里没有贴出全部）

 /**     * The storage used by groups. They may contain invalid values if     * a group was skipped during the matching.     */    int[] groups;    /**     * The range within the sequence that is to be matched. Anchors     * will match at these "hard" boundaries. Changing the region     * changes these values.     */    int from, to;    /**     * Lookbehind uses this value to ensure that the subexpression     * match ends at the point where the lookbehind was encountered.     */    int lookbehindTo;    /**     * The original string being matched.     */    CharSequence text;    /**     * Matcher state used by the last node. NOANCHOR is used when a     * match does not have to consume all of the input. ENDANCHOR is     * the mode used for matching all the input.     */    static final int ENDANCHOR = 1;    static final int NOANCHOR = 0;    int acceptMode = NOANCHOR;    /**     * The range of string that last matched the pattern. If the last     * match failed then first is -1; last initially holds 0 then it     * holds the index of the end of the last match (which is where the     * next search starts).     */    int first = -1, last = 0;

上面的方法都会改变first,last,from,to等这些属性。Matcher的start()返回的实际上是first，end()方法换回的是last。如果first为-1是你去调用start方法会出现异常：throw new IllegalStateException("No match available");

下面这段代码是实例：

Matcher m = Pattern.compile("\\w+").matcher("Evening is full of the linnet's wings");System.out.println(m.lookingAt()+"  "+m.start()+":"+m.end());Matcher mm = Pattern.compile("\\w+").matcher("Evening");System.out.println(mm.matches());mm.reset();System.out.println(mm.find()+":"+m.start()+":"+m.end());while(mm.find()){System.out.println(mm.start()+mm.end());}while (m.find()){System.out.println(m.start()+":"+m.end());System.out.println(m.group());}

得到结果为：

true ?0:7

true

true:0:7

8:10

11:15

full

16:18

19:22

the

23:29

linnet

30:31

32:37

wings

现在看正则表达式的应用，像String类中有split(reg,limit)方法，这个方法实际上是又调用了Pattern.compile(regex).split(str, n)?

我们看split应用实例：

String newStr = "AaaaA";String[] bb = p.split(newStr);System.out.println(Arrays.toString(bb));

这样会得到一个很难理解的结果：[, A, , A]

这样很奇怪怎么是这样呢。一般人会理解为[A,A]。其实上面的相当与下面的

String newStr = "AaaaA";Pattern p = Pattern.compile("a*");Matcher m =p.matcher(newStr);int index=0;List<String> list = new ArrayList<String>();while(m.find()){System.out.println(m.start()+":"+m.end());String str = newStr.substring(index, m.start());System.out.println(str);list.add(str);index=m.end();}System.out.println(Arrays.toString(list.toArray()));

?通过看源码你会发现split的实现其实和上面是同一种方式的。只是写得更复杂，而没那么简单而已。这样我们就很容易理解了。所以要得到[A,A]的结果，只需改变正则表达式为"a+"即可。

["A", "A"]

热点排行

编程

java正则表达式简介跟split详细介绍