一个正则表达式被定义为一个字符串,且这个字符串必须先被编译进Pattern类实例中。生成的pattern实例可以被用来创建能够匹配与正则表达式对应的任意(arbitrary)字符序列(character sequences)的Matcher对象。所有涉及到匹配执行过程的状态都会被驻留(resides)在匹配器(matcher)中, 因此许多匹配器可以共享相同的Pattern实例。 下面是一个典型的调用字符串的例子: ```{lang} Pattern p = Pattern.compile("a*b"); Matcher m = p.matcher("aaaaaaab"); boolean b = m.matches(); ``` matches方法由Pattern类来定义是为了方便,因为一个正则表达式(regular expression)仅仅被使用过一次。这个方法在一次调用中编译表达式并且匹配输入字符串。这个声名boolean b = Pattern.matches("a*b", "aaaaaaaab")和上面那三条语句是完全相等的,不过对于重复匹配来说,这个语句的效率更低,因为它不允许编译了正则表达式的Pattern实体类被重复使用。 Pattern类的实例是不可变的(immutable),被多线程(multiple concurrent threads)来使用是安全的。但是Mather实例不是线程安全的。 ## # 正则表达式结构简要介绍 字符 ```markdown 结构(construct) : Matches Characters(被匹配的字符串) x : The character x \\ : The backslash character \0n : The character with octal value 0n (0 <= n <= 7) \0nn : The character with octal value 0nn (0 <= n <= 7) \0mnn : The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) \xhh : The character with hexadecimal value 0xhh \uhhhh: The character with hexadecimal value 0xhhhh \x{h...h} : The character with hexadecimal value 0xh...h (Character.MIN_CODE_POINT <= 0xh...h <= Character.MAX_CODE_POINT) \t : The tab character ('\u0009') \n : The newline (line feed) character ('\u000A') \r : The carriage-return character ('\u000D') \f : The form-feed character ('\u000C') \a : The alert (bell) character ('\u0007') \e : The escape character ('\u001B')\cx:The control character corresponding to x` ``` 字符类 ```{lang} [abc] a, b, or c (simple class) [^abc] Any character except a, b, or c (negation) [a-zA-Z] a through z or A through Z, inclusive (range) [a-d[m-p]] a through d, or m through p: [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction) [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction) ``` Predefined character classes ```{lang} `.` Any character (may or may not match line terminators) \d A digit: [0-9] \D A non-digit: [^0-9] \h A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000] \H A non-horizontal whitespace character: [^\h] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \v A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029] \V A non-vertical whitespace character: [^\v] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w] ``` 贪婪匹配(Reluctant quantifiers) (尽可能多的匹配) ```{lang} X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times ``` 例如: 字符串:aabaaab, 正则表达式: a.*b, 匹配结果为aabaaab 勉强模式限定符(Reluctant quantifiers) (总是尽可能少的匹配) ```{lang} X?? X, once or not at all X*? X, zero or more times X+? X, one or more times X{n}? X, exactly n times X{n,}? X, at least n times X{n,m}? X, at least n but not more than m times ``` 例如: 字符串:aabaaab, 正则表达式: a.*b, 匹配结果为aab 结果很明显,勉强模式是只要匹配到了就停止后面的匹配,有点类似于短路与&&,一旦&&左侧表达式为false就不再判断右侧了。 Possessive(占有) quantifiers > **解释** > 非回溯,一旦匹配失败立即停止匹配,比贪婪模式更加高效。[Possessive Quantifiers](https://www.regular-expressions.info/possessive.html) ```{lang} X?+ X, once or not at all X*+ X, zero or more times X++ X, one or more times X{n}+ X, exactly n times X{n,}+ X, at least n times X{n,m}+ X, at least n but not more than m times ``` 逻辑运算符 ```{lang} XY X followed by Y X|Y Either X or Y (X) X, as a capturing group ``` 回溯引用 ```{lang} \n Whatever the nth capturing group matched \k Whatever the named-capturing group "name" matched ``` Quotation ```{lang} \ Nothing, but quotes the following character \Q Nothing, but quotes all characters until \E \E Nothing, but ends quoting started by \Q ``` Special constructs (named-capturing and non-capturing) ```{lang} (?X) X, as a named-capturing group (?:X) X, as a non-capturing group (?idmsuxU-idmsuxU) Nothing, but turns match flags i d m s u x U on - off (?idmsux-idmsux:X) X, as a non-capturing group with the given flags i d m s u x on - off (?=X) X, via zero-width positive lookahead (?!X) X, via zero-width negative lookahead (?<=X) X, via zero-width positive lookbehind (?X) X, as an independent, non-capturing group ``` ## # Groups and capturing(分组和捕获) 捕获组通过从左到右计算其开始括号来编号。例如,在表达式((A)(B (C))中,有四个这样的组: 0 ((A)(B(C))) 1 (A) 2 (B(C)) 3 (C) > **WARNING** > 第零组永远代表了整个表达式。 定义为捕获组这个名字的原因是在匹配的过程中,输入序列中的每一个与组锁匹配子序列都会被保存起来。捕获子序列可以在之后的通过回溯再次被使用,并且可以在匹配完成之后在匹配器(matcher)中再次检索。 可以使用group方法来获取捕获组,请看以下例子: ```{lang} String s = "abbbbaabbbbaaa"; String regex = "a(.*)b"; Matcher m = Pattern.compile(regex).matcher(s); m.group(1) // bbbbaabbb ``` Loading... 一个正则表达式被定义为一个字符串,且这个字符串必须先被编译进Pattern类实例中。生成的pattern实例可以被用来创建能够匹配与正则表达式对应的任意(arbitrary)字符序列(character sequences)的Matcher对象。所有涉及到匹配执行过程的状态都会被驻留(resides)在匹配器(matcher)中, 因此许多匹配器可以共享相同的Pattern实例。 下面是一个典型的调用字符串的例子: ```{lang} Pattern p = Pattern.compile("a*b"); Matcher m = p.matcher("aaaaaaab"); boolean b = m.matches(); ``` matches方法由Pattern类来定义是为了方便,因为一个正则表达式(regular expression)仅仅被使用过一次。这个方法在一次调用中编译表达式并且匹配输入字符串。这个声名boolean b = Pattern.matches("a*b", "aaaaaaaab")和上面那三条语句是完全相等的,不过对于重复匹配来说,这个语句的效率更低,因为它不允许编译了正则表达式的Pattern实体类被重复使用。 Pattern类的实例是不可变的(immutable),被多线程(multiple concurrent threads)来使用是安全的。但是Mather实例不是线程安全的。 ## # 正则表达式结构简要介绍 字符 ```markdown 结构(construct) : Matches Characters(被匹配的字符串) x : The character x \\ : The backslash character \0n : The character with octal value 0n (0 <= n <= 7) \0nn : The character with octal value 0nn (0 <= n <= 7) \0mnn : The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7) \xhh : The character with hexadecimal value 0xhh \uhhhh: The character with hexadecimal value 0xhhhh \x{h...h} : The character with hexadecimal value 0xh...h (Character.MIN_CODE_POINT <= 0xh...h <= Character.MAX_CODE_POINT) \t : The tab character ('\u0009') \n : The newline (line feed) character ('\u000A') \r : The carriage-return character ('\u000D') \f : The form-feed character ('\u000C') \a : The alert (bell) character ('\u0007') \e : The escape character ('\u001B')\cx:The control character corresponding to x` ``` 字符类 ```{lang} [abc] a, b, or c (simple class) [^abc] Any character except a, b, or c (negation) [a-zA-Z] a through z or A through Z, inclusive (range) [a-d[m-p]] a through d, or m through p: [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction) [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z](subtraction) ``` Predefined character classes ```{lang} `.` Any character (may or may not match line terminators) \d A digit: [0-9] \D A non-digit: [^0-9] \h A horizontal whitespace character: [ \t\xA0\u1680\u180e\u2000-\u200a\u202f\u205f\u3000] \H A non-horizontal whitespace character: [^\h] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \v A vertical whitespace character: [\n\x0B\f\r\x85\u2028\u2029] \V A non-vertical whitespace character: [^\v] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w] ``` 贪婪匹配(Reluctant quantifiers) (尽可能多的匹配) ```{lang} X? X, once or not at all X* X, zero or more times X+ X, one or more times X{n} X, exactly n times X{n,} X, at least n times X{n,m} X, at least n but not more than m times ``` 例如: 字符串:aabaaab, 正则表达式: a.*b, 匹配结果为aabaaab 勉强模式限定符(Reluctant quantifiers) (总是尽可能少的匹配) ```{lang} X?? X, once or not at all X*? X, zero or more times X+? X, one or more times X{n}? X, exactly n times X{n,}? X, at least n times X{n,m}? X, at least n but not more than m times ``` 例如: 字符串:aabaaab, 正则表达式: a.*b, 匹配结果为aab 结果很明显,勉强模式是只要匹配到了就停止后面的匹配,有点类似于短路与&&,一旦&&左侧表达式为false就不再判断右侧了。 Possessive(占有) quantifiers > **解释** > 非回溯,一旦匹配失败立即停止匹配,比贪婪模式更加高效。[Possessive Quantifiers](https://www.regular-expressions.info/possessive.html) ```{lang} X?+ X, once or not at all X*+ X, zero or more times X++ X, one or more times X{n}+ X, exactly n times X{n,}+ X, at least n times X{n,m}+ X, at least n but not more than m times ``` 逻辑运算符 ```{lang} XY X followed by Y X|Y Either X or Y (X) X, as a capturing group ``` 回溯引用 ```{lang} \n Whatever the nth capturing group matched \k<name> Whatever the named-capturing group "name" matched ``` Quotation ```{lang} \ Nothing, but quotes the following character \Q Nothing, but quotes all characters until \E \E Nothing, but ends quoting started by \Q ``` Special constructs (named-capturing and non-capturing) ```{lang} (?<name>X) X, as a named-capturing group (?:X) X, as a non-capturing group (?idmsuxU-idmsuxU) Nothing, but turns match flags i d m s u x U on - off (?idmsux-idmsux:X) X, as a non-capturing group with the given flags i d m s u x on - off (?=X) X, via zero-width positive lookahead (?!X) X, via zero-width negative lookahead (?<=X) X, via zero-width positive lookbehind (?<!X) X, via zero-width negative lookbehind (?>X) X, as an independent, non-capturing group ``` ## # Groups and capturing(分组和捕获) 捕获组通过从左到右计算其开始括号来编号。例如,在表达式((A)(B (C))中,有四个这样的组: 0 ((A)(B(C))) 1 (A) 2 (B(C)) 3 (C) > **WARNING** > 第零组永远代表了整个表达式。 定义为捕获组这个名字的原因是在匹配的过程中,输入序列中的每一个与组锁匹配子序列都会被保存起来。捕获子序列可以在之后的通过回溯再次被使用,并且可以在匹配完成之后在匹配器(matcher)中再次检索。 可以使用group方法来获取捕获组,请看以下例子: ```{lang} String s = "abbbbaabbbbaaa"; String regex = "a(.*)b"; Matcher m = Pattern.compile(regex).matcher(s); m.group(1) // bbbbaabbb ``` 最后修改:2025 年 11 月 28 日 © 允许规范转载 赞 别打赏,我怕忍不住购买辣条与续命水