使用split进行分割时遇到特殊字符的问题

使用split分割时:

String[] a="aa|bb|cc".split("|");

output:
[a, a, |, b, b, |, c, c]

先看一下split的用法:

 String[] java.lang.String.split(String regex)

Splits this string around matches of the given regular expression. 

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array. 

The string "boo:and:foo", for example, yields the following results with these expressions: 

Regex Result
: { "boo", "and", "foo" }}
o { "b", "", ":and:f" }} 

Parameters:
regex the delimiting regular expression
Returns:
the array of strings computed by splitting this string around matches of the given regular expression
Throws:
PatternSyntaxException - if the regular expression‘s syntax is invalid
Since:
1.4
See Also:
java.util.regex.Pattern
@spec
JSR-51

可以看到split中参数是一个正则表达式,正则表达式中有一些特殊字符需要注意,它们有自己的用法:

http://www.fon.hum.uva.nl/praat/manual/Regular_expressions_1__Special_characters.html

The following characters are the meta characters that give special meaning to the regular expression search syntax:

\ the backslash escape character.
The backslash gives special meaning to the character following it. For example, the combination "\n" stands for the newline, one of the control characters. The combination "\w" stands for a "word" character, one of the convenience escape sequences while "\1" is one of the substitution special characters.
    Example: The regex "aa\n" tries to match two consecutive "a"s at the end of a line, inclusive the newline character itself.
    Example: "a\+" matches "a+" and not a series of one or "a"s.
^ the caret is the start of line anchor or the negate symbol.
    Example: "^a" matches "a" at the start of a line.
    Example: "[^0-9]" matches any non digit.
$ the dollar is the end of line anchor.
    Example: "b$" matches a "b" at the end of a line.
    Example: "^b$" matches the empty line.
{ } the open and close curly bracket are used as range quantifiers.
    Example: "a{2,3}" matches "aa" or "aaa".
[ ] the open and close square bracket define a character class to match a single character.
The "^" as the first character following the "[" negates and the match is for the characters not listed. The "-" denotes a range of characters. Inside a "[ ]" character class construction most special characters are interpreted as ordinary characters.
    Example: "[d-f]" is the same as "[def]" and matches "d", "e" or "f".
    Example: "[a-z]" matches any lowercase characters in the alfabet.
    Example: "[^0-9]" matches any character that is not a digit.
    Example: A search for "[][()?<>.*?]" in the string "[]()?<>.*?" followed by a replace string "r" has the result "rrrrrrrrrrrrr". Here the search string is one character class and all the meta characters are interpreted as ordinary characters without the need to escape them.
( ) the open and close parenthesis are used for grouping characters (or other regex).
The groups can be referenced in both the search and the substitution phase. There also exist some special constructs with parenthesis.
    Example: "(ab)\1" matches "abab".
. the dot matches any character except the newline.
    Example: ".a" matches two consecutive characters where the last one is "a".
    Example: ".*\.txt$" matches all strings that end in ".txt".
* the star is the match-zero-or-more quantifier.
    Example: "^.*$" matches an entire line.
+ the plus is the match-one-or-more quantifier.
? the question mark is the match-zero-or-one quantifier. The question mark is also used in special constructs with parenthesis and in changing match behaviour.
| the vertical pipe separates a series of alternatives.
    Example: "(a|b|c)a" matches "aa" or "ba" or "ca".
< > the smaller and greater signs are anchors that specify a left or right word boundary.
- the minus indicates a range in a character class (when it is not at the first position after the "[" opening bracket or the last position before the "]" closing bracket.
    Example: "[A-Z]" matches any uppercase character.
    Example: "[A-Z-]" or "[-A-Z]" match any uppercase character or "-".
& the and is the "substitute complete match" symbol.

那么上述方法的解决方法是使用转义来分割:

String[] a="aa|bb|cc".split("\\|");

小结:

对字符串的正则操作时要注意特殊字符的转义。

时间: 2024-08-25 17:46:42

使用split进行分割时遇到特殊字符的问题的相关文章

javascript中split字符串分割函数

1. var ss=s.split("fs"); for(var i=0;i<ss.length;i++){ 处理每一个ss[i]; } 2. "2:3:4:5".split(":") //将返回["2", "3", "4", "5"] "|a|b|c".split("|") //将返回["", &qu

freemarker中的split字符串分割

1.简易说明 split分割:用来根据另外一个字符串的出现将原字符串分割成字符串序列 2.举例说明 <#--freemarker中的split字符串分割--> <#list "张三三,李思思,,王强,柳树,诸葛正我"?split(",") as name> "${name}" </#list> <#list "AhuAjiuAjkdsfAoionAjiuiAnujkkdfAkoijAmcjdhf

Split字符串分割函数

非常非常常用的一个函数Split字符串分割函数. Dim myTest myTest = "aaa/bbb/ccc/ddd/eee/fff/ggg" Dim arrTest arrTest = Split(myTest , "/" , -1 , 1) Dim i For i = 0 to ubound(arrTest) print "arrTest(" & i & ") = " & arrTest(i)

(转)使用tar和split打包分割文件

tar是文件打包工具,split是文件分割工具,在邮件中发送附件的时候,可能因为附件大小限制,需要压缩并分割,分几封邮件发送.如果需要备份很多资料的时候,打包后的单个文件可能超出文件系统支持的单个文件大小限制的时候,也需要分割成适合大小的文件包.刚好今天又要备份skype文件夹,所以就用这个实例来说明一下tar和split这两个指令,如何压缩.分割,再通过cat合并分割的文件解压还原,以及在管道”|”配合下的打包分割文件过程. 例如:我的.skype文件已经有35M之多了,现在需要压缩备份起来,

javascript 中 split 函数分割字符串成数组

分割字符串成数组的方法有很多,不过使用最多的还是split函数 <script language="javascript"> str="2,2,3,5,6,6"; //这是一字符串 var strs= new Array(); //定义一数组 strs=str.split(","); //字符分割 for (i=0;i<strs.length ;i++ ) { document.write(strs[i]+"<b

hive函数 -- split 字符串分割函数

hive字符串分割函数 split(str, regex) - Splits str around occurances that match regexTime taken: 0.769 seconds, Fetched: 1 row(s) 返回值为一个数组 a.基本用法: 例1: split('a,b,c,d',',') 得到的结果: ["a","b","c","d"] b.截取字符串中的某个值: 当然,我们也可以指定取结

Python split()方法分割字符串

Python 中,除了可以使用一些内建函数获取字符串的相关信息外(例如 len() 函数获取字符串长度),字符串类型本身也拥有一些方法供我们使用. 注意,这里所说的方法,指的是字符串类型 str 本身所提供的,由于涉及到类和对象的知识,初学者不必深究,只需要知道方法的具体用法即可. split() 方法可以实现将一个字符串按照指定的分隔符切分成多个子串,这些子串会被保存到列表中(不包含分隔符),作为方法的返回值反馈回来.该方法的基本语法格式如下: str.split(sep,maxsplit)

分享一个 Java String split 快速分割的方法

java中string.split() 方法比较强大,但是split()方法采用正则表达式,速度相对会慢一点, 其实大多数场景下并不需要使用正则表达式,下面分享一个不使用正则表达式分隔字符串的方法. 方法保证了和 string.split()的输出结果一致. 直接看代码: public static String[] split(String src,String delimeter){ String srcStr = src; String delimeterStr = delimeter;

Java,js,多条件split字符分割

后台字符串分割处理: String s = "i20002/400|i3030/300";        String[] s1 = s.split("\\||/");                   for (int i = 0; i < s1.length; i++) {           System.out.println(s1[i]);        } //输出结果:i20002 400 i3030 300 在Java中 "|&qu