1.String 存储的值就是一个char数组
1 /** The value is used for character storage. */ 2 private final char value[];
2.传入int作为参数,这个int是这个字对应的Unicode(16进制数)。每个最大65535 0xFFFF
public static final int MIN_CODE_POINT = 0x000000;
public static final int MAX_CODE_POINT = 0X10FFFF;
UTF-16中的基本单位是两个字节的码元,基本的码元范围是(0x0000-0xFFFF), UTF-16的字符映射范围是(U+0000,U+10FFFF),
当一个生僻字符需要使用0xFFFF以上的映射范围时,其需要使用两个码元(4Byte)进行表示. 其映射规则如下
第一个码元(前导代理)范围:0xD800 - 0xDBFF
第二个码元(后尾代理)范围:0xDC00 - 0xDFFF
有:(0xDBFF-0xD800+1)*(0xDFFF-0xDC00+1) === (0x10FFFF-0xFFFF)双射
所以(0xD800 - 0xDBFF)范围内的码元不能单独表示字符,其必须与后尾代理一起构成一个完整字符.
参考:https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
1 public String(int[] codePoints, int offset, int count) { 2 if (offset < 0) { 3 throw new StringIndexOutOfBoundsException(offset); 4 } 5 if (count <= 0) { 6 if (count < 0) { 7 throw new StringIndexOutOfBoundsException(count); 8 } 9 if (offset <= codePoints.length) { 10 this.value = "".value; 11 return; 12 } 13 } 14 // Note: offset or count might be near -1>>>1. 15 if (offset > codePoints.length - count) { 16 throw new StringIndexOutOfBoundsException(offset + count); 17 } 18 19 final int end = offset + count; 20 21 // Pass 1: Compute precise size of char[] 22 int n = count; 23 for (int i = offset; i < end; i++) { 24 int c = codePoints[i]; 25 if (Character.isBmpCodePoint(c)) 26 continue; 27 else if (Character.isValidCodePoint(c)) 28 n++; 29 else throw new IllegalArgumentException(Integer.toString(c)); 30 } 31 32 // Pass 2: Allocate and fill in char[] 33 final char[] v = new char[n]; 34 35 for (int i = offset, j = 0; i < end; i++, j++) { 36 int c = codePoints[i]; 37 if (Character.isBmpCodePoint(c)) 38 v[j] = (char)c; 39 else 40 Character.toSurrogates(c, v, j++); 41 } 42 43 this.value = v; 44 }
Character.isBmpCodePoint(c) 判断是不是只有一个码元的字符,Character.isValidCodePoint(c) 判断在字符范围内。此时n++,这个int要用2个char表示。
Character.toSurrogates(c, v, j++) 将int分解成2个char
3.length()返回的是码元char的数量,而不是字的数量,有些字要占两个char
1 public int length() { 2 return value.length; 3 }
4.String.join 免去StringBuild自己拼还要去掉最后一个delimiter
1 public static String join(CharSequence delimiter, CharSequence... elements)
5.native 关键字 调用别的语言的代码。
1 public native String intern();
深入解析String#intern
https://tech.meituan.com/in_depth_understanding_string_intern.html