转义字符详解（java和python）_小言_互联网的博客

转义字符详解（java和python）

2019-09-16 20:19 554人阅读评论(0)

转义字符'\'的作用是，和其后面的一个或多个字符一起，表示一个特殊字符，如"\n"这两个字符来表示一个换行符。

java

System.out.println("\n".length());  // 1

Java语言有以下转义字符：

Escape Sequence	Description
`\t`	Insert a tab in the text at this point.
`\b`	Insert a backspace in the text at this point.
`\n`	Insert a newline in the text at this point.
`\r`	Insert a carriage return in the text at this point.
`\f`	Insert a formfeed in the text at this point.
`\'`	Insert a single quote character in the text at this point.
`\"`	Insert a double quote character in the text at this point.
`\\`	Insert a backslash character in the text at this point.

其他语言里也有类似的转义字符。

另外java中还有一类转义字符，和前面列的转义字符不同，这些转义字符是预编译的（在编译之前已经被替换为正常字符了）。

这样的转义字符有两类：

\u{0000-FFFF}  /* Unicode [Basic Multilingual Plane only, see below] hex value 
                  does not handle unicode values higher than 0xFFFF (65535),
                  the high surrogate has to be separate: \uD852\uDF62
                  Four hex characters only (no variable width) */
                  
\{0-377}       /* \u0000 to \u00ff: from octal value 
                  1 to 3 octal digits (variable width) */

其中

\u和其后面的4个字符（有效的16进制数字[0-9a-fA-F]）表示一个16进制的unicode值；
\和其后面的1-3个字符（范围为{0-377}的8进制有效数字）表示一个8进制的unicode值。

我们看几个例子：

public static void main(String[] args) throws Exception {
  System.out.println("\u0031"); // 1
  System.out.println("\u4E2D"); // 中
  System.out.println("\100");   // @
  System.out.println("\376");   // þ
  System.out.println("\0771");  // ?1
  System.out.println("\779");   // ?9
  System.out.println("\12A");   // 相当于打印"\nA"
}

输出结果：

1
中
@
þ
?1
?9

A

python

python中的转义字符表：

Escape Sequence	Meaning
`\newline`	Backslash and newline ignored
`\\`	Backslash (`\`)
`\'`	Single quote (`'`)
`\"`	Double quote (`"`)
`\a`	ASCII Bell (`BEL`)
`\b`	ASCII Backspace (`BS`)
`\f`	ASCII Formfeed (`FF`)
`\n`	ASCII Linefeed (`LF`)
`\r`	ASCII Carriage Return (`CR`)
`\t`	ASCII Horizontal Tab (`TAB`)
`\v`	ASCII Vertical Tab (`VT`)
`\ooo`	Character with octal value ooo
`\xhh`	Character with hex value hh

以下转义字符只能在表示（unicode）字符串的时候起作用：

Escape Sequence	Description
`\N{name}`	Character named name in the Unicode database
`\uxxxx`	Character with 16-bit hex value xxxx. Exactly four hexadecimal digits are required.
`\Uxxxxxxxx`	Character with 32-bit hex value xxxxxxxx. Exactly eight hexadecimal digits are required.

注意：在unicode字符串里\ooo和\xhh都表示一个unicode值，也就是说：u'\x10\10'和u'\u0010\u0008'等效，若想表示U+1008（HTML代码ဈ字符：ဈ），用u'\u1008'。

python和java不同的地方在于：

python中多了几种转义，如：

\x{00-FF}  # \x和其后面的2个字符（有效的16进制数字[0-9a-fA-F]）表示一个16进制的值。

python中\ooo这中8进制转义形式，能表示的范围为{0-777}。
对于超过0xFFFF的unicode值（辅助平面），python和Java表示方法也有不同。

如表示U+1303F这个unicode值，Java字符串需要拆成两个值表示"\uD80C\uDC3F"(UTF-16代理对)，python会用这种形式表示"\U0001303F"。
关于UTF16编码细节，可以参看这篇博文：编码：UTF-8编码、UTF-16编码规则

有关这一点，python2默认可以识别UTF-16代理对，而python3默认无法识别，但可以通过使用编解码器错误处理程序处理：
```
# 如我们要表示汉字中的'𧟌'
# unicode：U+277CC
# UTF-16BE：D8 5D DF CC	

# python2
print u'\U000277cc' # 𧟌
print u'\ud85d\udfcc' # 𧟌

# python3
print(u'\U000277cc') # 𧟌
print(u'\ud85d\udfcc') # 报错：UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
print(u'\ud85d\udfcc'.encode('utf-16', 'surrogatepass').decode('utf-16')) # 𧟌
```

python中凡是无法识别的转义字符序列，会保留在字符串中（即保留'\'字符），而java中会报错。如：

"\Z\X"  # 在java中通不过编译，而在python中相当于"\\Z\\X"

# 这种可以识别却不完整的转义字符序列，在python中也会报错。
"\xFG"

关于\N{name}转义，使用字符在unicode数据库中的名称：

print(u'\N{LATIN CAPITAL LETTER A}') # A
print(u'\N{CJK UNIFIED IDEOGRAPH-4E2D}') # 中

获取unicode database名称：

import unicodedata
unicodedata.name('A') # LATIN CAPITAL LETTER A

python中使用`r''`定义字符串

python中可以使用b''或B''定义bytes类型（两种定义等效，下同）；
使用u''或U''定义unicode字符串类型；
使用r''或R''定义普通字面值字符串类型。

关于用r''定义字符串，其含义就是改变转义字符的转义行为，会保留转义字符'\'和其后面的一个字符。
举例：

r'\xgg' # 相当于'\\xgg'
r'\n'   # 相当于'\\n'
r'\\'   # 相当于'\\\\'
r'\''   # 相当于'\\\''或"\\'"

r'\' # 报错：SyntaxError: EOL while scanning string literal

r'\'之所以会报错，是因为最后的单引号被转义字符转义并解释为普通字符，而非定义字符串结束的定界符。这样定义字符串语句就没有正确结束。

也就是说，使用r''会改变转义字符的转义作用，而非使其失去转义效果。

那问题来了，使用r''定义字符串是没有办法定义以奇数个'\'字符结尾的字符串的，这算是python的一个小缺陷吧，可以使用如下方法来解决：

path = r'd:\test' '\\'
print(path) # d:\test\

各种Emoji图标的unicode值和名称：点这里查看。

转载：https://blog.csdn.net/xuejianbest/article/details/100739503

查看评论

小言_互联网的博客

小言_互联网的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章