python自动化编程--正则表达式_小言_互联网的博客


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      ageRegex=re.
      compile(
      r'\d\d\d\d')
     
    
   
    
     
    
    
     
      a=ageRegex.search(
      "今年是2023年")
     
    
   
    
     
    
    
     
      print(a)
     
    
   
    
     
    
    
     
      print(a.group())
     
    
   
    
     
    
    
     
      '''<re.Match object; span=(3, 7), match='2023'>
     
    
   
    
     
    
    
     
      2023
     
    
   
    
     
    
    
     
      '''

二.正则表达式匹配更多模式

1.用括号分组

在compile插入字符串是利用括号可以将匹配的数据进行分组，并通过Math对象的group(index)加下标的方式分布显示，如果查看全部数据用groups()方法


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'(\d\d\d\d)-(\d)-(\d)')
     
    
   
    
     
    
    
     
      a=Regex.search(
      '今年是2023-1-1')
     
    
   
    
     
    
    
     
      print(a)
     
    
   
    
     
    
    
     
      print(a.group(
      1))
     
    
   
    
     
    
    
     
      print(a.groups())
     
    
   
    
     
    
    
     
      '''<re.Match object; span=(3, 11), match='2023-1-1'>
     
    
   
    
     
    
    
     
      2023
     
    
   
    
     
    
    
     
      ('2023', '1', '1')
     
    
   
    
     
    
    
     
      '''

2.用管道匹配多个分组

符号|为管道，匹配多个字符串是可以用|连接，例如A|B，就可以匹配A或B，如果A,B都出现在字符串中只返回第一到Math对象中


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'hello|hi')
     
    
   
    
     
    
    
     
      a=Regex.search(
      'hello world hi time')
     
    
   
    
     
    
    
     
      print(a.group())
     
    
   
    
     
    
    
     
      '''hello'''

也可以通过括号将多个匹配的数据分组，group查看匹配的数据，group(1)查看从管道匹配的数据


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'Bat(man|mobile|bat)')
     
    
   
    
     
    
    
     
      a=Regex.search(
      'Batmobile lost a wheel , l like Batman')
     
    
   
    
     
    
    
     
      print(a.group())
     
    
   
    
     
    
    
     
      print(a.group(
      1))
     
    
   
    
     
    
    
     
      '''Batmobile
     
    
   
    
     
    
    
     
      mobile'''

3.用问号表示可选

如果要匹配的数据可以在也可以不在，可以用？来实现匹配


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'hello (world)?')
     
    
   
    
     
    
    
     
      a=Regex.search(
      'hello world hello time')
     
    
   
    
     
    
    
     
      b=Regex.search(
      'hello time')
     
    
   
    
     
    
    
     
      print(a.group())
     
    
   
    
     
    
    
     
      print(b.group())
     
    
   
    
     
    
    
     
      '''hello world
     
    
   
    
     
    
    
     
      hello '''

但字符串中有可选的数据时匹配，如果没有也不影响其他数据的匹配

4.用星号匹配零次或多次

星号*表示出现零次或多次，即标星号的内容可以出现一次或多次


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'(super)*man')
     
    
   
    
     
    
    
     
      a=Regex.search(
      'I am superman')
     
    
   
    
     
    
    
     
      b=Regex.search(
      'I am supersupersuperman')
     
    
   
    
     
    
    
     
      print(a.group())
     
    
   
    
     
    
    
     
      print(b.group())
     
    
   
    
     
    
    
     
      '''superman
     
    
   
    
     
    
    
     
      supersupersuperman
     
    
   
    
     
    
    
     
      '''

5.用加号表示匹配一次或多次

星号表示匹配零次或多次，而加号表示匹配一次或多次，加号必须有一次匹配才可以，否则返回的Math对象为None


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'(super)+man')
     
    
   
    
     
    
    
     
      c=Regex.search(
      'I am man')
     
    
   
    
     
    
    
     
      a=Regex.search(
      'I am superman')
     
    
   
    
     
    
    
     
      b=Regex.search(
      'I am supersupersuperman')
     
    
   
    
     
    
    
     
      print(a.group())
     
    
   
    
     
    
    
     
      print(b.group())
     
    
   
    
     
    
    
     
      print(c)
     
    
   
    
     
    
    
     
      '''superman
     
    
   
    
     
    
    
     
      supersupersuperman
     
    
   
    
     
    
    
     
      None
     
    
   
    
     
    
    
     
      '''

6.用花括号匹配特定次数

(a){3}表示匹配字符串‘aaa’但不匹配‘aa’，如果花括号里有两个数表示范围从最小值到最大值，例如(a){3,5}表示匹配字符串‘aaa’，‘aaaa’，‘aaaaa’，也可以省略最大值或最小值


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'(hello){2}')
     
    
   
    
     
    
    
     
      a=Regex.search(
      'hellohello time hello')
     
    
   
    
     
    
    
     
      print(a.group())
     
    
   
    
     
    
    
     
      '''hellohello'''

三.贪心和非贪心匹配

python的正则表达式在默认情况下是贪心的，即在有二意的情况下，默认匹配最多的的字符串，如果想人正则表达式不贪心，可以在花括号后面加上一个问号来约束贪心


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'(a){2,8}?')
     
    
   
    
     
    
    
     
      Regex1=re.
      compile(
      r'(a){2,8}')
     
    
   
    
     
    
    
     
      string=
      'aaaaaaaaaaaaaa'
     
    
   
    
     
    
    
     
      a=Regex.search(string)
     
    
   
    
     
    
    
     
      b=Regex1.search(string)
     
    
   
    
     
    
    
     
      print(a.group())  
      #非贪心
     
    
   
    
     
    
    
     
      print(b.group())
     
    
   
    
     
    
    
     
      '''aa
     
    
   
    
     
    
    
     
      aaaaaaaa'''

findall()方法

search()方法将第一次匹配的字符串返回到一个Math对象，findall()和search()方法类似，但findall()方法可以匹配多个匹配的字符串，如果只匹配一个就返回匹配的字符串，如果匹配多个，将返回一个元组，元组里包含匹配的字符串


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'hello')
     
    
   
    
     
    
    
     
      string=
      'hello world! hello time!'
     
    
   
    
     
    
    
     
      a=Regex.search(string)
     
    
   
    
     
    
    
     
      b=Regex.findall(string)
     
    
   
    
     
    
    
     
      print(a.group())
     
    
   
    
     
    
    
     
      print(b)
     
    
   
    
     
    
    
     
      '''hello
     
    
   
    
     
    
    
     
      ['hello', 'hello']'''

四.字符分类

缩写字符串分类	表示
\d	0到9的任何数字
\D	除0到9以外的任意字符
\w	字母，数字或下划线字符
\W	除字母，数字和下划线外的任意字符
\s	空格，制表符或换行符
\S	除空格，制表符和换行符外的任意字符

例如：\d+匹配一次或多次数字，\s匹配空格，制表符或换行符，\w+匹配字母，数字或下划线


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'\d+\s\w+')
     
    
   
    
     
    
    
     
      a=Regex.findall(
      '1 dog,2 pig,3 duck,4 cat,5 fish,6 col')
     
    
   
    
     
    
    
     
      print(a)
     
    
   
    
     
    
    
     
      '''['1 dog', '2 pig', '3 duck', '4 cat', '5 fish', '6 col']'''

五.自定义字符分类

1.创建自定义字符

通过-创建自定义字符，例如[a-z]表示匹配a~z的小写字母，[A-Z0-9]匹配A~Z或0~9，当加上(^)时表示不匹配这些字符串，例如[^a-z]表示不匹配a~z，[aiuoe]表示匹配指定的字母aioue


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'[a-z]')
     
    
   
    
     
    
    
     
      Regex1=re.
      compile(
      r'[aioue]')  
      #元音
     
    
   
    
     
    
    
     
      string=
      'abcdefghijklmnopqrstuvwsyz'
     
    
   
    
     
    
    
     
      a=Regex.findall(string)
     
    
   
    
     
    
    
     
      b=Regex1.findall(string)
     
    
   
    
     
    
    
     
      print(b)
     
    
   
    
     
    
    
     
      '''['a', 'e', 'i', 'o', 'u']'''
     
    
   
    
     
    
    
     
      print(a)
     
    
   
    
     
    
    
     
      '''['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 
     
    
   
    
     
    
    
     
      'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 's', 'y', 'z']'''

2.插入符号和美元符号

插入符号用(^)表示，插入符号表示匹配的字符串开始的位置，美元符号$表示结束，例如^\d表示从0~9的数字开始匹配，\d$表示匹配0~9结束的字符串


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'^\d+\w+$')
     
    
   
    
     
    
    
     
      a=Regex.findall(
      '1b32c23d')
     
    
   
    
     
    
    
     
      print(a)
     
    
   
    
     
    
    
     
      '''['1b32c23d']'''

3.通配字符

用.表示通配符，通配符可以匹配除了换行之外的所有字符


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'.a.')
     
    
   
    
     
    
    
     
      a=Regex.findall(
      'dfweascareaefefwa')
     
    
   
    
     
    
    
     
      print(a)
     
    
   
    
     
    
    
     
      '''['eas', 'car', 'eae']'''

4.用(.*)匹配所有字符

(.*)匹配所有字符，.表示匹配除换行符的所有字符，*表示出现零次或多次


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'<.*>')
     
    
   
    
     
    
    
     
      a=Regex.findall(
      '<adf>,<fedf>,<gre>,<fww2>')
     
    
   
    
     
    
    
     
      print(a)
     
    
   
    
     
    
    
     
      '''['<adf>,<fedf>,<gre>,<fww2>']'''

5.用参数re.DOTALL匹配换行

正则表达式中通过插入re.DOTALL作为compile的第二参数来人通配符匹配所有字符，包括换行


  
   
    
     
    
    
     
      import re
     
    
   
    
     
    
    
     
      Regex=re.
      compile(
      r'.*',re.DOTALL)
     
    
   
    
     
    
    
     
      a=Regex.findall(
      'hello world hello time')
     
    
   
    
     
    
    
     
      print(a)
     
    
   
    
     
    
    
     
      '''['hello world hello time', '']'''

转载：https://blog.csdn.net/weixin_63009369/article/details/128513465

查看评论

小言_互联网的博客

小言_互联网的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章

python自动化编程--正则表达式

一.创建正则表达式

1.re模块

2.匹配Regex对象

二.正则表达式匹配更多模式

1.用括号分组

2.用管道匹配多个分组

3.用问号表示可选

4.用星号匹配零次或多次

5.用加号表示匹配一次或多次

6.用花括号匹配特定次数

三.贪心和非贪心匹配

四.字符分类

五.自定义字符分类

1.创建自定义字符

2.插入符号和美元符号

3.通配字符

4.用(.*)匹配所有字符

5.用参数re.DOTALL匹配换行

* 以上用户言论只代表其个人观点，不代表本网站的观点或立场