While matching an email address, after I match something like yasar@webmail
, I want to capture one or more of (.w+)
(what I am doing is a little bit more complicated, this is just an example), I tried adding (.w+)+ , but it only captures last match. For example, [email protected]
matches but only include .tr
after yasar@webmail
part, so I lost .something
and .edu
groups. Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?
4
re
module doesn’t support repeated captures (regex
supports it):
>>> m = regex.match(r'([.w]+)@((w+)(.w+)+)', '[email protected]')
>>> m.groups()
('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')
>>> m.captures(4)
['.something', '.edu', '.tr']
In your case I’d go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip’s answer.
5
You can fix the problem of (.w+)+
only capturing the last match by doing this instead: ((?:.w+)+)
3
This will work:
>>> regexp = r"[w.]+@(w+)(.w+)?(.w+)?(.w+)?(.w+)?(.w+)?"
>>> email_address = "[email protected]"
>>> m = re.match(regexp, email_address)
>>> m.groups()
('galactica', '.caprica', '.fleet', '.mil', None, None)
But it’s limited to a maximum of six subgroups. A better way to do this would be:
>>> m = re.match(r"[w.]+@(.+)", email_address)
>>> m.groups()
('galactica.caprica.fleet.mil',)
>>> m.group(1).split('.')
['galactica', 'caprica', 'fleet', 'mil']
Note that regexps are fine so long as the email addresses are simple – but there are all kinds of things that this will break for. See this question for a detailed treatment of email address regexes.
This is what you are looking for:
>>> import re
>>> s="[email protected]"
>>> r=re.compile(".w+")
>>> m=r.findall(s)
>>> m
['.something', '.edu', '.tr']
4