Regular expressions (Regex) are sequences of characters that define search patterns. They are widely used for pattern matching and string manipulation tasks such as validation, searching, and substitution.
re
ModuleTo use regex in Python, you need to import the re
module:
import re
re.match()
Searches for a match only at the beginning of the string.
import re
pattern = r"^Hello"
result = re.match(pattern, "Hello, world!")
if result:
print("Match found!")
re.search()
Searches for a match anywhere in the string.
result = re.search(r"world", "Hello, world!")
if result:
print("Match found!")
re.findall()
Finds all occurrences of a pattern in a string.
matches = re.findall(r"\d+", "The price is 45 dollars and 30 cents.")
print(matches) # Output: ['45', '30']
re.finditer()
Returns an iterator yielding match objects for all matches.
for match in re.finditer(r"\d+", "The price is 45 dollars and 30 cents."):
print(match.group())
re.sub()
Replaces occurrences of a pattern with a specified string.
result = re.sub(r"\d+", "[NUMBER]", "Item1: 123, Item2: 456")
print(result) # Output: "Item1: [NUMBER], Item2: [NUMBER]"
Use re.compile()
to compile a regex pattern for reuse, improving readability and efficiency.
Regex uses special characters to define patterns:
Character | Description |
---|---|
. | Matches any character except a newline. |
^ | Matches the beginning of a string. |
$ | Matches the end of a string. |
* | Matches 0 or more repetitions. |
+ | Matches 1 or more repetitions. |
? | Matches 0 or 1 repetition. |
{n} | Matches exactly n repetitions. |
{n,} | Matches n or more repetitions. |
{n,m} | Matches between n and m repetitions. |
[] | Matches any character inside the brackets. |
| | Acts as an OR operator. |
() | Groups patterns. |
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
email = "example@example.com"
if re.match(pattern, email):
print("Valid email!")
else:
print("Invalid email!")
text = "Visit https://example.com or http://example.org."
urls = re.findall(r"https?://[\w.-]+", text)
print(urls) # Output: ['https://example.com', 'http://example.org']
result = re.split(r",\s*", "apple, banana, cherry")
print(result) # Output: ['apple', 'banana', 'cherry']
Regex is powerful but can become complex. Use tools like regex101.com to test and debug your patterns interactively.
Find All Words Starting with a Capital Letter:
\b[A-Z][a-z]*\b
Example:
text = "The quick Brown Fox jumps Over the lazy Dog."
words = re.findall(r"\b[A-Z][a-z]*\b", text)
print(words) # Output: ['The', 'Brown', 'Fox', 'Over', 'Dog']
Validate Phone Numbers:
^\+\d{1,3}-\d{3}-\d{3}-\d{4}$
Example:
phone = "+1-800-555-1234"
if re.match(r"^\+\d{1,3}-\d{3}-\d{3}-\d{4}$", phone):
print("Valid phone number!")
else:
print("Invalid phone number!")
Extract Hashtags from a Tweet:
#\w+
Example:
tweet = "#Python is amazing! #coding #regex"
hashtags = re.findall(r"#\w+", tweet)
print(hashtags) # Output: ['#Python', '#coding', '#regex']
Regex is an essential tool for any Python developer dealing with text processing. With practice, you can unlock its full potential for various real-world applications.