Regular Expressions Optimisation
May 30, 2026 | 5 min read
When re Module Wins
The re module is C-accelerated and usually faster than pure Python for pattern matching:
import re
emails = re.findall(r'[\w.]+@[\w.]+', text)
When Pyvorin Wins
For simple character scanning or custom tokenisation, compiled Python can match or beat regex startup overhead:
def extract_numbers(text: str) -> list[int]:
numbers = []
current = []
for ch in text:
if ch.isdigit():
current.append(ch)
else:
if current:
numbers.append(int(''.join(current)))
current = []
if current:
numbers.append(int(''.join(current)))
return numbers
Hybrid Approach
Use regex for complex patterns and Pyvorin for the surrounding transformation logic:
def process_document(text):
tokens = re.split(r'\s+', text)
return compiled_transform(tokens)