PRegEx: Write Human-Readable Regular Expressions in Python

Khuyen Tran

Motivation

Imagine you are trying to find all URLs inside a text. Each of these URLs must:

Start with either http:// or https:// or the domain of the URL
End with either .com or .org

You might end up writing a complicated regular expression(RegEx) like the one below:

import re

text = """You can find me through my
website codecut.ai/ or
GitHub https://github.com/khuyentran1401"""

re.findall(
    "(?:https?://)?[^\s]+(?:\.com|\.org)[^\s]+",
    text
)
"""
[
    'codecut.ai/',
    'https://github.com/khuyentran1401'
]
"""

This RegEx is difficult to read and create. Is there a way that you can write a more human-readable RegEx with ease?

That is when PRegEx comes in handy.

What is PRegEx?

PRegEx is a Python package that allows you to construct RegEx patterns in a more human-friendly way.

To install PRegEx, type:

pip install pregex

The version of PRegEx that will be used in this article is 2.0.1:

pip install pregex==2.0.1

To learn how to use PRegEx, let’s start with some examples.

Capture URLs

Get a Simple URL

First, let’s try to get a URL in a text using PRegEx.

from pregex.core.classes import AnyButWhitespace
from pregex.core.quantifiers import OneOrMore
from pregex.core.operators import Either

text = "You can find me through GitHub https://github.com/khuyentran1401"

pre = (
    "https://"
    + OneOrMore(AnyButWhitespace())
    + Either(".com", ".org")
    + OneOrMore(AnyButWhitespace())
)

Output:

['https://github.com/khuyentran1401']

In the code above, we use:

AnyButWhitespace() to match any character except for whitespace characters
OneOrMore() to match the provided pattern one or more times.
Either to match either one of the provided patterns.

Specifically,

OneOrMore(AnyButWhitespace()) matches one or more characters that are not whitespace characters.
Either(".com", ".org) matches either .com or .org .

HTTP or HTTPS

Sometimes, a URL might use the scheme http instead of https . Let’s make the character s optional by using Optional() :


from pregex.core.quantifiers import Optional

text = "You can find me through GitHub One"

pre = (
    "http"
    + Optional("s")
    + "://"
    + OneOrMore(AnyButWhitespace())
    + Either(".com", ".org")
    + OneOrMore(AnyButWhitespace())
)
pre.get_matches(text)

Output:

['http://github.com/khuyentran1401']

Match URL without a Scheme

Some URLs in a text might not include a scheme such at https or http . Let’s make the scheme optional with Optional .

To make our code more readable, we will assign PRegrex’s methods to a variable.

text = "You can find me through my website mathdatasimplified.com/ or GitHub https://github.com/khuyentran1401"

at_least_one_character_except_white_space = OneOrMore(AnyButWhitespace())
optional_scheme = Optional("http" + Optional("s") + "://")
domain_choice = Either(".com", ".org")

pre = (
    optional_scheme
    + at_least_one_character_except_white_space
    + domain_choice
    + at_least_one_character_except_white_space
)
pre.get_matches(text)

Output:

['mathdatasimplified.com/', 'https://github.com/khuyentran1401']

Let’s take a look at the underlying RegEx pattern:

pre.get_pattern()

(?:https?\:\/\/)?[^\s]+(?:\.com|\.org)[^\s]+

We have just avoided creating a complicated pattern with some human-readable lines of code!

Capture Time

AnyDigit() matches any numeric character. Let’s use this to match a time in a text.

from pregex.classes import AnyDigit

text = "It is 6:00 pm now"
pre = AnyDigit() + ":" + AnyDigit()
pre.get_matches(text)

['6:0']

Right now, we only match one digit on either side of : . Let’s make this more general by wrapping AddLeastOnce() around AnyDigit() :

pre = OneOrMore(AnyDigit()) + : + AtLeastOnce(AnyDigit())
pre.get_matches(text)

['6:00']

Capture Phone Numbers

Common formats for a phone number are:

##########
###-###-####
### ### ####
###.###.####

These formats either have punctuation or nothing between numbers. We can use AnyFrom("-", " ", ".") to match either - , . , or space.

We also use Optional() to make punctuation optional.


text = "My phone number is 3452352312 or 345-235-2312 or 345 235 2312 or 345.235.2312"

punctuation = AnyFrom("-", " ", ".")
optional_punctuation = Optional(punctuation)
at_least_one_digit = OneOrMore(AnyDigit())

pre = (
    at_least_one_digit
    + optional_punctuation
    + at_least_one_digit
    + optional_punctuation
    + at_least_one_digit
)
pre.get_matches(text)

['3452352312', '345-235-2312', '345 235 2312', '345.235.2312']

Capture an Email Address

Now let’s utilize what we have learned so far to capture an email address from a text.

text = "My email is abcd@gmail.com"

pre = (
    OneOrMore(AnyButWhitespace())
    + "@"
    + OneOrMore(AnyButWhitespace())
    + Either(".com", ".org", ".io", ".net")
)

pre.get_matches(text)

Output:

['abcd@gmail.com']

Next Step

This article gives you an overview of how to use PRegEx to match complicated patterns without spending hours on them.

I encouraged you to check out PRegEx’s documentation for other useful methods.

I love writing about data science concepts and playing with different data science tools. You can stay up-to-date with my latest posts by:

Subscribing to my newsletter on Codecut.
Connect with me on LinkedIn and Twitter.

Stop Writing SQL for AI Agents: Build Direct Database Access with FastMCP

July 10, 2025

Transform Any PDF into Searchable AI Data with Docling

July 1, 2025

Natural-Language Queries for Spark: Using LangChain to Run SQL on DataFrames

June 15, 2025

2 thoughts on “PRegEx: Write Human-Readable Regular Expressions in Python”

Richard
July 24, 2023 at 12:00 pm

Absolutely amazing library. Would consider using PregEx in my next workflow.

Reply
1. Khuyen Tran
  July 25, 2023 at 10:04 am
  
  Thank you! I’m glad to hear that
  
  Reply

PRegEx: Write Human-Readable Regular Expressions in Python

Table of Contents

PRegEx: Write Human-Readable Regular Expressions in Python

Khuyen Tran

Motivation

What is PRegEx?

Capture URLs

Get a Simple URL

HTTP or HTTPS

Match URL without a Scheme

Capture Time

Capture Phone Numbers

Capture an Email Address

Next Step

Related Posts

2 thoughts on “PRegEx: Write Human-Readable Regular Expressions in Python”

Leave a Comment Cancel Reply

Stay up-to-date with
data skills using
CodeCut

Drop a line

Get in touch

Follow Us on Social Media

PRegEx: Write Human-Readable Regular Expressions in Python

Table of Contents

PRegEx: Write Human-Readable Regular Expressions in Python

Khuyen Tran

Motivation

What is PRegEx?

Capture URLs

Get a Simple URL

HTTP or HTTPS

Match URL without a Scheme

Capture Time

Capture Phone Numbers

Capture an Email Address

Next Step

Related Posts

2 thoughts on “PRegEx: Write Human-Readable Regular Expressions in Python”

Leave a Comment Cancel Reply

Stay up-to-date with data skills using CodeCut

Drop a line

Get in touch

Follow Us on Social Media

Work with Khuyen Tran

Work with Khuyen Tran

Stay up-to-date with
data skills using
CodeCut