Regular Expression Exercises¶

Here are some more exercises to give you practice working with regular expressions. All of these problems require regular expressions at some point, but more work might also be needed.

import re

Warm-up Exercise: Write three regular expressions to test to see if a number is divisible by 2, 4, and 5. (Remember that we can check if a number is divisble by these using only the last few digits).

(solution below)

.

div2 = re.compile(r"\d*[02468]$")
div4 = re.compile(r"\d*[02468][048]|[13579][26]$")
div5 = re.compile(r"\d*[05]$")

if div2.match('30'):
    print('30 divisible by 2')
if div4.match('30'):
    print('30 divisible by 4')
if div5.match('30'):
    print('30 divisible by 5')

30 divisible by 2
30 divisible by 5

Exercse 1: Write a function subsequence that takes in two strings s1 and s2 and determines if s1 is a subsequence of s2. This means that all characters in s1 appear in s2 in the same order, but there might be additional characters added in the middle.

Examples:

"by" is a subsequence of "beauty"
"UCLA" is a subsequence of "The University of California, Los Angeles"

(Hint: Use s1 to build a regular expression. You may assume it doesn't contain any special characters for regular expressions.)

(solution below)

.

def subsequence(s1, s2):
    # first we add .* between every character in s1, an easy way is:
    regexp = '.*'.join(s1)
    return re.search(regexp,s2) != None

print(subsequence("yes","no"))
print(subsequence("UCLA", "The University of California, Los Angeles"))

False
True

Exercise 2: Write a function split_sentences that takes in a string and returns a list of every sentence in it. An example is below.

(Hint: use a lookbehind/lookbackward)

test_string = "This is a sentence. This is the next!     This is the third?"
# split_sentences(test_string) should return
['This is a sentence.', 'This is the next!', 'This is the third?']

['This is a sentence.', 'This is the next!', 'This is the third?']

(solution below)

.

def split_sentences(s):
    return re.split(r'(?<=[.!?])\s+', s)

split_sentences("This is a sentence. This is the next!     This is the third?")

['This is a sentence.', 'This is the next!', 'This is the third?']

Exercise 3: Write a function unpack_dict that creates dictionaries from a string without using eval or exec. You may assume all the keys are strings and all the keys are integers and the dictionary format is {key => value}. Examples are given below. You may make any further assumption about the format as long as it works for all the examples given. (But try to make sure it works for more than just these, too!)

test_string = \
"""{'a' => 1, 'b' => 2, 'c' => -3}
{'key' => 10}
{'home' => 12, 'away' => 15}"""

#running unpack_dict(test_string) should return
[{'a': 1, 'b': 2, 'c': -3}, {'key': 10}, {'home': 12, 'away': 15}]

[{'a': 1, 'b': 2, 'c': -3}, {'key': 10}, {'home': 12, 'away': 15}]

(solution below)

.

def unpack_dict(s):
    dicts = []
    for line in s.splitlines():
        pairs = re.findall(r"'([^']*)' => (-?\d+)", line)
        dicts.append({p[0]: int(p[1]) for p in pairs})
    return dicts

unpack_dict(test_string)

[{'a': 1, 'b': 2, 'c': -3}, {'key': 10}, {'home': 12, 'away': 15}]

Exercise 4: Write a class Phonebook that allows us to make objects that extract and store US phone numbers from text. We want to have a find_numbers() method that takes in a string and finds all the phone numbers that appear anywhere in the string. We also wants a numbers() method that returns a list of every phone number found so far. Do not include duplicates.

You may assume whichever format you like for phone numbers, but try to make the code as broad as possible. As a suggestion, try to include these formats:

310-555-5555
310 555-5555
310 555 5555
(310)-555-5555
(310) 555-5555
(310) 555 5555

(solution below)

.

class Phonebook:
    def __init__(self):
        self.phone_numbers = set() # using a set means we do not have to worry about duplicates

    def numbers(self):
        return list(self.phone_numbers)

    def find_numbers(self, s):
        # use capture groups so we can reformat the phone numbers to make sure
        # that we don't accidentally store a number in two different formats
        for phone_num in re.findall(r'(?:(\d{3})|\((\d{3})\))[ -](\d{3})[ -](\d{4})', s):
            self.phone_numbers.add(f'{phone_num[0]+phone_num[1]}-{phone_num[2]}-{phone_num[3]}')

        # an alternative is to use the regex '(\d{3})|\(\d{3}\))[ -](\d{3})[ -](\d{4})',
        # which is a little easier to read, but then you have to check if phone_num[0]
        # has () surrounding it and remove them

p = Phonebook()
p.find_numbers('310-555-5555')
p.find_numbers('My phone number is (310) 444 4444.  Yours is (310) 555 555, right?')
p.find_numbers('310 333 3333')
print(p.numbers())

['310-333-3333', '310-444-4444', '310-555-5555']