Here are some more exercises to give you practice working with regular expressions. All of these problems require regular expressions at some point, but more work might also be needed.
import re
Warm-up Exercise: Write three regular expressions to test to see if a number is divisible by 2, 4, and 5. (Remember that we can check if a number is divisble by these using only the last few digits).
(solution below)
.
.
.
.
.
.
.
.
.
.
div2 = re.compile(r"\d*[02468]$")
div4 = re.compile(r"\d*[02468][048]|[13579][26]$")
div5 = re.compile(r"\d*[05]$")
if div2.match('30'):
print('30 divisible by 2')
if div4.match('30'):
print('30 divisible by 4')
if div5.match('30'):
print('30 divisible by 5')
Exercse 1: Write a function subsequence
that takes in two strings s1
and s2
and determines if s1
is a subsequence of s2
. This means that all characters in s1
appear in s2
in the same order, but there might be additional characters added in the middle.
Examples:
(Hint: Use s1
to build a regular expression. You may assume it doesn't contain any special characters for regular expressions.)
(solution below)
.
.
.
.
.
.
.
.
.
.
def subsequence(s1, s2):
# first we add .* between every character in s1, an easy way is:
regexp = '.*'.join(s1)
return re.search(regexp,s2) != None
print(subsequence("yes","no"))
print(subsequence("UCLA", "The University of California, Los Angeles"))
Exercise 2: Write a function split_sentences
that takes in a string and returns a list of every sentence in it. An example is below.
(Hint: use a lookbehind/lookbackward)
test_string = "This is a sentence. This is the next! This is the third?"
# split_sentences(test_string) should return
['This is a sentence.', 'This is the next!', 'This is the third?']
(solution below)
.
.
.
.
.
.
.
.
.
.
def split_sentences(s):
return re.split(r'(?<=[.!?])\s+', s)
split_sentences("This is a sentence. This is the next! This is the third?")
Exercise 3: Write a function unpack_dict
that creates dictionaries from a string without using eval
or exec
. You may assume all the keys are strings and all the keys are integers and the dictionary format is {key => value}
. Examples are given below. You may make any further assumption about the format as long as it works for all the examples given. (But try to make sure it works for more than just these, too!)
test_string = \
"""{'a' => 1, 'b' => 2, 'c' => -3}
{'key' => 10}
{'home' => 12, 'away' => 15}"""
#running unpack_dict(test_string) should return
[{'a': 1, 'b': 2, 'c': -3}, {'key': 10}, {'home': 12, 'away': 15}]
(solution below)
.
.
.
.
.
.
.
.
.
.
def unpack_dict(s):
dicts = []
for line in s.splitlines():
pairs = re.findall(r"'([^']*)' => (-?\d+)", line)
dicts.append({p[0]: int(p[1]) for p in pairs})
return dicts
unpack_dict(test_string)
Exercise 4: Write a class Phonebook
that allows us to make objects that extract and store US phone numbers from text. We want to have a find_numbers()
method that takes in a string and finds all the phone numbers that appear anywhere in the string. We also wants a numbers()
method that returns a list of every phone number found so far. Do not include duplicates.
You may assume whichever format you like for phone numbers, but try to make the code as broad as possible. As a suggestion, try to include these formats:
310-555-5555
310 555-5555
310 555 5555
(310)-555-5555
(310) 555-5555
(310) 555 5555
(solution below)
.
.
.
.
.
.
.
.
.
.
class Phonebook:
def __init__(self):
self.phone_numbers = set() # using a set means we do not have to worry about duplicates
def numbers(self):
return list(self.phone_numbers)
def find_numbers(self, s):
# use capture groups so we can reformat the phone numbers to make sure
# that we don't accidentally store a number in two different formats
for phone_num in re.findall(r'(?:(\d{3})|\((\d{3})\))[ -](\d{3})[ -](\d{4})', s):
self.phone_numbers.add(f'{phone_num[0]+phone_num[1]}-{phone_num[2]}-{phone_num[3]}')
# an alternative is to use the regex '(\d{3})|\(\d{3}\))[ -](\d{3})[ -](\d{4})',
# which is a little easier to read, but then you have to check if phone_num[0]
# has () surrounding it and remove them
p = Phonebook()
p.find_numbers('310-555-5555')
p.find_numbers('My phone number is (310) 444 4444. Yours is (310) 555 555, right?')
p.find_numbers('310 333 3333')
print(p.numbers())