Part 1
We have a compression algorithm: our input data contains markers such as (10x2) meaning take the next 10 characters and repeat them 2 times. "Compressed data" can also contain markers, but they should be skipped. We need to get the length of uncompressed file, skipping any whitespace.
Two things to consider: do we really decompress the data or just calculate the length? It's quite possible to calculate without reconstructing the string, eg. marker (15x3) adds 14 characters (3*15 minus the original 15 minus the length of the marker 6). It is however possible that part 2 will require actually decompressing the data. I decided to take my chances and don't decompress.
Second consideration: how to parse the data? Two solutions come to mind immediately. First way, use regular expressions to find possible markers (easy) and - based on their position in the input - decide whether they are real markers or they are inside the data block of another marker. Second way, a state machine approach: parse the input character by character and do a different thing depending on the current state: starting with the state "no special block", after opening bracket switch to state "length", after "x" - "repeat", after closing bracket - "data", and after specified number of characters back to "no special block". This will automatically deal with markers inside data blocks, but is generally more complicated. Let's try the regexp solution.
First, let's prepare two input files: 09-input.txt contains my puzzle input and 09-small.txt contains test strings from the instruction. Now let's read the file, call function to get uncompressed length of each line (stub for now) and sum up everything.
INPUTFILE="09-small.txt"
def get_uncompressed_length(string: str, debug: bool=False) -> int:
return 1
with open(INPUTFILE) as f:
totallength=0
for line in f:
line="".join(line.split())
print(line)
length=get_uncompressed_length(line)
print (f"Uncompressed line length: {length}")
totallength+=length
print(f"Part 1: {totallength}")
That works. Just one explanation, line="".join(line.split())
removes all whitespace from the line. Now to find the markers. Here's the regexp: rx1=re.compile( r"\((\d+)x(\d+)\)" )
. Completely clear, as usual with regular expressions, right? Well, maybe not.
- it starts with
\(
which is an opening bracket and ends with a closing bracket\)
- brackets need to be escaped by backslash because they have a special meaning - then there's
(\d+)
- \d is a digit, + means it can be repeated (but there's at least one), brackets mean to remember it - it will be available as group(1) - then x is just x
- and another group of digits, group(2)
We also need one special case: if the string checked doesn't contain any markers, just return its length, it won't change. If it does, first get the position of the beginning of the marker. Everything before that would be copied as-is, so again the length doesn't change. And then it's length*repeat. Then, we'll have to find the next marker, skipping the data block. But for now let's check if the regular expression is correct.
def get_uncompressed_length(string: str, debug: bool=False) -> int:
marker=rx1.search(string)
if not marker:
return len(string)
# we got first marker
retval=marker.start()
length=int(marker.group(1))
repeat=int(marker.group(2))
if (debug):
print(f"found marker: repeat {length} characters {repeat} times")
retval+=length*repeat
return retval
Great, how do we skip past the data block? With string slicing and iteration.
def get_uncompressed_length(string: str, debug: bool=False) -> int:
retval=0
while string:
marker=rx1.search(string)
if not marker:
retval+=len(string)
break
else:
# we got the marker
retval+=marker.start()
length=int(marker.group(1))
repeat=int(marker.group(2))
if (debug):
print(f"found marker: repeat {length} characters {repeat} times")
retval+=length*repeat
newposition=marker.end()+length
string=string[newposition:]
if (debug):
print(f" searching new string {string}")
return retval
It worked! Let's try with the full data. Also worked!
Part 2
So now we also have to consider the markers inside the data blocks. And the instruction hints that it's better to calculate the size than actually uncompress, as the computer probably doesn't have enough memory. Looks like I made the right decision.
So, all we have to do is a slight modification to the get_uncompressed_length function. We need to actually decompress the data block and call the get_uncompressed_length on it recursively. I'm not uncompressing the whole file at once, so hopefully I have enough RAM. As it turned out, my laptop took several minutes to calculate, but RAM usage was minimal.
Why is it so slow? Let's check: python3 -m cProfile ./09-explosives.py
shows that a lot of time is spent on various regexp functions.
The usual tradeoff: "state machine" approach would run much faster, if written in C - probably less than a second, but also take longer to code.
Source code
#!/usr/bin/python3
import re
INPUTFILE="09-input.txt"
rx1=re.compile( r"\((\d+)x(\d+)\)" )
def get_uncompressed_length(string: str, debug: bool=False, part2: bool=False) -> int:
retval=0
while string:
marker=rx1.search(string)
if not marker:
retval+=len(string)
break
else:
# we got the marker
retval+=marker.start()
length=int(marker.group(1))
repeat=int(marker.group(2))
if (debug):
print(f"Checking {string}")
print(f" found marker: repeat {length} characters {repeat} times")
if (part2):
data_begin=marker.end()
data_end=data_begin+length
newdata=string[data_begin:data_end]*repeat
retval+=get_uncompressed_length(newdata, debug=debug, part2=True)
else:
retval+=length*repeat
newposition=marker.end()+length
string=string[newposition:]
if (debug):
print(f" searching new string {string}")
return retval
with open(INPUTFILE) as f:
part1length=0
part2length=0
for line in f:
line="".join(line.split())
part1length+=get_uncompressed_length(line, part2=False)
part2length+=get_uncompressed_length(line, part2=True)
print(f"Part 1: {part1length}")
print(f"Part 2: {part2length}")