Algorithms Weekly by Petr Mitrichev: A higher-order week

The Mar 2 - Mar 8 week's contests started with the Ozon Tech Challenge 2020 hosted by Codeforces (problems, results, top 5 on the left, analysis). Only three contestants were able to solve G or H, but Um_nik's F failed due to checking all divisors instead of only prime divisors, and maroonrk's C failed due to trying to squeeze 10⁸ modulo operations in the time limit (replacing the modulo in the inner loop with an if makes it pass). Therefore only tourist was able to keep 7 problems, and he got a clear first place. Congratulations!

Codeforces Round 626 followed on Saturday (problems, results, top 5 on the left, analysis). Once again just 3 contestants were able to solve any of the last two problems, and once again only one of them solved all previous ones. Congratulations to zhouyuyang!

I was unable to solve problem E, but I found the reference solution idea very nice. You are given a sequence of at most 500000 numbers. In one step, you simultaneously replace every number except the first one and the last one with the median of the following set: this number itself, the number on the left, and the number on the right. For example, if you start with 3 1 4 2 5, then after one step you get 3 3 2 4 5, after two steps it becomes 3 3 3 4 5, and then it does not change anymore. What will be the stable state of the sequence, and how many steps are needed to reach it?

TopCoder SRM 780 wrapped up the week (problems, results, top 5 on the left, analysis). The problems were on the easy side this time, and still tourist was able to solve both 450 and 1000 significantly faster than everybody else. Well done!

In my previous summary, I have mentioned an Open Cup problem: you are given a string s of length up to 10, and a positive integer d also up to 10. How many strings consisting of letters A-Z are at the Levenshtein distance of exactly d from s? Compute the answer modulo 998244353.

First of all, how does one compute the Levenshtein distance between the string s and some other string t? This is one of the textbook examples of dynamic programming, and even the Wikipedia article has the formula. We compute the dynamic programming table dp[a,b] which contains the Levenshtein distances between the prefix of s of length a and the prefix of t of length b.

For example, for s="havka" and t="papstvo", we get the following table:
h a v k a
0 1 2 3 4 5
p 1 1 2 3 4 5
a 2 2 1 2 3 4
p 3 3 2 2 3 4
s 4 4 3 3 3 4
t 5 5 4 4 4 4
v 6 6 5 4 5 5
o 7 7 6 5 5 6

In order to compute a row of this dynamic programming, we only need to know the values in the previous row, the string s, and the next character of the string t.

This observation allows us to apply higher-order dynamic programming to count the number of strings t at the given distance from s! We will construct the string t character by character, and the state of our dynamic programming will be the row of the above matrix. For each possible row of values we will compute how many possible choices of a prefix of t yield that row.

How many states does this dynamic programming have? It's not hard to notice that the first value in a row is always equal to the length of our prefix of t, and each consecutive value differs from the previous value by -1, 0 or +1. This follows most directly from the fact that the Levenshtein distance satisfies the triangle inequality. Therefore when s has length n and the maximum length of t is m, the number of states does not exceed m*3ⁿ. In our case n=10, m=20, so we have at most 1.2 million states.

Thanks for reading, and stay safe!

Algorithms Weekly by Petr Mitrichev

Friday, April 10, 2020

A higher-order week

1 comment: