Algorithms Weekly by Petr Mitrichev

Monday, April 13, 2020

A cheese week

Codeforces hosted its Round 631 during the Mar 30 - Apr 5 week (problems, results, top 5 on the left, analysis). I had a relatively good start on the first four problems, but could not make progress in the last one and switched to challenging with about 20 minutes remaining, also without success. Um_nik, on the other hand, had an even better start and did not give up so easily on E, earning the ultimate reward of being the only contestant to solve all problems. Great job!

When solving problem C, I could not come up with a solution analytically for quite some time, so I've used the adage "so many people got it accepted, the solution must be simple!" and started thinking in terms of what a simple solution might look like. I've came up with the [ultimately] correct greedy very quickly, but could not prove that it works. After some more time, I've decided to just implement and submit it, relying on pretests and the adage to help :)

When this approach works, it is satisfying, but I feel that using it might be detrimental in the long run: it greatly reduces the motivation when solving the problems analytically, the opportunity to just start submitting good-looking solutions is too tempting. Or maybe it really is a part of the game?

Thanks for reading, and check back for more!

Saturday, April 11, 2020

A Geronimo week

The Mar 23 - Mar 29 week had two events on the weekend. TopCoder SRM 782 was the first one to go (problems, results, top 5 on the left, TCO stage 2 results, analysis). With about 30 seconds remaining in the challenge phase I've noticed that I need one more successful challenge for the first place, but I did not find any bugs. Therefore I decided to guess which solution is most likely to have a bug, and to challenge it with a large testcase that I prepared earlier. I was extremely lucky that this strategy worked, given that the solution I chose — Egor's 1000 — turned out to be the only remaining incorrect solution in the room at that point :)

Open Cup 2019-20 Grand Prix of Wroclaw took place on Sunday (results, top 5 on the left, analysis). Since Open Cup is a (multi-)onsite team contest, this required some big changes, as teams were asked to all participate online without even gathering a team together in one place, instead using audio/video conferencing and coordinating to use at most one computer at the same time. Team USA1 has been participating like this all season and therefore maybe had a slight edge over the competitors, winning by 11 problems to 10. Well done!

Team HDU-Coach in the second place is doing very well this season in general, however unlike other top teams their team members are not listed in the standings. Does anybody know their Codeforces handles, for example?

In my previous summary, I have mentioned an AtCoder problem: you are given a sequence of n<=10⁶ numbers, each number is 1, 2 or 3. Then, you create a new sequence of length n-1 by taking the absolute values of the differences between adjacent numbers. Then you create a new sequence of length n-2 from the sequence of length n-1 in the same way, and so on until there's just one number in the sequence. What will this number be?

First of all, it's convenient to subtract 1 from all numbers in the input so that we have 0, 1 and 2 — this will not change the differences.

Then, let's assume the input sequence contains just 0s and 1s. The absolute difference in this case is the same as addition modulo 2, so for example starting from five numbers a, b, c, d, e, we'll get: a+b, b+c, c+d, d+e; then a+2b+c, b+2c+d, c+2d+e; then a+3b+3c+d, b+3c+3d+e; and finally a+4b+6c+4d+e (everything modulo 2). We can now notice (and prove) that the coefficient next to the k-th input number is equal to C(k-1,n-1), and thanks to Lucas's theorem we know it's equal to 1 modulo 2 when (k-1)&(n-1)=k-1 (& is bitwise and), and to 0 modulo 2 otherwise, which allows us to solve this case by summing the input numbers multiplied by the corresponding coefficients.

What can we do if the input sequence also contains 2s? First of all, we can notice that if we declare that 0 and 2 are the same, then our operation is still equivalent to addition modulo 2! This allows us to determine if the final number is 1, or if it's 0/2. If it is 1, we're done, but how to decide between 0 and 2 in the other case?

It turns out that if the input sequence contains a 1, then the final number will never be 2. To see that, we can notice that if the sequence contains at least one 1, and at least one 0/2, then two of those will be adjacent, and therefore the next step will also contain at least one 1. So the only way to get rid of all 1s if there were any is to get a sequence with all numbers equal to 1, in which case all following steps will have all numbers equal to 0. In any case, if the input sequence contains at least one 1, the final number will always be a 0 or a 1, therefore the above approach solves this case.

The only remaining case is: the input sequence contains only 0s and 2s. In this case we can just divide all numbers by 2 to reduce the problem to the one with only 0s and 1s, solve it and then multiply its answer by 2.

I found it very enjoyable that such simple problem statement requires one to unwrap multiple layers to solve it, even if each layer by itself is not very hard.

Thanks for reading, and check back for more!

A topology week

The Mar 16 - Mar 22 week had competitions from the three main platforms. Codeforces was the first to go, with its Global Round 7 on Thursday (problems, results, top 5 on the left, analysis). Um_nik was the first to finish solving the problems, but tourist had beaten him to the first place because he solved the problem F1, which is a reduced-constraint version of problem F2, separately and much faster, while Um_nik just solved F2 directly and got F1 accepted at the same time with a lower score. Congratulations to both!

TopCoder SRM 781 followed in the middle of the night on Friday (problems, results, top 5 on the left, analysis). My lead in the TCO qualification before this round was considerable at 31 points to 24, so I've decided to not wake up at 2 in the morning, even though mathematically I could still lose my spot if I do really badly in SRM 782. Congratulations and big thanks to Egor who made sure the five points did not go to one of the 24-point pursuers :)

Finally, AtCoder Grand Contest 043 marked the start of a new season there (problems, results, top 5 on the left, analysis). apiad started with the hardest problem F and got it accepted with 47 minutes remaining in the contest, probably realizing that others are not likely to get it. There were therefore a few paths to the first place open for him. Solving B+C+D in 47 minutes was probably the most realistic by looking at the scoreboard, but he went for the very interesting and unusual problem E (D+E would be enough for the first place) and didn't manage to get it in time. tourist was the fastest of those on a more traditional problem solving order, and won the round with a 11-minute gap. Well done!

There were several nice problems in this round, so let me highlight an easier problem for a change, problem B: you are given a sequence of n<=10⁶ numbers, each number is 1, 2 or 3. Then, you create a new sequence of length n-1 by taking the absolute values of the differences between adjacent numbers. Then you create a new sequence of length n-2 from the sequence of length n-1 in the same way, and so on until there's just one number in the sequence. What will this number be?

Thanks for reading, and check back for more!

A monotonic week

The Mar 9 - Mar 15 week was light on contests, so let's come back to the Codeforces problem from the previous summary: you are given a sequence of at most 500000 numbers. In one step, you simultaneously replace every number except the first one and the last one with the median of the following set: this number itself, the number on the left, and the number on the right. For example, if you start with 3 1 4 2 5, then after one step you get 3 3 2 4 5, after two steps it becomes 3 3 3 4 5, and then it does not change anymore. What will be the stable state of the sequence, and how many steps are needed to reach it?

Since I did not solve the problem myself, I will describe the solution from the tutorial. First, let's assume that the input sequence has only numbers 1 and 2. Then the process is somewhat straightforward: if there are two or more adjacent numbers that are equal, they will never change; and for each maximal subsegment of alternating numbers, at every step all numbers in it except the first and the last one will flip from 1 to 2 and vice versa, thus reducing the length of the alternating sequence by 2, until it eventually disappears: 12121212 -> 11212122 -> 11121222 -> 11112222 -> 11112222 -> ...

Here comes the most beautiful part of the solution: we can notice that the process is in some sense monotonic. More specifically, if we take the input sequence, replace all numbers that are less than or equal to some boundary b by 1, and all numbers greater than b by 2, then run the process on this sequence of 1s and 2s in parallel to the process on the original sequence, then the relation between them will always stay the same: after any number of steps, all numbers less than or equal to b in the first sequence will correspond to 1s in the second sequence. Since we know how to find the stable state for the sequence of 1s and 2s, we know which numbers in the stable state of the original sequence are less than or equal to b, and which are greater than b!

Since we also know that all numbers in the stable state are equal to some number of the original sequence, doing the above process for all values of b equal to some number of the original sequence allows us to reconstruct the steady state: every number can be determined as the lowest value of b for which it corresponded to a 1.

When implemented naively, this solution would be quadratic which is not good enough. However, we can process the values of b in increasing order, and maintain a set of all alternating sequences of 1s and 2s in the second sequence in a balanced tree. We have n events, in each of those one of the 2s becomes a 1, and in each event at most two alternating sequences adjacent to the place of modification will be affected. The balanced tree allows to find, add and remove those in logarithmic time, therefore we can simulate the entire process in O(n*logn). And in order to construct the answer, we can maintain the set of positions which still have a 2 in the steady state in another balanced tree, and remove appropriate positions from it whenever a new alternating sequence appears, since each alternating sequence produces at most one segment of consecutive 1s in the steady state, which ends up being amortized O(n*logn) as well.

Thanks for reading, and check back for more!

Friday, April 10, 2020

A higher-order week

The Mar 2 - Mar 8 week's contests started with the Ozon Tech Challenge 2020 hosted by Codeforces (problems, results, top 5 on the left, analysis). Only three contestants were able to solve G or H, but Um_nik's F failed due to checking all divisors instead of only prime divisors, and maroonrk's C failed due to trying to squeeze 10⁸ modulo operations in the time limit (replacing the modulo in the inner loop with an if makes it pass). Therefore only tourist was able to keep 7 problems, and he got a clear first place. Congratulations!

Codeforces Round 626 followed on Saturday (problems, results, top 5 on the left, analysis). Once again just 3 contestants were able to solve any of the last two problems, and once again only one of them solved all previous ones. Congratulations to zhouyuyang!

I was unable to solve problem E, but I found the reference solution idea very nice. You are given a sequence of at most 500000 numbers. In one step, you simultaneously replace every number except the first one and the last one with the median of the following set: this number itself, the number on the left, and the number on the right. For example, if you start with 3 1 4 2 5, then after one step you get 3 3 2 4 5, after two steps it becomes 3 3 3 4 5, and then it does not change anymore. What will be the stable state of the sequence, and how many steps are needed to reach it?

TopCoder SRM 780 wrapped up the week (problems, results, top 5 on the left, analysis). The problems were on the easy side this time, and still tourist was able to solve both 450 and 1000 significantly faster than everybody else. Well done!

In my previous summary, I have mentioned an Open Cup problem: you are given a string s of length up to 10, and a positive integer d also up to 10. How many strings consisting of letters A-Z are at the Levenshtein distance of exactly d from s? Compute the answer modulo 998244353.

First of all, how does one compute the Levenshtein distance between the string s and some other string t? This is one of the textbook examples of dynamic programming, and even the Wikipedia article has the formula. We compute the dynamic programming table dp[a,b] which contains the Levenshtein distances between the prefix of s of length a and the prefix of t of length b.

For example, for s="havka" and t="papstvo", we get the following table:
h a v k a
0 1 2 3 4 5
p 1 1 2 3 4 5
a 2 2 1 2 3 4
p 3 3 2 2 3 4
s 4 4 3 3 3 4
t 5 5 4 4 4 4
v 6 6 5 4 5 5
o 7 7 6 5 5 6

In order to compute a row of this dynamic programming, we only need to know the values in the previous row, the string s, and the next character of the string t.

This observation allows us to apply higher-order dynamic programming to count the number of strings t at the given distance from s! We will construct the string t character by character, and the state of our dynamic programming will be the row of the above matrix. For each possible row of values we will compute how many possible choices of a prefix of t yield that row.

How many states does this dynamic programming have? It's not hard to notice that the first value in a row is always equal to the length of our prefix of t, and each consecutive value differs from the previous value by -1, 0 or +1. This follows most directly from the fact that the Levenshtein distance satisfies the triangle inequality. Therefore when s has length n and the maximum length of t is m, the number of states does not exceed m*3ⁿ. In our case n=10, m=20, so we have at most 1.2 million states.

Thanks for reading, and stay safe!

Saturday, April 4, 2020

A random week

The Feb 24 - Mar 1 week had two Codeforces rounds and an Open Cup. The first one was called Kotlin Heroes: Episode 3 and accepted solutions in Kotlin only (problems, results, top 5 on the left, analysis). Only Egor and tourist managed to solve all problems, with the former seemingly writing in Java and converting the code to Kotlin automatically right before the submission, and the latter using more idiomatic Kotlin style. The Java way was the better way this time, in large part thanks to tourist's 6 incorrect submissions. Congratulations to Egor!

The Open Cup round was called the Grand Prix of North America, and it was based on the problems from ICPC North American Championship (problems, results, top 5 on the left, onsite results, video analysis). Team Almost Retired really aced it this time, solving 12 problems from 12 attempts, congratulations! The performance of Gennady Korotkevich is also worth mentioning, as he solved all problems in order despite the fact that two most difficult problems were D and E, most likely because he did not know it would become an Open Cup round when he was solving the online mirror of NAC.

I found problem I educational: you are given a string s of length up to 10, and a positive integer d also up to 10. How many strings consisting of letters A-Z are at the Levenshtein distance of exactly d from s? Compute the answer modulo 998244353.

The second Codeforces round of the week was the Round 625 (problems, results, top 5 on the left, parallel round results with more problems, analysis). ksun48 was the only contestant to solve all problems, and 300iq achieved the same feat in the parallel round. If one excludes the problems A and C from the parallel round that did not appear in Round 625, ksun48 and 300iq would be neck and neck. Congratulations to both!

In my previous summary, I have mentioned two problems. The first one came from the Open Cup: find three points in three-dimensional space with integer coordinates up to 10⁶ that would present a hard testcase for floating-point geometry code. More specifically, let's project these three points to the unit sphere (x/sqrt(x²+y²+z²), y/sqrt(x²+y²+z²), z/sqrt(x²+y²+z²)). The plane passing through those projections must pass very close to the origin, but not exactly through it: the distance from the origin to this plane must not exceed 1.5*10^-19. Moreover, probably for the sake of the numeric stability of the checker, the three points have to be far apart on the unit sphere: at least 1.7 from each other (note that the diameter of the unit sphere is 2).

This problem was solved by Ilya Kornakov in our team, so I'm explaining his solution. First, let's find some way to generate random 3x3 matrices with determinant 1/-1 and values up to a 10⁶ by absolute value. We did it by starting with a random upper triangular matrix with ones or negative ones on the diagonal, and then trying 100 times to add or subtract one of the rows/columns to another row/column (which does not change the determinant), only doing the addition/subtraction if the values stay small enough.

Then, it turns out that such matrix has a reasonable probability of being our answer! To see why, note that the matrix having determinant 1 means that the corresponding pyramid has volume 1/6. Its height can be computed using the formula h=3*volume/base_area. Since volume is 1/6, we have h=0.5/base_area. base_area can be computed as one half of a determinant of a 2x2 matrix with values up to 2*10⁶, so it is up to 4*10¹², therefore its height could be as small as 0.125*10^-12.

Now, the problem asks for the height of a different pyramid: the one obtained by projecting the original three points onto the unit sphere. Let's split this projection into two parts:

Projecting all points into a sphere with radius equal to minimum distance l from origin to one of our three points (one of the points remains unchanged in this step).
Contracting that sphere l times.

The second part reduces the height of the pyramid l times, but what about the first part? We claim that it's very likely that the height of the pyramid decreases during this step! I don't have a strict proof here, but the picture on the left demonstrates two-dimensional analogy: the height AF of triangle ABD is bigger than the height AE of triangle ABC because it intersects the segment BC, and AE is the shortest path from A to BC. This argument breaks if F lands outside of the segment BD (more precisely, the line BD has to intersect the dotted circle, which is an even stronger requirement). Since the problem requires the three points to be at least 1.7 apart on the unit sphere, they form a pretty big triangle and therefore it's likely that the height still passes through it after the projection.

Since l can be up to sqrt(3)*10⁶, the final pyramid could have its height as small as 0.125*10^-12/(sqrt(3)*10⁶) which is approximately 7*10^-20. Our goal is 1.5*10^-19, so we have a ~2x gap, and after a sufficient number of attempts we will find a good example.

Note that the argument above also gives us a way to obtain a numerically stable upper estimate of the height: h=0.5/base_area/l. Computing the required height directly is hard to do in double or even long double, but this estimate is computed very precisely as there's no subtraction of close numbers at all. If this estimate is below 1.5*10^-19, the true height is even lower and therefore is also small enough.

Of course the fact that the first projection above does not increase the height is the most shaky part of this solution. To our defense, I can say that I've ran it with three different random seeds and it got accepted every time :)

What was your approach? Is there a way to avoid shaky arguments?

The second problem I mentioned came from Codeforces: you are given a complete directed graph on n vertices (n<=80), with each arc having its own non-negative cost. Your goal is to find a cyclic route of length exactly k (k<=10) from vertex 1 to itself with the lowest possible cost such that this route does not have sub-cycles of odd length (therefore k is always even).

The route not having sub-cycles of odd length is equivalent to saying that the vertices of the route can be colored with two colors such that the colors always alternate along the route. Since the length of the route is at most 10, it has at most 9 intermediate vertices. Therefore if we pick a random coloring of all vertices, with probability 1/2⁹=1/512 we will guess the right colors for the route from the optimal solution!

This leads us to the solution: let's repeatedly pick a random coloring of all vertices, and find the shortest cycle of length k where the colors alternate. We will do this until the time runs out, and print the shortest cycle that we found among all attempts. Finding the shortest cycle of length k can be done in O(n²*k) in a pretty standard fashion.

Since the probability of failure of each particular attempt is 511/512, if we manage to do t attempts our probability of failure will be (511/512)^t. For t=10000, this gives about 3*10^-9, which is good enough, and 10000 runs of a O(n²*k) algorithm should be fast enough for n<=80 and k<=10.

I find the way randomization helps us make 9 tough decisions pretty amusing :)

Thanks for reading, and check back for more!

Sunday, March 29, 2020

A repeated doubling week

The Feb 17 - Feb 23 week had quite a bit of contest activity. Codeforces Round 621 took place right on Monday (problems, results, top 5 on the left, analysis). MiFaFaOvO, 300iq, ohweonfire and yosupo got their last accepted problem around the same time, but MiFaFaOvO had resubmitted D and was done much faster otherwise, hence the higher score. Well done!

TopCoder SRM 779 followed on Friday (problems, results, top 5 on the left, analysis). I've tried to dig a hole for myself this time, first resubmitting the 450 while my original solution was correct as well, then earning -50 on challenges that put me out of the top 10, only to finally get a +50 and cling to the 10th place (which gives 4 points towards the TCO qualification) by less than one point :)

maroon_kuri on the other hand did everything better, including finding a challenge to jump into the first place. Congratulations!

Open Cup 2019-20 Grand Prix of Zhejiang, also known as Yuhao Du Contest 7, took place on Sunday (results, top 5 on the left). Only this comment made me notice the unusual numbering system for Yuhao Du contests, and it's totally justified :) Team Polish Mafia's 5 solved problems is a truly impressive achievement, well done!

Problem G (which we didn't get accepted during the contest because of a small bug) had no input and challenged one to find three points in three-dimensional space with integer coordinates up to 10⁶ that would present a hard testcase for floating-point geometry code. More specifically, let's project these three points to the unit sphere (x/sqrt(x²+y²+z²), y/sqrt(x²+y²+z²), z/sqrt(x²+y²+z²)). The plane passing through those projections must pass very close to the origin, but not exactly through it: the distance from the origin to this plane must not exceed 1.5*10^-19. Moreover, probably for the sake of the numeric stability of the checker, the three points have to be far apart on the unit sphere: at least 1.7 from each other (note that the diameter of the unit sphere is 2).

VK Cup 2019-20 Elimination Round wrapped up the week (problems, results, top 5 on the left, parallel round results, analysis). Nobody was able to solve all problems either in the official round or in the parallel one, so choosing the right set of problems to tackle was very important this time. tourist did the best job on that front, and won the round convincingly — congratulations!

Problem D looked quite puzzling, but had a very short solution. You are given a complete directed graph on n vertices (n<=80), with each arc having its own non-negative cost. Your goal is to find a cyclic route of length exactly k (k<=10) from vertex 1 to itself with the lowest possible cost such that this route does not have sub-cycles of odd length (therefore k is always even). Can you see the short solution?

In my previous summary, I have mentioned a TopCoder problem: you start in position 0 on a line, and can make jumps of integer size between 1 and k (k<=1000) to the right, increasing your position. You are given the costs of such jumps as c₁, c₂, ..., c_k. What is the minimum cost to reach position m (m<=10⁹)?

The key idea is that if we know the minimum cost to reach positions x-k, x-k+1, ..., x+k then we can determine the minimum cost to reach positions 2x-k, 2x-k+1, ..., 2x+k! The reason for this is: our maximum jump is k, therefore there is always an intermediate stopping point within k/2 of any point in any jump sequence. Consider the jumping sequence that reaches 2x+t, and find its intermediate stopping point that is the closest to its middle point, x+t/2: let this stopping point be x+t/2+u. Since both t/2 and u do not exceed k/2 by absolute value, this point is between x-k and x+k! And the distance from this point to 2x+t is also between x-k and x+k for symmetrical reasons. Therefore if we try to combine all pairs of optimal answers for x-k, x-k+1, ..., x+k, we will obtain all optimal answers for 2x-k, 2x-k+1, ..., 2x+k.

This step works in O(k²), and we can use the repeated doubling approach to solve the entire problem in O(k²*log(m)). It is also remarkable that this problem can in fact be solved in O(k²) if we're guaranteed that m>=k² (and therefore we can achieve O(k²*log(k)) in the general case) by running a shortest paths algorithm on remainders modulo the best step (the one with the smallest cost/length ratio).

Thanks for reading, and check back for more!