Algorithms Weekly by Petr Mitrichev: A 39 vs 17 week

Codeforces Round 959 sponsored by NEAR was the main event of last week (problems, results, top 5 on the left, analysis). AWTF participants have probably already returned home and were ready to fill the top five places in this round, but Egor has managed to solve everything and squeeze in the middle of their party. Congratulations to Egor on doing this and to tourist on winning the round!

In my previous summary, I have mentioned two problems. The first one came from AWTF: you are given n<=250000 slimes on a number line, each with weight 1. You need to choose k of them, all others will be removed. Then, they will start moving: at each moment in time, every slime moves with velocity r-l, where r is the total weight of all slimes to the right of it, and l is the total weight of all slimes to the left of it. When r-l is positive, it moves to the right, and when it is negative, it moves to the left. Since this rule pushes the slimes towards each other, sometimes they will meet. When two or more slimes meet, they merge into one slime with weight equal to their total weight, which continues to move according to the above rule. Eventually, all slimes will merge into one stationary slime, suppose this happens at time t. What is the maximum value of t, given that you can choose the k slimes to use freely?

Even though the problem does not ask for it, it is actually super helpful to think where will the final big slime be located. For me, this would have probably been the hardest part of solving this problem. Why should one think about the final position if the problem only asks about the final time?..

After studying a few examples, one can notice that the final position is always equal to the average of the starting positions of the slimes. Having noticed this, it is relatively easy to prove: consider the function sum(a_i*w_i) where a_i is the position of the i-th slime, and w_i is its weight, and consider two slimes (a_i,w_i) and (a_j,w_j). The first one contributes -w_i to the velocity of the second one, while the second one contributes w_j to the velocity of the first one. Therefore together they contribute -w_i*wj+w_j*w_i=0 to the velocity of value sum(a_i*w_i), therefore that sum stays constant. And it also does not change when two slimes merge, therefore it is always constant and has the same value for the final big slime.

Now is the time for the next cool observation, this one is a bit more logical though. Consider the last two slimes that merge. They split the original slimes into two parts. What happens to the weighted averages of the positions of those two parts sum(a_i*w_i)/sum(w_i)? By the same argument as above, influences within each of the parts on that average cancel out. The influences of the parts on each other do not cancel out though, but we can easily compute them and find out that those two averages are moving towards each other with constant velocity equal to the total weight, in other words k.

Therefore if we know which two parts form the two final slimes we can find the answer using a simple formula: (sum(a_i*w_i)/sum(w_i)-sum(a_j*w_j)/sum(w_j))/k, where j iterates over the slimes in the first part, and i iterates over the slimes in the second part.

And here comes the final leap of faith, also quite logical: the answer can actually be found by finding the maximum of that amount over all ways to choose a prefix and a suffix as the two parts. This can be proven by noticing that the difference between averages starts decreasing slower than k if some slimes merge across the part boundary, therefore the number we compute is both a lower bound and an upper bound on the actual answer.

We have not yet dealt to the fact that we have to choose k slimes out of n, but seeing the above formula it is pretty clear that we should simply take a prefix of all available slimes for the sum over j, and a suffix of all available slimes for the sum over i. Now all pieces of the puzzle fit together very well, and we have a full solution.

The second problem I mentioned came from Universal Cup: you are given a string of length n<=500000. We choose one its arbitrary non-empty substring and erase it from this string, in other words we concatenate a prefix and a suffix of this string with total length less than n. There are n*(n+1)/2 ways to do it, but some of them may lead to equal strings. How many distinct strings can we get?

A natural question to ask is: how can two such strings be the same? If we align two different ways to erase a substring of the same length that lead to the same result, we get the following two representations of the same string:

abcd=aebd

where in one case we delete the substring c, and in the other case we delete the substring e to obtain the result abd. We can notice that such pairs of equal prefix+suffix concatenations are in a 1:1 correspondence with pairs of equal substrings within the string (two occurrences of the substring b in the above example). It is not true though that our answer is simply the number of distinct substrings, as we have merely proven that the sum of c*(c-1)/2 over all repeat quantities c of equal substrings is the same as the sum of d*(d-1)/2 over all repeat quantities d of equal prefix+suffix, but that does not mean that the sum of c is the same as the sum of d.

However, this makes one think that probably the suffix data structures, such as the suffix tree, array or automaton, will help solve this problem in the same way they help count the number of distinct substrings. This turns out to be a false path, as the actual solution is much simpler!

Consider the middle part of the above equation (bc=eb), and let us write out an example with longer strings for clarity:

aba~~caba~~
~~abac~~aba

We can notice that the following are also valid ways to obtain the same string:

aba~~caba~~
ab~~acab~~a
a~~baca~~ba
~~abac~~aba

So the structure here is simpler than in counting distinct substrings. In order to count only distinct prefix+suffix strings, let us count only canonical respresentations, for example those where the prefix is the longest. The criteria for when we cannot make the prefix even longer is evident from the above example: the next character after the prefix (the one that is removed) must be different from the first character of the suffix, if any. Therefore the answer is simply equal to the number of pairs of characters in the given string that differ, plus n to account for the representations where the suffix is completely empty.

Thanks for reading, and check back next week!

Algorithms Weekly by Petr Mitrichev

Wednesday, July 24, 2024

A 39 vs 17 week

No comments:

Post a Comment