Algorithms Weekly by Petr Mitrichev: August 2023

Sunday, August 27, 2023

An onsite week

Harbour.Space Scholarship Contest 2023-2024 on Codeforces was the main event of the week (problems, results, top 5 on the left, discussion, analysis). Radewoosh solved the first eight problems 20 minutes faster than everybody else, and therefore had all but guaranteed himself first place by the middle of the contest. Still, he kept his focus and managed to get problem I accepted as well with just 3 minutes to go. Huge congratulations!

I skipped this round and therefore cannot tell you much about its problems. Therefore, I'd like to use this post to talk about the upcoming onsite event that you might not be aware of :) I'm not referring to the IOI, which takes place next week in Szeged, Hungary, since I expect everybody interested in it to know where to find more information. Instead, I'd like to talk about the AtCoder WTF that takes place on September 8-9 in Tokyo, Japan.

When AtCoder announced its own onsite competition, the World Tour Finals, back in 2017, the world of onsite algorithmic competitions was quite packed. They have managed to host one edition of the WTF before the pandemic hit. All onsite competitions were cancelled or moved online during the pandemic, but instead of returning after global travel became relevant again, most were cancelled completely: Google Code Jam, TopCoder Open, likely Meta Hacker Cup. There were a few more Russia-based onsite competitions that are of course no longer a thing. ACM ICPC and high school olympiads such as the IOI are back in the onsite mode, but they have age upper bounds for participation.

AtCoder did not move the WTF online, instead they kept running the yearly qualification cycle, promising to invite the top 8 from each year to the onsite round in Japan when it becomes possible again. This is finally happening in two weeks! 18 finalists out of the maximum of 32 from the four qualification cycles will compete on the same problems over two competition days. This approach lies in contrast to the ACM ICPC approach where the finals for the 2022 and 2023 seasons are happening together in Egypt but the actual competitions will be kept separate. Which one do you think makes more sense?

I am really excited both for the competition (where to be honest I no longer stand a chance), and for the onsite event atmosphere. As an added twist, AtCoder have announced the online participation option for the future WTFs; I can't predict how this will work out, but when TopCoder did the same for TCO22, so few people decided to come onsite that they decided to cancel the onsite event altogether. Therefore this might be the last onsite event without the age upper bound ever :)

Thanks for reading, and check back next week!

Sunday, August 20, 2023

An arborescence week

There was no contest that I'd like to mention this week, so let us come back to the AtCoder problem from the previous summary: you are given a connected graph where each edge is either red or blue, and each vertex is also either red or blue. You need to find a spanning tree in this graph such that for each vertex it has at least one adjacent edge of the same color as the vertex, or report that there isn't any.

The first step is to notice that all edges are of three types: type 2 are the edges where both ends have the same color as the edge, type 1 are the edges where one end has the same color as the edge, and type 0 are the edges where no end has the same color as the edge. We will also direct the type 1 edges from the vertex of the correct color to the vertex of the wrong color.

Clearly type 2 edges are the most useful for reaching our goal. In fact, if every vertex has at least one type 2 edge adjacent to it, then we can simply find the spanning forest among the type 2 edges. If it is a spanning tree, then we have solved the problem. If it is not connected, since we have already satisfied the color constraints, we can augment the forest to a tree using the remaining edges in any way.

What if there are some vertices that do not have type 2 edges adjacent to them? If there is a vertex with no outgoing type 1 edges, or in other words a vertex where all adjacent edges have the wrong color, then there is clearly no solution. So we only have to consider the case where some vertices have type 2 edges adjacent to them, and all remaining vertices have at least 1 outgoing type 1 edge. We will call the vertices with type 2 edges adjacent to them type 2 vertices, and the ones without them type 1 vertices.

The key remaining observation is that we cannot solve the problem using only type 1 edges. Each type 1 edge satisfies the color constraint for 1 vertex, but overall we have n vertices but only n-1 edges. So a logical idea is to start with the spanning forest of the type 2 edges that will cover all type 2 vertices, and then repeatedly try to find a type 1 edge that goes from a yet uncovered vertex to an already covered vertex (building something similar to an arborescence). If we manage to cover all vertices this way, then we can once again augment the resulting forest to a tree using the remaining edges. If we get stuck at some point, then there is no solution.

But why is this correct? We have proven that we need at least one type 2 edge, but why is it always correct to take ~all of them? This is a very typical situation for problems involving the spanning trees, as they all usually end up relying on some kind of greedy argument, so one could just bet on intuition here in my opinion. Still, here is the formal proof: consider the situation where our solution gets stuck. At this moment, the set S of uncoverted vertices contains only type 1 vertices, and there is no type 1 edge going from this set to the set of covered vertices. But then how could S be covered in any other solution? Since we have at most |S|-1 edges within S, and only edges within S can cover its vertices, one of them will have to remain uncovered in any tree.

Thanks for reading, and check back next week!

Sunday, August 13, 2023

An apiad week

AtCoder Grand Contest 064 was the main event of the week (problems, results, top 5 on the left, analysis, my screencast, race standings). apiad's usual strategy worked spectacularly well this time, as after spending two hours on E he was able to solve just enough easier problems to win the round, while those who started with the four easier problems did not have two hours remaining for E and could not figure out all details. Congratulatons on the victory!

I was one of those starting with the four easier problems. Somewhat unexpectedly, I did not get stuck in them and did actually have a bit more than an hour remaining for E, and made some significant progress on paper: I realized that if the sum of all a_i is zero, and the sum of all b_j is zero, and we can place the n² negated sums -(a_i+b_j) in the matrix in such a way that every column and row has exactly one of each a_i and b_j, then the sum of the 2n-1 cells in a cross, which is the sum of the row plus the sum of the column minus the middle cell, will be exactly a_i+b_j. I've then realized that if the sum of all a_i plus the sum of all b_j is divisible by 2n-1, then we can get both of those sums equal to zero with some constant shifts. Finally, I've correctly hypothesized that in case it's not divisible by 2n-1, then we can only achieve score n²-1, and figured out how to do so with some more constant shifts for the first row and the first column.

Therefore the only remaining step was to place the n² negated sums -(a_i+b_j) in the matrix in such a way that every column and row has exactly one of each a_i and b_j. This does not depend on the values of a and b, so this simply needs to be done once for every n. It seemed quite doable as it felt like a more advanced derangement, and derangements are quite dense. I've tried some random placements and noticed that I can find such a placement for odd n, but not for even n. And this is pretty much how far I've got in an hour :)

Judging from the editorial (which I do not understand), it seems that for even n we need to use more degrees of freedom, therefore while I enjoyed coming up with my approach, I needed another huge step to finish the solution, and therefore needed at least another hour.

I found problem B to be a cute test of how well one understands the spanning tree mechanics (it is amazing how many nice problems can be derived from a relatively simple concept of a spanning tree, and the fact that it can be found greedily!): you are given a connected graph where each edge is either red or blue, and each vertex is also either red or blue. You need to find a spanning tree in this graph such that for each vertex it has at least one adjacent edge of the same color as the vertex, or report that there isn't any. Can you see a way to do it?

Thanks for reading, and check back next week!

Friday, August 4, 2023

A power of two week

There were no contests that I'd like to mention this week, but we still have two cool problems C from the previous summary to discuss. First, let us talk about a Codeforces problem: you have a set S of integers between 1 and M (M<=500). In one operation, you pick an element of the set uniformly at random, and increment it by 1. If the incremented element coincides with another element of the set, we keep only one copy, so the size of the set decreases by 1. If the incremented element exceeds M, we discard it, so the size of the set also decreases by 1 in that case. We are given the starting set S, and need to find the expected number of steps before it becomes empty.

When I was thinking about this problem, I have convinced myself that any solution will need to keep track of the number of elements in the set S as we divide the probabilities by that number on each step, and unlike multiplication, division does not lend itself to some nice linearity of expectation arguments. Well, it turns out it does :)

Consider two adjacent numbers from the initial set S. How do they move? Well, either the lower number increases by 1, in which case it might actually merge with the higher number, or the higher number increases by 1, in which case it might disappear if it exceeds M, and then the lower number will also disappear after a few more steps. What is the expected number of steps that the lower number makes? Here comes the stunning observation: we do not actually need to know anything about the rest of the set S to compute this expected number of steps! No matter what happens to the rest of the set, every time one of those two numbers moves, it will be the lower number with probability 50%, and the higher number with probability 50%. So we can implement a relatively standard dynamic programming to compute the expectation.

And then, as you have probably guessed by this point, thanks to the linearity of expectation the answer to the problem can be computed as the sum of those expected numbers of steps over all pairs of adjacent numbers in the input, plus the expected number of steps for the highest number (which is just M+1 minus that number).

The really unexpected aspect of this solution for me is that we end up only ever dividing by 2, even though on the surface it looks like we need to divide by numbers up to |S|. Having learned this from the editorial, I immediately remembered one thing that I considered doing during the contest: can we try to reconstruct the answers to the sample cases in the fraction form from the values modulo 1000000007 that are given in the problem statement (750000009, 300277731 and 695648216)? Knowing the solution, I realized that those fractions would have a power of two in the denominator, and that might have been just the clue I needed to solve this problem.

I've tried doing this now, and it turns out this reconstruction is not so easy. Just trying to find a fraction with the smallest numerator and denominator yields:

750000009 = 15/4
300277731 = 26062/39247
695648216 = 35459/32906

Which is clearly incorrect for the last two cases, as the answer cannot be less than 1 or so close to 1. Adding some reasonable boundaries on the value of the fraction (it must be between 10 and 50 for the last two cases), we get:

750000009 = 15/4 = 3.75
300277731 = 133137/8642 = 15.40580884054617
695648216 = 133345/10938 = 12.19098555494606

Which actually looks plausible, but is still incorrect as we know from the above. However, we know from the problem statement that the denominator cannot have prime factors greater than |S|, which in our case is 5. Adding that constraint produces:

750000009 = 15/4 = 3.75
300277731 = 1241063/65536 = 18.937118530273438
695648216 = 582323/32768 = 17.771087646484375

Which looks even more plausible, to the point that it might be correct (I did not check), and in any case might have delivered the key hint about only ever dividing by 2.

Now, some questions to the reader :) Do you routinely try to convert the sample answers from the "modulo prime" form to the fraction form? Did it ever help you solve a problem? Do you have some tricks on how to do this more reliably? If the problemsetters are reading this, did you consider this trick when choosing the samples for this problem?

Now, on to an AtCoder problem: you are given N<=1000 pairs of integers (A_i,B_i), each integer between 0 and 10⁹. In one operation, you choose two integers x<y between 0 and 10¹⁸, and replace each A_i with (A_i+x) mod y (note that you apply the same x and y to all A_i). You need to make it so that A_i=B_i for all i after at most N operations, or report that it is impossible.

During the round, I guessed correctly that this is almost always possible, except for the cases like A₁=A₂, but B₁≠B₂. However, I could not find a nice way to reorder the numbers. I was thinking mostly about operations where y is bigger than the biggest number, to have at least some control over what is happening. In that case, by choosing the appropriate x, we can split the numbers into two parts and interleave those parts, but that does not give us enough control to achieve an arbitrary ordering.

One other thing that came to mind when thinking about this interleaving operation is that it can only make the numbers closer to each other, but not pull them apart. However, when no interleaving actually happens, in other words when we just do a cyclic shift of the numbers, then we are not reordering, but we can pull the two halves apart as much as we'd like.

This is how far I've got during the round, and it turns out that this is the right direction, but I missed one final trick. Here is a working plan: in the first N-1 operations, we do N-1 cyclic shifts, which allows us to create almost arbitrary gaps between pairs of adjacent numbers, but still keeps the cyclic order intact. And then in the N-th operation we do a massive reodering and everything falls into place.

Knowing the plan, it is relatively easy to figure out the details. Assuming the last operation uses x=0 and some y=Y that is bigger than all B_i, we just need to make A_i=Y*k_i+B_i. And now we can choose k_iin such a way that the cyclic order of those A_i coincides with the initial cyclic order of A_i, moreover we can choose any particular cyclic shift of this cyclic order, which then makes planning the first N-1 operations easy. You can find more details in the editorial.

However, it is still unclear to me how I could have arrived at this final trick without just "seeing" it out of nowhere. Even though the operation used in this problem is simple, the space of ways to use it is endless, and it was hard for me to somehow narrow down the search. For many problems, it is more or less clear where the difficulty lies and that you must use a certain resource optimally to arrive at the solution, and this helps cut less promising directions quickly when trying to solve the problem. However, in problems like this one, one cannot be sure that the direction is correct until they see a full solution.

To the 174 people who solved this problem during the round: how did you arrive at the solution? Is there some intermediate reasoning step that makes the last leap of faith smaller, or is it just small enough for you as it is?

Thanks for reading, and check back next week!