Friday, October 24, 2025

A major week

Meta Hacker Cup 2025 Round 1 was the main event of last week (problems, results, top 5 on the left, analysis). As usual in the early Hacker Cup rounds, one has to balance the speed against the amount of effort you spend to test your own solutions before attempting to submit them. There is just one attempt per problem, and the submission result is only revealed after the round ends (a pretty unique setup for today's contests, but one that I think makes the competition more exciting and emphasizes the useful testing skills!), and at the same time you only need to be in top n to advance, your score does not matter for the future rounds. It does matter for the bragging rights though, so well done to Geothermal on being much faster than everybody else!

The Hacker Cup is now the "Major Individual Competition" (according to cphof) that was held the most years among those still happening, this one is the 15th edition! TopCoder Open/Invitational is still in the overall lead with 22 editions, but that number is not going to increase further. Huge thanks to the organizers for putting it together year after year, and I'm looking forward to participating in the next 15, and of course to you bringing back the onsite finals :)

Problem D was quite nice, and you can also solve it as a math problem without any computer help. You are given a string of 0s and 1s (check out the original problem statement for the more colorful story!) of length <=600000. Two players are playing a game, making moves in turns. On each move of the first player, they erase any prefix of the string that ends with a 0. On each move of the second player, they erase any suffix of the string that starts with a 1. The player who cannot make a move loses. Who will win if both play optimally?

Thanks for reading, and check back next week!

Monday, October 20, 2025

A lifting week

The Universal Cup held its Stage 2: Paris during the Oct 6 - Oct 12 week (problems, results, top 5 on the left). This was one of the very rare occassions where the winner seemingly was not a veteran team, but a team THU1 consisting of university students. They were already ahead of the usual suspects on 12 problems by penalty time, and solving the last problem 10 minutes before the end was the icing on the cake. Congratulations!

Codeforces Round 1058 wrapped up the week on Sunday (problems, results, top 5 on the left, analysis, my screencast). As the custom goes these days, I have started well on the easier problems but then got stuck, trying to make progress in either of the three remaining problems, and actually being close to solving D2 at the end, but not quite.

Nobody was able to solve everything this time, but the top 5 shows that many different strategies were competitive. In the end solving the problems fast from left to right was the winning recipe for Benq. Congratulations!

In my previous summary, I have mentioned another Codeforces problem: you are given n<=250000 numbers which are written in sorted order. You need to process q<=250000 queries, each query is a segment [l, r] of the numbers. For each query, you need to find the maximum amount of numbers within the segment [l, r] that can be taken in such a way that for each three taken numbers the difference between the largest and the smallest of them is more than z. The number z is the same for all queries.

The first step is solving the problem for a single [lr] query. It is not hard to convince oneself (if we want to have a formal proof, we can use the exchange argument) that we can do it greedily: take the first two numbers in the segment al and al+1, then take the first number at position l+2 or higher that is bigger than al+z, then take the first number to the right of that that is bigger than al+1+z, and so on.

Now we need to learn to simulate this process fast so that we can process many queries with different values of l within the time limit. The key observation is that the greedy solution can be split into two almost independent chains: the odd-numbered values are al, then the first number ak that is bigger than al+z, then the first number that is bigger than ak+z, and so on; and the even-numbered values are al+1, the first number that is bigger that al+1+z, and so on. So a solution could count the numbers in those two chains independently and add the results up.

How do we count the numbers in one of those chains, starting with a given position al and until we pass ar? This can be done using a standard algorithm called binary lifting: we can precompute where do we end up after we do 2i jumps from position l for each l and i. Then to answer an [lr] query we first try to make 2i jumps with the highest possible i, if we overshoot ar then we skip this step otherwise we take it, and then do the 2i-1 step, and so on. Since the true answer has a binary representation, we will find it in just O(log n) steps per query, and the precomputation takes O(n*log n).

However, this approach is not the full solution since the two chains are only "almost" independent. More specifically, it can happen that the arrive at the same position, and then one of the chains will instead take the next consecutive number as described above.

It turns out that for a given two starting positions l and l+1, we can again use binary lifting to find out when this happens! If two chains are at the same position at a certain moment, they will also be at the same position after any number of additional jumps. It means that we can still use the binary lifting idea of trying a large 2i jump first for both chains, and if we are in the same position then we roll it back and try 2i-1, and so on.

Note that it is not necessarily true that the two chains will reach the same number at the same step, though. For example, it might happen that al+1>al+z, and then the first chain will have al+1 at the second position. However, because the jump function is monotonic, the second chain will always be at the same position or to the right of it for the same step, and the first chain will always be at the same position or to the right of it when one step ahead of the second chain. So we just need to check those two possibilities (a collision at the same step, or a collision where the first chain is one step ahead) in the binary lifting condition.

This way we can precompute for all n options for the starting position l, where, and after how many jumps, do the chains starting from al and al+1 collide. This precomputation takes O(n*log n). Note that after the two chains collide, we now have the collision position k, and the two chains continue from ak and ak+1, so we have exactly the same situation as in the beginning.

Now we can consider a new type of jump, from al to ak as described above, and do binary lifting on those big jumps! This time we need to precompute two things for each l and i: where do we end up after 2i big jumps, and how many small jumps does that entail.

And finally to correctly answer an [lr] query, we first do the binary lifting steps using big jumps from al, and as soon as the next big jump would overshoot ar, we do the binary lifting steps using small jumps for the remainder.

The solution was ultimately about applying a standard algorithm three times, not about coming up with a novel approach. However, I still find it quite beautiful how the same idea just keeps on giving.

Thanks for reading, and check back for last week's summary!

Thursday, October 9, 2025

An enough week

Squarepoint Challenge, also known as Codeforces Round 1055, was the first event of last week (problems, results, top 5 on the left, analysis). This round followed a different problemsetting philosophy, so for me it was mostly about figuring out the best way to implement a solution and actually implementing it, as I got the general idea for solving each problem, including H2, fairly quickly. Unfortunately, I got bogged down with the details of F for almost a whole hour, which meant I could not really compete for the top spots.

Nachia, on the other hand, was one of the fastest on the first 8 problems and managed to figure out the implementation of H2 with almost half an hour to spare. Congratulations on the victory!

Here is that problem F that delayed me: you are given n<=250000 numbers which are written in sorted order. You need to process q<=250000 queries, each query is a segment [l, r] of the numbers. For each query, you need to find the maximum amount of numbers within the segment [l, r] that can be taken in such a way that for each three taken numbers the difference between the largest and the smallest of them is more than z. The number z is the same for all queries. Can you see how to combine the standard building blocks to obtain a solution here?

The Universal Cup opened its 4th season with Stage 1: Korolyov (problems, results, top 5 on the left, analysis). The usual suspects USA1 were only the third team to solve everything, but they were much faster on the easy problems and won with much lower total penalty time. Congratulations to them and to the teams THU1, PKU1 and HoMaMaOvO who also solved 11!

In my previous summary, I have mentioned two problems. The first one (E1, E2) came from Codeforces: there is a hidden array of size 2n-1, where each number between 1 and n appears twice, except one which only appears once, and you need to find out which one. To do that, you can only ask queries of the following form: choose a collection of indices in the array, and a number between 1 and n, and you will be told if this number appears at at least one of those indices in the array.

In E1, we have n<=300 and you need to find the answer using at most 4n+2*ceil(log2n) queries. In E2, we have n=300 exactly and you need to find the answer using at most 925 queries.

Having n=300 instead of n<=300 in E2 strongly hints that the expected solution is randomized, while the 4n+2*ceil(log2n) formula in E1 most likely promises a deterministic one. So the two problems with the same story would seem almost independent in terms of their solutions.

There is a building block that is shared between the solutions, and it is somewhat natural: let us split the entire array into two parts, and ask if a given number appears in each of those parts.

It is easier to see how to make a randomized solution out of this building block: if we know that a number appears in both parts, then it cannot be the answer, as it must have two occurrences. Moreover, if we choose the parts randomly, then with probability around 1/2 the two occurrences will end up in different parts and we will exclude this number from the candidates for the answer. And this, in turn, means that after 1/(1/2)=2 attempts on average we will exclude each number that is not the answer, so after around 2n total attempts we will have only one number remaining which we can return (this is not exactly the case as we will also spend attempts on the number that is in fact the answer, so effectively the least lucky sequence gets multiplied by 2, but this change only adds a logarithmic term to the expected total). Given that each attempt takes 2 queries, we will spend around 4n queries on average, which is too much as 4*300=1200>925.

On a more careful examination, we can notice that we don't always need 2 queries per attempt. If we got the answer that the first part does not have the number, we do not need to ask about the second part as we know it is there. So on failed attempts (that have both occurrences in one part) we will spend only 1.5 queries on avereage, while on successful attempts we will spend 2. The number of failed attempts is 2-1=1 per number on average, so we will spend around 3.5n queries in total, which is still a bit too much as 3.5*300=1050>925.

Let us put E2 aside for a moment and return to E1. How can we make a deterministic solution out of the same building block? Well, let us fix the split into two parts, and ask for all numbers if they appear in each of the two parts. We spent 2n queries on this, and all numbers are split into three categories: appearing only in the first part, only in the second part, or in both. The key observation here is that by comparing the parity of the size of a part with the parity of the amount of numbers that appear in both parts (and therefore have 1 occurrence in each part) we can tell if the answer, the only number that appears in one part but only once, is in this part or not. This way we have reduced the problem to one with twice smaller size, but now we also have some numbers that are known to appear only once.

We can now apply the same idea recursively to the smaller problem, with the small optimization that for numbers that are known to appear only once, we need only one query to figure out which of the two parts does it belong to. Since we spend two queries for a number that appears twice or for the answer and one query for a number that is known to appear once, the total number of queries to split a part in two is always the size of the part plus one. So the total number of queries we need is roughly 2n+n+n/2+... ~= 4n, so we have solved E1.

Returning to E2, we need to notice that the first step of our solution for E1 is quite efficient: we spend 2n queries, and on average we will have only n/4 candidates left after it, so we eliminate 3n/4 candidates in 2n queries, 8/3 queries per candidate on average, and 8/3 is much less than 3.5 which was our average before. We cannot be as efficient in subsequent steps (because we also have to deal with numbers that are known to appear once), but we do not need to: applying the naive randomized solution which spends 3.5 queries per candidate for the remaining n/4 candidates yields 7n/8 queries, so the total number of queries is 2n+7n/8=23n/8, which for n=300 gives 862.5, well below the allowed amount. This solves E2.

I find the different yet interconnected solutions for the two variants of the problem quite beautiful.

The second one came from AtCoder: you are given a tree with n<=5000 vertices. We choose a real weight ai for each vertex i from the range [-n+1,1] independently and unformly at random. Now, for each vertex i we compute xi as the highest sum of weights in a connected subtree containing vertex i. Now we define the score of the tree as follows: if for all i we have 0<=xi<=1, then the score is equal to the product of all xi, otherwise it is 0. You need to find the expected value of the score.

First, let us look at the connected subtree B with the maximum sum of weights among all connected subtrees. All vertices in this subtree have the same value of xi=y. Let us choose one of those vertices and root the tree from that vertex. Now suppose we run the standard dynamic programming that computes for each subtree the highest sum of weights in a connected subtree containing the root of that subtree, with the very straightforward transition of dpi=ai+sum(max(0, dpj)) over all children j of i.

For each vertex i in our best subtree B we have the following constraints: dpi<=y (since otherwise we have a connected subtree with value more than y, which is a contradiction to the choice of y), and dpi>=0 (since otherwise we can remove y together with its subtree from B to obtain another connected subtree with a higher value). Additionally, for the root of B we must have dpi=y.

It turns out that those conditions are not just necessary, but sufficient! If for all vertices i in B we have 0<=dpi<=y, and for the root dpi=y, then for all vertices i in B we have xi=y. We can see this by looking at how an arbitrary connected subtree intersects with B, and realizing that since dpi>=0, we can always add more vertices of B to the subtree without decreasing its value until the intersection becomes a subtree of B, and then its value is bounded by dpi<=y so we can further augment the intersection to the full B without decreasing the value.

Now suppose we choose the values of ai as we run this dynamic programming. Then no matter what are the values of dpj for vertices j not in B, for each vertex i that is in B but is not its root we will have a range of values aof length exactly y (out of the overall range [-n+1,1] of length n) that will yield 0<=dpi<=y, and for the root there will be exactly one value of ai that yields dpi=y. This means that we can compute the probability density for xi=y in B as a function of y that is independent from what happens outside of B: it is simply yb-1/nb, where b is the number of vertices in B.

Now let us look at the connected subtree C with the highest value that contains at least one vertex outside of B. All vertices of C that are not in B have the same value of xi=z, and we can assume that those vertices form a connecteded subtree themselves. Indeed, suppose removing B splits C into multiple independent parts. Since B is the subtree with the highest sum, each of those parts must have a non-positive sum, otherwise we could improve B. But then we can remove all but one of those parts and obtain a new C with at least as high value of z that is in one piece. So there are essentially two choices to consider: either C is disjoint with B, or C contains the entire B plus some other connected subtree.

Then we can apply exactly the same approach as we did for B and prove that the probability density for each vertex in C-B having xi=z is simply zc-1/nc, and we additionally know that z<=y.

We can keep applying the same step until the entire tree is split into connected components with the same values of xi, with the probability density of that happenning being a product of terms of the form yb-1/nb and some constraints of the form z<=y. And here comes another crucial observation: we can forget about the constraints of the form z<=y! Since when z>y, we can still obtain this structure, just taking the component with z before he component with y. So in effect, we just need to split the tree into connected parts, and then we can handle the parts completely independently, with the only constraint being 0<=y<=1.

The product of xi for all i in B is equal to yb, and its expected value is therefore equal to the integral of yb*yb-1/nb over 0<=y<=1, which is equal to 1/(2*b*nb). The expected value of a product is equal to the product of expected values for independent random variables, and we already established that we can treat y, z, ... independently. So we just need to find the sum of products of 1/(2*b*nb) for all possible ways to split a tree into connected subtrees, which is solved with a standard dynamic programming on the tree that remembers the size of "running" part containing the root of the subtree in its state, and runs in O(n2) since the sum of products of sizes of children being merged in such DP for any tree is O(n2), in turn since every pair of vertices merge into one subtree only once.

From another angle, we have "reordered the summation" in this solution two times: first, we decide on the values of xi, and then we decide on the values of dpi, and only then we obtain the values of ai which are the original random variables.

I find this solution quite beautiful as it requires creating a deep understanding of the structure induced by the process described in the problem statement, and this structure ends up being non-trivial yet tractable. Well done to the problemsetters!

Thanks for reading, and check back next week.

Sunday, October 5, 2025

A grand week

After a bit of a summer lull the contests are starting to gain steam again. Last week started with the Codeforces Round 1053 on Wednesday (problems, results, top 5 on the left, analysis). I got stuck in the implementation details of problem D for almost a full hour, then spent some time implementing a local tester for E2 to see how far my solution for E1 was from solving E2 (quite far) and trying to improve it by replacing binary search with ternary (only made it worse), and therefore when I opened problem F and got the right idea almost immediately (use length 2 in the first step, consider the diameter in the second step to reduce the problem to one on the line) I could not finish the implementation in time, even though 15 minutes were added to the contest duration. Some quite usual scenes for me these days :)

It was usual scenes for Kevin as well, only for him it meant solving F much faster than everybody else and getting the first place by a margin. Congratulations!

I would like to highlight the interactive problem E (E1, E2) from this round: there is a hidden array of size 2n-1, where each number between 1 and n appears twice, except one which only appears once, and you need to find out which one. To do that, you can only ask queries of the following form: choose a collection of indices in the array, and a number between 1 and n, and you will be told if this number appears at at least one of those indices in the array. In E1, we have n<=300 and you need to find the answer using at most 4n+2*ceil(log2n) queries. In E2, we have n=300 exactly and you need to find the answer using at most 925 queries. What do those differences in constraints tell you?

AtCoder Grand Contest 073 on Sunday brought another AI story (problems, results, top 5 on the left, analysis). I got off to a good start by solving A faster than everyone in the screenshot on the left, but the remaining 2.5 hours of the round were not very productive. I felt that I was quite close on B and I kept updating my solution to handle yet another counterexample found by the stress test, but I failed to generalize and to notice that one of the corner cases in my wrong solution could be made into a correct solution.

tour1st has solved that same B in just over 8 minutes, and got the three solvable problems done almost half an hour before everybody else. Congratulations on the win!

The big story of this round was of course this announcement (some reactions), which was followed up by this rule change. AtCoder Grand Contest was the last place where using AI was explicitly allowed and even encouraged, hoping to discover new creative ways to use AI to improve the humans' problem solving ability. However, in this round it turned out that AI can solve problem C without human help, and this was the last straw that made AtCoder admins disallow the use of AI in AGC going forward, to avoid the contest turning into a prompting competition, and into a "who has money to pay for access to a better model" competition.

Here is that fateful problem C: you are given a tree with n<=5000 vertices. We choose a real weight ai for each vertex i from the range [-n+1,1] independently and unformly at random. Now, for each vertex i we compute xi as the highest sum of weights in a connected subtree containing vertex i. Now we define the score of the tree as follows: if for all i we have 0<=xi<=1, then the score is equal to the product of all xi, otherwise it is 0. You need to find the expected value of the score. Can you keep up with AI here?

Thanks for reading, and check back later for this week's summary!