Algorithms Weekly by Petr Mitrichev: 2025

Monday, December 1, 2025

A 29 week

Meta Hacker Cup 2025 Round 3 on Saturday narrowed the field down to the 25 finalists (problems, results, top 5 on the left, my screencast). After solving B relatively quickly, I got stuck on A for about 50 minutes. The overall solution plan was relatively clear — k=2 is a special case, in all other cases we can achieve the minimum amount derived from the area by solving greedily with some small backtracking or heuristic to deal with the diagonal. However, I could not figure out that heuristic, neither on paper nor by trying things on the computer. When I decided to give up and switch to other problems, a lot of time had already been wasted.

I've then solved D and C relatively quickly again, and also figured out the overall plan for E on paper: we just need to put two 1s at the end and solve recursively, and to achieve that we can first move two 1s into one of the last 6 positions by big swaps that essentially swap the first and second halves of some suffix, and then do some casework to move them to exactly the last 2 positions. However, I got bogged down in that casework, and only a few minutes before the end I realized that I can just implement any bfs, dfs, or bruteforce to solve the problem for n=3 without manual casework, and it was a bit too late, I have finished debugging only by using 5-10 additional minutes after the end of the round.

It was therefore only place 29 for me, and top 25 qualify for the final round :( It is an improvement over place 42 last year, and unlike the AtCoder WTF qualification, here I felt that the qualification was well within reach, and I had only myself to blame for wasting too much time on A. So, I have big hopes to qualify next year!

What is more disappointing (but maybe expected?) is that I did well on quite standard problems B, C and D that were mostly about quick implementation for me, but poorly on the heuristic/ad-hoc problems A and E. Those problems were set nicely in such a way that many approaches work, without squeezing the constraints so much that only a specific heuristic is required, and I enjoy both solving and setting such problems a lot. I think such problems reward the will and ability to experiment using a computer and practical problem solving skills, which are quite general qualities useful outside of the programming competition world. So please do try to solve problems A and E for practice if the above did not spoil them too much, I think you will find it quite enjoyable!

As usual, the strategy does not matter as much if you can just solve everything :) Congratulations to Yui, Gennady and Kevin on the great performance!

The 4th Universal Cup Stage 7: Grand Prix of Zhengzhou also took place on Saturday (problems, results, top 5 on the left). Gennady and Kevin showed up again to claim the first place together with Andrew, even though Yuhao, Lingyu and Qiwen (with an average Codeforces rating even higher than USA1!) put up a good fight and were trying to solve L and overtake them right until the end. Well done to both teams!

Thanks for reading, and check back next week!

Thursday, November 27, 2025

A double prix week

Before I cover the last week, I need to mention the 4th Universal Cup Stage 5: Nanjing, which happened the week before and which I forgot to cover (problems, results, top 5 on the left, analysis in Chinese). Even having only 2 out of 3 teammates did not stop team USA1 from winning by a whole problem. Congratulations!

The Universal Cup can't stick to the plan of having rounds only every other week, so Stage 6: Shenyang took place last week (problems, results, top 5 on the left, analysis in Chinese). The first two places look suspiciously similar to the previous round, but for the veteran teams Polish Mafia and Almost Retired Dandelion who also solved everything this was the best result of the season. Well done!

Codeforces Round 1066 wrapped up the week (problems, results, top 5 on the left, analysis). Some of the Universal Cup stars were also strong this time, but they were complemented in the top 5 by jiangbowen and PEIMUDA who were not at the very top in the team rounds. Congratulations on the nice individual performance!

Thanks for reading, and check back next week for a more meaningful summary!

An array week

Meta Hacker Cup 2025 Round 2 was the first event of the Nov 10 — Nov 16 week (problems, results, top 5 on the left, analysis, my screencast). Once again getting the problems right was more important than solving them fast, but it is still more impressive to do both at the same time. Congratulations to Benq and Um_nik who got quite a margin in penalty time compared to the rest of the full scorers!

In problem D, my solution (like all other solutions) was exponential in k. When I ran it on the validation input, it turned out that one case with k=25 (the maximum) takes about 30 seconds, so I was not sure if 80 such cases would run in time, even with the multithreaded template. But then I realized that the exponential part of the solution depends only on k, so we can precompute it once for all possible values of k, and then solve the 80 testcases in polynomial time. The precomputation only took about 40 seconds for all values of k (since the running time is exponential, k=24 is already much faster than k=25 and so on), and I was able to comfortably submit within the 6 minute window.

What I did not realize is that even if the precomputation took 10 minutes, I would still be able to submit, because one can run the precomputation before downloading the input! And unlike regular contests where the result of the precomputation in such cases must fit into the source limit, here we can just keep it in memory (in case we trust that the program will not crash on the first run) or save it to disk. I never had to do something like this in the Hacker Cup, but will definitely keep this trick in mind!

Codeforces Round 1064 wrapped up the week on Sunday (problems, results, top 5 on the left, analysis). maroonrk was done with 42 minutes to go, but only JDScript0117 was able to join him with 5 problems, and only with 5 minutes remaining in the round. Congratulations to both on solving everything!

In my previous summary, I have mentioned my search for a nice and simple templated treap. I have decided to try building one myself, and you can see the result in action in this submission to problem F2 from Global Round 30 (treap class itself at the top, its usage at the bottom).

The API is modeled on lazysegtree, and it is parametrized with exactly the same types S and F and operations on them, with the same requirements on the composition of operations. A treap contains zero or more blocks, where each block represents an array of values of type S. We support the following operations:

t.prod(block) -> S: computes op(...(op(op(e(), a), b), ...) if the given block has elements (a, b, ...). Runs in O(1).
t.apply(f, block): replaces each element x of the given block with f(x). Runs in O(1).
t.single(s) -> block: creates a new block with a single element with the given value. Runs in O(1).
t.destroy(block): destroys a block so that its memory can be reused. Runs in O(block_size).
t.concat(block1, block2) -> block: concatenates two blocks into one. The old blocks are consumed by this operation and are no longer valid after it. Runs in O(log(block_size)).
t.split_left(block, g) -> (block, block): splits a block into a prefix and a suffix, given a predicate g. The split point is chosen in such a way that both of the following are true: either prefix is empty or g(prod(prefix)) is true; either suffix is empty or g(prod(concat(prefix, first_element(suffix)))) is false. If g is monotonic on prod(prefix), then this will find the point where it changes from true to false. Runs in O(log(block_size)).
t.interleave_left(block1, block2, g) -> block: interleaves two blocks into one. The predicate g tells, given the products of a prefix of block1 and a prefix of block2, whether the last element of that prefix of block1 should come before the last element of that prefix of block2. Runs in O(smaller_block_size*log^2(block_size)) maybe?

This API tries to unify and generalize the two common cases of splitting by index and splitting by key. Splitting by index can be achieved by adding a 1 as one of the elements of struct S, implementing op as addition for that element, and using a predicate on that sum for splitting. Splitting by key can be achieved by storing the key as one of the elements of struct S, and implementing op(k1, k2)=k2 (this is what the submission linked above does).

I've also tried to make the API harder to misuse by making the block type moveable but non-copyable, so that the compiler helps make sure that we don't use a block after it has been consumed by an operation. This requires some std::move's in user code, but hopefully that is not too big of a burden.

We can also add more operations that can be expressed using the above ones, but allow a dedicated implementation with a better constant factor, such as insert (instead of single+interleave) or a product on a range (instead of two splits, product, then two concats back). However, I wanted to start with a small API to be able to iterate faster.

It is certainly not as powerful as a fully custom treap, but I think it can express most of the operations from BYOT, with the exception of range reverse and segment tree beats. And for me it is much more convenient to think in terms of concatenating and splitting arrays and applying associative operations on them, rather than directly in terms of treap implementation. What do you think, and do you have improvement suggestions?

Thanks for reading, and check back soon for last week's summary!

Saturday, November 15, 2025

A treap week

Codeforces Global Round 30 was the main event of last week (problems, results, top 5 on the left, analysis, my screencast). I was doing quite well on the first six problems, but got bogged down with implementing treaps with some additional accounting for problem F2, spending more than an hour on that. Right after the contest ended, I had two thoughts:

Wow, the people at the top of the scoreboard are so much better than me at implementation, how did they dig themselves out of this treap mess so fast?
I should implement some kind of templated treap and have it available for the next rounds. Currently I do not have a library of my own, and rely on the AtCoder library which has very nice templated segment trees and other algorithms, but not treaps.

Well, it turns out that the first thought was wrong. Out of the top 9 contestants (those that solved 8 problems), exactly the first three did not have treaps or splay trees in their solution for F2, but only segment trees plus standard STL data structures (while the next 6 had treaps or splay trees)! So these results clearly show that thinking some more to simplify the implementation beats just jumping into the first tedious simplementation approach one comes up with :) Well done to Otomachi_Una, Kevin114514 and dXqwq!

The second thought still stands though. However, it is not obvious to me what is the best templated abstraction for a treap. From the 9 contestants mentioned above, it seems that only hos.lyric has a templated splay tree, but even in her case the abstraction seems to leak the implementation details (push/pull), and the function splitAt might have been added specifically for this problem since it is not aligned with the rest of the abstraction, relying on the custom field "size" of the node that no other methods use. Compare this to AtCoder's lazysegtree, which is defined in abstract mathematical terms of applying two operations on ranges of data.

So, does somebody know a templated treap/splay tree abstraction that expresses the operations in abstract mathematical terms, and yet is as fast and powerful as tinkering with treap's implementation details?

In my previous summary, I have mentioned another Codeforces problem: you are given a tree with n<=5000 vertices. Consider the 2ⁿ ways to choose a subset S of its vertices, and for each subset S build a complete graph where S is the vertex set, and every two vertices are connected by an edge with a weight equal to the length of the tree path connecting them. What is the sum of weights of minimum spanning trees of those 2ⁿ graphs?

A natural question to ask is: given a particular pair of vertices a and b, in how many of the 2ⁿ subsets will they be directly connected by an edge of the minimum spanning tree? If we can answer that question, then we can just add up those answers multiplied by the distance between a and b to obtain the answer to the problem.

The first difficulty on that path is that "the minimum spanning tree" is not well defined, there might be multiple spanning trees of the same minimum weight. To deal with that, we can choose a particular minimum spanning tree, say, the one found by the Kruskal's algorithm that processes the edges in the order of weights, breaking ties in the lexicographical order of the edge ends.

In order for an edge between a and b to be chosen by the algorithm, it is necessary and sufficient that there is no path connecting a and b using the previous edges in the aforementioned order, namely using all edges of a smaller weight, and edges of the same weight that come before in the lexicographical order. Let us check how this path might go along the tree. Let us draw the tree in the following manner: the path from a to b is at the top, and a subtree grows from each of its vertices. We need to get from a to b, stopping only in vertices in the chosen subset S, such that each jump satisfies the above criteria.

Suppose there is a vertex c in S, hanging somewhere from the middle of the path, such that d(a,c)<d(a,b) and d(b,c)<d(a,b), then we can just go from a to c to b, so in those cases the edge between a and b will not be chosen. Suppose c hangs from the vertex x on the path (x might not be in S, so we cannot stop there). Then d(a,c)=d(a,x)+d(x,c), d(a,b)=d(a,x)+d(b,x), d(b,c)=d(b,x)+d(x,c), so the above inequalities translate to d(x,c)<d(b,x) and d(x,c)<d(a,x), in other words d(x,c)<min(d(a,x),d(b,x)). So in order for the edge between a and b to be chosen, we must have d(x,c)>=min(d(a,x),d(b,x)) for all c in S.

Now, suppose that inequality is strict: d(x,c)>min(d(a,x),d(b,x)). Then it's not hard to see that instead of going to/from c when passing through x, we can go directly to/from a or b, therefore such vertices are not useful to construct the path we are looking for. So the existence or nonexistence of the path depends only on the vertices c with d(x,c)<=min(d(a,x),d(b,x)). Since we already know that the path will exist when there is c such that d(x,c)<min(d(a,x),d(b,x)), the only interesting remaining case is where there is one or more vertices c such that d(x,c)=min(d(a,x),d(b,x)).

The distances between such vertices can be equal to d(a,b), so this is where the lexicographical ordering of the edges comes into play. It is possible to carefully consider it and solve the problem, but we can also make another small observation instead: vertices c such that d(x,c)=min(d(a,x),d(b,x)) are exactly those vertices that are at the distance of d(a,b)/2 from point z that is the middle of the path between a and b (note that z is either a vertex or a midpoint of an edge)!

This raises a natural question: what if, instead of focusing on the edge between a and b, we focus on all edges of the spanning tree with the given middle point z? If d is the smallest distance between z and a vertex in S, then from the above argument we can see that only edges of length 2d can exist in the spanning tree. If we root the original tree at z, and count the number m of subtrees of the root that contain a vertex at depth d (none of them will have a vertex at depth less than d by the definition of d), then it's not hard to observe that we will have exactly m-1 edges with length 2d and middle point z in the spanning tree. Note that in case z is a midpoint of an edge, then m<=2.

So we just need to count for each subtree how many ways are there to choose S in this subtree such that the smallest depth is d, and then we can combine those values to obtain the answer using the above observation, and then sum up those answers over all choices of z.

It was not strictly necessary to go through the Kruskal's algorithm and the lexicographical order part. Instead, we can immediately start with the "consider all edges of the spanning tree with the given middle point z" argument, and the solution will be short and beautiful. However, I also wanted to demonstrate how one can arrive at this idea analytically.

Thanks for reading, and check back next week!

Wednesday, November 5, 2025

A QOJ week

Pinely Round 5 started the competitive part of the last week on Thursday (problems, results, top 5 on the left, analysis). This was my first individual appearance in a screenshot on this blog since April 2024, so I think it went pretty well :) I was figuring out the solutions pretty quickly, and did not get stuck in implementation like I often do. However, it was not nearly good enough for the first place, since tourist was both the fastest, and with zero incorrect attempts. Congratulations on the convincing win!

Problem F gave me a nice warm feeling when I figured out the solution, so maybe it will be enjoyable for you as well: you are given a tree with n<=5000 vertices. Consider the 2ⁿ ways to choose a subset S of its vertices, and for each subset S build a complete graph where S is the vertex set, and every two vertices are connected by an edge with a weight equal to the length of the tree path connecting them. What is the sum of weights of minimum spanning trees of those 2ⁿ graphs?

The 4th Universal Cup Stage 4: Chengdu took place on Saturday, as usual (problems, results, top 5 on the left, analysis in Chinese). Team USA1 won the third stage out of four this year, defying the parity rule. The second place went to a team that is not in the season rating, so I assume this was a team that was participating in the original onsite contest that this stage was a mirror of. Congratulations to both teams on solving all problems!

Yandex Cup 2025 Qualification Round wrapped up the week on Sunday (results, top 5 on the left). A few people were in contention on the easier problems, but starting from problem J ecnerwala was significantly faster than the competition, so it is not a surprise that only he was able to solve everything this time. Well done!

After the 3rd Universal Cup Semifinals, this was the second big contest that required proctoring for those who want to qualify to the onsite round, and it was certainly not the last one to do so.

In my previous summary, I have mentioned an AtCoder problem: you are given a single integer n<=2¹⁰, and you need to produce a sequence p_i of n integers such that 0<=p_i<2³⁰, such that for every value of k between 1 and n there exists a number a such that the longest strictly increasing subsequence of the sequence q_i=p_i&a has length k. Here & denotes bitwise and, in other words we consider a certain subset of bits of p_i, the same subset for all i.

Many, if not most, constructive problems that have a single integer n in the input are solved by figuring out a way to produce solutions for 2n and 2n+1 from a solution for n, and this problem was also one of those. The constraints strongly hinted that we need to learn to double n using at most 3 additional bits.

Suppose we have a sequence p_i that solves the problem for some n. There are actually two ways to easily double the LIS length: we can either add a new bit at the end, replacing each number p_i with two consecutive numbers 2p_i and 2p_i+1, or add a new bit in the beginning, appending a second copy of the sequence sequence p_ito itself, but with all numbers increased by the same value 2^t chosen in such a way that all p_i<2^t. In both cases we set the corresponding bit of a to 1. This way by adding just one more bit we are now able to get all even LIS lengths from 2 to 2n. But what to do about the odd lengths?

Let us use one more bit, and append a number 2^rthat is bigger than all existing numbers to the end of the (doubled) sequence. Now if we use the same a as before, the LIS length will still be 2k since the new number becomes 0 if the r-th bit of a is not set. However, if we set the r-th bit of a to 1, then the LIS length will be 2k+1, since the new big number can be appended to any increasing sequence. This means that we now have a sequence of length 2n+1 for which we can get any LIS length between 2 and 2n+1. The only missing length is 1, but that is also easy to achieve: just set a=0.

So we can go from n to 2n+1 using 2 additional bits. What about going from n to 2n? Well, actually we can use the "append a big power of two" trick once again, and go from n to 2n+2 using 3 additional bits. Being able to go from n to 2n+1 and 2n+2 means that we can get any number if we also have solutions for the base cases n=1 and n=2.

As is often the case with AtCoder problems, after seeing the solution one cannot fathom struggling for several hours to come up with it, it looks so straightforward :)

I was discussing an old Open Cup gem with my friends recently, motivated by a lecture on binary search for beginner Swiss high school students. I was almost certain there is no way to upsolve that problem now. However, it turns out that this problem (and probably many other old Open Cup contests?) are now available for upsolving on QOJ.ac, the same website that hosts Universal Cup. Huge thanks to the admins of QOJ for preserving the old contests!

Thanks for reading, and check back next week!

Sunday, October 26, 2025

A moonlit week

The Universal Cup Stage 3: Polar was the first event of this week (problems, results, top 5 on the left, still not updated cup standings). In all three stages of this season the first two places went to the same two teams, and in odd-numbered rounds team USA1 comes out on top, if only by less than one wrong submission this time. Congratulations to both season leaders!

AtCoder Grand Contest 074 wrapped up the week just a few hours ago (problems, results, top 5 on the left, analysis, race standings). Just one minute before the end of the round it seemed like Nachia executed a perfect strategy, solving C and B first, then deciding to go for D instead of A since it was clear at that point that A+B+C would not be enough to win the round, and getting D with 15 minutes remaining to go on top with a unique set of problems solved.

However, ksun48 made not one but two submissions on C during the last minute, one of which turned out to be correct. How does one submit, find and fix a bug, and submit again, all within the last minute of the round? Well done, Kevin!

I have spent most of the round on the same problem C. It was clear from the constraints that one should dig in the direction of going from ~n to ~2n using 3 additional bits, and in the end this is exactly how my solution works, but somehow this digging took me more than 2 hours, mostly on paper.

I think it took me so long since I have made a reduction that made the solution more difficult, and did not consider rolling it back (instead, I have persevered and solved the more difficult reduced problem): I was looking for a way to configure the 3 additional bits in such a way that we can go from a LIS of length k for n/2 either to lengths k, 2k and 2k+1 or to lengths k, 2k+1 and 2k+2, depending on the parity of n.

The reason I wanted three options instead of the usual two was that the 2k+x options would not cover small values of LIS, so I thought it would make sense to make sure we also can keep k unchanged; however, on further thought after the round it became clear that the small values not covered are only 1 and possibly 2; and it is always very easy to obtain a LIS of 1 by just making everything equal, and also not hard to obtain a LIS of 2 by making almost everything equal.

Despite a few spoilers above, I hope you can still enjoy solving that problem if you have not already: you are given a single integer n<=2¹⁰, and you need to produce a sequence p_i of n integers such that 0<=p_i<2³⁰, such that for every value of k between 1 and n there exists a number a such that the longest strictly increasing subsequence of the sequence q_i=p_i&a has length k. Here & denotes bitwise and, in other words we consider a certain subset of bits of p_i, the same subset for all i.

In my previous summary, I have mentioned a Hacker Cup problem: you are given a string of 0s and 1s of length <=600000. Two players are playing a game, making moves in turns. On each move of the first player, they erase any prefix of the string that ends with a 0. On each move of the second player, they erase any suffix of the string that starts with a 1. The player who cannot make a move loses. Who will win if both play optimally?

First of all, the first player always takes a 0, and the second player always takes a 1. Naively, one might think that we should just count if there are more 0s or 1s in the string. However, it is very easy to come up with a counter-example: string 110 has more 1s than 0s, but the first player still wins because they can erase it in one go. The 1s in the beginning of the string don't really matter, as long as there is at least one 0.

However, since having more 0s still seems useful for the first player, a somewhat natural idea is to check: what if the first player tries to make a move that maximizes the balance of the remaining string — the difference between the number of 0s and 1s? Here a counterexample is also not that hard: from the string 0100 we can go to two different strings 100 and 0 with the maximum balance of 1, however if we go to 100 then the second player can take the remaining string and win, while if we go to 0, the second player loses.

Guided by this counterexample, what if we always break ties by choosing the shortest possible string with the maximum balance? After spending some time finding yet another counterexample to this approach and failing, I've decided to look for a proof that it works instead.

Suppose we have used that strategy for the first move, and the remaining string has some non-negative balance (it feels intuitive that the balance has to be non-negative for us to win). What happens to the balance after the move of the second player? After some experimentation, we can notice that it always becomes strictly positive. And indeed, it is not hard to prove by contradiction: suppose it is non-positive after the move of the second player. Suppose the string remaining after the move of the first player is s, and the second player has removed then removed the suffix u, and what remained is t, so s=tu.

The balance of s is non-negative, the balance of t is non-positive, and u starts with a 1. Since the balance of u is the balance of s minus the balance of t, is is also non-negative, so it must have a 0. Consider its first 0, which splits the string u as u=v0w, where v consists only of 1s. But then the first player could have additionally erased the string tv0 in their first move. Since the balance of t is non-positive, and the balance of v0 is non-positive, the balance of tv0 is non-positive, which means that the balance of w is the same or higher than the balance of s=tv0w. But this contradicts the choice of s.

So, after the move of the second player the balance will become strictly positive. But then we can apply the same approach (make the move that maximizes the balance of the remaining string, breaking ties to shorter remaining strings), and the balance of the remaining string will be non-negative (since at least if we take the first 0 it is non-negative, and we maximize it), so we can repeat this until we win.

We can already submit our solution at this point, but formally speaking we also need to prove that if the first player cannot make a move that leaves a string with a non-negative balance, then the second player wins. And indeed, we can apply the same argument as above: if we have left a string with a strictly negative balance, then from the point of view of the second player it has a strictly positive balance, and they can apply exactly the same strategy as above to win.

This solution is longer than the one in the editorial, but it is how I solved the problem during the round, and I hope it shows how to arrive at a solution without having to pull an idea out of thin air, for example an idea to consider balanced parentheses sequences as the editorial does.

Thanks for reading, and check back next week!

Friday, October 24, 2025

A major week

Meta Hacker Cup 2025 Round 1 was the main event of last week (problems, results, top 5 on the left, analysis). As usual in the early Hacker Cup rounds, one has to balance the speed against the amount of effort you spend to test your own solutions before attempting to submit them. There is just one attempt per problem, and the submission result is only revealed after the round ends (a pretty unique setup for today's contests, but one that I think makes the competition more exciting and emphasizes the useful testing skills!), and at the same time you only need to be in top n to advance, your score does not matter for the future rounds. It does matter for the bragging rights though, so well done to Geothermal on being much faster than everybody else!

The Hacker Cup is now the "Major Individual Competition" (according to cphof) that was held the most years among those still happening, this one is the 15th edition! TopCoder Open/Invitational is still in the overall lead with 22 editions, but that number is not going to increase further. Huge thanks to the organizers for putting it together year after year, and I'm looking forward to participating in the next 15, and of course to you bringing back the onsite finals :)

Problem D was quite nice, and you can also solve it as a math problem without any computer help. You are given a string of 0s and 1s (check out the original problem statement for the more colorful story!) of length <=600000. Two players are playing a game, making moves in turns. On each move of the first player, they erase any prefix of the string that ends with a 0. On each move of the second player, they erase any suffix of the string that starts with a 1. The player who cannot make a move loses. Who will win if both play optimally?

Thanks for reading, and check back next week!

Monday, October 20, 2025

A lifting week

The Universal Cup held its Stage 2: Paris during the Oct 6 - Oct 12 week (problems, results, top 5 on the left). This was one of the very rare occassions where the winner seemingly was not a veteran team, but a team THU1 consisting of university students. They were already ahead of the usual suspects on 12 problems by penalty time, and solving the last problem 10 minutes before the end was the icing on the cake. Congratulations!

Codeforces Round 1058 wrapped up the week on Sunday (problems, results, top 5 on the left, analysis, my screencast). As the custom goes these days, I have started well on the easier problems but then got stuck, trying to make progress in either of the three remaining problems, and actually being close to solving D2 at the end, but not quite.

Nobody was able to solve everything this time, but the top 5 shows that many different strategies were competitive. In the end solving the problems fast from left to right was the winning recipe for Benq. Congratulations!

In my previous summary, I have mentioned another Codeforces problem: you are given n<=250000 numbers which are written in sorted order. You need to process q<=250000 queries, each query is a segment [l, r] of the numbers. For each query, you need to find the maximum amount of numbers within the segment [l, r] that can be taken in such a way that for each three taken numbers the difference between the largest and the smallest of them is more than z. The number z is the same for all queries.

The first step is solving the problem for a single [l, r] query. It is not hard to convince oneself (if we want to have a formal proof, we can use the exchange argument) that we can do it greedily: take the first two numbers in the segment a_l and a_l+1, then take the first number at position l+2 or higher that is bigger than a_l+z, then take the first number to the right of that that is bigger than a_l+1+z, and so on.

Now we need to learn to simulate this process fast so that we can process many queries with different values of l within the time limit. The key observation is that the greedy solution can be split into two almost independent chains: the odd-numbered values are a_l, then the first number a_k that is bigger than a_l+z, then the first number that is bigger than a_k+z, and so on; and the even-numbered values are a_l+1, the first number that is bigger that a_l+1+z, and so on. So a solution could count the numbers in those two chains independently and add the results up.

How do we count the numbers in one of those chains, starting with a given position a_l and until we pass a_r? This can be done using a standard algorithm called binary lifting: we can precompute where do we end up after we do 2ⁱ jumps from position l for each l and i. Then to answer an [l, r] query we first try to make 2ⁱ jumps with the highest possible i, if we overshoot a_r then we skip this step otherwise we take it, and then do the 2^i-1 step, and so on. Since the true answer has a binary representation, we will find it in just O(log n) steps per query, and the precomputation takes O(n*log n).

However, this approach is not the full solution since the two chains are only "almost" independent. More specifically, it can happen that the arrive at the same position, and then one of the chains will instead take the next consecutive number as described above.

It turns out that for a given two starting positions l and l+1, we can again use binary lifting to find out when this happens! If two chains are at the same position at a certain moment, they will also be at the same position after any number of additional jumps. It means that we can still use the binary lifting idea of trying a large 2ⁱ jump first for both chains, and if we are in the same position then we roll it back and try 2^i-1, and so on.

Note that it is not necessarily true that the two chains will reach the same number at the same step, though. For example, it might happen that a_l+1>a_l+z, and then the first chain will have a_l+1 at the second position. However, because the jump function is monotonic, the second chain will always be at the same position or to the right of it for the same step, and the first chain will always be at the same position or to the right of it when one step ahead of the second chain. So we just need to check those two possibilities (a collision at the same step, or a collision where the first chain is one step ahead) in the binary lifting condition.

This way we can precompute for all n options for the starting position l, where, and after how many jumps, do the chains starting from a_l and a_l+1 collide. This precomputation takes O(n*log n). Note that after the two chains collide, we now have the collision position k, and the two chains continue from a_k and a_k+1, so we have exactly the same situation as in the beginning.

Now we can consider a new type of jump, from a_l to a_k as described above, and do binary lifting on those big jumps! This time we need to precompute two things for each l and i: where do we end up after 2ⁱ big jumps, and how many small jumps does that entail.

And finally to correctly answer an [l, r] query, we first do the binary lifting steps using big jumps from a_l, and as soon as the next big jump would overshoot a_r, we do the binary lifting steps using small jumps for the remainder.

The solution was ultimately about applying a standard algorithm three times, not about coming up with a novel approach. However, I still find it quite beautiful how the same idea just keeps on giving.

Thanks for reading, and check back for last week's summary!

Thursday, October 9, 2025

An enough week

Squarepoint Challenge, also known as Codeforces Round 1055, was the first event of last week (problems, results, top 5 on the left, analysis). This round followed a different problemsetting philosophy, so for me it was mostly about figuring out the best way to implement a solution and actually implementing it, as I got the general idea for solving each problem, including H2, fairly quickly. Unfortunately, I got bogged down with the details of F for almost a whole hour, which meant I could not really compete for the top spots.

Nachia, on the other hand, was one of the fastest on the first 8 problems and managed to figure out the implementation of H2 with almost half an hour to spare. Congratulations on the victory!

Here is that problem F that delayed me: you are given n<=250000 numbers which are written in sorted order. You need to process q<=250000 queries, each query is a segment [l, r] of the numbers. For each query, you need to find the maximum amount of numbers within the segment [l, r] that can be taken in such a way that for each three taken numbers the difference between the largest and the smallest of them is more than z. The number z is the same for all queries. Can you see how to combine the standard building blocks to obtain a solution here?

The Universal Cup opened its 4th season with Stage 1: Korolyov (problems, results, top 5 on the left, analysis). The usual suspects USA1 were only the third team to solve everything, but they were much faster on the easy problems and won with much lower total penalty time. Congratulations to them and to the teams THU1, PKU1 and HoMaMaOvO who also solved 11!

In my previous summary, I have mentioned two problems. The first one (E1, E2) came from Codeforces: there is a hidden array of size 2n-1, where each number between 1 and n appears twice, except one which only appears once, and you need to find out which one. To do that, you can only ask queries of the following form: choose a collection of indices in the array, and a number between 1 and n, and you will be told if this number appears at at least one of those indices in the array.

In E1, we have n<=300 and you need to find the answer using at most 4n+2*ceil(log₂n) queries. In E2, we have n=300 exactly and you need to find the answer using at most 925 queries.

Having n=300 instead of n<=300 in E2 strongly hints that the expected solution is randomized, while the 4n+2*ceil(log₂n) formula in E1 most likely promises a deterministic one. So the two problems with the same story would seem almost independent in terms of their solutions.

There is a building block that is shared between the solutions, and it is somewhat natural: let us split the entire array into two parts, and ask if a given number appears in each of those parts.

It is easier to see how to make a randomized solution out of this building block: if we know that a number appears in both parts, then it cannot be the answer, as it must have two occurrences. Moreover, if we choose the parts randomly, then with probability around 1/2 the two occurrences will end up in different parts and we will exclude this number from the candidates for the answer. And this, in turn, means that after 1/(1/2)=2 attempts on average we will exclude each number that is not the answer, so after around 2n total attempts we will have only one number remaining which we can return (this is not exactly the case as we will also spend attempts on the number that is in fact the answer, so effectively the least lucky sequence gets multiplied by 2, but this change only adds a logarithmic term to the expected total). Given that each attempt takes 2 queries, we will spend around 4n queries on average, which is too much as 4*300=1200>925.

On a more careful examination, we can notice that we don't always need 2 queries per attempt. If we got the answer that the first part does not have the number, we do not need to ask about the second part as we know it is there. So on failed attempts (that have both occurrences in one part) we will spend only 1.5 queries on avereage, while on successful attempts we will spend 2. The number of failed attempts is 2-1=1 per number on average, so we will spend around 3.5n queries in total, which is still a bit too much as 3.5*300=1050>925.

Let us put E2 aside for a moment and return to E1. How can we make a deterministic solution out of the same building block? Well, let us fix the split into two parts, and ask for all numbers if they appear in each of the two parts. We spent 2n queries on this, and all numbers are split into three categories: appearing only in the first part, only in the second part, or in both. The key observation here is that by comparing the parity of the size of a part with the parity of the amount of numbers that appear in both parts (and therefore have 1 occurrence in each part) we can tell if the answer, the only number that appears in one part but only once, is in this part or not. This way we have reduced the problem to one with twice smaller size, but now we also have some numbers that are known to appear only once.

We can now apply the same idea recursively to the smaller problem, with the small optimization that for numbers that are known to appear only once, we need only one query to figure out which of the two parts does it belong to. Since we spend two queries for a number that appears twice or for the answer and one query for a number that is known to appear once, the total number of queries to split a part in two is always the size of the part plus one. So the total number of queries we need is roughly 2n+n+n/2+... ~= 4n, so we have solved E1.

Returning to E2, we need to notice that the first step of our solution for E1 is quite efficient: we spend 2n queries, and on average we will have only n/4 candidates left after it, so we eliminate 3n/4 candidates in 2n queries, 8/3 queries per candidate on average, and 8/3 is much less than 3.5 which was our average before. We cannot be as efficient in subsequent steps (because we also have to deal with numbers that are known to appear once), but we do not need to: applying the naive randomized solution which spends 3.5 queries per candidate for the remaining n/4 candidates yields 7n/8 queries, so the total number of queries is 2n+7n/8=23n/8, which for n=300 gives 862.5, well below the allowed amount. This solves E2.

I find the different yet interconnected solutions for the two variants of the problem quite beautiful.

The second one came from AtCoder: you are given a tree with n<=5000 vertices. We choose a real weight a_i for each vertex i from the range [-n+1,1] independently and unformly at random. Now, for each vertex i we compute x_i as the highest sum of weights in a connected subtree containing vertex i. Now we define the score of the tree as follows: if for all i we have 0<=x_i<=1, then the score is equal to the product of all x_i, otherwise it is 0. You need to find the expected value of the score.

First, let us look at the connected subtree B with the maximum sum of weights among all connected subtrees. All vertices in this subtree have the same value of x_i=y. Let us choose one of those vertices and root the tree from that vertex. Now suppose we run the standard dynamic programming that computes for each subtree the highest sum of weights in a connected subtree containing the root of that subtree, with the very straightforward transition of dp_i=a_i+sum(max(0, dp_j)) over all children j of i.

For each vertex i in our best subtree B we have the following constraints: dp_i<=y (since otherwise we have a connected subtree with value more than y, which is a contradiction to the choice of y), and dp_i>=0 (since otherwise we can remove y together with its subtree from B to obtain another connected subtree with a higher value). Additionally, for the root of B we must have dp_i=y.

It turns out that those conditions are not just necessary, but sufficient! If for all vertices i in B we have 0<=dp_i<=y, and for the root dp_i=y, then for all vertices i in B we have x_i=y. We can see this by looking at how an arbitrary connected subtree intersects with B, and realizing that since dp_i>=0, we can always add more vertices of B to the subtree without decreasing its value until the intersection becomes a subtree of B, and then its value is bounded by dp_i<=y so we can further augment the intersection to the full B without decreasing the value.

Now suppose we choose the values of a_i as we run this dynamic programming. Then no matter what are the values of dp_j for vertices j not in B, for each vertex i that is in B but is not its root we will have a range of values a_iof length exactly y (out of the overall range [-n+1,1] of length n) that will yield 0<=dp_i<=y, and for the root there will be exactly one value of a_i that yields dp_i=y. This means that we can compute the probability density for x_i=y in B as a function of y that is independent from what happens outside of B: it is simply y^b-1/n^b, where b is the number of vertices in B.

Now let us look at the connected subtree C with the highest value that contains at least one vertex outside of B. All vertices of C that are not in B have the same value of x_i=z, and we can assume that those vertices form a connecteded subtree themselves. Indeed, suppose removing B splits C into multiple independent parts. Since B is the subtree with the highest sum, each of those parts must have a non-positive sum, otherwise we could improve B. But then we can remove all but one of those parts and obtain a new C with at least as high value of z that is in one piece. So there are essentially two choices to consider: either C is disjoint with B, or C contains the entire B plus some other connected subtree.

Then we can apply exactly the same approach as we did for B and prove that the probability density for each vertex in C-B having x_i=z is simply z^c-1/n^c, and we additionally know that z<=y.

We can keep applying the same step until the entire tree is split into connected components with the same values of x_i, with the probability density of that happenning being a product of terms of the form y^b-1/n^b and some constraints of the form z<=y. And here comes another crucial observation: we can forget about the constraints of the form z<=y! Since when z>y, we can still obtain this structure, just taking the component with z before he component with y. So in effect, we just need to split the tree into connected parts, and then we can handle the parts completely independently, with the only constraint being 0<=y<=1.

The product of x_i for all i in B is equal to y^b, and its expected value is therefore equal to the integral of y^b*y^b-1/n^b over 0<=y<=1, which is equal to 1/(2*b*n^b). The expected value of a product is equal to the product of expected values for independent random variables, and we already established that we can treat y, z, ... independently. So we just need to find the sum of products of 1/(2*b*n^b) for all possible ways to split a tree into connected subtrees, which is solved with a standard dynamic programming on the tree that remembers the size of "running" part containing the root of the subtree in its state, and runs in O(n²) since the sum of products of sizes of children being merged in such DP for any tree is O(n²), in turn since every pair of vertices merge into one subtree only once.

From another angle, we have "reordered the summation" in this solution two times: first, we decide on the values of x_i, and then we decide on the values of dp_i, and only then we obtain the values of a_i which are the original random variables.

I find this solution quite beautiful as it requires creating a deep understanding of the structure induced by the process described in the problem statement, and this structure ends up being non-trivial yet tractable. Well done to the problemsetters!

Thanks for reading, and check back next week.

Sunday, October 5, 2025

A grand week

After a bit of a summer lull the contests are starting to gain steam again. Last week started with the Codeforces Round 1053 on Wednesday (problems, results, top 5 on the left, analysis). I got stuck in the implementation details of problem D for almost a full hour, then spent some time implementing a local tester for E2 to see how far my solution for E1 was from solving E2 (quite far) and trying to improve it by replacing binary search with ternary (only made it worse), and therefore when I opened problem F and got the right idea almost immediately (use length 2 in the first step, consider the diameter in the second step to reduce the problem to one on the line) I could not finish the implementation in time, even though 15 minutes were added to the contest duration. Some quite usual scenes for me these days :)

It was usual scenes for Kevin as well, only for him it meant solving F much faster than everybody else and getting the first place by a margin. Congratulations!

I would like to highlight the interactive problem E (E1, E2) from this round: there is a hidden array of size 2n-1, where each number between 1 and n appears twice, except one which only appears once, and you need to find out which one. To do that, you can only ask queries of the following form: choose a collection of indices in the array, and a number between 1 and n, and you will be told if this number appears at at least one of those indices in the array. In E1, we have n<=300 and you need to find the answer using at most 4n+2*ceil(log₂n) queries. In E2, we have n=300 exactly and you need to find the answer using at most 925 queries. What do those differences in constraints tell you?

AtCoder Grand Contest 073 on Sunday brought another AI story (problems, results, top 5 on the left, analysis). I got off to a good start by solving A faster than everyone in the screenshot on the left, but the remaining 2.5 hours of the round were not very productive. I felt that I was quite close on B and I kept updating my solution to handle yet another counterexample found by the stress test, but I failed to generalize and to notice that one of the corner cases in my wrong solution could be made into a correct solution.

tour1st has solved that same B in just over 8 minutes, and got the three solvable problems done almost half an hour before everybody else. Congratulations on the win!

The big story of this round was of course this announcement (some reactions), which was followed up by this rule change. AtCoder Grand Contest was the last place where using AI was explicitly allowed and even encouraged, hoping to discover new creative ways to use AI to improve the humans' problem solving ability. However, in this round it turned out that AI can solve problem C without human help, and this was the last straw that made AtCoder admins disallow the use of AI in AGC going forward, to avoid the contest turning into a prompting competition, and into a "who has money to pay for access to a better model" competition.

Here is that fateful problem C: you are given a tree with n<=5000 vertices. We choose a real weight a_i for each vertex i from the range [-n+1,1] independently and unformly at random. Now, for each vertex i we compute x_i as the highest sum of weights in a connected subtree containing vertex i. Now we define the score of the tree as follows: if for all i we have 0<=x_i<=1, then the score is equal to the product of all x_i, otherwise it is 0. You need to find the expected value of the score. Can you keep up with AI here?

Thanks for reading, and check back later for this week's summary!

Monday, September 22, 2025

A skipped week

Codeforces Global Round 29 was the main event of last week (problems, results, top 5 on the left, analysis). Only 3 participants managed to solve the 5500-point problem H, and they occupied the first three spots. Well done to ecnerwala, jiangly and ksun48!

The big story of the round though was the solve count for problem G. According to the screenshot in this post the solve count was at 184+ when the round ended, while the final standings had just 48, so 136+ solutions were removed because of cheating. The two most likely types of cheating would be using AI to solve the problem and getting help from other people. Of course, one could also use both :) I saw speculations about both types happening in this case, but I am not sure if there is any public analysis on which type dominated this time. Have you seen one? The raw data on the solutions deemed to be cheating seems available in this search (it has 200+ solutions though, so maybe it overcounts?).

Of course, huge thanks to the Codeforces admins for trawling through the submissions and taking action to preserve the integrity of the competition! Before this case I somehow expected the number of cheating disqualifications to be much lower, even though I have seen numerous discussions of the issue. Would this push the competitive programming world to have more onsite contests with no internet access?

Thanks for reading, and check back next week!

Sunday, September 14, 2025

A Kevin week

Codeforces Round 1048 on Monday was the first competition of this week (problems, results, top 5 on the left, analysis). The round was unrated, which has delayed my inevitable downfall: I solved A+B(in the same wrong way as the problemsetters)+C1+C2 in 32 minutes, but could not solve anything else in the remaining 2.5 hours, even though I tried my hand at all remaining problems.

In D, I came up with a cubic solution first, then optimized it to quadratic, and then tried to optimize it to linear for a long time with no success, even though it always felt so close. Imagine my surprise when I reread the problem statement after the contest and learned that n<=5*10³, so I could have just submitted my quadratic solution :) I did submit the solution in upsolving and it passed after fixing two small bugs.

In E, I was trying some randomized approaches, but they were not sophisticated enough to pass. Somehow it was hard to think constructively about determinants, although in retrospect choosing a tridiagonal matrix so that the determinant can be computed in one pass makes perfect sense.

In F, I only opened it in the last hour, and it turns out I almost managed to solve it in time — the code was almost ready during the round, and my solution passed in upsolving after writing just a few more lines and some debugging. I have then uphacked my solution with the case "141 9940 19882", but that won't have mattered during the round :) I am surprised nevertheless that such an extreme case (the case where there is no rounding in the computation of the number of allowed moves and that has the highest polar angle) was not in the system tests.

Kevin, on the other hand, has managed to solve all of those problems (except the proper B :)) during the round. Well done and congratulations on the clear first place!

Wincent DragonByte 2025 Onsite Finals in Bratislava wrapped up the week (problems and analysis, results, top 5 on the left). The top 2 places did not change from the Codeforces round :) Congratulations to all finalists and prize winners!

In my previous summary, I have mentioned an ICPC problem: you are given a graph with 2n vertices and m edges (n<=2*10⁵, m<=4*10⁵), the vertices are split into n pairs that we will call blocks, let us call one vertex in a block red and another blue. Every edge connects a red vertex with a blue vertex. The m edges are added to the initially empty graph one by one. After each edge is added, you need to print the number of pairs of blocks (out of n*(n-1)/2 possibilities) that are connected (there exists a path connecting one of the four possiblities: between the red or the blue vertex in the first block and the red or the blue vertex in the second block).

First, consider the following naive approach. Let us maintain the connected components in the graph, and for each component remember the set of blocks represented in it, which we can update via small-to-large merging. To compute the answer after adding every edge, we will just add up b*(b-1)/2 for each component, where b is the number of different blocks represented in it.

Why is this approach naive? It will overcount in case two blocks are connected in two different connected components. But since each block has only two vertices, the only way this can really happen is that there are two components, each containing one vertex from each of the blocks. And moreover, we will overcount by exactly 1 for each pair of blocks with this property.

So let us also store the inverse mapping: for every block, we store the set of components it has representatives in. We can maintain those together with the above sets. Those sets will always have 1 or 2 elements. For those that have 2 elements, for each group of c equal such sets we need to just subtract c*(c-1)/2 from the answer. We can maintain the current answer and update it when those sets change.

This solution, by the way, is also by Kevin. Thanks for reading, and check back next week!