Saturday, November 15, 2025

A treap week

Codeforces Global Round 30 was the main event of last week (problems, results, top 5 on the left, analysis, my screencast). I was doing quite well on the first six problems, but got bogged down with implementing treaps with some additional accounting for problem F2, spending more than an hour on that. Right after the contest ended, I had two thoughts:
  1. Wow, the people at the top of the scoreboard are so much better than me at implementation, how did they dig themselves out of this treap mess so fast?
  2. I should implement some kind of templated treap and have it available for the next rounds. Currently I do not have a library of my own, and rely on the AtCoder library which has very nice templated segment trees and other algorithms, but not treaps.
Well, it turns out that the first thought was wrong. Out of the top 9 contestants (those that solved 8 problems), exactly the first three did not have treaps or splay trees in their solution for F2, but only segment trees plus standard STL data structures (while the next 6 had treaps or splay trees)! So these results clearly show that thinking some more to simplify the implementation beats just jumping into the first tedious simplementation approach one comes up with :) Well done to Otomachi_Una, Kevin114514 and dXqwq!

The second thought still stands though. However, it is not obvious to me what is the best templated abstraction for a treap. From the 9 contestants mentioned above, it seems that only hos.lyric has a templated splay tree, but even in her case the abstraction seems to leak the implementation details (push/pull), and the function splitAt might have been added specifically for this problem since it is not aligned with the rest of the abstraction, relying on the custom field "size" of the node that no other methods use. Compare this to AtCoder's lazysegtree, which is defined in abstract mathematical terms of applying two operations on ranges of data.

So, does somebody know a templated treap/splay tree abstraction that expresses the operations in abstract mathematical terms, and yet is as fast and powerful as tinkering with treap's implementation details?

In my previous summary, I have mentioned another Codeforces problem: you are given a tree with n<=5000 vertices. Consider the 2n ways to choose a subset S of its vertices, and for each subset S build a complete graph where S is the vertex set, and every two vertices are connected by an edge with a weight equal to the length of the tree path connecting them. What is the sum of weights of minimum spanning trees of those 2n graphs?

A natural question to ask is: given a particular pair of vertices a and b, in how many of the 2n subsets will they be directly connected by an edge of the minimum spanning tree? If we can answer that question, then we can just add up those answers multiplied by the distance between a and b to obtain the answer to the problem.

The first difficulty on that path is that "the minimum spanning tree" is not well defined, there might be multiple spanning trees of the same minimum weight. To deal with that, we can choose a particular minimum spanning tree, say, the one found by the Kruskal's algorithm that processes the edges in the order of weights, breaking ties in the lexicographical order of the edge ends.

In order for an edge between a and b to be chosen by the algorithm, it is necessary and sufficient that there is no path connecting a and b using the previous edges in the aforementioned order, namely using all edges of a smaller weight, and edges of the same weight that come before in the lexicographical order. Let us check how this path might go along the tree. Let us draw the tree in the following manner: the path from a to b is at the top, and a subtree grows from each of its vertices. We need to get from a to b, stopping only in vertices in the chosen subset S, such that each jump satisfies the above criteria.

Suppose there is a vertex c in S, hanging somewhere from the middle of the path, such that d(a,c)<d(a,b) and d(b,c)<d(a,b), then we can just go from a to c to b, so in those cases the edge between a and b will not be chosen. Suppose c hangs from the vertex x on the path (x might not be in S, so we cannot stop there). Then d(a,c)=d(a,x)+d(x,c), d(a,b)=d(a,x)+d(b,x), d(b,c)=d(b,x)+d(x,c), so the above inequalities translate to d(x,c)<d(b,x) and d(x,c)<d(a,x), in other words d(x,c)<min(d(a,x),d(b,x)). So in order for the edge between a and b to be chosen, we must have d(x,c)>=min(d(a,x),d(b,x)) for all c in S.

Now, suppose that inequality is strict: d(x,c)>min(d(a,x),d(b,x)). Then it's not hard to see that instead of going to/from c when passing through x, we can go directly to/from a or b, therefore such vertices are not useful to construct the path we are looking for. So the existence or nonexistence of the path depends only on the vertices c with d(x,c)<=min(d(a,x),d(b,x)). Since we already know that the path will exist when there is c such that d(x,c)<min(d(a,x),d(b,x)), the only interesting remaining case is where there is one or more vertices c such that d(x,c)=min(d(a,x),d(b,x)).

The distances between such vertices can be equal to d(a,b), so this is where the lexicographical ordering of the edges comes into play. It is possible to carefully consider it and solve the problem, but we can also make another small observation instead: vertices c such that d(x,c)=min(d(a,x),d(b,x)) are exactly those vertices that are at the distance of d(a,b)/2 from point z that is the middle of the path between a and b (note that z is either a vertex or a midpoint of an edge)!

This raises a natural question: what if, instead of focusing on the edge between a and b, we focus on all edges of the spanning tree with the given middle point z? If d is the smallest distance between z and a vertex in S, then from the above argument we can see that only edges of length 2d can exist in the spanning tree. If we root the original tree at z, and count the number m of subtrees of the root that contain a vertex at depth d (none of them will have a vertex at depth less than d by the definition of d), then it's not hard to observe that we will have exactly m-1 edges with length 2d and middle point z in the spanning tree. Note that in case z is a midpoint of an edge, then m<=2.

So we just need to count for each subtree how many ways are there to choose S in this subtree such that the smallest depth is d, and then we can combine those values to obtain the answer using the above observation, and then sum up those answers over all choices of z.

It was not strictly necessary to go through the Kruskal's algorithm and the lexicographical order part. Instead, we can immediately start with the "consider all edges of the spanning tree with the given middle point z" argument, and the solution will be short and beautiful. However, I also wanted to demonstrate how one can arrive at this idea analytically.

Thanks for reading, and check back next week!

Wednesday, November 5, 2025

A QOJ week

Pinely Round 5 started the competitive part of the last week on Thursday (problems, results, top 5 on the left, analysis). This was my first individual appearance in a screenshot on this blog since April 2024, so I think it went pretty well :) I was figuring out the solutions pretty quickly, and did not get stuck in implementation like I often do. However, it was not nearly good enough for the first place, since tourist was both the fastest, and with zero incorrect attempts. Congratulations on the convincing win!

Problem F gave me a nice warm feeling when I figured out the solution, so maybe it will be enjoyable for you as well: you are given a tree with n<=5000 vertices. Consider the 2n ways to choose a subset S of its vertices, and for each subset S build a complete graph where S is the vertex set, and every two vertices are connected by an edge with a weight equal to the length of the tree path connecting them. What is the sum of weights of minimum spanning trees of those 2n graphs?

The 4th Universal Cup Stage 4: Chengdu took place on Saturday, as usual (problems, results, top 5 on the left, analysis in Chinese). Team USA1 won the third stage out of four this year, defying the parity rule. The second place went to a team that is not in the season rating, so I assume this was a team that was participating in the original onsite contest that this stage was a mirror of. Congratulations to both teams on solving all problems!

Yandex Cup 2025 Qualification Round  wrapped up the week on Sunday (results, top 5 on the left). A few people were in contention on the easier problems, but starting from problem J ecnerwala was significantly faster than the competition, so it is not a surprise that only he was able to solve everything this time. Well done!

After the 3rd Universal Cup Semifinals, this was the second big contest that required proctoring for those who want to qualify to the onsite round, and it was certainly not the last one to do so.

In my previous summary, I have mentioned an AtCoder problem: you are given a single integer n<=210, and you need to produce a sequence pi of n integers such that 0<=pi<230, such that for every value of k between 1 and n there exists a number a such that the longest strictly increasing subsequence of the sequence qi=pi&a has length k. Here & denotes bitwise and, in other words we consider a certain subset of bits of pi, the same subset for all i.

Many, if not most, constructive problems that have a single integer n in the input are solved by figuring out a way to produce solutions for 2n and 2n+1 from a solution for n, and this problem was also one of those. The constraints strongly hinted that we need to learn to double n using at most 3 additional bits.

Suppose we have a sequence pi that solves the problem for some n. There are actually two ways to easily double the LIS length: we can either add a new bit at the end, replacing each number pi with two consecutive numbers 2pi and 2pi+1, or add a new bit in the beginning, appending a second copy of the sequence sequence pto itself, but with all numbers increased by the same value 2t chosen in such a way that all pi<2t. In both cases we set the corresponding bit of a to 1. This way by adding just one more bit we are now able to get all even LIS lengths from 2 to 2n. But what to do about the odd lengths?

Let us use one more bit, and append a number 2that is bigger than all existing numbers to the end of the (doubled) sequence. Now if we use the same a as before, the LIS length will still be 2k since the new number becomes 0 if the r-th bit of a is not set. However, if we set the r-th bit of a to 1, then the LIS length will be 2k+1, since the new big number can be appended to any increasing sequence. This means that we now have a sequence of length 2n+1 for which we can get any LIS length between 2 and 2n+1. The only missing length is 1, but that is also easy to achieve: just set a=0.

So we can go from n to 2n+1 using 2 additional bits. What about going from n to 2n? Well, actually we can use the "append a big power of two" trick once again, and go from n to 2n+2 using 3 additional bits. Being able to go from n to 2n+1 and 2n+2 means that we can get any number if we also have solutions for the base cases n=1 and n=2.

As is often the case with AtCoder problems, after seeing the solution one cannot fathom struggling for several hours to come up with it, it looks so straightforward :)

I was discussing an old Open Cup gem with my friends recently, motivated by a lecture on binary search for beginner Swiss high school students. I was almost certain there is no way to upsolve that problem now. However, it turns out that this problem (and probably many other old Open Cup contests?) are now available for upsolving on QOJ.ac, the same website that hosts Universal Cup. Huge thanks to the admins of QOJ for preserving the old contests!

Thanks for reading, and check back next week!