Algorithms Weekly by Petr Mitrichev: A no-regret week

TopCoder SRM 803 last week wrapped up the third stage of the race for TCO21 qualification (problems, results, top 5 on the left, TCO21 race results, analysis). Even though tourist had already guaranteed himself the first place and the direct ticket (or a direct login, in case it takes place online again :)) to TCO21, he has won the round with a big margin once again. This was his fourth SRM win in a row, which has happened before, but nobody else could win five in a row in the past. tourist has also claimed the all-time high rating during this streak. Congratulations Gennady, and no pressure at all for SRM 804 ;)

Codeforces Round 712 followed on Saturday (problems, results, top 5 on the left, analysis). The last problem kind of screamed that some sort of a greedy approach for embedding each cycle should work, but figuring out all details of the approach, as well as implementing it, was still extremely hard. Very well done to ksun48 on doing that and getting a clear first place!

Finally, the Northern Eurasia region of the ICPC held its onsite finals for the 2020-21 season on Sunday (problems, results, top 12 on the left, online mirror results, analysis). The top 50 teams from the online finals were invited to this round (how does one actually call the round between a semifinal and a final? A 2/3-final? A 1/sqrt(2)-final? :)). In the end, solving the 6 relatively easier problems was the bar for getting a medal, while getting a gold medal required also solving either the traditionally cactus-themed problem C, or the also traditionally interactive problem I. Congratulations to all medalists and especially to the team "Insert your name" from ITMO on the victory!

I set that interactive problem I for this round. Given that the contestants were onsite and did not have internet access during the round, it was a good opportunity to give a problem involving a beautiful but googleable algorithm: the randomized weighted majority algorithm. Preparing the test cases for this problem was extremely tricky, as the space of possible approaches is virtually unlimited, and there seems to be no single way to fail all of them. It turned out that the testcases were good enough for the onsite round, with the only two passing solutions using the provably correct approach, and with the top two teams probably accumulating quite some love for me with their -35 and -29 on this problem.

In the online mirror, some more questionable, but at the same time really creative, approaches passed all tests. In fact, the ML approach pwns the test set, making several times less mistakes than b (the smallest the number of mistakes of experts) in most test cases. It only has difficulties on test cases with a really small b (say, 7 out of 10000), where the allowed leeway of 100 extra mistakes is barely enough for the neural network training to converge.

It looks like I need to add some simple gradient descent and boosted decision forest algorithms to my prewritten code library for the future :)

In my previous summary, I have mentioned a Codeforces problem: there is a hidden array of n b-bit integers (in other words, each element is between 0 and 2^b-1). You do not know the elements of the array, but you can ask questions about them: in one question, you can ask "is it true that the i-th element of the array is greater than j?" for some value of i between 1 and n and j between 0 and 2^b-1. Your goal is to find the value of the maximum element in the array, and you need to do it in O(n+b) queries.

The first question is kind of expected: let's compare the first element of the array with 2^b-1-1, in other words let's learn the highest bit of the first element. If that bit is 1, we're all good: we then know that the highest bit of the answer is also 1, therefore we have effectively reduced b by 1 using just one query, so we're on track for O(n+b) queries overall.

The case where that bit is 0 is more interesting. We can't really afford to keep asking about the first bit of all numbers, since in case they are all 0, we would've spent n queries to reduce b by 1, which does not lead to a linear number of queries. This issue kind of points us to a better approach: in case the answers are always "less than", we want to be left with a really small range after asking the n queries. Therefore let's compare the second number with 2^b-2-1, in other words let's ask "is it true that the two highest bits of the second number are 0"?

In case we keep getting "less than" answers, after n queries we will know that the first number starts with 0, the second with 00, the third with 000, and so on. Now let's go from right to left, and ask "does the n-1-th number start with n zeros?". If not, then we know that the n-th number is smaller than the (n-1)-th number, and can be discarded for the rest of the solution, and we continue by asking "does the (n-2)-th number start with n-1 zeros?" If the (n-1)-th number does start with n zeros, we continue by asking "does the (n-2)-th number start with n zeros"? After we complete this process going from right to left, we will have discarded some amount k of all numbers, and will know that the remaining numbers all start with n-k zeros. Therefore we have reduced n by k, and b by n-k, so n+b was reduced by n using 2n queries, which is good enough for a linear solution!

Finally, we need to figure out what to do in case we get some "greater" answer after a few "less than" answers, for example when we learn that the first number starts with 0, the second with 00, the third with 000, but the fourth does not start with 0000. We will then ask: does the fourth number start with 0001? If the answer is also no, then we know that the fourth number starts with at least 001, therefore it's greater than the third number which can be discarded, and we continue by asking if the fourth number really starts with 001, potentially discarding the second number if not, and so on. If the fourth number does start with 0001, then we continue with the fifth number, but instead of asking if it starts with 00000, we ask if its prefix is at least 00010 (since we're not really interested in numbers smaller than 0001, given that we already have evidence of a number that starts with 0001).

At any moment, our algorithm therefore maintains a stack of numbers, where for each number we know a prefix that is equal to the prefix we know for the previous number in the stack plus one more bit. When we run out of bits, we do the backwards pass as described above, and obtain a problem of reduced size. Just like in the all-zeros case, we spend O(n) queries to reduce n+b by n, therefore achieving a linear solution.

This is the main idea of the solution. There are still some small details to be figured out, which you can find in the official editorial.

Thanks for reading, and check back next week!

4 comments:

Ishraaq Parvez05 April, 2021 16:58
Hey Petr, I really like your blog and the fact that you being a full-time professional manage to get a lot of time to actually take part in (read:win) contests.
A question - what should be the frequency of the nummber of contests that a beginner is to take part in (as per you experience)?
Anonymous06 April, 2021 08:31
Why is the blog's title "no-regret"?

Monday, April 5, 2021

A no-regret week

4 comments: