Algorithms Weekly by Petr Mitrichev: A MEX week

Codeforces CodeTON Round 6 was the main event of the last two weeks (problems, results, top 5 on the left, analysis, discussion). orzdevinwang solved 8 problems while everybody else got at most 7 and I barely got 6, and earned a well-deserved first place. Congratulations!

While I could more or less keep up with the leaders on the first 4 easier problems, I was not able to solve E at all and spent a lot of time implementing, speeding up and debugging F, even though the algorithmic solution was clear to me reasonably quickly. On the other hand, I could solve G in 22 minutes, which seems to be the fastest among the top scorers, but it was already too late to catch up :) I guess that's one more lesson to read all problems, at least when one is stuck trying to solve the next problem by difficulty.

Here is problem F that has caused me so much implementation pain: we write down all integers between 1 and n as strings in base k (assuming we have characters for all digits between 0 and k-1). Now we sort those strings lexicographically, for example the first few would typically be 1, 10 (=k), 100 (=k²), ... How many numbers are in the same position in this sorted order as in the order where we just sort by number? n and k are up to 10¹⁸, and you have to solve 1000 testcases in 3 seconds.

In my previous summary, I've highlighted one of the AWTF problems: n people are standing in a row, each with either a red or a blue hat on. Each person can see the hat colors of the people in front of them (with smaller position numbers), but not their own hat color or the hat colors of the people behind them. Each person also knows the total number of red and blue hats. Then, the following process happens in rounds: in one round, every person who can already deduce their hat color, declare that they have done so (they do not declare the color itself). If multiple people can deduce it in the same round, they declare it simultaneously without hearing each other first. In the next round, on the other hand, people can already use the information of who has declared in the previous round to potentally make additional deductions themselves. Which people will eventually declare, and which will still be in the dark about their hat color even after an large number of rounds?

The key step in such problems about logic is to figure out the correct formalization. What exactly does it mean to be able to deduce one's hat color using the information of who has declared in prevoius rounds? Or in other words, we can start by finding a solution that runs with any time complexity, but that is correct. When I was solving this problem, I've only thought and implemented such solution for a stress test after my approach did not pass the samples, which in hindsight was way too late.

Here is the formalization: there are C(n,r) possible sequences of hat colors, where r is the globally known number of red hats. After the i-th round, from the point of view of the first person who does not see any hats, some of those sequences are still possible, and some are not (in other words, if the hats did in fact correspond to this sequence, would everybody say what they have said?). This set of possible sequences S_i also clearly defines what all other people are thinking: for each person, the set of possible sequences that is possible from their point of view is equal to the intersection of S_i with the set of sequences that have the correct hat colors for those hats that they see. This sounds trivial when spelled out, but it was actually not that easy to state it for me during the round.

Now, what will each person do during the i-th round? They will look at the set of sequences that are still possible from their point of view (given by the intersection mentioned above), and check if their hat color is the same in all of them. If yes, they will declare, otherwise they won't.

How to compute S_i given S_i-1 and the declarations? We need to check the declarations that would have happened for each sequence in S_i-1 (assuming each person sees some prefix of that sequence), and remove those sequences where this set of declarations does not match one for the real sequence of hats. Once again, this looks very simple, almost trivial, but it was actually far from easy to state concisely.

This is how far I've got during the round: I've implemented a slow solution based on the above, and was trying to find some fast heuristic solution that would match it on random inputs. It turns out that this did not lead to a correct solution. Instead, one simply had to speed up the slow solution!

One had to notice that after each step, the set S_ican be described by the lists of possible amounts of red hats in each of the prefixes of the sequence. For example, suppose there are 4 red and 3 blue hats in total. The initial set S₀can then be described as: empty prefix has 0 red hats; the prefix of length 1 has 0 or 1 red hats; 2 has 0, 1, 2; 3 has 0, 1, 2, 3; 4 has 1, 2, 3, 4; 5 has 2, 3, 4; 6 has 3, 4; and the whole sequence has 4. Every sequence that has 4 red and 3 blue hats satisfies those constraints, and every sequence that satisfies those constraints has 4 red and 3 blue hats.

Then suppose during the first round only the last two people have declared that they know the color of their hats. It turns out that the resulting set S₁can then be described as: empty prefix has 0 red; 1 has 0, 1; 2 has 0, 1, 2; 3 has 1, 2, 3; 4 has 2, 3; 5 has 2, 4; 6 has 3, 4; 7 has 4.

More generally, if we describe the prefix of length i having j red hats as (i,j), then the (i+1)-th person will declare if exactly one of (i+1,j) and (i+1,j+1) is still possible, and not declare if both are possible. This criterion allows us to compute the declarations that actually happen, and the same criterion then allows us to eliminate states which contradict those declarations. However, we also need to pay attention to also eliminate states that do not have a possible predecessor (if both (i-1,j-1) and (i-1,j) are eliminated, then (i,j) should be eliminated as well), or a possible successor (if both (i+1,j) and (i+1,j+1) are eliminated, then (i,j) should be eliminated as well). This criterion is also the basis of the inductive proof that S_i can always be represented in the above form.

Therefore we can maintain the set of states (i,j) that are still possible after each round, and stop when this set of states stops changing.

Thanks for reading, and check back next week!

Algorithms Weekly by Petr Mitrichev

Saturday, September 23, 2023

A MEX week

No comments:

Post a Comment