## Saturday, November 19, 2016

### An unshuffle week

During the Oct 10 - Oct 16 week, TopCoder has hit its next milestone: SRM 700 (problems, results, top 5 on the left, analysis). Despite the starting time of 4am, the first three spots were occupied by contestants from St Petersburg and Moscow. The first place went to _aid, who has recently joined the ACM ICPC World Champion team SPbSU Base - so all other teams will not have an easy time at 2017 World Finals :) Congratulations!

In my previous summary, I have mentioned a problem in the trademark style of Ivan Kazmenko: You're given the following random number generator, an implementation of the Fischer-Yates shuffle, and a program using them:

Function random (range):
state := (state * 1664525 + 1013904223) mod 232;
return (state * range) div 232;

Function shuffle (array):
n := length of array;
for i := 0, 1, 2, ..., n - 1:
j := random (i + 1);
swap(array[i], array[j]);
return array;

state := seed;
n := 10000;
array := [1, 2, ..., n];
array := shuffle (array);
array := shuffle (array);

In other words, we shuffle the numbers from 1 to n twice. The particular generator used here is well-known, so no particular weaknesses could be expected except of course the fact that it's a linear congruential generator. The task is: given the final state of the array, find what the seed was.

There are two main ideas necessary to solve this problem. The first idea is: if we know the numbers that the generator returned on two concrete calls (i.e., on p-th step and on q-th step), there are only so many (on the order of 100) possibilities for the seed that we can try them all, and moreover, we can enumerate all of them efficiently.

In order to enumerate them, we can notice that the value of the state variable on p-th step is a linear function of the initial seed. Because of this, having two return values fixed leads to two inequalities that look like l <= seed * a <= r (mod 232). We can solve this system of inequalities using an algorithm similar to the Euclidean one.

The second idea is that it's not actually that hard to find two concrete return values of the generator. During each shuffle, each number participates in at least one swap: when the variable i passes through the cell containing it. It's also not hard to see that on average quite a few numbers will participate in exactly one swap. And because this set of numbers is fairly random for each of the two shuffles, there will be a significant amount of numbers that have participated in exactly two swaps overall. And since we know the starting and ending position for each number, we just need to guess which was its "middle" position after the first swap.

So here's the overall solution structure: we iterate over all numbers from n to 1 that have moved to the left after two shuffles. For each of those numbers, we iterate over the "middle" position between the final positions and the starting position. We find all seeds that would make this number do the corresponding jumps, and then check if each of those seeds leads to the overall result we see. We expect to find the seed after trying just a few initial numbers.

Thanks for reading, and wish me luck in today's TopCoder Open semifinal (starting time, broadcast link)!