## Wednesday, July 29, 2020

### A rescue week

The Jun 8 - Jun 14 week did not have contests that I want to mention, so let's discuss the question from the previous summary.

I have found that the following code solves an AtCoder problem:

for (int n = 1; n <= N; ++n) {
for (int a = 1; a <= A && a <= n; ++a) {
if (a == n) {
w0[n][a] = fact[n];
w1[n][a] = fact[n];
} else {
w0[n][a] = ((n - a) * (int64) w1[n - 1][a] + (a - 1) * (int64) w1[n - 1][a - 1]) % MODULO;
w1[n][a] = (w0[n - 1][a - 1] + w0[n][a]) % MODULO;
}
}
}

where w0[N][A] would contain the final answer. This solution runs in O(N*A), which is too slow to get the problem accepted since N<=107 and A<=5000. How to make it faster?

To quote TLE, it's Berlekamp-Massey Algorithm to the rescue! The situation here matches the application from that blog post almost exactly. There are two differences, though. The first difference is that our transition has an extra "if (a == n)" branch. However, that branch never triggers when n>A, so we can just treat the n=A row as our initial values and imagine we don't have that branch.

This is not enough: the answer to the last sample still does not match if we compute w0[A,A], w0[A+1,A],..., w0[3*A,A] and apply Berlekamp-Massey to find w0[N][A]. The reason is the following: in our transition function, we multiply by (n-a), while Berlekamp-Massey expects the transition matrix to be independent of n.

However, we can notice a peculiar property of our transition: we only multiply by (n-a) when going from (n-1,a) to (n,a). In all other cases, the value of n-a stays constant, and we multiply by something independent of n (we go from either (n-1,a-1) or from (n,a) to (n,a)). Therefore, in aggregate, we will multiply by (n-a)! to achieve state (n,a) starting from a state (x,x). And this in turn means that if we define w2[n][a]=w0[n][a]/fact[n-a], then the transition function for these values will be independent of n, and we can apply Berlekamp-Massey to the sequence w2[A,A],w2[A+1,A],..., w2[3*A,A] to find w2[N][A]. Note that we don't need to write down the actual transition function for w2, we just need to believe it has the correct form, and we can use the code above to find w2 through w0, and thus solve our problem in O(N2*log(N)).

Thanks for reading, and check back for more!

1. 1. 