Algorithms Weekly by Petr Mitrichev: October 2012

Saturday, October 27, 2012

Petr Mitrichev Contest 10 solution ideas

Petr Mitrichev Contest 10 is over, congratulations to WJMZBMR, to an unnamed team from China, to tmt514, to UPC-1 team and to scottai1 for solving 4 problems, to flashmt for being the only one to solve problem D, and to hirosegolf for being the only one to solve problem F!

You can still solve the problems for practice at http://codeforces.com/gym/100110, and you can even start a virtual contest when you'll see that standings of other teams as they appeared at the particular time into the contest.

When making this contest, I've tried to add a flavor of exploration (as opposed to just using standard algorithms), since that actually resembles my work quite a bit. Please tell whether you liked that or not!

In particular, I like problem C the most, and I'm pretty sure it can be solved for 8x8, 10x10 or even 20x20 fields - but I don't know how. We can probably set up a challenge to solve it for 10x10. My feeling is that some statistical methods should be the most powerful here.

Here are the solution ideas. Those are not complete solutions, but I hope they can guide you in the right direction. Problem statements can be found here. You can also find a video of myself doing analysis of this contest (without problem G) in Russian at http://youtu.be/FN6FRXiQ7JE. Please don't read below if you still want to try solving those problems yourself :)

Problem A. Asymmetric Art.
This problem can be solved with backtracking. The straightforward backtracking tries all possible subsets, but that's obviously too slow. We can do several optimizations: first, let's consider the numbers in increasing order, and when taking each number we can 'mark' all numbers that we can't take anymore because of this new number. This is not fast enough, but we can additionally truncate our search by saying 'if we've reached number x, then we'll take not more than answer[n-x] more numbers', where answer[y] is the number of items in the largest quasi-symmetric-triple-free subset for n=y. Then we can find all answer[] values in under one minute, and then send a program that has all answers hardcoded for judging.

Problem B. Lots of Combinations.
We need to do two things: find (n,k) modulo 10**10, and check if (n,k) is greater than or equal to 10**10. The second part is relatively easy - if k <= n-k, then we can incrementally compute (n,1), (n,2) and stop as soon as we exceed 10**10. For the first part, we notice that it's enough to compute the answer modulo 2**10 and modulo 5**10, and then use the Chinese remainder theorem. Computing the answer modulo 5**10 is done by first calculating the power of 5 in (n,k) separately, and then calculating (n,k) with all powers of 5 removed from all numbers using the fact that we can now use division and thus just need to calculate factorials (skipping numbers divisible by 5) modulo 5**10, and numbers modulo 5**10 is a periodic sequence.

Problem C. Curiosity.
I'm pretty sure there are lots of different approaches that work in this problem. The two main directions I'm aware of is finding large blocks of data without unknowns and combining them together, or using various dynamic programming approaches that consider all possibilities. My solution is of the second kind: assume we've chosen the 6x6 field. We can then use dynamic programming to find the minimum possible number of contradictions in the input data (places where we know the color of some cell but it's different from the actual cell we're on). Now I use simulated annealing to find the 6x6 field that has zero contradictions. Of course, since all input files are given to you, you can do this without any time or memory limits and then just submit the discovered answers.

Problem D. Domination.
Since one of the dimensions is small (up to 10), we can do dynamic programming using a cut in that dimension as our state. More specifically, suppose we have filled first k columns and first p cells in the (k+1)-th column. Then we're only interested what happens on the first p cells in the (k+1)-th column and the last (n-p) cells in the k-th column. Each cell can be in four states (has '1', has '2', has '0' and doesn't have an adjacent '2', has '0' and already has an adjacent '2' (let's call this state '3')), yielding 4**10=2**20 states, but it turns out we don't need all of them - some pairs of adjacent cell states are not useful: '21' and '11' can always be replaced by '23', and '20' is plain impossible. That brings the number of states down significantly, but the program is probably still too slow. The final touch is to notice that things become periodic if we fix n and increase m.

Problem E. Easy Learning.
This one probably has even more different solutions than C. There is a lot of theory for this kind of problem that I don't want to recite here, my solution was using gradient boosting.

Problem F. Hash.
There are two main approaches here. One is called Pollard's rho algorithm, and does not depend on the hash function much, the other actually uses the specific hash function we use. Consider values x_i = hash('1000 zeroes but i-th character is 1') - hash('1000 zeroes'). Then we need to find two disjoint subsets of x_i with the same sum. Let's do the following trick: sort all x_i, and take y_1=x_2-x_1, y_2=x_4-x_3, y_3=x_6-x_5, and so on. It's not hard to see that now we have 500 numbers (2 times less), but the numbers themselves are about 1000 times less on average, and we still need to find two disjoint subsets with equal sums. Repeating this several times gets us numbers that are so small that two of them are bound to be equal. This problem has some very tricky cases when b=2 - you should watch out for those!

Problem G. RLE Size.
Here, you just need to carefully consider all cases. Each block of consecutive '?' signs can be solved on paper, based on what's to the left of it and what's to the right, and then you can just implement the formulas you've discovered on paper in your program.

Problem H. Good Students and Bad Students.
This problem can be solved greedily. Let's go from the highest numbers to the lowest numbers, and gradually fill all groups. Whenever we encounter a student that wants to be in the upper half, we place it in the first upper half that still has empty slots, if any. Whenever we encounter a student that wants to be in the lower half, we place it in the first lower half that has the corresponding upper half completely filled, if any, and in the first free upper half slot otherwise. The proof is a bit tricky but doable.

Problem I. Tennis Scores.
This was the classical 'long statement, straightforward but long code' problem. You just had to carefully calculate the probabilities of game outcomes for each player's serve, then use those to calculate the probabilities of set outcomes, and then use those to calculate the probabilities of match outcomes. It's important to notice that set outcomes depend on who's serving first in the set.

Problem J. Three Squares.
This problem was supposed to be solved numerically. One step that you need to make to avoid falling into a trap is to realize that we're not always rotating all three squares by the same angle, however improbable that might look. Here's an example:

Then you could either just iterate over possible rotation angles with a reasonable step and check whether there's intersection (since all coordinates are integers, there's no possibility to create a particularly nasty case for that solution), or, if you want a more robust solution, you can do a search in the 3-D space of angles that repeatedly splits the search space in two, but then throw away certain branches when we can see that for all possible triples in that branch there's still an intersection.

Thursday, October 25, 2012

Petr Mitrichev Contest 10 - this Saturday!

I will be hosting Petr Mitrichev Contest 10 this Saturday (the day after tomorrow) between 15:30 and 20:30 Moscow time (other timezones: http://timeanddate.com/worldclock/fixedtime.html?msg=Petr+Mitrichev+Contest+10&iso=20121027T1530&p1=166&ah=5).

The contest will be held at http://codeforces.com/gyms. In order to take part, you need to have an account on codeforces.com. Both teams and individual participants can join. The contest itself will be at http://codeforces.com/gym/100110, but this link will be accessible only after the start of the contest.

There will be 10 problems of varying difficulty (most are quite difficult) for 5 hours. Your solution needs to be correct on all test cases to be accepted (the standard ACM ICPC rules). People will be ranked by the number of solved problems, and by total penalty time in case they're tied on solved problems, so don't be late! :) Note that in each problem you need to read the input from a file and write the output to another file - so don't read from stdin and don't write to stdout!

Feel free to ask any questions that you might have, and also please tell if there are any issues with the contest system - it's the first time I'm using codeforces.com/gyms.

Monday, October 22, 2012

Time and venue for Petr Mitrichev Contest 10

There will be an online contest called Petr Mitrichev Contest 10. Problems are all mine, previously used in Petrozavodsk trainings for top Russian teams this September but not published elsewhere. The problems are not easy, but they are of different types and thus I hope everyone will find something interesting to solve. Both teams and individual participants can join.

I'm trying to choose the best time and place for the contest. My current proposition is: 15:30 to 20:30 Moscow time this Saturday, October 27 (in other timezones: http://timeanddate.com/worldclock/fixedtime.html?msg=Petr+Mitrichev+Contest+10&iso=20121027T1530&p1=166&ah=5). Looking at the contest list at http://clist.by/ and the new Yandex contest site not mentioned there (http://contest.yandex.ru/contest/ContestList.html), all weekends are very busy, but it looks like IFMO trainings are not attended by many teams this fall and thus overlapping with it is OK.

I understand that the contest ending time in my proposition is 1:30AM in Japan and Korea and 0:30AM in China. People from Japan, Korea and China (and from Asia in general): is that too late? I'd host it earlier but there's a contest on acm.timus.ru that ends at 15:00 Moscow time that was very popular last year (http://acm.timus.ru/monitor.aspx?id=100), so I fear many contestants from Asia will take part there.

For the venue, I propose http://codeforces.com/gyms.

Please share your suggestions.

Wednesday, October 3, 2012

TopCoder Open 2012 Finals Commentary

16:09 - That's it about this TCO. See you next year! Feel free to ask me questions in comments, I'll do my best to answer.

15:57 - Marathon results! ainu7 is the champion, Psyho is second!

15:54 - Algorithm results! Both 500s and ACRush’s 250 fail. Egor, meret, RAVEman are the only non-zero scorers, in that order. Egor is the champion!

15:48 - They’ve announced design results (which were alredy known in advance), now development.

15:31 - I've taken a look at RAVEman's and andrew's 500s, and they both only consider moves by small_constant*first_vector+other_small_constant*second_vector. I fear that doesn't work for the following reason: when the vectors are very long, close in length to the size of the board, there can be two cells that can only be reached from one another via a linear combination with larger coefficients. So I'd bet on both 500s failing.

15:29 - They are starting announcements, but they usually do other tracks first and algorithm last.

15:21 - Waiting for systest. Currently RAVEman with just 500 first, andrewzta with just 500 second, Egor, meret and ACRush with just 250s from third to fifth. Let's hope at least the champion has one working problem :)

15:18 - ACRush one more -25 on RAVEman's 500, kills RAVEman's 250, -25 on meret's 250. Meret has got -25 on both standing 500s.

15:18 - Egor has got -25 in the meantime as well. Still above meret.

15:17 - And gets -25. Meret is also preparing the testcase for RAVEman’s 500.

15:15 - ACRush is preparing a testcase for RAVEman’s 500.

15:13 - assuming both remaining 500s will also fail, it's Egor vs meret challenge battle. Both have made their moves, still less than 50 separates them. ACRush gets -25 on Egor's 250.

15:10 - meret bring’s down marek’s 250, RAVEman brings down iwi’s 500, Egor brings down Andrew’s 250.

15:08 - admins are giving out a prize for another contest during intermission, using loudspeakers. What an AWFUL decision!

15:07 - RAVEman preparing a large and presumably tricky case for 500.

15:05 - iwi submits 500 but I'm not sure he has at least tested for TLE.

15:03 - marek.cygan doing nothing, meret writing his 1000, his solution has “layers” in it - we had an idea that involved layers, too, so that can be a correct solution.

15:02 - most people fight with not-passing-examples in 500.

15:00 - apparently RAVEman's 250 is wrong, and he's fixing it... And he resubmits!

14:59 - five minutes before the end. Marek has submitted and resubmitted his 250. I’m betting on many last-second submissions.

14:52 - actually nika’s hypothesis seems to be true and it’s even not hard to construct that ordering: let’s try all possibilities for the first vertex. Then, at every step the next vertex can be constructed by doing the following: if a vertex has more edges to the already constructed vertices then it’s closer to the last marked vertex. Also, if a vertex has less edges to the vertices yet to be constructed, then it’s also closer to the last marked vertex. And the only way both those numbers can be the same for two vertices is when they’re isomorphic. So we just reconstruct the ordering greedily.

14:45 - meanwhile, andrewzta and RAVEman have submitted the 500. I have a bad feeling about andrewzta's one, though, as he has probably patched his solution quickly to pass the sample case but he couldn't rewrite a large part of his solution since I've seen it.

14:44 - if that's true, then the solution is to handle groups of isomorphic vertices at once, and then the number of states will be really small - just linear.

14:42 - nika has the following conjecture for 1000: omitting isomorphic vertices, the ordering of vertices is uniquely determined up to a complete reversal.

14:41 - andrewzta's medium seems far from completion - he iterates over pairs of vectors but his check if the given pair of vectors gives a symmetry is too simple.

14:38 - shangjingbo’s 1000 is not working only on the last example case, which is huge. Poor guy :(

14:34 - RAVEman optimizing his 500, it’s working in 0.9s on what seems to be close to worst case. Will he submit soon?

14:29 - ACRush coding 500, meret coding 1000.

14:27 - Rustyoldman reports that iwi and andrewzta are coding their 500s.

14:18 - Meanwhile, ACRush resubmits 250. Current standings: Egor (with resubmit), RAVEman and meret (no resubmit), andrewzta and ACRush (with resubmit). ACRush has also opened the 500. RAVEman is coding on 500 and shangjingbo is coding on 1000.

14:15 - About 1000: maybe something like this can work - let’s start putting numbers from 1 to N. After we’ve put numbers from 1 to K, all that matters is which vertices are assigned the last D numbers, and which vertices have an assigned number at all. And because of the restrictions, I’d hope that the number of states will be quite small. But I’m not sure how to estimate that.

14:08 - Rustyoldman reports that nobody is still coding on 500 or 1000. meret submits 250.

14:06 - By the way, bmerry was test-solving 250 and 1000 in this round, and it took him 12 and 44 minutes respectively.

14:05 - RAVEman submits 250 as well. I’m trying to think about 1000...

14:02 - andrewzta submits 250, iwi gave up on 1000 and opened 500. ACRush went to 1000 after 250. Marek.cygan gave up on 500 and went to 1000. It’s 25 minutes into the contest and it’s weird we have just 3 submissions on the 250.

13:58 - about 250: First, we can forget about E by XORin S and E, and just making player A’s goal to achieve all zeroes. How can player A win? The only way is that the entire string has just one segment of consecutive ones. But how could B allow to make that happen? If the string has length of at least 5, it looks like B can always prevent that on his move, so when length is at least 5 we should just check if A can win in one turn, and when length is at most 4, the number of states is just 16 and we can analyze the entire game using the usual means.

13:55 - meanwhile, Egor submits and resubmits the easy, adding one more small “if” clause, ACRush submits the easy a bit later but has higher score because of Egor’s resubmit. Everybody who’s opened 500 and 1000 is working on paper.

13:51 - I’ve test-solved 500 before this contest. The basic idea is - any good tiling is defined by two “basic” shift vectors. In fact, if we fix the two basic shift vectors, then all vertices are split into groups of equal vertices, and we just need to check that all vertices in each group have the same color (it won’t be a problem to form a connected piece afterwards). Now all that remains is to iterate over all possible pairs of shift vectors and carefully check everything. This can be simplified further by noticing that we can always choose one of the basic vectors to be horizontal (but it might be up to size^2 in length then).

13:50 - formal problem statements: http://apps.topcoder.com/forums/?module=Thread&threadID=763920&start=0&mc=2.

13:47 - 500: A good tiling of the plane that is split into grid cells that are colored black or white is such tiling where each piece is a connected block of cells, all pieces are equal (will coincide after some transition), including colors, and the tiling itself is periodic (for each three tiles X, Y, Z there’s a tile at Z+(Y-X)). What is the smallest possible size of piece of a good tiling which has a given rectangle as its part?

13:45 - 1000: you are given a connected undirected graph with M vertices, and must assign distinct numbers from 1 to N (N >= M) to its vertices so that adjacent vertices have numbers differing by at most D, and non-adjacent vertices have numbers differing by at least D+1. How many ways are there to do that? N is up to 100, M is up to 50, D is up to 8.

13:43 - marek goes for 500, iwi and shangjingbo go for 1000, everybody else for 250.

13:41 - 250: you are given two strings S and E of the same length, consisting of ‘0’ and ‘1’ characters. Player A can (but is not forced to) choose its substring and change all ‘0’s to ‘1’s and vice versa. Player B can (but is not forced to) do the same for just one character. Player A’s goal is to obtain string E, and player B’s goal is to prevent him from doinh so. What is the minimum number of turns he needs (assuming both play optimally)?

13:37 - everything looks fine this time. Two minutes before start!

13:35 - based on this, spectators propose a new strategy: if meret and ACRush submit a problem very quickly, code maximum flow _before_ opening it :)

13:34 - ACRush has apparently been coding most of this time, implementing Dinic algorithm for maximum flow.

13:28 - not much happening now, people are relaxing, admins say everything will be fine this time :)

13:17 - Egor and andrewzta playing air hocket. Meret, [[iwi]] walking around the room. marek.cygan talking with friends. ACRush looked at something on a laptop and went walking with a surprised face. RAVEman talking with friends as well. Can’t find shangjingbo in the arena.

13:14 - the problems turned out to be more serious than expected. The new start time is 13:40. Hysterical laughing from contestants :)

13:12 - it looks like the contest won’t start at 13:15, too. The technical issues still unresolved. This must be a very nervous time for all contestants.

13:05 - admin announcement: The match will start at 13:15.

13:02 - Egor and andrewzta are playing "Cities": each player should name a city that has not been named before that starts with the last letter of the previous named city. So far: Moscow, Warsaw, Wrozlav, Voronezh, Hurghada, Athens, Sarajevo, Osaka, Astana, Amsterdam, Malmo, Orel, Ljubljana, Astrakhan, Novosibirsk.

13:00 - marek.cygan: "The winner will be the one who can find the problemset".

13:00 - 250, 500, 1000. No news on contest start time.

12:59 - it looks like the contest start is postponed, there’re some technical issues.

12:49 - 10 minutes before the start. Andrewzta showed up, now almost everybody is coding the easy problem from the wildcard round for practice, except meret who’s writing max flow again.

12:44 - Everybody is preparing except Egor and andrewzta - the two Java coders. One doesn’t need much preparation when there’s no necessary bolierplate :) Egor is already ready, andrewzta not yet here.

12:35 - Hi! This is live commentary for TopCoder Open 2012 Algorithm Finals. The round will start in about 25 minutes. I will update this post with new comments.

TopCoder Open 2012 Finals preview

This year's TopCoder Open finalists are a very diverse group of people. The most telling statistic, in my view, is the time when they first acquired 3000+ rating on TopCoder (not surprisingly, all of them did at some point :)):

andrewzta - May 2006
ACRush - September 2006
Egor - January 2007
marek.cygan - September 2007
RAVEman - June 2010
meret - July 2010
shangjingbo - December 2011
[[iwi]] - July 2012

So we have four people that I'd call veterans, playing the TopCoder algorithm games at the top level for at least five years, and that group is clearly separated from the other four, who have only reached the top recently. None have been the TopCoder Open champion yet, though!

Here are the micromatch scores between the today's finalists (the first number is how many times the person on the left has been ranked higher in rated TopCoder rounds, the second number is how many times the person on the top has been ranked higher):

Handle	ACRush	marek.cygan	andrewzta	Egor	meret	RAVEman	shangjingbo	[[iwi]]	Average
ACRush	0/0	68/23	60/27	98/37	15/2	85/14	30/2	70/9	82%
marek.cygan	23/68	0/0	51/51	73/53	6/6	57/18	13/11	41/13	56%
andrewzta	27/60	51/51	0/0	69/53	6/6	43/20	7/9	36/14	53%
Egor	37/98	53/73	53/69	0/0	16/18	90/67	41/14	71/36	51%
meret	2/15	6/6	6/6	18/16	0/0	15/14	15/9	18/11	49%
RAVEman	14/85	18/57	20/43	67/90	14/15	0/0	39/17	70/41	42%
shangjingbo	2/30	11/13	9/7	14/41	9/15	17/39	0/0	14/25	34%
[[iwi]]	9/70	13/41	14/36	36/71	11/18	41/70	25/14	0/0	34%

A funny aspect of yesterday's Wildcard round was that fourth-placed people from both Semifinals advanced, meaning that we'd get just the same set of finalists if four advanced from each Semifinal and there was no Wildcard. Historically, Wildcard round advancers has won TopCoder tournaments once (tomek in TCO 2008), got second place twice (ZorbaTHut in TCCC 2004, JongMan in TCO 2007).

What strategy would people use at today's finals? Judging from the semifinals, [[iwi]] will go Hard-Easy-Medium while everybody else will use the usual Easy-Medium-Hard order. I'm pretty sure the TopCoder admins want to make sure at most one person will solve all three problems, which might well in practice mean nobody will solve all three, so starting with Hard (or doing Easy-Hard-Medium) does look like a viable strategy. I'd propose Easy-Hard-Medium switching to Medium if Hard is not solved (and it's not clear how much is left) about 30 minutes before the end.

Also take a look at vexorian's finals preview at http://community.topcoder.com/tco12/our-algorithm-finalists/.

What other stats on finalists would you like to see? :)

Tuesday, October 2, 2012

TopCoder Open 2012 Wildcard Commentary

19:53 - That’s it for today! Join us tomorrow for the coverage of the ultimate round - the finals! http://www.timeanddate.com/worldclock/fixedtime.html?iso=20121003T13&p1=867.

19:52 - It’s easier to tell which solutions passed than which failed :) iwi’s 1000, meret’s 500, SergeiFedorov’s and dzulgakov’s 250s. That’s it. iwi and meret to the finals!

19:49 - it looks like most 250s were indeed brought down by d=2. The results are being announced now!

19:40 - Kankuro also loses 25 in the dying minutes, on Romka's 500.

19:39 - both meret and Romka got -25 from Dmitry_Egorov's 500.

19:36 - and meret brings down SergeiFedorov's 500. Will we have two non-zero scorers to advance to the finals? :)

19:35 - Romka brings down sdya’s 500.

19:34 - dzhulgakov brings down Dmitry_Egorov's 250.

19:34 - sdya is not reading solutions, seems to have given up.

19:31 - Kankuro brings down sdya’s 250. What about 500s?

19:30 - Challenge is on! Surprisingly, no blind challenges. SergeiFedorov brings down _Romka_’s 250 and meret’s 250, but it seems that he reads them.

19:26 - Apparently iwi’s easy tries to approach (0,0) from (x,y) somewhat greedily. Is it an obvious failure or is that greedy actually some kind of gcd?.. We’ll find out soon. His code contained return “I hate math”; at some point :)

19:25 - Last seconds of coding phase were quite eventful - [[iwi]] submitted easy and SergeiFedorov resubmitted medium. [[iwi]] is the current leader.

19:24 - _Romka_ resubmits the medium, and drops to the last place. Yeah, I’d guess there’s plenty of space for bugs there as well. Looking forward to challenge phase.

19:19 - not much more action, just 4 minutes left. Everybody except iwi has 250+500, iwi has just 1000.

19:14 - SergeiFedorov submits the medium after giving up on the hard and it’s the fastest by far! He’s now in first place.

19:13 - apparently the sample cases for 250 don’t include the “all coordinates odd” case (for example d=2). I predict a lot of challenge fun :)

19:11 - iwi’s 250 is not working, he’s debugging.

19:06 - we’ve missed that dzhulgakov has resubmitted the 250. Promising for challenge phase?.. It looks like challenges will be very important if there’s nobody with 3 tasks.

19:04 - 20 minutes left. I’d say iwi is actually in a great position, as most competitors can’t get their 1000 working - many already coded up some DPs.

19:03 - SergeiFedorov gave up on hard and moved to medium.

18:59 - I guess we can speed things up by considering where will the first letter of large string go. It’s either removed at some point, in which case we get a suffix of large string and the same small string, or mapped to the first letter of the small string. So this can get us a DP on (suffix of large, suffix of small), although it’s not clear to me if we can still manage not to overcount solutions where one of consecutive letters is removed several times. Maybe we should add something like “last removed letter” to the state?.. Anyway, it’s too late to think about this.

18:57 - I can’t see iwi’s solution now, but wata tells its complexity is 50^3, not 50^6 as my proposition below.

18:54 - Currently _Romka_, sdya, Kankuro, meret, Dmitry_Egorov and dzhulgakov submitted easy and medium (and all moved on to the hard), [[iwi]] submitted hard and moved to the easy, SergeiFedorov submitted easy and moved to hard.

18:49 - So the rough approach for the 1000 is: let’s count the number of ways to obtain a substring of the small string from a substring of the large string. We do that by checking which character will be removed last, which leaves us with two substrings of the large string that should be matched somehow to the small substring, so we also have to check all ways to split the small string into two. One challenge is not to count things twice with consecutive equal letters, but that should be doable although I’m too lazy now to figure out how. Another challenge is running time: currently we have something like (50^3/6)*(50^3/6) - (the number of ways to choose a substring and a letter in it)*(the number of ways to choose a substring and a split point in it), which is about 500 million which might be too slow. But in reality we have more restrictions like length of large substring should be at least length of small substring which divides everything by 2 more, and the same for both halves which should divide by something like 2 further, bringing the total running time under control?..

18:43 - About 1000: let’s go from the long string to obtain the short string. The only way we can confuse two removals is when there’s a string of consecutive equal letters in the long string. In that case, removing each of those letters produces the same string, so we should always remove the first one of them. Still don’t know what to do next...

18:39 - forgot to post the link to statements: http://apps.topcoder.com/forums/?module=Thread&threadID=763830&start=0&mc=2

18:36 - 250 and 500 solved by Romka.

18:35 - the complexity of that solution is 50*2500 (maximum number of steps)*50(number of equations to check on each step). We can check each equation in O(1) time using hashing.

18:30 - 500 does look straightforward-ish. We will construct all strings from the end. We take any equation, and if one of its parts has more letters known in the end than the other, we reconstruct the missing letters. We repeat this until we can’t reconstruct anything, or until one of the strings is of length more than 2500. In the latter case, there’s no solution (we’ve entered an infinite cycle), and in the former case, our current suffixes are a valid solution, and the smallest one.

18:28 - Meanwhile 7 people submitted easy and [[iwi]] works on hard. Also SergeiFedorov switched to hard after easy.

18:25 - Here’s why the solution for the easy works. If we have a jump by (a,b), then we have jumps by (2a,0) and (0,2b), and also jump by (b,a). Thus, we can get jumps (2g,0) and (0,2g). Then if there’s a jump (2kg, (2m+1)g) then we can get to (0,g) using those two jumps. Otherwise, all jumps are ((2k+1)g,(2m+1)g) and we can get to (g,g) but can’t get to (0,g).

18:22 - 500: We have a system of equations a_i = b_i + c_i, where a_i and b_i are variables, c_i are constants, + is string concatenation (like s = t + “a” and t = u + “b”). We need to find the minimal sum of lengths of its solution.

18:19 - So together with Egor we kind of figured out the 250. Suppose the gcd of all coordinates of all possible jumps is g. Then there are two cases: if at least one jump has an even coordinate (after division by g), then we can get to any point where both coordinates are divisible by g. If all jumps have odd coordinates after division by g, then we can get to any point where both coordinates are divisible by g and their sum is divisible by 2g.

18:09 - Apparently my thinking gets slower as the day comes to an end. It’s kind of obvious that you have to generate all integer vectors of length sqrt(d), but where to go from there?..

18:07 - Romka has just submitted 250! I still have no clue how to solve it.

18:02 - 1000: let’s consider sequence of strings good if for each applicable i s_i is s_{i + 1} with one letter erased. You have two strings, how much different sequences exists with this two string as first and last element? String lengths are up to 50.

18:00 - 250: one starts at (0, 0) and can jump to other integer points. Each jump should be exactly sqrt(d) in length. Is it possible to get to (x,y)?

17:56 - meret has coded max flow to warm up and in case it shows up at the contest. Three minutes before start.

17:44 - One more comment from Gennady Korotkevich (made before the semifinals) - he said that almost anyone could win, but if forced to bet on somebody, he’d bet on meret. Well, now meret has to qualify from the Wildcard to live up to the expectations :)

17:43 - 250, 500 and 1000.

17:41 - Hi! This is live commentary for TopCoder Open 2012 Algorithm Wildcard Round. The round will start in about 20 minutes. I will update this post with new comments, and so will Egor who’s joining me again.

TopCoder Open 2012 Semifinal 2 Commentary

15:03 - that’s it for Semifinal 2. Join me later today for the coverage of the Wildcard round at http://www.timeanddate.com/worldclock/fixedtime.html?msg=TCO+2012+Wildcard&iso=20121002T18&p1=867!

15:01 - results: dzhulgakov’s 275 and 950 fail, kalinov’s 500 and 950 fail, marcina007’s 500, wata’s 500. shangjingbo, marek.cygan, andrewzta to the finals, meret, Kankuro, Dmitry_Egorov, dzhulgakov to the wildcard.

14:48 - kalinov’s 500 will also fail.

14:42 - dzhulgakov says his hard solution fails.

14:40 - waiting for systests.

14:38 - nika says that several mediums will time out, and contestants themselves know that. Meanwhile andrewzta’s 500 goes down, and there are two more -25s.

14:37 - Right! Now everybody can see the submissions and it’s actually non-competitors who look at leaders’ solutions. (14:37:34) qiuiuu> spectators are interested in top submissions actually.

14:35 - no more action yet, we’re in the middle of the challenge phase.

14:32 - lots of people are reading shangjingbo’s and kalinov’s solutions. As in the first semifinal, solutions of leaders are attracting more interest, which looks illogical to me.

14:31 - (14:31:03) System> marcina007 unsuccessfully challenged shangjingbo's 500-point problem.

14:25 - dzhulgakov also submits the 950 with 30 seconds to go. The current situation: shangjingbo 1375, kalinov 1285, marek.cygan 1145, andrewzta 1084, dzhulgakov 1047. The only less-than-50 gap is between andrewzta and dzhulgakov. Waiting for challenge phase!

14:20 - andrewzta submits the 950. As things stand, he’s still fourth and needs two challenges to overtake the 3rd place, but solutions may fail at systest, you know :)

14:19 - Marek’s solution is the same as ours and kalinov’s. Kankuro has his solution timing out (and it has a weird “for (int t = 24; t >= 5; --t)” loop :)), andrewzta gets wrong answer and is trying to add more and more debug output.

14:14 - Marek submits the hard, 3 people with all problems now. Lining up nicely for the finals, but I’d guess there will be more submissions as the end approaches.

14:06 - pieguy opened hard abandoning his medium. wata had not opened easy, so he is either still working on medium or trying to finish hard.

14:04 - So his solution has complexity of 47 (for k) times 47*4 (for number of bad events happening) times 4 (for number of bad events on this team). The problem is solvable for much larger constraints!

13:59 - Here's how shangjingbo's solution works. We'll do inclusion-exclusion, with the basic event being “rabbit X gives a carrot to someone from his team”. Then how do we count the number of cases with T such events happening? We’ll do a DP over “first k teams, T events”. For a new team, we iterate over how many rabbits in that team give a carrot to someone in his own team, and multiply that by 4*3*.. to account for giving that carrot to different people on his team, and by c[rabbits][num] to account for different rabbits on the team taking part.

13:59 - Meanwhile we have shangjingbo and kalinov with all 3 problems, meret, andrewzta, marek.cygan, dzhulgakov and Kankuro with easy and medium, wata with medium only, pieguy, marcina007 and Dmitry_Egorov with just easy. wata is the only one that had not followed easy-medium-hard approach.

13:49 - Studio finalists are introduces quite loudly. Not sure how algo semifinalists could think about problems currently.

13:48 - shangjingbo submits the hard. His solution is even simpler than what we wrote below, he has just 47*(47*4) states in his DP. kalinov is writing something similar to our solution, but his solution doesn’t work on examples yet. meret is writing some solution which has a DP state of “5 numbers with sum up to 47”. Nobody else is writing code for 950.

13:45 - dzhulgakov also submitted medium.

13:44 - andrewzta is sitting with his notebook and pencil, trying to figure out 950.

13:43 - Marek has submitted the 500.

13:42 - Nothing changed during recent minutes, we are now into the second half of the contest.

13:36 - wata abandoned hard and switched to the medium instead.

13:34 - Easy and hard are DPs if we don’t have issues in our solutions, medium does not involve any standard approach.

13:32 - Medium submissions started to flow in, kalinov, shangjingbo, meret and andrewzta currently submitted and moved on to the hard.

13:31 - actually, 16 can be replaced by 4 since we’re just interested in how many team members are receiving the carrots, not which ones. We just have to be careful to multiply by appropriate coefficients to make sure we’re counting all possibilities.

13:28 - the complexity seems to be: 47 for the boundary position, 47*4 (actually 47*4/2) for the left-to-right number, 47*2 for the right-to-left number, 2^4=16 for the subset of this team that will receive carrots, 4 for the number of carrots from this team that go to the right (all remaining go to the left), and 4 for the number of carrots for this team arriving from the right. All in all, that’s 47*47*2*47*2*16*4*4=106314752, so that should work.

13:27 - here’s our idea for 950: let’s do a DP over “how many carrots cross the boundary between team x and team x+1 from left to right, and how many cross that boundary from right to left”.

13:26 - kalinov and shangjingbo submitted medium.

13:25 - All competitors but wata submitted easy and opened medium, wata still works on hard. This commentary is brought to you by Egor, who’ll be helping me during the rest of this round.

13:25 - 950: There are n teams with 4 players each, and each team brought 0<=a_i<=4 carrots. Find the number of ways to distribute the carrots in such a way that each carrot is given from a team to a different team. All carrots are considered distinct.

13:22 - actually, that solution was wrong, as ACRush has pointed out. The correct solution is: iterate over the set of “important” bits, and then just check if there’s a number that has one important bit set and all others cleared, for all important bits.

13:13 - 500: a set of numbers is called good when all bitwise ors of its subsets are different. What is the subset of the given set of numbers that is good and has the maximum sum of elements? Egor solved it in about 30 seconds: the “good” condition is equivalent to “each number has at least one bit that is zero in all other numbers”, so we can do a DP over “maximum sum of subset of first k numbers that have the given mask of bits as their chosen bits”.

13:11 - Three submissions for 275: shangjingbo, pieguy, meret.

13:05 - Problem statements as contestants open them: http://apps.topcoder.com/forums/?module=Thread&threadID=763801&start=0&mc=2.

13:03 - it looks to be a straightforward DP. For each segment of balls, we determine whether it's possible to remove that segment completely. To check that, we iterate over which two balls will be removed the last, and they separate that segment into subproblems.

13:01 - The 250 problem: you are given n balls (n is odd), each marked with either “left” or “right”. In one turn, you take one non-boundary ball and remove it and the ball to the left or right from it, correspondingly, until there’s one ball left. Which balls could be the last one left in the end?

12:56 - announcements are done, 3 minutes before start. There are several in-form competitors in this round: meret has won Google Code Jam just several months ago, Kankuro has won Russian Code Cup a month ago.

12:49 - There are two Java coders in this round as well: andrewzta and wata. In the first round the Java coders took the first and last place :)

12:48 - There are 11 contestants in this round. Burunduk1 couldn’t come, and he told that too late to rearrange everything.

12:48 - 275, 500, 950.

12:39 - Hi! This is live commentary for TopCoder Open 2012 Algorithm Semifinal 2. The round will start in about 20 minutes. I will update this post with new comments.

TopCoder Open 2012 Semifinal 1 Commentary

11:07 - that’s all for Semifinal 1. Join me later today for Semifinal 2, and then for the Wildcard! Semifinal 2 starting time: http://www.timeanddate.com/worldclock/fixedtime.html?msg=TCO+2012+Semifinal+2&iso=20121002T13&p1=867

11:06 - and the results are in! Tom’s 250 and Dlougach’s 1000 did fail indeed, so did Tom’s 500. All other solutions passed. Egor, ACRush, RAVEman to the finals, iwi, SergeiFedorov, Romka and sdya to the wildcard round.

10:56 - not much news from contestants. It looks like Tom’s 250 and Dlougach’s 1000 will fail, no news about other solutions.

10:49 - PaulJefferys got -25 on Tom’s 500 (not 250 which we know to have a bug), and on iwi’s 500, sdya got -25 on dolphiningle’s 500 in the last seconds. Waiting for systest!

10:47 - sdya killed exod40's 500. Apparently his resubmit was not for nothing.

10:45 - Since the difference between 7th and 8th place is 60 points now, it looks like it’s almost riskless for 7th or 6th place to challenge blindly.

10:43 - No more challenges yet, most people are reading Egor’s and ACRush’s solutions.

10:40 - Romka has killed Dlougach’s 250.

10:38 - dolphinigle has resubmitted both his 250 and his 500, sdya has resubmitted his 500. It would seem that dolphinigle has the best chance at challenging.

10:35 - And Dlougach submits 250 just 7 seconds before the end! Now it’s Egor, ACRush, Dlougach at the top. MikhailOK suggests the best challenge opportunity is integer overflow in 250.

10:33 - Some other contestants from China are recording ACRush's duplicate screen using a camera on their laptop. And he does submit his 1000 just 1 minute before the end! Egor has given up on his stresstesting since his stupid solution doesn't work on the first sample.

10:24 - Egor is writing a stupid solution for 1000 to compare with his solution on random testcases - great thinking! I think that's the best thing he can spend his time on now. Meanwhile, Dlougach submits 1000, not sure if he has changed his matrix to be of size k or have optimized his solution in some other way.

10:15 - And Egor submits 1000! I was right that his solution looks good.

10:10 - ACRush, Egor and Dlougach all seem on the right track in the 1000 - they have matrix power, and they have some formulas for the matrix. Actually, Egor seems to be the closest to our solution as his matrix has explicit formulas and has size k, which is the right thing to do. ACRush computes the matrix in some complicated way which doesn’t work. Dlougach seems to have the examples passing, but his matrix is not of size k but of size k^2, which presumably causes timeouts.

10:03 - ACRush, Egor and SergeiFedorov solved 250+500, iwi solved 500, everybody else except Dlougach solved 250, Dlougach still working on 1000. His projected score on 1000 is approaching scores everybody else has on 500, though.

9:59 - most people are implementing something with disjoint-set-union structure for the 500, so it seems that everybody is on the right track and we’ll see many 500 solutions, so many people will try to solve 1000 and thus it’s possible all three finalists will solve everything.

9:51 - ACRush and Egor have 250+500, everyone but the two people who started with 1000 have 250.

9:47 - 1000 solution - it was right there, but we were missing it for some time :) So we should do a DP with “how many last numbers are linearly independent” as the state. It’s straightforward to count how many ways are there to transition from a state to another state, since all situations are essentially the same: if we know that x last numbers are linearly independent, it doesn’t actually numbers what the numbers are themselves. And of course we do fast matrix power to handle that n can be up to 10^9.

9:39 - Meanwhile, ACRush has submitted 250 and 500 and 7 other people have submitted 250.

9:37 - The problem statement looks like Burrows-Wheeler Transform, but it doesn’t seem relevant to the solution.

9:34 - The 500: given a prefix of a permutation, find the lexicographically smallest permutation with that prefix that is one big cycle. I’ve test-solved this problem before the TCO so I already know the solution, but it’s actually quite straightforward anyway. When placing each unknown number in the permutation, we place it to the smallest possible number unless that number would form a cycle, in which case we place the second smallest possible number - it will not form a cycle in that case.

9:31 - it seems that Tom’s solution in 250 has an overflow bug.

9:26 - Essentially, it means that each k consecutive numbers must be linearly dependent as vectors in Z_2^(log m). This is always true when k > log m. When k <= log m, it looks like when choosing the next number, the only thing that matters is the dimension of the space defined by the previous k-1 numbers.

9:24 - The 1000 problem: count how many ways are there to choose n numbers each up to m (where m is power of two minus one) so that for each consecutive segment of k of those numbers (segments [1st, 2nd, …, kth], [2nd, 3rd, …, (k+1)th], and so on) some non-empty subset of that segment xors to 0.

9:19 - Problem statements as contestants open them: http://apps.topcoder.com/forums/;jsessionid=0CE09A8F5F29689F1259D2D102B38B34?module=Thread&threadID=763756&start=0&mc=2.

9:18 - And ACRush readily submits the easy. It looks like there’s no catch, it’s indeed straightforward.

9:14 - This problem looks straightforward. The white connected components actually correspond to triples of consecutive equal characters in each string, and the ones adjacent to the boundary are those adjacent to the boundary in at least one string. So it looks like the answer is something like total_length_of_white_in_A*total_length_of_white_in_B*total_length_of_white_in_C+(same for black)-total_length_of_white_in_A_not_adjacent_to_bounary*(same for B)*(same for C)-(same for black).

9:11 - Dlougach and iwi start with 1000, everybody else with 250. 250 problem statement: you are given three strings A, B, C. Let’s color 3D cell (i,j,k) white if and only if A[i], B[j], C[k] are all the same. Now we need to count the number of white cells that are in the connected components adjacent to the boundary. A, B, C are up to 2500 characters.

9:08 - One minute before start.

9:05 - Spectators can log into the arena, but the contestants can’t see their messages. However, they can see who’s in the room. It looks possible to pass some information that way (as in “I will enter the room if your solution looks wrong”).

9:02 - Jessie tells everybody to line up for introductions.

9:02 - 250, 500, 1000.

8:58 - Almost nobody is preparing, people are just walking around and chatting.

8:54 - Jessie says ten minutes before start.

8:53 - There seems to be some admin activity onstage. Apparently one machine is not working?..

8:50 - Exclusive commentary from Gennady Korotkevich - it turns out Gena is closely following the TCO and is watching this semifinal. He says that it’s a pity that rng_58 doesn’t get to defend his title - couldn’t agree more.

8:48 - And here’s the (one of) prediction contest current standings: http://snarknews.info/showvote.cgi?data=tco2012.

8:45 - The overview of all semifinalists: http://snarknews.info/index.cgi?data=tco/2012/finalists&head=index&menu=index&year=2012&contest=tco&class=tco2012. This round includes those with “1” in the last column.

8:43 - It looks like we have two Java coders - Egor and theycallhimtom. Everybody else is in the ugly world of C++.

8:38 - The contestants started preparations. Not much to do since Arena and plugins are already set up, and there’s no Internet access to set up additional software. As usual, C++ coders are writing #includes.

8:33 - Hi! This is live commentary for TopCoder Open 2012 Algorithm Semifinal 1. They’ve just opened the arena but haven’t let the contestants to start preparations yet, so 12 anxious people are walking around :) I will post commentary by updating this post.

TopCoder Open 2012 - more arena flyovers

Here are some more videos that illustrate how TopCoder Open works.

Monday, October 1, 2012

TopCoder Open 2012 - arena video overview

Here's a less serious view at TopCoder Open 2012 - a standard "overview of arena" video, this time from a flying camera though :) Unfortunately the arena is quite dark and picture quality is quite bad. I'm trying to explain what's going on in the audio.

TopCoder Open: problem difficulty

Tomorrow is the first day of TopCoder Open 2012, and I'll try to blog on its algorithm track.

Here's a small analysis of TopCoder Open problem difficulty.

Let's consider the time taken to solve the problem as the ultimate measure of problem difficulty. Then, we can estimate the difficulty of, say, a medium problem in a TopCoder Open semifinal by comparing the solving time for this problem for each competitor with the solving time for other medium problems for the same competitor. More precisely, for each competitor in each TopCoder Open onsite round, I've compared the solving time for each problem with the solving time for this competitor in all rounds that took place in the same year, and the percentage of problems that were easier plus half the percentage of problems that were of the same difficulty is the declared measure of difficulty of the onsite problem for this competitor. The median of those numbers over all competitors is the difficulty of the onsite problem.

Here's the result:

TCO '03 Semifinals 1 98% 69% 79%
TCO '03 Semifinals 2 89% 81% 73%
TCO '03 Semifinals 3 92% 87% 68%
TCO '03 Semifinals 4 82% 78% 82%
TCO '03 Finals 91% 90% 81%
TCO04 Semifinal 1 87% 80% 67%
TCO04 Semifinal 2 92% 79% 67%
TCO04 Semifinal 3 84% 51% 64%
TCO04 Wildcard 66% 88% 72%
TCO04 Finals 86% 79% 87%
TCO05 Semi 1 91% 76% 64%
TCO05 Semi 2 90% 71% 64%
TCO05 Semi 3 78% 81% 64%
TCO05 Wildcard 88% 81% 64%
TCO05 Finals 91% 77% 74%
TCO06 Semi 1 93% 77% 69%
TCO06 Semi 2 75% 77% 68%
TCO06 Semi 3 88% 77% 70%
TCO06 Wildcard 66% 84% 71%
TCO06 Finals 66% 85% 77%
TCO07 Semi 1 65% 79% 68%
TCO07 Semi 2 89% 77% 66%
TCO07 Semi 3 86% 83% 65%
TCO08 Semifinal 1 64% 81% 62%
TCO08 Semifinal 2 85% 79% 63%
TCO08 Semifinal 3 76% 80% 63%
TCO08 Wildcard 94% 44% 66%
TCO08 Championship 96% 86% 64%
TCO09 Semifinal 84% 59% 60%
TCO09 Championship 95% 87% 56%
TCO10 Semi 1 89% 84% 61%
TCO10 Semi 2 87% 75% 53%
TCO10 Wildcard 81% 86% 57%
TCO10 Final 90% 54% 61%
TCO11 Semifinal 1 74% 73% 55%
TCO11 Semifinal 2 91% 84% 57%
TCO11 Wildcard Round 87% 90% 56%
TCO11 Championship Round 83% 78% 62%

You can see that the numbers for the hard problem are lower than those for easy and medium; the reason is that I consider "not solved" to be equal to "not solved", and thus when a person solves just 60% of all hards, the perceived difficulty of any problem will not exceed 80%.

With that effect in mind, the above list reveals that 'easy' problems are hovering around 90% mark (more difficult than 90% SRM 'easy' problems), while 'medium' problems are sometimes around 80-85% difficulty but sometimes have huge drops to just average difficulty, around 50%.

Another interesting slice of the same data is the list of competitors ordered by decreasing average difficulty of onsite problems. Here's the list, limited to only those competitors with at least 5 onsite rounds:

PaulJefferys - 6 rounds - 58%
reid - 5 rounds - 59%
tomekkulczynski - 5 rounds - 64%
Eryx - 6 rounds - 65%
gawry - 7 rounds - 66%
Yarin - 5 rounds - 67%
bmerry - 7 rounds - 67%
nicka81 - 5 rounds - 67%
antimatter - 5 rounds - 70%
liympanda - 6 rounds - 70%
ACRush - 8 rounds - 71%
Im2Good - 5 rounds - 71%
Ying - 5 rounds - 71%
ploh - 6 rounds - 71%
grotmol - 6 rounds - 72%
marek.cygan - 10 rounds - 72%
John Dethridge - 9 rounds - 73%
cyfra - 5 rounds - 73%
misof - 5 rounds - 74%
andrewzta - 7 rounds - 75%
tomek - 10 rounds - 75%
SnapDragon - 8 rounds - 77%
Petr - 11 rounds - 81%

So for PaulJefferys and reid, the onsite rounds are actually not much more difficult than normal rounds (I guess partially because they don't do much SRMs, and thus "normal" rounds are the TCO qualification rounds for them), while I'm actually the one who struggles the most at the onsites :)