> My (limited) understanding of LambdaRank is that their model empirically minimizes NDCG, but does not have a strong theoretical backing for why it should work.
That's akin to saying that minimizing cross entropy empirically maximizes accuracy but there's no strong theoretical backing for that either
LambdaRank is one way of getting a smooth differentiable approximation to NDCG by slapping a sigmoid somewhere. The paper we're discussing now offers another way. Hard to say which way would turn out to be empirically better on problems of practical significance without actually experimenting.
That's akin to saying that minimizing cross entropy empirically maximizes accuracy but there's no strong theoretical backing for that either
LambdaRank is one way of getting a smooth differentiable approximation to NDCG by slapping a sigmoid somewhere. The paper we're discussing now offers another way. Hard to say which way would turn out to be empirically better on problems of practical significance without actually experimenting.