Reviewer 1:
The authors describe algorithms for fast "funny matrix" multiplication, meaning the usual max-product operation in graphical models. They show that by reordering the search using sorted vectors, they can speed up the operation (on average over independent orderings) by at least a small exponent factor, and sometimes much more. This seems like a very interesting and exciting idea, is novel to the best of my knowledge, and I certainly plan to give it a try.
The paper is well written and clear, with intuitive examples to pull the reader along. I particularly liked that the authors took the time to explain when their algorithm would work well, and when not; and to include examples in which their algorithm did not help or had only modest improvement. I wish I had more useful suggestions, but I have only some minor issues to address.
1. I do not like the style of figures interrupting the text (latex "here"), and would prefer floating
2. The bulleted list in 1.1 moves between "list form" and "conversational form" (e.g., one bullet is giving an example)
3. Eqn. (17) appears to be wrong (perhaps an old notation?); it should maximize over x_j rather than y_j.
Reviewer 2:
The authors present an approach for speeding the message-passing operation
either when the factors of the model decompose and/or when parts that are
not evidence dependent can be precomputed. The core idea is that given
pre-sorting of vectors, some maximization computations can be carried
out efficiently. The authors show that their method results in expected run-time
theoretical benefits as well as empirical gains in efficiency.
The authors tackle an important problem and develop an interesting
tool that has the potential of being useful in practice. The authors
actually due a good job of covering a lot of ground. However,
the central high-level concern is the state of this journal submission relative to 2+ works
(the plus being assigned to a workshop presentation) that have appeared
in the past by the same authors (and are properly cited). While I am not of those who
think that a journal paper needs to be dramatically more significant
than its conference predecessors, I believe that in this case the
submission does not contain sufficient content to warrant a JMLR
publication. On the one hand this paper is not really a theoretical
one so that the proof appendix does not constitue a significant
addition. On the other hand, except from cosmetic points, the
main experiment added to the paper (Protein design) is inconclusive.
Overall, I would have expected something more substantial such
as an empirical comparison to other approaches, possibly including
approximate methods (see relevant comments below)
Additional comments:
Though not properly exposed, the method proposed by
the authors is beneficial mostly when the state space is large.
Aside from making the scope more limited, this also brings to mind
alternative MAP approaches that typically benefit from large state
spaces. To name just a couple of the most recent one I am aware
of: Alphabet SOUP method by Gould et al. and the clamping method
by Eaton and Ghahramani. More generally, the idea of clamping
which typically benefits from large state spaces appears widely in the
CSP community.
The second central concern is that while the benefit for exact inference
has merits, many of the domains suggested by the authors actually benefit
from approximate inference. Thus, from a practical point of view, which
is the core contribution of the paper (despite that some theorists might
be excited by the sqrt(N) factor), the performance of the method within
approximate propagation method is as, if not more, important than its
performance in a junction tree. The non-impressive results within
Sontag's algorithm which is similar to many recent propagation based
developments, further highlights the need for such a comparison.
Presentation-wise, the claim that the improvements ``shall increase the class
of problems for which inference via max-product belief propagation
is tractable'' is a gross over-statement that is not justified by
the experiments. More importantly, the authors do not properly
discuss the main limitation of their method in my view which
is that it is effective only when the state-space is quite large.