Editor Comments
Editor
Comments to the Author:
Thank you for your submission to IEEE PAMI, the reviewers and myself all found the contribution valuable and enthusiastically accept your article with minor revisions. The additional accelerations for the inference in graphical models are indeed of great value to the field and the reviewers were only concerned about presentation issues. In particular, some aspects of the paper were not sufficiently self-contained and has some confusing statement. Reviewer 2 also requested some additional clarifications as well as more citations to various all-pairs shortest paths algorithms and their performance. Finally, reviewer 3 makes an excellent recommendation to compare the fast algorithm when applied to MAP estimation of image segmentation against the graph-cuts algorithms of Boykov or Kolmogorov which are known to be extremely fast to see if these accelerations are indeed state-of-the-art in that setting. Please address the comments of the reviewers in the final version of the paper and, once again, congratulations. Since these were overall minor revisions, your paper may not need to go through a full re-review if the editors find you have sufficiently addressed all the reviewers' comments.
********************
Reviewer Comments
Reviewer: 1
Recommendation: Author Should Prepare A Minor Revision
Comments:
This is a nice report of compelling research. My primary suggestion for improvement is that the paper is not self-contained in the experiments section, where a lot of prior knowledge and even notation is assumed to be known by the reader (specifics on that later). It's not too hard to look up the references, but readers will appreciate some more explanation. In general, I believe the algorithm described here is important and is a useful tool with wide-reaching applicability and thus should be published. That said, here is a list of finer-grained suggestions and comments, mostly minor about writing and style:
Introduction: "[13] showed ... " I prefer you don't use reference numbers as nouns, especially a noun that starts a sentence. (You also use reference numbers as nouns in 1.1, three times on page 4, 4.2, and 4.3.)
p. 2, the claim that tree-width 2 inference can be solved in O(mf(n)) time needs either a reference or a short explanation or both.
Top of page 5, "In such cases" is ambiguous. I believe you mean in text denoising with a skip chain model, but that's an educated guess.
Section 3. Formally define all data structures in English in the text. I'm guessing S is a set.
Throughout this section, be clear about indexing. For example, when you say C_ik = \infty, you should at some point point out that the algorithm runs for each i and k. It was unclear at first if the full C matrix is computed with one run of the algorithm or one entry at a time.
I believe English text describing the algorithm would be helpful before simply pointing to Algorithm 1 and discussing the details. Something like "the algorithm exploits a priority queue to avoid unnecessary computation of sums that cannot be the minimums".
I prefer to have Lemma 1's proof in an appendix, since that's the order I believe most readers will read in.
Algorithm 1. Redefine Q, and all variables and data structures at least in the caption, if not in the algorithm itself. These Algorithms should be as stand-alone as possible.
"item" and "t-min" appear to be typeset in math notation, making them typographically indistinguishable from $i$ times $t$ times $e$ times $m$. Obviously this is not what you mean, but variable names based on English words should be typed in Roman. It's worst for "t-min" because I don't know if that's supposed to be a minus sign and t and min are two different quantities, or if t-min is supposed to be a variable name.
Proof of Lemma 1. I'd prefer actual English rather than the mathematical shorthand "iff", especially for a proof in the main text. It's marginally more acceptable in the appendix, but in general, a published paper should not look like math notes.
Be consistent about indentation after displayed equations.
I'm not sure how a probability P(s=x) can be a "convolution of two boxes" or "equal a triangle". Further, since s is a continuous value sampled from a continuous distribution, P(s = anything) is zero. So, I'm not sure why P(s = x) = x, or what you mean by it. Nevertheless, the next claim of P(s \le x) is clear. I'm just confused about that intermediate claim.
Section 3.1 The noun "runtime" is not the same thing as "running time". (It is also used at the end of 3.2.)
Section 3.2. Again, an English description of the algorithm should precede pointing the reader to Algorithm 2.
3rd paragraph. There should be a comma after "In Algorithm 2".
Last line of page 9. "(eq. 3)" should be "Eq. (3)".
Section 3.3. Stylistically, putting numbers at the beginning of paragraphs to make a list looks pretty messy. Perhaps use indentation (or the enumate environment in LaTeX).
How bad is the error in practice when switching to the integer queue? It's not clear how that additive error translates to practical applications.
Page 12. The function "grad" should not be in italics.
Page 13. Define E. I believe it is the edge set. In general, more explanation about each of the applications will help.
In the last paragraph of 4.2, "For inference" should have a comma after it.
Additional Questions:
1. Which category describes this manuscript?: Research/Technology
2. How relevant is this manuscript to the readers of this periodical? If you answer Not very relevant or Irrelevant please explain your rating under Public Comments below.: Very Relevant
1. Please evaluate the significance of the manuscript’s research contribution.: Excellent
2. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : The manuscript describes a new algorithm that allows a significant improvement in computation time for inference in graphical models.
3. Is the manuscript technically sound? In the Public Comments section, please provide detailed explanations to support your assessment: Yes
4. How thorough is the experimental validation (where appropriate)? Please discuss any shortcomings in the Public Comments section.: Compelling experiments; clearly state of the art
1. Are the title, abstract, and keywords appropriate? If not, please comment in the Public Comments section.: Yes
2. Does the manuscript contain sufficient and appropriate references? Please comment and include additional suggested references in the Public Comments section.: References are sufficient and appropriate
3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? If not, please explain your answer in the Public Comments section.: Yes
4. How would you rate the organization of the manuscript? Is it focused? Please elaborate with suggestions for reorganization in the Public Comments section.: Could be improved
5. Please rate the readability of the manuscript. Explain your rating under Public Comments below. : Readable - but requires some effort to understand
6. How is the length of the manuscript? If changes are suggested, please make explicit recommendations in the Public Comments section.: Too short – would benefit from additional material, details, or description.
Please rate the manuscript overall. Explain your choice.: Good
Reviewer: 2
Recommendation: Author Should Prepare A Minor Revision
Comments:
This paper introduces a new algorithm for efficient computation of min-sum products, an operation that is the bottleneck to performing MAP inference in a variety of common real-world problems. The boost in speed is sufficient to make previously impractical problems practical. I believe the contribution will have a significant impact on its field and clearly merits publication.
I found the paper to be very clear and the proofs easy to follow. I followed the arguments carefully but found no technical errors.
The proposed method is exact (not approximate), and does not depend on the form of the pairwise potentials for correctness. The fast n^2logn runtime, however, does require that the elements of the pairwise potential matrices be uniformly distributed and iid. The authors argue that if the matrix elements are not uniformly distributed, a preprocessing and postprocessing scaling operation could remedy this problem. I would have expected that this argument could be absorbed into the proof, so that the authors would not need to claim that the method requires uniformly distributed potential elements.
The requirement that elements be iid is more concerning, since there is no obvious way to work around it, and I would expect real-world problems to violate this assumption strongly. The experimental evidence in section 4 mostly resolves my concerns, but it would strengthen the paper to see either theoretical or empirical analysis of the impact of violating this assumption on performance.
In the introduction, the authors mention previous algorithms used to solve all pairs shortest paths that can be seen as equivalent to solving a min-sum products problem. While these algorithms achieve good expected performance for "average" all pairs shortest paths problems, the authors claim that the assumptions used to achieve good performance would break down if used to solve min-sum products. This sounds plausible, given my limited background on all pairs shortest paths, but I would like to see this claim supported more carefully. Is there a citation that documents this? Is there a theoretical argument that could be made? Could such a method be included in the empirical comparison?
In the experimental section, runtimes are given for the algorithm
I wasn't certain if these were for the entire execution of the algorithm, or for each min-sum product operation. Figures 4 and 6 have runtimes similar to figure 3 (single operations), even though each would require several min-sum product operations. However, figure 5 has much longer runtimes.
In the experimental section in general, more details about how the method was evaluated should be given. For example, for the active contours example, I would like to know if the algorithm was tested under a variety of real-world scenarios, or just the avacado image. This is relevant because I would expect certain images to violate the iid assumption strongly (certain rows or columns of the A or B matrix much lower than others if they correspond to pixels near an edge), perhaps moreso than the simple clutterless windows of the avacodo image.
Similarly, for the pattern matching - was this tested only on the fish pattern, or on many patterns? For the text denoising, some examples of algorithm performance would also be helpful.
For the pattern matching example, it might be helpful to point out that this graph can be solved using pairwise belief propagation, but that computing messages between cliques leads to (I would assume) better performance. Some measure of the performance advantage might be illuminating.
In section 3.3, I found speed-up technique #3 to be vague and difficult to follow. More explanation here would be helpful.
On page 8, it would be conventional to use lowercase p for probability density, i.e. p(s=x) instead of P(s=x).
Additional Questions:
1. Which category describes this manuscript?: Research/Technology
2. How relevant is this manuscript to the readers of this periodical? If you answer Not very relevant or Irrelevant please explain your rating under Public Comments below.: Very Relevant
1. Please evaluate the significance of the manuscript’s research contribution.: Excellent
2. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : This paper introduces a new algorithm for the efficient computation of min-sum matrix products. The min-sum product comes up in the MAP inference of certain graphical models. In particular, the min-sum product operation is the bottleneck for exact inference (via junction trees) for pairwise-connected factor graphs of treewidth 2. Similarly, pairwise connected graphs that are "nearly" of treewidth 2, or have large subgraphs of treewidth 2, can be solved approximately using belief propagation or a similar method. If messages are computed between cliques, min-sum product is again the computational bottleneck. The techniques introduced here make inference in these models substantially faster.
3. Is the manuscript technically sound? In the Public Comments section, please provide detailed explanations to support your assessment: Yes
4. How thorough is the experimental validation (where appropriate)? Please discuss any shortcomings in the Public Comments section.: Lacking in some respects; some cases of interest not tested
1. Are the title, abstract, and keywords appropriate? If not, please comment in the Public Comments section.: Yes
2. Does the manuscript contain sufficient and appropriate references? Please comment and include additional suggested references in the Public Comments section.: References are sufficient and appropriate
3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? If not, please explain your answer in the Public Comments section.: Yes
4. How would you rate the organization of the manuscript? Is it focused? Please elaborate with suggestions for reorganization in the Public Comments section.: Satisfactory
5. Please rate the readability of the manuscript. Explain your rating under Public Comments below. : Easy to read
6. How is the length of the manuscript? If changes are suggested, please make explicit recommendations in the Public Comments section.: About right
Please rate the manuscript overall. Explain your choice.: Excellent
Reviewer: 3
Recommendation: Author Should Prepare A Minor Revision
Comments:
This paper presents an interesting new speed-up of the min-sum matrix product (applicable to a variety of MAP inference tasks), with expected computational complexity results given a certain distribution of the input matrices. While the expected computational complexity is only a modest improvement over recent work by McCauley and Caetano (cited as [13]), experimental results show that in practical terms, the speed-up is substantial, even for real problems where the required distribution of the input matrices is not fulfilled. Overall, the method is not as signficant as some of Felzenswalb's past work in a similar vein (e.g. his paper "Efficient Belief Propagation for Early Vision"), but it is still worth publishing.
The basic idea behind the technique is to formulate the min-sum matrix product operation, denoted C = A \otimes B (i.e. C_{ik} = \min_j A_{ij} + B_{jk}), in the following way: store all entries of A, B and C in two priority queues (implemented with a Fibonacci heap or similar data structure) Q and S; an entry A_{ij} is removed from Q and combined with an entry B_{jk} in S, computing the result C_{ik} = \min(C_{ik}, A_{ij} + B_{jk}) and placing it in S (and similarly when B_{jk} is removed from Q, it is combined with entry A_{ij} in S, etc.); the algorithm is repeated until all entries of C are in S.
It is easy to see (Theorem 1) that the algorithm performs the min-sum matrix product; it is more work to understand exactly why (Theorem 2 and Lemma 1) the expected complexity of the algorithm is O(n^2 log n), which I didn't have time to verify completely (but the overall logic appears correct).
Simpler alternatives to this algorithm are provided, including a scaling method that avoids the use of priority queues altogether, which is fast in practice but may depend unpredictably on the initial value of a scaling value, T, chosen heuristically.
The experimental results include simulations of the min-sum matrix product for matrices satisfying the distribution specified in Theorem 2, as well as two small (but realistic) computer vision problems (interactive image segmentation and point pattern matching) and a text denoising problem (using text from a standard corpora). They demonstrate the soundness of the proposed algorithm, and show its applicability even when the input matrices don't satisfy the distribution specified in Theorem 2.
Overall:
Positives:
1) The algorithm, and the proof of expected complexity, is non-trivial, and the scope of applicability is not too narrow.
2) Convincing experimental results.
Negatives:
1) Modest improvement over recent work by McCauley and Caetano.
2) Given that the paper focuses on MAP estimation (rather than marginal estimation), it would be useful to place the current approach in the context of other techniques such as graph cuts (e.g. sometimes used in image segmentation problems, see for example "Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images" by Boykov and Jolly in ICCV 2001), which (when applicable) are generally faster than message-passing algorithms.
Additional Questions:
1. Which category describes this manuscript?: Research/Technology
2. How relevant is this manuscript to the readers of this periodical? If you answer Not very relevant or Irrelevant please explain your rating under Public Comments below.: Very Relevant
1. Please evaluate the significance of the manuscript’s research contribution.: Good
2. Please explain how this manuscript advances this field of research and/or contributes something new to the literature. : The paper introduces a new technique for speeding up the min-sum matrix product of two n x n matrices, which is a key step in performing MAP inference on a large class of graphical models with message passing algorithms such as min-sum belief propagation. Specifically, the technique applies to graphical models which have pairwise potentials (factors) but third-order cliques. Whereas the worst-case complexity for such models is O(n^3) per node in the graph, and a recently developed algorithm has expected complexity O(n^2.5), the algorithm introduced in the paper has expected complexity O(n^2 log n), assuming some constraints on the input matrices. This is not a radical advance, but as the experiments in the paper demonstrate it still confers significant speedups on real problems (even though the constraints required for the theoretical result are not strictly satisfied).
3. Is the manuscript technically sound? In the Public Comments section, please provide detailed explanations to support your assessment: Appears to be - but didn't check completely
4. How thorough is the experimental validation (where appropriate)? Please discuss any shortcomings in the Public Comments section.: Compelling experiments; clearly state of the art
1. Are the title, abstract, and keywords appropriate? If not, please comment in the Public Comments section.: Yes
2. Does the manuscript contain sufficient and appropriate references? Please comment and include additional suggested references in the Public Comments section.: References are sufficient and appropriate
3. Does the introduction state the objectives of the manuscript in terms that encourage the reader to read on? If not, please explain your answer in the Public Comments section.: Yes
4. How would you rate the organization of the manuscript? Is it focused? Please elaborate with suggestions for reorganization in the Public Comments section.: Satisfactory
5. Please rate the readability of the manuscript. Explain your rating under Public Comments below. : Readable - but requires some effort to understand
6. How is the length of the manuscript? If changes are suggested, please make explicit recommendations in the Public Comments section.: About right
Please rate the manuscript overall. Explain your choice.: Good