Metareview: This paper gives a literature review on dynamic neural networks for NLP, categorizing previous work into three classes: skipping, MoE, and early exit. The categorization is natural and gives a big picture to anyone who is not familiar with these topics. The review is complete and covers many cutting-edge papers in this sub-field. Also, the paper is well-written and easy to read. Some of the reviewers lean positive because this review is something new in NLP though similar studies can be found in other communities, such as ML and CV. On the other hand, other reviewers think that this work does not offer too much beyond the literature review and there are some (possibly important) missing reference papers. But I think all these can be addressed in an update of this submission. To me, the work is interesting itself and such a survey paper is new. However, there are some concerns. It seems that the work does little with NLP topics. I was also aware of a similar point in the comments of Reviewer ui82. This makes the work less informative, given the fact that most readers are NLP guys. In fact, transferring ML techniques to NLP problems is common but the value of the work is not big if there are no insights on why using these techniques in NLP is hard and how they are adapted to specific problems. I’d like to categorize dynamic NNs into efficient methods. Note that many of the studies along this line of research are based on a similar idea: some of the problems merely need a light model and a small amount of compute. But there is no discussion on how these methods improve efficiency. As a matter of fact, many previous papers indeed did nothing with efficiency improvement. But this should be an important problem that one mention when involved in this research area. Also, most of the dynamic methods require additional systems to dynamically activate parts of the model and inactivate the rest. This in turn makes the system much more complicated and the result is hard to reproduce. I am not meant to be a critic but to make the author aware of this point. ============================== Thanks for your response to the comments. As mentioned above, the topic here is interesting, and the paper is itself well written, but it still needs refinements for stronger work. All the issues arise in the reviews can be addressed in an update of this work. ===== Official Review of Paper96 by Reviewer 8Z7y  ACL ARR 2022 April Paper96 Reviewer 8Z7y 09 Jun 2022ACL ARR 2022 April Paper96 Official ReviewReaders: Program Chairs, Paper96 Senior Area Chairs, Paper96 Area Chairs, Paper96 Reviewers Submitted, Paper96 Authors Paper Summary: The survey discusses various dynamic neural network models in NLP and categorizes them according to their new taxonomy that the authors propose in this contribution. The review of the existing approaches is comprehensive and the taxonomy is natural, but also rather straightforward. Based on their taxonomy they then classify the existing approaches and provide a great overview of them (with some focus on text comprehension) for each class of their taxonomy. Naturally, the authors explain the general reasoning that they used in that classification task and in this manner also provide a high-level explanation of the different existing approaches themselves. Summary Of Strengths: (reasonably) comprehensive literature review reasonable divisions and subdivisions excellent high-level exposition of existing dynamic network architectures excellently written Summary Of Weaknesses: straightforward classification and taxonomy thin contribution besides the literature review would be better suited as short paper Comments, Suggestions And Typos: The write-up is exceptionally good and deserves commendation. Thank you. Overall Assessment: 2.5 Confidence: 4 = Quite sure. I tried to check the important points carefully. It's unlikely, though conceivable, that I missed something that should affect my ratings. Best Paper: No Reproducibility: 5 = They could easily reproduce the results. Datasets: 1 = No usable datasets submitted. Software: 1 = No usable software released. Author Identity Guess: 1 = I do not have even an educated guess about author identity. ===== Official Review of Paper96 by Reviewer 9ZE7  ACL ARR 2022 April Paper96 Reviewer 9ZE7 09 Jun 2022 (modified: 09 Jun 2022)ACL ARR 2022 April Paper96 Official ReviewReaders: Program Chairs, Paper96 Senior Area Chairs, Paper96 Area Chairs, Paper96 Reviewers Submitted, Paper96 Authors Paper Summary: The paper is a survey on dynamic neural networks for NLP. It summarizes dynamic neural networks into three major categories: skimming, mixture of experts, and early exit. The related works of these three categories are separately discussed in great detail. The paper also further discussed the challenges and future directions of dynamic neural networks, including their evaluation, speedup, theoretical support, and explainability. Summary Of Strengths: The paper provided a comprehensive survey of many dynamic neural networks and an in-depth discussion on the future of this line of work. Different works are well structured and the paper is written clearly. The direction is an interesting direction given that SOTA PLMs are computationally very expensive. Summary Of Weaknesses: However, it still missing some important works for this domain, for example: Adaptive computation time (Graves, Alex. "Adaptive computation time for recurrent neural networks." arXiv preprint arXiv:1603.08983 (2016).) and its followups are a good example of dynamic RNN models. Universal transformer (Dehghani, Mostafa, et al. "Universal transformers." arXiv preprint arXiv:1807.03819 (2018).) also has an early exit strategy. Ordered Memory (Shen, Yikang, et al. "Ordered memory." Advances in Neural Information Processing Systems 32 (2019).) and other neural stack models are good examples of hierarchical RNN. Having some quantitative comparison between cited works would also improve the paper. Comments, Suggestions And Typos: Missing references are mentioned in the above section. Overall Assessment: 2 = Revisions Needed: This paper has some merit, but also significant flaws, and needs work before it would be of interest to the community. Confidence: 3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., the math or experimental design. Best Paper: No Reproducibility: 5 = They could easily reproduce the results. Datasets: 1 = No usable datasets submitted. Software: 1 = No usable software released. Author Identity Guess: 1 = I do not have even an educated guess about author identity. ===== Official Review of Paper96 by Reviewer rfDQ  ACL ARR 2022 April Paper96 Reviewer rfDQ 07 Jun 2022ACL ARR 2022 April Paper96 Official ReviewReaders: Program Chairs, Paper96 Senior Area Chairs, Paper96 Area Chairs, Paper96 Reviewers Submitted, Paper96 Authors Paper Summary: This paper is a literature review of dynamic neural networks, or systems that use varying amounts of computation depending on the context of a token in a sequence. They propose a taxonomy of these systems, overview existing methods and where they fit into that taxonomy, and then discuss open research questions. Summary Of Strengths: The discussion of open research questions seeme thorough to me. One possible future direction that isn't discussed is using dynamic networks to implement contextual tradeoffs between accuracy and properties other than efficiency, such as interpretability or uncertainty calibration. The taxonomy seems well constructed, although I do not work in this area, so I don't know how novel it is or how useful it would be to people in the field. The authors synthesize different results from different papers in terms of performance well. Summary Of Weaknesses: I found myself in need of examples of how each model might process a sequence. I would like to have seen some kind of running example, that could contrast each method by describing how it would be processed step by step. There was limited discussion of the relative advantages of each of these broad categories of methods, beyond just the performance reported in papers. What type of situations might call for each of these categories? Please explain the "overthinking problem". Comments, Suggestions And Typos: How does token dropping fit into this? I would have liked to see some detailed discussion of how these methods perform during transfer learning and finetuning. We believe this doubt is highly debatable and 223 warrants further investigation. This is very strange phrasing. "learned routing and unlearnable routing" should actually define these terms the first time you present them get- 342 ting rid of the expert capacity and auxiliary loss in 343 previous works. Can you explain this better? Overall Assessment: 3.5 Confidence: 2 = Willing to defend my evaluation, but it is fairly likely that I missed some details, didn't understand some central points, or can't be sure about the novelty of the work. Best Paper: No Reproducibility: 5 = They could easily reproduce the results. Datasets: 1 = No usable datasets submitted. Software: 1 = No usable software released. Author Identity Guess: 1 = I do not have even an educated guess about author identity. ===== Official Review of Paper96 by Reviewer ui82  ACL ARR 2022 April Paper96 Reviewer ui82 31 May 2022ACL ARR 2022 April Paper96 Official ReviewReaders: Program Chairs, Paper96 Senior Area Chairs, Paper96 Area Chairs, Paper96 Reviewers Submitted, Paper96 Authors Paper Summary: The paper present a review on dynamic neural networks for NLP. The review discussed three approaches if dynamic NNs: skimming, early exit and mixture of expert. It present an overview of recent NLP studies leveraging dynamic NNs as well as associated advantages and challenges. Summary Of Strengths: There is no similar review available for this topic. The review is well structured and written clearly. Open challenges in dynamic NNs for NLP are discussed objectively. Summary Of Weaknesses: I expected a stronger link to specific NLP tasks in the application sections: For which NLU/NLP tasks have dynamic NNs shown good results or should be applied in the future? For example, the subsection starting on line 567 ff. focuses on reviewing existing approaches with respect to their methods but does not describe the specific NLP tasks that were involved. What were the performance gains of applying dynamic NNs to these tasks (both in terms of relevant evaluation metrics and reduction of training time)? Some concepts and methods could be described in more detail (e.g., Pareto Front, CAT (Schuster et al., 2021)). Comments, Suggestions And Typos: The tables should be improved: Add in the caption how they are grouped and sorted. This is somewhat explained in the text for Table 3, but not for Table 1 and 2. Overall Assessment: 4 = This paper represents solid work, and is of significant interest for the (broad or narrow) sub-communities that might build on it. Confidence: 2 = Willing to defend my evaluation, but it is fairly likely that I missed some details, didn't understand some central points, or can't be sure about the novelty of the work. Best Paper: No Reproducibility: 5 = They could easily reproduce the results. Datasets: 1 = No usable datasets submitted. Software: 1 = No usable software released. Author Identity Guess: 4 = From an allowed pre-existing preprint or workshop paper, I know/can guess at least one author's name.