(prior to submitting revision) BEGINNING OF COMMENTS TO THE AUTHOR(S) +++++++++++++++++++++++++++++++++++++++ Recommended Decision by Associate Editor: Recommendation #1: Minor Revision Comments to Author(s) by Associate Editor: Associate Editor Comments to the Author: (There are no comments. Please check to see if comments were included as a file attachment with this e-mail or as an attachment in your Author Center.) +++++++++++++++++++++ Individual Reviews: Reviewer(s)' Comments to Author(s): Reviewer: 1 Comments to the Author This paper proposes a 2-stage zero-shot learning method (RSR) based on reinforcement learning and spiral learning. RSR consists of a preview stage and a review stage, which first previews the images to construct a characterized learning path for models to learn. By iterative rethinking, revisiting, and revising, RSR learns instance-specific information to ease the learning difficulty of complex semantic relationships among attributes for ZSL. It also includes a generative extension, which reveals the feasibility of RSR to be applied in generative-based ZSL works. In experiments, RSR-related methods outperform the state-of-the-art methods. Extensive analysis of the learned semantics and learning process shows the good explainability of the model. Strengths: 1) Motivation: The paper is well-motivated in that it focuses on problems that are interesting and rarely studied. 2) Novelty: This paper proposes a reinforced self-revised network for spiral ZSL. The authors implement RSR in an end-to-end manner, which combines a self-directed grouping function and a reinforced selection module. RSR dynamically selects the target attribute group to learn and revises its prediction. This method can effectively ease the learning difficulty of ZSL tasks. The authors further implement an adversarial version, which further improves the learning ability of the model. The motivations and solutions in ZSL are novel enough to justify a publication in TNNLS. 3) Experiment: The experimental results are comprehensive1, which can prove the effectiveness of the proposed RSR network. The model outperforms the state-of-the-art in both ZSL and GZSL settings in four benchmark datasets. Extensive analysis, e.g., visualization and quantitative analysis from attribute-, group-, and decision-level, illustrates the insightful semantics that the model learns. Besides, the author provides detailed hyper-parameter study and implementation details to show the robustness and reproducibility. 4) Writing: The network is complex, but this paper is well-written and easy to follow. It clearly illustrates the key ideas and pipelines of the method. Weaknesses: 1. Some references are still not published, like references [11] and [31]. 2. In Section II, the authors provide limited related works. The authors should include more related works to give more analysis and comparison about the current works. 3. In Section III-D, the training strategies of this 2-stage network are complex. The authors should write more details of how to train the models. 4. In Section IV-B Decision-level analysis, where does `not of concrete’ come from? Do authors define these items? The authors should give more details about these descriptions. 5. The author should give more information and details about Figure 1, which can help readers better understand the motivation. 6. In Section IV-B, the authors should give more details about the calculation of the composition ratios and Top-10 shot accuracy. The current description is confusing and difficult to understand. 7. In Section IV-C, Figure 6 (in Section IV-C 1) is mentioned before Figure 5 (in Section IV-C 2). The authors should check the figure orders in this paper. Reviewer: 2 Comments to the Author This paper proposes a spiral learning scheme and an end-to-end reinforced self-revised framework for zero-shot learning. The paper is well organized and clearly written. The proposed idea is attractive and well demonstrated its efficacy and superiority. I only have one suggestion that might help improve the paper quality: it would be better to have a short discussion of some possible future directions. Reviewer: 3 Comments to the Author The authors propose a spiral Reinforced Self-Revised (RSR) network for Zero-Shot Learning (ZSL), which utilizes reinforcement learning to spirally learn the semantic information in the images. Most conventional ZSL methods directly learn the semantic relationship between attributes and classes. However, the models may not be able to directly learn some difficult tasks. Inspired by spiral learning, the authors split tasks into a few small tasks and use reinforcement learning to dynamically select suitable learning goals to spirally enhance the learned information. The authors introduce the algorithm flow and analyzes the theory in detail. The model is validated on four widely used datasets in both ZSL and generalized ZSL settings. The experimental results are sufficient and impressive. Detailed analysis is provided, e.g., learned semantic analysis from three different levels, and the explainable decision process. Overall, the presentation is easy to follow. The motivation is solid. This work is novel, interesting, and beneficial for the ZSL community. Strengths: 1) Motivation: The motivation of this paper is very sufficient and solid, because the problems it studies are very representative, universal, and have high research value. 2) Novelty: This is a very novel framework. While most ZSL research has focused on visual-semantic transfer, this paper addresses a more important problem of improving knowledge representation in a self-correcting manner. The framework intuitively treats spiral curriculum learning, reinforcement, etc. as a unified model. A series of new modules and operators with solid theoretical contributions are introduced. 3) Experiment: This paper conducts detailed and comprehensive experiments that can fully demonstrate the superiority of the spiral learning approach involved in the proposed RSR network. The model achieves better performance levels than SOTA in the ZSL and GZSL settings on the four benchmark datasets. The authors provide in-depth and detailed analysis, such as visualization and quantitative analysis from the attribute, group, and decision level, illustrating that the model learned can effectively learn complex semantics. At the same time, the authors also provide meticulous hyperparameter studies and model details, demonstrating superior robustness and reproducibility. 4) Writing: The paper is well organized and the writing is easy to follow. Weaknesses: 1. Equation 6 looks messy. It can be broken down into inline equations for better clarity. 2. The proposed method employs reinforcement optimization to train the model, which may be affected by training stability. The results in Figure 4a-b illustrate this concern. This paper also proposes a random version of SN (random-SN), where random selection is used to group attributes. However, the authors do not show how this randomness affects performance. 3. Adversarial loss brings more improvement on Random-SN than RSN, what is the reason behind this? 4. Figure 3 confuses me. Which dataset are these examples from? 5. How does the threshold \eta_T relate to the number of steps (e.g., how many steps are required on average)? 6. Eq1, h_ex has three dimensions? How is the fully connected layer f_c(h_ex) done? 7. In the rethinking section, is the output of the Pi a number? Does it select an attribute vector in g? Since there is no semantic supervision, it is difficult to understand why g is called an attribute group. What does the member in g mean? +++++++++++++++++++++++++++++++++++++++ END OF COMMENTS TO THE AUTHOR(S)