Reviewer #1 Questions 1. Overall Recommendation Borderline 2. Summary The authors present an unsupervised method to improve the robustness of convolutional neural networks (CNNs). The method converts a CNN to a spiking neural network (SNN) with activation function replacement and uses local (Hebbian) learning to induce synaptic modifications which produce network robustness against a given set of perturbation mechanisms. 3. Strengths Overall the paper is well-written and the problem is well-defined. 4. Weaknesses Selection of perturbation methods not clear or justified with references. The method may not be generalizable. Details of hyperparameter selection and computational cost are missing. 5. Detailed Comments The authors report the method as well as its evaluations for a fixed set of perturbation mechanisms (Gaussian blur, Additive Gaussian noise, Salt and pepper and Speckle). Why these particular mechanisms were selected is not clear and references to comparable robustness methods are missing. The authors do not compare their method against previously-reported robustness attacks or defense methods. This casts doubts over the applicability and generality of the described method. For example, see: 1. Liang, Hongshuo, et al. "Adversarial attack and defense: A survey." Electronics 11.8 (2022): 1283. The details (or reference) of the genetic algorithm used to determine the hyperparameters of the SRC algorithm are missing. Similarly, details of extra computational cost (on top of regular network training) are missing. This again could be a point of comparison with existing robustness methods. Spelling/formatting errors. For example, the axis caption in the top-most panel in columns 1, 4, 7 and 12 of Fig. 4 is incorrect. Reviewer #2 Questions 1. Overall Recommendation Accept 2. Summary They seek to improve robustness to Gaussian-based and salt and pepper noise by "sleep-like replay" Their method (SRC) is implemented by converting the CNN to a spiking neural network by replacing ReLU with a Heaviside function and using noise to induce activations. They use Hebbian-style learning to modify the network weights under noise to meet certain conditions such as similar mean activations reflecting the training data and rules for pre- and post-synapse co-occurrent firing of the neurons. After the weight modification steps are complete, they return to the original CNN layers for testing. Their models performed better than the baseline on noisy data, but not as well as standard fine-tuning. They also propose that their method is more computationally efficient than alternative methods. They use GradCam among other techniques to visualize how SRC affects the model. Model attention showed an improved alignment with class features in the image space, especially on the perturbed images. 3. Strengths I enjoyed seeing the GradCam analysis near the end and am curious how the technique would perform on adversarial examples. Overall, well written. 4. Weaknesses I was hoping to see a visualization of the method / pipeline through the model. How did removing the bias term affect performance, it seems to me removing that is required but would also affect performance on noisy samples. I think the standard network with a bias term would be a good comparison as well. In Fig 3, it is a little hard to read the black on dark red values, but I can read them. On page 4, "Ten trials differing in randomness" needs to be clarified. What kind of randomness? Weight initialization? Something else? MNIST and CIFAR are sort of overused, but easier to train models, so I understand it. Table 2 and two other occurrences have "Hyperparamters" instead of Hyperparameters. On page 4, I want to know more about the hyperparameters of your method, could you at least the symbols used in the algorithm in Table 2? 5. Detailed Comments I enjoyed the overall concept. Some more time could be spent on the details of the implementation and visualization of the SRC model, including hyperparameters required. I liked the analysis using GradCam. This method appears to be heuristically driven. I would enjoy more discussion on the heuristic for why strengthening these connections improves performance on noise. How does this differ from a bias term or only using the Heaviside activation? In addition, is there a closed form method of doing a similar thing as a regularization term in the loss function, so we don't have to go through the manual process described here? Curious on that. Does ReLU perform better than Heaviside during test? Is the only benefit to Heaviside during this voltage adaptation stage? Some of my questions for you. Reviewer #3 Questions 1. Overall Recommendation Accept 2. Summary This paper proposes an optimization scheme to implement sleep like replay in CNN learning. 3. Strengths Implementation of the mathematical optimization on Convolutional layers. Minimal computational overhead with improved performance. Experimentation with various common noise types in images. 4. Weaknesses In Fig. 3, "undistorted" word appears cut at the edge. 5. Detailed Comments Implementation of the mathematical optimization on Convolutional layers. Minimal computational overhead with improved performance. Experimentation with various common noise types in images. Reviewer #4 Questions 1. Overall Recommendation Weak Accept 2. Summary The paper addresses the issue of degrading performance in an image classification task when inputs are perturbed by distortions. It proposes an unsupervised optimization algorithm for CNNs, called Sleep Replay Consolidation (SRC), to improve the robustness and generalization to noisy inputs. In detail, convolutional filters are improved by selective modification of spartial gradients. The algorithm was tested on following datasets: MNIST and CIFAR-10 with following type of distortions: additive Gaussian noise, Gaussian blur, Salt & Pepper and Speckle. 3. Strengths Good choice of distortion types with explanations on where they could occur in real applications. A good set of experiments and clearly explaining the results and the thought processes behind them. Explaining the weaknesses of their results. 4. Weaknesses The datasets used contain are very basic and contain low resolution images. The trained models are also very small and basic. It would be very interesting to see how this approach fares on settings closer to real world scenarios. The biological inspired computations (i.e. spiking) are not very well explained. (i.e. Math formulas are needed). 5. Detailed Comments The figure descriptions need to be clearer. It is sometimes not possible to understand them without reading the whole sections. The figures and tables could be a bit closer to the text where they are mentioned/explained. In Fig.3 the FT abbreviation (standing for Fine-tuned) is never established.