Reviewer #1: This is an interesting article and would be of interest to readers. However the manuscript would be improved with reorganizing with the background section focusing on the problem (digital harassment/threats) globally, why focus on LMIC, specifically India, significant use of twitter vs. other LMIC, English speaking, social and gender norms that reinforce inequality, etc) consequences of digital harassment/threats on women/girls, specific details on COVID-19 in context and why important to examine the association and then move to the theory/framework and categories that guides the research and the study purpose and question. The methods section is dense, perhaps divide into phases/sections related to 1. geotagging locations of 30 million tweets (also how did you get access to the tweets for analysis), 2) Qualitative analysis of tweets for misogyny and non-misogyny (curious how students trained), 3. Test/Train models to determine best fit and 4) Analysis of misogyny and COVID in the digital space using model. The most lacking section of the paper is, what are the implications of these findings? What should/could be done to prevent/reduce digital abuse/harassment during a national/global crisis like a pandemic? What the policy and future research implications? Reviewer #2: In general the authors show competence, for this is an interesting and well-developed study. However, I was left with some question and recommendations that I'd like to see answered in the paper itself. That said, I believe this study should be published, once these concerns are addressed. - When describing the hypothesis and conclusions, these are worded as universal but the study is restricted to tweets from India. - Why the authors didn't explore modern NLP classifiers/techniques? We see TF-IDF unigrams used with SVMs, bayes, logistic regression and a multi-layered perceptron. Nothing wrong with those, but there are plenty of newer architectures with better results. - My mayor concern here is that I missed a connection, and to me this work reads as two papers. First, a taxonomy for misogyny is proposed and used for the classifiers, then this classifier is used on tweets to measure said misogyny. But I missed the connecting idea between the two. If the goal is to defend the proposed taxonomy, I'd expect a comparison against previous taxonomies. If the goal is the time-series analysis, I don't know why the previous taxonomies weren't enough to make the measurements. Thus I read this work as having those two separate goals, and each has its well-earned merits, but it lacks a paragraph somewhere connecting these two ideas. Small notes on form: 1. Line [147] the figure is missing 2. Line [311] this conclusion is not constraint to India, while the study was 3. Line [319] can we justify this with most being less than 1% of the ground truth tweets