============================================================================ META-REVIEW ============================================================================ Comments: Thank you for your submission! I agree with all reviewers, that this is a very interesting new dataset, and with R1 and R3 that you have provided reasonable baselines and experiments. R2 questions whether a gamified setting can enable the collection of actual cant; I have no doubt on this point. There's ample evidence that gamified settings improve data collection---the authors might include a citation or two on this point. The author's response to R2's point about using off-the-shelf models is correct, according to the viewpoint of the resources and evaluation track. Modeling is not the focus here. In resources and eval., models are used to show the quality of a dataset, which I believe was adequately demonstrated. To direct your efforts for CR, I'd suggest you be sure to clarify the game description in Sec. 3 (as R3 suggests), and include a screenshot (as you mention in you author response). If you can find evidence of cant frequency in Chinese, by all means do include that as well. One note from me: there's a good amount of recent work on dogwhistles in semantics and pragmatics from McCready and Henderson. This work might be interesting to you, and could be cited (Elin McCready is a computational linguist, and I think she spoke at the last ACL in a SIG; I only know her theory work, but it's possible she's also working on something in NLP) @inproceedings{henderson2017dogwhistles, title={How dogwhistles work}, author={Henderson, Robert and McCready, Eric}, booktitle={JSAI International Symposium on Artificial Intelligence}, pages={231--240}, year={2017}, organization={Springer} } ============================================================================ REVIEWER #1 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- The paper presents a Chinese dataset for both creating and understanding cant (i.e. a jargon shared by a group of speakers, called "insiders"). The paper is clearly written. Previous work is sufficiently discussed. The paper is organized in two main sections, respectively dedicated to data collection and a number of experiments that show how much challenging is the computational processing of cant creation/understanding. Although the piece of research presented is quite limited (it is a short paper) and results are not particularly surprising (it is obvious that such a task requires much world knowledge), the paper addresses a topic not yet very much covered in computational linguistics, by providing both a dataset to support empirical research and a number of (simple, but well done) experiments to show its usage. Adding some words about the enhancement of the dataset with (linguistic? descriptive?) annotation would be good. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- The paper presents an innovative dataset to support computational analysis of cant, which is a still under-studied linguistic phenomenon. The method presented is sound. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- The paper does not provide sufficient evidence in support of the motivation of the research work presented. It would be helpful if the authors provide some quantitative data about the frequency of cant in (Chinese and not only) corpora. --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 3.5 ============================================================================ REVIEWER #2 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- This paper presents a new dataset for the recognition of cants and some basic experiments on this dataset. My principle concern with this paper is that the data presented does not represent real cants in usage but instead the authors collect data from the online game Decrypto. I am extremely doubtful that this has much similarity to real-world cants. There is also a lack of external validation of the cants, so we only get to see if they work in the context of this game. As such, I am not sure that the results presented here are much more use than being an AI for Decrypto. The authors claim that this dataset is potentially valuable for evaluating language models, but I would like to see more evidence supporting this. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- This is quite an interesting dataset. Could be useful for evaluating language models. Well-written and easy to understand --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- No evaluation that this dataset represents real-world cants Game setting leads to unrealistic data No independent evaluation of the quality of the dataset Experiments use off-the-shelf models, nothing new or interesting here. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- n/a --------------------------------------------------------------------------- Missing References --------------------------------------------------------------------------- n/a --------------------------------------------------------------------------- Typos, Grammar, Style, and Presentation Improvements --------------------------------------------------------------------------- n/a --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 2.5 ============================================================================ REVIEWER #3 ============================================================================ What is this paper about, what contributions does it make, and what are the main strengths and weaknesses? --------------------------------------------------------------------------- The authors propose a Chinese cant dataset that requires a deep understanding of language and world knowledge. They formulate the task as encoding and decoding a message using cant. The data is collected from online game records, where players are divided into teams. Each team tries to correctly interpret the cant presented from the teammate (insider task), then they try to crack the codes from the opposing team (outsider task). Given the study's closeness with word similarity, the authors propose baselines based on (contextual and non-contextual) word embeddings. They used 4 non-contextual models, averaging the word vectors to represent the cant and selecting the hidden word with the smallest cosine for the insider task. For the outsider task, they use the average of history. In the contextual word embeddings, they employed 4 models, concatenating the context, cant, and candidate hidden words with a [MASK] token as a separator for the insider subtask. For the outsider subta! sk, they replace the hidden words with the cant history. strengths: - It's presented a new methodology for data gathering for word association. - Appropriate baselines based on word embeddings are presented. - The dataset includes 2 variations of the task: an easier version, insider task, where the model has access to more information, and a more challenging task, outsider, where the model uses less information. weaknesses: - Language semantic encoding/decoding is deeply impacted by personal experience. Therefore, a description of the information providers (players) is critical (e.g. average age and education level). However, this information is missing. - The game description (section 3) is hard to understand for those who don't know the game. - The cant creation involves strategies based on different pieces of knowledge (e.g. semantics, common sense, as in example #3, and world knowledge, as in example #2). However, the dataset does not contain the applied strategy's annotation, requiring a long qualitative analysis to perform a in-depth model comparison. - The input of the contextual models is poorly described. --------------------------------------------------------------------------- Reasons to accept --------------------------------------------------------------------------- It's presented a new methodology for data gathering for word association. Appropriate baselines based on word embeddings are presented. The dataset includes variations of the task. --------------------------------------------------------------------------- Reasons to reject --------------------------------------------------------------------------- The paper lacks a description of the information providers (players). The game description (section 3) is hard to understand for those who don't know the game. The dataset does not contain the applied strategy's annotation, requiring a long qualitative analysis to perform in-depth model comparison. --------------------------------------------------------------------------- Questions for the Author(s) --------------------------------------------------------------------------- What's the motivation to use precisely the AFQMC and LCQMC? --------------------------------------------------------------------------- --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Overall Recommendation: 4