Metareview: The paper proposes a model for entity typing. The authors construct a dataset consists of Wikipedia articles and the corresponding Wikidata entry. Using this new dataset, a text-to-text pre-training scheme to instill type knowledge in language models via QA is proposed. More specifically, the task includes four types of knowledge-based questions: Entity/Type Discovery, Entity Typing, Entity Recognition and Slot Filling. The experimental results show the proposed method achieves state-of-the-art performance in zero-shot dialog state tracking. Summary Of Reasons To Publish: The proposed pre-training scheme using typing related questions is interesting and could be useful. The newly created dataset might be a useful resource for the community. Summary Of Suggested Revisions: While the proposed pre-training scheme of an encoder-decoder LM is interesting, a common question raised by most reviewers is that how is it compared with other knowledge enhanced LM such as ERNIE, RoBERTA, and LUKE? The proposed model is essentially another KB-enhanced LM. It would be better to add some comparison with these kind of pre-trained LM. The contribution of the new dataset needs to be further clarified. Many datasets also contain Wikipedia pages with the corresponding Wikidata entries. It would be good to differentiate WikiWiki from the existing ones. Several key parts need to have more explanation. For example, the WikiWiki corpus and the dialogue state tracking task. Overall Assessment: 3 = There are major points that may be revised 1. Official Review of Paper204 by January Reviewer Paper Summary: This paper investigates how to instill type knowledge in language models. It first links entities in Wikipedia to Wikidata to form the WikiWiki dataset, which augments Wikipedia with rich type knowledge from Wikidata. Four types of questions with regard to type knowledge are proposed on WikiWiki to ask an encoder-decoder language model (where they use BART) to answer. Evaluation results on dialog state tracking and entity typing demonstrate that such a language model is more effective in capturing type knowledge. Summary Of Strengths: The proposed WikiWiki dataset is potentially useful for future entity typing research in the field. They achieve the new state-of-the-art results on zero-shot domain adaptation for dialog state tracking on MultiWOZ, and show promising results on entity typing tasks. The proposed pre-training tasks are simple and shown effective. Comments, Suggestions And Typos Technically, they propose to use an encoder-decoder language model along with QA tasks to capture the type knowledge. However, this is not justified as the optimal pre-training model architecture or pre-training task for type knowledge. They should consider using the encoder-only model (e.g., BERT) and decoder-only model (e.g., GPT) with different pre-training tasks (mask language model task) and show their results in the ablation study. They also did not include other knowledge-enhanced language models as baselines (especially when RoBERTa is considered as baseline method). Overall Assessment: 2 = Revisions Needed: This paper has some merit, but also significant flaws, and needs work before it would be of interest to the community. Datasets: 1 = No usable datasets submitted. Software: 1 = No usable software released. Author Identity Guess: 1 = I do not have even an educated guess about author identity. Confidence: 3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., the math or experimental design. 2. Official Review of Paper204 by January Reviewer Paper Summary: This paper aims to inject more type knowledge into pretrained language models (in this case they used BART). The approach is straightforward: they first constructed a large-scale type knowledge dataset called WikiWiki based on the existing entity type information from Wikidata. They then designed several pretraining tasks using a unified text-to-text question-answering format to train the model to predict relevant typing knowledge. The evaluation results on several related benchmarks show that their approach is effective. Overall I think there are solid contributions in compiling the data and achieving good empirical results, but some important details are lacking (maybe due to the page limits?). I'd encourage the authors to fill in the missing details given the additional page in the final draft if they have the chance. Also, I assume the relevant data will be openly released to facilitate future work, if not I'll have to reconsider my evaluation. Summary Of Strengths: The proposed pipeline is straightforward and can be applied to other models as well (e.g., T5). The constructed data can be useful for future work to further study type knowledge modeling. And I encourage the authors to release a human-verified version of their test sets to be used as evaluation benchmarks for future work (I especially like the unseen entity setting). Comments, Suggestions And Typos I think the details about the WikiWiki corpus are lacking. For example, I suspect such auto-constructed datasets to be rather noisy (e.g., personal experience tells me that spaCy's named entity recognition can be inaccurate sometimes; and the Wikidata information isn't always accurate either right?), do you have a measure of how accurate the typing labels are? Did you perform human validation on the test sets to make sure the labels are accurate (I think it is important given that you have a section comparing results on the WikiWiki test sets)? Overall Assessment: 4 = This paper represents solid work, and is of significant interest for the (broad or narrow) sub-communities that might build on it. Datasets: 4 = Useful: I would recommend the new datasets to other researchers or developers for their ongoing work. Software: 1 = No usable software released. Author Identity Guess: 1 = I do not have even an educated guess about author identity. Confidence: 3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., the math or experimental design. 3. Official Review of Paper204 by January Reviewer Paper Summary: This paper proposes a text-to-text pre-training scheme to instill type knowledge in language models via QA, which includes four types of knowledge-based questions (Entity/Type Discovery, Entity Typing, Entity Recognition and Slot Filling). The authors also provide WikiWiki dataset built from Wikipedia articles and Wikidata KG. Models trained on WikiWiki achieve SOTA zero-shot DST results and are able to infer novel types. Summary Of Strengths: This paper provides a new WikiWiki dataset comprising 10M Wikipedia articles linked to Wikidata knowledge graph with 41K types. The authors propose a pre-training scheme for generative language models using type-centric question-answering based on WikiWiki. Models trained on WikiWiki achieve SOTA performance in zero-shot domain adaptation for dialog state tracking and can precisely infer types for seen and unseen entities. Comments, Suggestions And Typos This paper instills type knowledge in language models via QA, but lack of knowledge-enhanced pre-trained language models as baselines (for entity typing task), such as LUKE, EAE, CoLAKE, ERICA and DKPLM. [1] Yamada I, Asai A, Shindo H, et al. Luke: deep contextualized entity representations with entity-aware self-attention[J]. arXiv preprint arXiv:2010.01057, 2020. [2] Févry T, Soares L B, FitzGerald N, et al. Entities as experts: Sparse memory access with entity supervision[J]. arXiv preprint arXiv:2004.07202, 2020. [3] Sun T, Shao Y, Qiu X, et al. Colake: Contextualized language and knowledge embedding[J]. arXiv preprint arXiv:2010.00309, 2020. [4] Qin Y, Lin Y, Takanobu R, et al. Erica: Improving entity and relation understanding for pre-trained language models via contrastive learning[J]. arXiv preprint arXiv:2012.15022, 2020. [5] Zhang T, Wang C, Hu N, et al. DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for Natural Language Understanding[J]. arXiv preprint arXiv:2112.01047, 2021. Overall Assessment: 3 = Good: This paper makes a reasonable contribution, and might be of interest for some (broad or narrow) sub-communities, possibly with minor revisions. Datasets: 1 = No usable datasets submitted. Software: 1 = No usable software released. Author Identity Guess: 1 = I do not have even an educated guess about author identity. Confidence: 3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., the math or experimental design. 4. Official Review of Paper204 by January Reviewer Paper Summary: The paper claims building a new dataset which contains Wikipedia articles and their types and a novel way to pre-train large language models by answering questions about entities types. In my opinion, the paper has the following deficiencies: 1. The contribution of the dataset is limited; 2. The novelty of the proposed model seems marginal; 3. The evaluation needs to be stronger. Summary Of Strengths: The model proposes a model which use entity type information to pre-train language models. Comments, Suggestions And Typos The paper claims to propose a WikiWiki corpus, which contains Wikipedia articles and their corresponding Wikidata links. This part of the contribution is hard to judge, as it seems to me the paper is just reusing existing data with a bit of additional data processing. I doubt if it is proper to claim the dataset is a major contribution. The proposed model benefits from gaining extra information from data in knowledge bases. It seems this idea has been explored before (e.g. ERNIE: Enhanced Language Representation with Informative Entities, ACL 2019), while the paper includes some of these works in the related work section, none of them was compared to in the experiment, which makes it hard to demonstrate the advantage of the proposed model. To demonstrate the performance of large pre-trained language models, the paper could benefit from evaluating it on large benchmarks (e.g. GLUE : A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP 2018) The paper could be improved by giving more explanation to key parts, i.e. The paper did not explain what dialogue state tracking is. Overall Assessment: 2 = Revisions Needed: This paper has some merit, but also significant flaws, and needs work before it would be of interest to the community. Datasets: 2 = Documentary: The new datasets will be useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. (Still a positive rating) Software: 2 = Documentary: The new software will be useful to study or replicate the reported research, although for other purposes it may have limited interest or limited usability. (Still a positive rating) Author Identity Guess: 1 = I do not have even an educated guess about author identity. Confidence: 3 = Pretty sure, but there's a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper's details, e.g., the math or experimental design.