------------------------- METAREVIEW ------------------------ The reviewers are generally positive about the paper, as the paper is well-motivated and easy to follow, the experiments are extensive, and the results improve upon the state-of-the-art. There are however major weaknesses highlighted by the reviewers, including limited technical contribution, proposing three kinds of span encoder but only using two of them, missing analysis of the contribution of each of the three losses, and small improvement in the experiment results over other benchmarks (less than expected as the proposed approach considers both "entity representation" and "relation representation" while the other approaches consider only one). The weaknesses are significant enough that the paper should be improved before it can be accepted. ----------------------- REVIEW 1 --------------------- SUBMISSION: 9681 TITLE: SPOT: Knowledge-Enhanced Language Representations for Information Extraction AUTHORS: Jiacheng Li, Yannis Katsis, Tyler Baldwin, Ho-Cheol Kim, Andrew Bartko, Julian McAuley and Chun-Nan Hsu ----------- Overall evaluation ----------- SCORE: 1 (weak accept) ----- TEXT: This paper presents SPOT, a SPan-based knOwledge Transformer, to learn representations of entities from words and representations of relationships from entities, which utilizes English Wikipedia and the aligned Wikidata as pre-training corpus. Experimental results show that SPOT outperforms other knowledge-enhanced language models on kinds of information extraction tasks and generates superior knowledge representations. ----------- Strengths and reasons to accept ----------- 1. The paper is well-motivated and easy to follow. Considering the relations between entities in the pre-training framework is an interesting topic, and it will promote many knowledge-intensive tasks. The authors propose to abandon the heavy knowledge encoder, which is a meaningful exploration. 2. The proposed model achieves significant improvements compared with baseline models on various information extraction tasks. Ablation studies and some detailed analytic experiments are conducted to explore the influence of different design of span encoder, which is valuable for further study. ----------- Weaknesses and limitations ----------- 1. The technical contribution of the paper is relatively limited. It is not the first work to incorporate relation knowledge into entity representations. The methods in the span encoder are easy to think of and the span pair encoder is also somewhat trivial. 2. The paper employs RoBERTa large as the text encoder, and it seems that the text encoder is also pre-trained with the proposed tasks. But, the authors only count the parameters of the span module and span pair module, then argue that SPOT only contains 21M parameters, which is not inappropriate. 3. The paper proposes three kinds of span encoder, but only utilizes two of them, an in-depth study is needed (why do not utilize all of them). Besides, Table 7 only report a few results with the pair encoder, Does the pair encoder cannot help the entity-centered tasks? ----------------------- REVIEW 2 --------------------- SUBMISSION: 9681 TITLE: SPOT: Knowledge-Enhanced Language Representations for Information Extraction AUTHORS: Jiacheng Li, Yannis Katsis, Tyler Baldwin, Ho-Cheol Kim, Andrew Bartko, Julian McAuley and Chun-Nan Hsu ----------- Overall evaluation ----------- SCORE: 1 (weak accept) ----- TEXT: The authors propose a knowledge-enhanced pre-training framework for transformer-based language models based on span encoding and pairwise representations. Extensive experiments conducted on multiple datasets demonstrate the effectiveness of the proposed model. ----------- Strengths and reasons to accept ----------- 1. The paper is well-written. 2. The experiments include all major IE tasks and the most recent baseline models. 3. The authors also did a case study on the visualization of entity and relation embeddings, which demonstrates the underlying reason why the model works. ----------- Weaknesses and limitations ----------- Maybe it's better to use the review version for the paper which includes line numbers, but definitely that's a minor comment. ----------------------- REVIEW 3 --------------------- SUBMISSION: 9681 TITLE: SPOT: Knowledge-Enhanced Language Representations for Information Extraction AUTHORS: Jiacheng Li, Yannis Katsis, Tyler Baldwin, Ho-Cheol Kim, Andrew Bartko, Julian McAuley and Chun-Nan Hsu ----------- Overall evaluation ----------- SCORE: 1 (weak accept) ----- TEXT: This paper presents a new pre-train model, SPOT, for knowledge-base construction tasks, i.e., relation extraction. SPOT explores the representations of both entities and relationships from token spans and span pairs in the text respectively. Experiments on BC5CDR, NYT24, and FIGER demonstrate the performance of the SPOT model on information extraction tasks outperforms other state-of-art models. Representation visual shows that the combination of different encoders separates representations on some tasks without fine-tuning. However, the overall experiments are weak, with only around 1-2% higher than previous models. ----------- Strengths and reasons to accept ----------- - A very important problem relevant to the community. - Overall, the paper is well-written and clear. - The design of encoder architecture and corresponding losses seem reasonable. - SPOT uses fewer parameters without fine-tuning to specific tasks and explores three paring settings. ----------- Weaknesses and limitations ----------- In this paper, the authors have investigated the knowledge-enhanced pre-trained models by considering both informative entity and relations representations in a relatively small model. I think the topic itself is interesting and I have the following concerns. 1) The first is about the structure and losses of this pre-train model. The authors introduce a hierarchical structure to build the three different encoders. The objective of such encoders is to find the corresponding pairs at the entity level. Though it makes sense in most cases, the author should consider counterexamples or boundary examples. For example, this model doesn't constrain the part of speech during pairing, which might propose incorrect entity. The loss seems just simply plus three losses together (shown in equation 13). If authors could add analysis on weights and contributions of each loss in ablation study, that might improve the performance of SPOT and make this model more interpretable. 2) The experiment results might be weak because the improvement of performance over different benchmarks is minor, less than 1-2%. I am not sure whether the authors should include the “entity representation” and “relation representation” in the comparison. As SPOT considers both of them, the final performance should be much better than previous methods (which only take one of them). The author should point out under what condition, SPOT is much better than other models. For now, the improvement is minor. 3) The representation visualization is also not clear. In Figure 2, the performance of the SPOT and LUKE are similar. In Figure 3, the example might be occasionally. The author might need to present more examples.