--======== Review Reports ========-- The review report from reviewer #1: *1: Is the paper relevant to Bigdata? [_] No [X] Yes *2: How innovative is the paper? [_] 5 (Very innovative) [_] 4 (Innovative) [X] 3 (Marginally) [_] 2 (Not very much) [_] 1 (Not) [_] 0 (Not at all) *3: How would you rate the technical quality of the paper? [_] 5 (Very high) [_] 4 (High) [X] 3 (Good) [_] 2 (Needs improvement) [_] 1 (Low) [_] 0 (Very low) *4: How is the presentation? [_] 5 (Excellent) [_] 4 (Good) [X] 3 (Above average) [_] 2 (Below average) [_] 1 (Fair) [_] 0 (Poor) *5: Is the paper of interest to Bigdata users and practitioners? [_] 3 (Yes) [X] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [_] 2 (High) [X] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 5 (Strong Accept: top quality) [X] 4 (Accept: a regular paper) [_] 3 (Weak Accept: could be a poster or a short paper) [_] 2 (Weak Reject: don't like it, but won't argue to reject it) [_] 1 (Reject: will argue to reject it) [_] 0 (Strong Reject: hopeless) *8: Detailed comments for the authors The authors propose self-supervised representation learning for sentence incoherence in documents. They use sentence embeddings from MPNet, which is based on transformers. For self-supervised learning, they use contrastive loss. The first loss uses the previous and next sentence as positive samples, and a sentence from another document as a negative sample. For the second loss, they build a graph with each node as a sentence, and edges to another sentence within a window size of k. GNN is used to learned an embedding. The second loss uses neighboring nodes/sentences as positive samples, and sentences in other documents as negative samples. The third loss tries to match from two embeddings (MPNet and GNN)--embeddings from the same sentence is a positive pair, embeddings from different documents form negative pairs. The incoherence score is how similar the two embeddings (MPNet and GNN) are. For evaluation, they compare with 11 existing methods over 4 datasets. Empirical results indicate the proposed method generally outperform the others. Parameter sensitivity analyses were performed. The experiments could have included more neural-based methods. ======================================================== The review report from reviewer #2: *1: Is the paper relevant to Bigdata? [X] No [_] Yes *2: How innovative is the paper? [_] 5 (Very innovative) [_] 4 (Innovative) [_] 3 (Marginally) [X] 2 (Not very much) [_] 1 (Not) [_] 0 (Not at all) *3: How would you rate the technical quality of the paper? [_] 5 (Very high) [_] 4 (High) [_] 3 (Good) [X] 2 (Needs improvement) [_] 1 (Low) [_] 0 (Very low) *4: How is the presentation? [_] 5 (Excellent) [_] 4 (Good) [_] 3 (Above average) [X] 2 (Below average) [_] 1 (Fair) [_] 0 (Poor) *5: Is the paper of interest to Bigdata users and practitioners? [_] 3 (Yes) [X] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [X] 2 (High) [_] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 5 (Strong Accept: top quality) [_] 4 (Accept: a regular paper) [_] 3 (Weak Accept: could be a poster or a short paper) [_] 2 (Weak Reject: don't like it, but won't argue to reject it) [X] 1 (Reject: will argue to reject it) [_] 0 (Strong Reject: hopeless) *8: Detailed comments for the authors Summary This paper introduces IText, a contrastive self-supervised model for detecting incoherent sentences in documents. The model uses two views of each sentence: (1) semantic content via a sentence transformer (MPNet), and (2) neighborhood context via a Graph Attention Network (GAT). Three contrastive loss functions are used to align these views and detect incoherence. The model is evaluated on four datasets (CS Abstracts, Wikinews, Simple Wikipedia, CS Introductions), outperforming various baselines, including anomaly detection, graph-based, and multi-view models. Ablation studies validate the importance of each component. Strong points - Addresses a real-world, high-impact industrial problem. - Extensive evaluation on a large-scale dataset. - The paper is well organized and easy to follow. Weak points - The paper constructs a graph where nodes are sentences and edges are based on local context. However, it is not clearly justified why this graph structure is appropriate for modeling coherence. Coherence is a discourse-level phenomenon, and modeling it as a local graph of sentence proximity is a weak assumption. The paper does not explain why GNNs are a better fit than simpler sequential or transformer-based models. - The introduction fails to clearly articulate what prior work has done on incoherence detection and what specific limitations this paper addresses. There is no discussion of existing methods that attempt similar tasks (e.g., sentence ordering, coherence scoring) and how this work improves upon them. - The model architecture is largely a combination of off-the-shelf components: MPNet for sentence embeddings, Graph Attention Networks (GAT), and SimCSE-style contrastive loss. There is little novelty in the modeling approach beyond combining these components, and no ablation to show whether the GAT is necessary or beneficial. - The assumption that sentences within a small context window should have similar embeddings is not well-supported. In many real-world texts, adjacent sentences can be topically or stylistically distinct. The link between contrastive loss and coherence is not clearly established. The paper assumes that alignment between content and context embeddings implies coherence, but this is not theoretically or empirically justified. - The paper defines an “incoherence score” based on the cosine similarity between two projected embeddings (Eq. 9), but it is unclear why this metric captures coherence. There is no validation (e.g., human annotation or qualitative analysis) to show that low similarity indeed corresponds to incoherence. - Treating incoherent sentences as anomalies is conceptually problematic. Incoherence is not necessarily rare or out-of-distribution—it can be subtle, stylistic, or context-dependent. This framing leads to questionable evaluation choices and undermines the validity of the experimental setup. - The evaluation relies on artificially injecting incoherent sentences into documents. This setup may not reflect real-world incoherence and could bias the model toward detecting out-of-place semantics rather than true discourse-level inconsistency. No human evaluation or real-world incoherent data is used, which limits the credibility of the results. ======================================================== The review report from reviewer #3: *1: Is the paper relevant to Bigdata? [X] No [_] Yes *2: How innovative is the paper? [_] 5 (Very innovative) [X] 4 (Innovative) [_] 3 (Marginally) [_] 2 (Not very much) [_] 1 (Not) [_] 0 (Not at all) *3: How would you rate the technical quality of the paper? [_] 5 (Very high) [X] 4 (High) [_] 3 (Good) [_] 2 (Needs improvement) [_] 1 (Low) [_] 0 (Very low) *4: How is the presentation? [_] 5 (Excellent) [X] 4 (Good) [_] 3 (Above average) [_] 2 (Below average) [_] 1 (Fair) [_] 0 (Poor) *5: Is the paper of interest to Bigdata users and practitioners? [_] 3 (Yes) [X] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [_] 2 (High) [X] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 5 (Strong Accept: top quality) [_] 4 (Accept: a regular paper) [X] 3 (Weak Accept: could be a poster or a short paper) [_] 2 (Weak Reject: don't like it, but won't argue to reject it) [_] 1 (Reject: will argue to reject it) [_] 0 (Strong Reject: hopeless) *8: Detailed comments for the authors Summary: - For the problem of understanding and detecting incoherence, this paper focused on the task of detecting sentences that are inconsistent with the overall document context. Question - Can we have more details on model training/eval? - Most of the experiment results are of small model setup, wonder do we have experiment results for larger model/datasets. ======================================================== The review report from reviewer #4: *1: Is the paper relevant to Bigdata? [X] No [_] Yes *2: How innovative is the paper? [_] 5 (Very innovative) [X] 4 (Innovative) [_] 3 (Marginally) [_] 2 (Not very much) [_] 1 (Not) [_] 0 (Not at all) *3: How would you rate the technical quality of the paper? [_] 5 (Very high) [_] 4 (High) [X] 3 (Good) [_] 2 (Needs improvement) [_] 1 (Low) [_] 0 (Very low) *4: How is the presentation? [_] 5 (Excellent) [X] 4 (Good) [_] 3 (Above average) [_] 2 (Below average) [_] 1 (Fair) [_] 0 (Poor) *5: Is the paper of interest to Bigdata users and practitioners? [_] 3 (Yes) [X] 2 (May be) [_] 1 (No) [_] 0 (Not applicable) *6: What is your confidence in your review of this paper? [X] 2 (High) [_] 1 (Medium) [_] 0 (Low) *7: Overall recommendation [_] 5 (Strong Accept: top quality) [X] 4 (Accept: a regular paper) [_] 3 (Weak Accept: could be a poster or a short paper) [_] 2 (Weak Reject: don't like it, but won't argue to reject it) [_] 1 (Reject: will argue to reject it) [_] 0 (Strong Reject: hopeless) *8: Detailed comments for the authors The paper deals with the coherence detection of sentences within the entire document. The idea is simple -- to compare the learned presentations of the sentence and the presentations of its context (neighboring sentences). The authors have proposed a "complex model" based on graph neural networks and language models, and demonstrate its effectiveness on real-world datasets. The proposed self-supervised approach is opposed to traditional supervised approaches, where graph neural network and language model support multi-view learning in a self-supervised manner. Overall, the paper is well organised and presented. The work appears novel to me. It may need further clarification how self-learning using sentence contexts is compatible with the incoherence detection task. During contractive, models are trained based on the assumption that adjacent sentences should be similar; while in the detection task, we are trying to identify the sentences dissimilar to its neighboring sentences. How do we know the dissimilar sentences are not forced to have similar presentations during contractive learning? ========================================================