Reviewers' comments: Editor's Note: You have addressed some of Reviewer #1's previous comments through changes in the paper. In two cases, however, you only provided a response in your cover letter (see below). Please modify the paper so that readers (who will not see your cover letter) will not raise teh same issue. Comment #4 4. Classification - There are three different classes of notes in the paper. The most important type is the third one - "similar notes" (I assume that contains the copied and pasted paragraphs for the same patient - eg. Progress notes). Explain more how JS threshold has been selected as that threshold determines the coverage of such cases. Your response: "Similar notes" is a broad classification category which includes any notes with a JS greater than the given threshold and that were not considered exact notes or common output notes. Basically, if a "similar" pair (based on the selected JS) is not an exact copy or a common output note (i.e. JS == 1.0 but not same patient and/or time), it is defaulted to be considered a "similar note". We used various JS thresholds (from 0.4 to 1.0) in our experiments to be considered "similar notes". In our manuscript, there is no definition of what threshold should be considered similar, in any case. Comment #6: 6. The authors claim "Among all JS thresholds - no clusters contained pairs of notes that were incorrectly clustered (false positive) with JS below the 5% allowable level". For exact copies I agree with the comment but for overlaps (highly similar documents) please explain how this was measured as it sounds too optimistic to be true. Your response: Based on our algorithm, by definition, no two documents in a set could have a JS less than the defined threshold. Per our methods, during the clustering step, candidate pairs are compared and the JS calculated. If they do not meet criteria, then they are not placed in the same set. Therefore, we would expect a very low false positive rate as we did for all similarity indices. Reviewer #2: The authors claimed that "This work attempts to address note-level duplication versus event-level duplication. Though the presence of note-level duplication is well-known, few studies provide quantitative and large-scale measures of extent of this issue. This manuscript represents a step toward understanding its scale, the potential consequences, and scalable solutions to the problem."" However, there have been many publications (in the past decade) reporting the scale in note duplication, reasons and possible solutions and impact. One recent example as stated in the initial review (UCSF) is "UCSF Medical Center study: Clinicians copy and paste nearly half of EHR progress notes". https://www.fiercehealthcare.com/ehr/clinicians-copy-and-paste-nearly-half-ehr-progress-notes?utm_medium=nl&utm_source=internal&mrkid=715016&mkt_tok=eyJpIjoiWXpJMU5XRmlOVEZpTjJWayIsInQiOiJheEdKbXo4bmt4U1loY1lUYmhcL3kzVHJVTHV5cE9PNTd0ZUJ4NUVPYVJJZTlZZFR1THRRWFg5NEdCS1JjMzNjMmhkOWhzNmhvNWJNOXpuOFB4VFJ3ckI3d1F6Zk03MmswcDVKRjVrcHdYS3Z4dHJPMGE2YjAxdlhcL1wvTFwvajJpTDMifQ%3D%3D