===== REVIEWS ===== Reviewer #1 Questions 1. Where does your core expertise lie? Clinical 2. Please provide a SHORT SUMMARY of the paper, in which you briefly describe its main contributions and its context to relevant work. In this article, the authors describe the labor-intensive task of monitoring fertilized embryos for viable transfer and a new method for automatic embryo staging using time-lapse EmbryoScope videos. They go on to describe how they utlized convolutional neural networks to 1) crop the images, and 2) effectively label each embryo with its developmental stage. 3. Please comment on the METHODOLOGICAL SIGNIFICANCE of this paper. All MLHC papers should exhibit some level of machine learning sophistication. If the paper is more methodologically focused, are the methods sound? If there are claims of novelty, are these claims substantiated? If the paper utilizes complex models, are they non-gratuitous and warranted? If the paper is more clinically focused, does it at least extend an existing approach or apply machine learning in a new way or relevant clinical setting? Are there adequate comparisons to existing work? The authors exhibit sound methodology that takes advantage of existing methods while introducing novel elements that make sense for the problem they are trying to solve. 4. Please comment on the HEALTHCARE SIGNIFICANCE of the paper. All MLHC papers should address a real, clinically relevant problem in a thoughtful way. If the paper is more methodologically focused, what potential does it have to address scientific questions in healthcare? If the paper is more clinically focused, does it demonstrate impact on a real question regarding patient health or our scientific understanding of health? Is the experimental design appropriate? This work clearly addresses the first part of the clinical problem they hope to eventually address. They demonstrate that their work can effectively stage the development of an embryo, but it doesn't assess the viability of a given embryo, which is the end goal of the clinician and patient to maximiaze chances for a baby. 5. If possible, list at least two of the paper's STRENGTHS and two of the paper's WEAKNESSES (but do not feel obliged to fill a quota). The paper is well written and clearly frames a problem, approach, and results. They also clearly identify the limitations of their own approach. However, I'm afraid the paper itself should emphasize more that this represents the first of a two-stage approach to addressing the clinical problem. 6. Please comment on OTHER aspects of the paper, including clarity, presentation, quality of writing, etc. The quality of the writing is the best among the submissions this reviewer has received. The authors strike an effective balance between an appeal to data scientists and clinicians, which I believe MLHC seks to achieve. 7. How would you best describe the TECHNICAL DEPTH of this paper? Simple and interesting/useful Reviewer #2 Questions 1. Where does your core expertise lie? Computational 2. Please provide a SHORT SUMMARY of the paper, in which you briefly describe its main contributions and its context to relevant work. This paper performs automated embryo staging on time-lapse Embryoscope video, taken at roughly fifteen-minute intervals, beginning about 18 hours after IVF fertilization. Embryo staging is important to improve IVF implantation rate, thus reducing costs and patient discomfort. For the purposes of this study, only the first six embryo stages are focused on, due to selection heuristics generally depending only on these stages, and also due to incomplete data for later stages. Various extensions to a simple ResNet classifier are applied, beginning with weakly-supervised segmentation, and then LSTM and dynamic programming (DP) processing of frame sequences. Results from Table 2 suggest that the greatest benefit arose from proper segmentation of the embryo from the background, with slight improvements from LSTM/DP. 3. Please comment on the METHODOLOGICAL SIGNIFICANCE of this paper. All MLHC papers should exhibit some level of machine learning sophistication. If the paper is more methodologically focused, are the methods sound? If there are claims of novelty, are these claims substantiated? If the paper utilizes complex models, are they non-gratuitous and warranted? If the paper is more clinically focused, does it at least extend an existing approach or apply machine learning in a new way or relevant clinical setting? Are there adequate comparisons to existing work? This paper begins with a standard ResNet-50 classifier as a baseline, before a logical progression through segmentation and sequence analysis in an effort to further improve performance, thereby justifying the increased complexity. Patient-level stratification was properly implemented, with a 93/10/10 split for the training, validation and test sets respectively. In general, each additional extension was well-motivated and empirically tested; for example, the performance of the embryo detection and segmentation model was tested on a small manually-annotated dataset of embryo centres. The discovery that most of the final performance gain came from segmentation, however, unavoidably raises the question of whether a reinforcement learning solution for segmentation is truly required, when embryo examples (e.g. in Figure 3) suggest that classical computer vision methods might also work. In any case, it is not clear whether well-known augmentation techniques (e.g. rotation, minor scaling/distortion) were employed here, especially given the relatively low number of independent cases (510 embryos in the training set). For instance, while it is stated that only "the best" region is chosen at test time, general practice to further improve classification performance has often been to average multiple "good enough" crops. While LSTM to include some temporal context provides a roughly 2% boost without DP, it appears that the monotonicity enforcement of DP is relatively more important, directly leveraging as it has on the developmental property of embryos. The importance of DP is reflected in Figure 5, especially in the leftmost chart that exhibits much confusion between the final three stages, by the initial classifier. Taken together, this constitutes a well-motivated pipeline for embryo staging. We might however suggest also investigating the effect of boosting the initial classification performance (e.g. input augmentation, ensembling, using newer architectures, including time from fertilization as an input feature), and visualizing attributions with saliency methods especially for misclassifications. 4. Please comment on the HEALTHCARE SIGNIFICANCE of the paper. All MLHC papers should address a real, clinically relevant problem in a thoughtful way. If the paper is more methodologically focused, what potential does it have to address scientific questions in healthcare? If the paper is more clinically focused, does it demonstrate impact on a real question regarding patient health or our scientific understanding of health? Is the experimental design appropriate? Given the development of automated video microscopy for embryos, automated staging is a natural next step. The authors further recognize that staging is only an intermediate objective, and that the ultimate goal would be towards assessing embryo implantation viability directly. An angle that has been less explored is that of the human labels, which appear to have been obtained from a single embryologist for each patient/embryo. Does there exist any data/literature on expected variability between embryologist annotations? Another issue that may be clarified is the mention of t4+ on page 4, which does not appear in the charts (the stage after t4 is t5 instead). 5. If possible, list at least two of the paper's STRENGTHS and two of the paper's WEAKNESSES (but do not feel obliged to fill a quota). Strengths - Rigorous experimental design - Well-justified rationale for each modelling stage - Comprehensive review both of the relevant computational and medical literature Weaknesses - Complexity of initial segmentation stage possibly not strictly necessary - Relatively shallow optimization of base classifier 6. Please comment on OTHER aspects of the paper, including clarity, presentation, quality of writing, etc. The presentation is generally clear. 7. How would you best describe the TECHNICAL DEPTH of this paper? Technically complex and interesting/useful Reviewer #3 Questions 1. Where does your core expertise lie? Computational 2. Please provide a SHORT SUMMARY of the paper, in which you briefly describe its main contributions and its context to relevant work. The paper proposes a novel method for automated embryo staging that exploits prior knowledge of location of the embryo in the videos without using ground truth ROI outlines and a dynamic programming based approach to post process predictions. The method performs better that existing methods in terms of accuracy and transition predictions while using a smaller amount of hand labeled data. 3. Please comment on the METHODOLOGICAL SIGNIFICANCE of this paper. All MLHC papers should exhibit some level of machine learning sophistication. If the paper is more methodologically focused, are the methods sound? If there are claims of novelty, are these claims substantiated? If the paper utilizes complex models, are they non-gratuitous and warranted? If the paper is more clinically focused, does it at least extend an existing approach or apply machine learning in a new way or relevant clinical setting? Are there adequate comparisons to existing work? To solve the proposed problem, the method presented contains the following components: (1) learning the ROI from weak supervision, (2) incorporating temporal context of the video into the model, and (3) post processing using dynamic programming to generate the most likely sequence of embryo stages. The stages are clearly described and fairly novel: for example, it would have been easier to use existing annotated embryo ROIs to train a vision model, but the authors propose a weakly supervised approach. Next, the temporal structure of the video is taken into consideration when predicting stages as well as for post processing. The method itself consists of two machine learning algorithms and one stage that utilizes dynamic programming to improve the results of the ML model, which is very interesting and can be applied to other time series models as well. The results support the success of the overall method as well as the individual components themselves. 4. Please comment on the HEALTHCARE SIGNIFICANCE of the paper. All MLHC papers should address a real, clinically relevant problem in a thoughtful way. If the paper is more methodologically focused, what potential does it have to address scientific questions in healthcare? If the paper is more clinically focused, does it demonstrate impact on a real question regarding patient health or our scientific understanding of health? Is the experimental design appropriate? Currently, embryologists perform analysis of the time lapse video of embryo development manually and annotating time stamps for different developmental milestones. These annotations are then used to rank the embryos for transfer viability. The success of the method presented can help automatically detect the transitions instead of requiring human annotation, thus speeding up this process significantly. To my best knowledge, this seems like a clinically relevant problem that can be solved by the presented method. 5. If possible, list at least two of the paper's STRENGTHS and two of the paper's WEAKNESSES (but do not feel obliged to fill a quota). + The paper introduces a clinically relevant problem and uses a sophisticated, three stage method to automate a manual labeling process + The method achieves a 90.23% frame accuracy, which outperforms existing methods in literature - The analysis could have been stronger by describing how this approach could be generalized to other video datasets and time series data in general with priors on the labels 6. Please comment on OTHER aspects of the paper, including clarity, presentation, quality of writing, etc. The paper is very well written and presented. 7. How would you best describe the TECHNICAL DEPTH of this paper? Technically complex and interesting/useful