View Reviews Paper ID26 Paper TitlePredicting Surgery Duration with Neural Heteroscedastic Regression REVIEWER #1 REVIEW QUESTIONS 1. Where does your core expertise lie? Computational 2. What type of paper is this? Full Paper 3. Please provide a SHORT SUMMARY of the paper, in which you briefly describe its main contributions and its context to relevant work. The paper studies the problem of surgery planning and proposes the use of MLP-based heteroscedastic models to predict surgery duration based on factors such as procedure and patient characteristics, location and performing surgeon. The baselines used for comparison are the mean duration (traditional), logistic regression and Laplace/ MLPs without heteroscedasticy. 4. Please comment on the METHODOLOGICAL SIGNIFICANCE of this paper. All MLHC papers should exhibit some level of machine learning sophistication. If the paper is more methodologically focused, are the methods sound? If there are claims of novelty, are these claims substantiated? If the paper utilizes complex models, are they non-gratuitous and warranted? If the paper is more clinically focused, does it at least extend an existing approach or apply machine learning in a new way or relevant clinical setting? Are there adequate comparisons to existing work? The methods introduced in the paper differ slightly from the HS neural regression of Lakshminarayanan, as the authors themselves point out. It would be useful if, in Section 5, the authors would explicitly list the differences. The main novelty comes from the appropriate use of such models in the context of the application. The use of the models is warranted for the applications and the changes are not overly complex. The selected contenders are also reasonable. 5. Please comment on the HEALTHCARE SIGNIFICANCE of the paper. All MLHC papers should address a real, clinically relevant problem in a thoughtful way. If the paper is more methodologically focused, what potential does it have to address scientific questions in healthcare? If the paper is more clinically focused, does it demonstrate impact on a real question regarding patient health or our scientific understanding of health? Is the experimental design appropriate? The authors provide a solid motivation for their work: accurate prediction of surgery time is important for the allocation of hospital resources. The currently used method falls short of providing an accurate estimate, leading to either underuse of equipment or delays in patient care. The reported results show that the HS models outperform the other baselines. Further analysis could be performed to determine how the models perform for different procedures – for instance: maybe some of the models are better suited to shorter surgeries. Also, is there a functional difference between the 3 proposed HS methods? 6. If possible, list at least two of the paper's STRENGTHS and two of the paper's WEAKNESSES (but do not feel obliged to fill a quota). Strengths: the work is well motivated by a shortcoming in the healthcare domain; heteroscedasticity is an appropriate setting in this case, due to the variations in procedure duration; improvement over standard way of predicting surgery time. Weaknesses: not clear which of the introduced methods should actually be used. 7. Please comment on OTHER aspects of the paper, including clarity, presentation, quality of writing, etc. Overall, the paper is clearly written and relatively easy to follow. Typo: page 2 ("the a parameters") 8. How would you best describe the TECHNICAL DEPTH of this paper? Simple and interesting/useful 9. What is your OVERALL RECOMMENDATION for this paper? strong accept REVIEWER #2 REVIEW QUESTIONS 1. Where does your core expertise lie? Clinical Computational 2. What type of paper is this? Full Paper 3. Please provide a SHORT SUMMARY of the paper, in which you briefly describe its main contributions and its context to relevant work. This paper describes an analysis of neural network architectures for predicting surgery duration. In particular, this work aims to evaluate the effect of heteroscedasticity and suggests how the results might apply to a decision analysis for operating room scheduling. 4. Please comment on the METHODOLOGICAL SIGNIFICANCE of this paper. All MLHC papers should exhibit some level of machine learning sophistication. If the paper is more methodologically focused, are the methods sound? If there are claims of novelty, are these claims substantiated? If the paper utilizes complex models, are they non-gratuitous and warranted? If the paper is more clinically focused, does it at least extend an existing approach or apply machine learning in a new way or relevant clinical setting? Are there adequate comparisons to existing work? The methodology of the paper is sound but not novel. The approach taken in this work is relatively standard with the explicit modeling of both parameters of two-parameter distributions. Being more clinically focused, this work is applied in a relevant clinical setting. 5. Please comment on the HEALTHCARE SIGNIFICANCE of the paper. All MLHC papers should address a real, clinically relevant problem in a thoughtful way. If the paper is more methodologically focused, what potential does it have to address scientific questions in healthcare? If the paper is more clinically focused, does it demonstrate impact on a real question regarding patient health or our scientific understanding of health? Is the experimental design appropriate? The healthcare significance of this work is real, but the impact of such a model on patients is not discussed. Improved operating room utilization could allow a facility to serve more people, decreasing the general wait time in the population, but this aspect was not discussed. The motivation for this work is discussed in a more abstract sense where costs of over-utilization and under-utilization are not specified, nor is it specified who would incur those costs. Also, the clinical significance of this work is difficult to evaluate given that the units of the evaluation are unclear. It is impossible that the current method has a mean absolute error of 28.87 hours, as the experiment description describes the labels as having a units of hours. If this is expressed in minutes (or hundredths of an hour), then it is unclear whether a difference of a few minutes between the best and worst methods is clinically relevant as it will not necessarily change scheduling. 6. If possible, list at least two of the paper's STRENGTHS and two of the paper's WEAKNESSES (but do not feel obliged to fill a quota). Two of the paper's strengths include a clear presentation of the work and an application of a heteroscedastic model where there is clearly heteroscedasticity. Another strength of the model is the post-hoc analysis of the drivers of procedure duration. This information could be used to identify targeted strategies for improving operating room efficiency. Two of the paper's weaknesses include an unclear presentation of the quantitative evaluation and an abstract discussion of the economic analysis that could have been grounded with more concrete motivations. 7. Please comment on OTHER aspects of the paper, including clarity, presentation, quality of writing, etc. The authors use the term long-tailed to describe the support of distributions, but this could be confusing as it is often used to describe the heaviness of tails (gamma, laplace, and gaussian are not examples of heavy-tailed distributions). The authors should use the term doctor consistently instead of switching between surgeon and doctor, or at least indicate that they will be used interchangeably. There is a lengthy introduction to loss functions that correspond to probabilistic output functions. This could be much more concise for the audience. Figure 2 could benefit from further discussion. The description of the first baseline as a linear regression with a single feature per procedure is not clear as it does not specify a loss function (which is important in this case) and that it is based on a pre-computed average is also unclear. For this and for other things (such as feature normalization) it should be specified that the normalization parameters and averages were computed based on the training set. The related work section could include a discussion of the following papers as well: ShahabiKargar, Zahra, et al. "Predicting Procedure Duration to Improve Scheduling of Elective Surgery." Pacific Rim International Conference on Artificial Intelligence. Springer International Publishing, 2014. Stepaniak, Pieter S., et al. "Modeling procedure and surgical times for current procedural terminology-anesthesia-surgeon combinations and evaluation in terms of case-duration prediction and operating room efficiency: a multicenter study." Anesthesia & Analgesia 109.4 (2009): 1232-1245. Master, Neal, et al. "Improving predictions of pediatric surgical durations with supervised learning." International Journal of Data Science and Analytics (2017): 1-18. 8. How would you best describe the TECHNICAL DEPTH of this paper? Simple and interesting/useful 9. What is your OVERALL RECOMMENDATION for this paper? weak accept REVIEWER #3 REVIEW QUESTIONS 1. Where does your core expertise lie? Computational 2. What type of paper is this? Full Paper 3. Please provide a SHORT SUMMARY of the paper, in which you briefly describe its main contributions and its context to relevant work. The paper propose a regression methods using MLP with different heteroscedastic loss functions for predicting the duration of a surgeon procedure. The prediction is based on a patient-based, doctor-based, procedure-based and other contextual features. The different loss functions models are derived from Gaussian, Laplacian, and Gamma observation noise models. 4. Please comment on the METHODOLOGICAL SIGNIFICANCE of this paper. All MLHC papers should exhibit some level of machine learning sophistication. If the paper is more methodologically focused, are the methods sound? If there are claims of novelty, are these claims substantiated? If the paper utilizes complex models, are they non-gratuitous and warranted? If the paper is more clinically focused, does it at least extend an existing approach or apply machine learning in a new way or relevant clinical setting? Are there adequate comparisons to existing work? The contribution of the paper is to provide both the predicted output and the uncertainty using a (single hidden layer) MLP: 5. Please comment on the HEALTHCARE SIGNIFICANCE of the paper. All MLHC papers should address a real, clinically relevant problem in a thoughtful way. If the paper is more methodologically focused, what potential does it have to address scientific questions in healthcare? If the paper is more clinically focused, does it demonstrate impact on a real question regarding patient health or our scientific understanding of health? Is the experimental design appropriate? The evaluation of the healthcare significance is beyond my expertise, but the experimental design seems to be appropriate. 6. If possible, list at least two of the paper's STRENGTHS and two of the paper's WEAKNESSES (but do not feel obliged to fill a quota). Strengths: - The heteroscedastic models and the estimation of the uncertainty of the estimate Weaknesses: - Except for the heteroscedasticity, the method is not novel and the approach is very simple 7. Please comment on OTHER aspects of the paper, including clarity, presentation, quality of writing, etc. The paper is well organized and clearly written 8. How would you best describe the TECHNICAL DEPTH of this paper? Simple and interesting/useful 9. What is your OVERALL RECOMMENDATION for this paper? weak accept