Congratulations! We would like to inform you that your paper The NES Music Database: A symbolic music dataset with expressive performance attributes (Paper #265) has been scheduled for presentation at ISMIR 2018. Each paper was reviewed by at least 4 reviewers including a member of the Program Committee. The selection process was very competitive and we could only choose a limited number of the submitted papers to appear on the program. As opposed to previous ISMIR editions, all accepted papers will have *BOTH AN ORAL AND A POSTER* presentation. The reason for this is to favor the visibility (oral presentations) of all accepted papers while keeping a large space for discussions between authors and the audience (poster presentations). Detailed information for presenters can be found at: http://ismir2018.ircam.fr/pages/call-information-presenters.html IMPORTANT: ISMIR 2018 will be in parallel to the Paris Fashion Week. The Organizing Committee strongly recommends that participants book accommodation in advance. In this email you will find the following information: 1) Camera-Ready Submission Instructions (Due June 15th) 2) Author Registration Deadline (July 6th) 3) Visa Application Information 4) Registration and Accommodation Information (Early Registration Deadline July 6th) 5) Travel Grants 6) Additional Ways to Participate (LBD, Music, HAMR, DLfM, WoRMS, WiMIR Workshop) 7) Reviews for your Paper All the best, Emmanouil Benetos Emilia Gómez Xiao Hu Eric Humphrey Program Co-Chairs, ISMIR 2018 ===== 1) Camera-Ready Submission ===== Your next step is to prepare a camera-ready manuscript by: *** Camera-Ready Deadline: Friday, June 15th *** You should de-anonymize your papers, add acknowledgements where appropriate, and make updates based on the feedback from the reviewers (see below). Reviewers have worked hard to provide feedback that will improve your paper and maximize the impact of your work! Please take some time to incorporate their comments into your camera-ready version. When revising your papers, you can make changes to the paper title, if recommended by the reviewers. The author order cannot change and should be the same as the one entered in the submission system for the original submission. Please make sure that all fonts are embedded in the PDF document, and that it is not password-protected. Papers which are not formatted according to the ISMIR 2018 templates (including excessive use of \baselinestretch) or papers that exceed the 6+n page length requirement will not be accepted for publication in the proceedings. To submit your camera-ready manuscript, you can click on the following URL which will take you directly to a form to submit your final paper: https://www.softconf.com/h/ismir2018/cgi-bin/scmd.cgi?scmd=aLogin&passcode=265X-B9J7D6B7B6 We will notify you about the specific date and time of your presentation at a later date. ===== 2) Author Registration Deadline ===== Accepted papers must be presented at the conference by one of the authors and at least one of the authors must register by the Author Registration deadline: *** Author Registration Deadline: Friday, July 6th *** Registration will open on 30th May. Failure to register before the deadline will result in automatic withdrawal of your paper from the conference proceedings and program. More information on registration is listed at: http://ismir2018.ircam.fr/pages/participants-registration.html ===== 3) Visa Application Information ===== The Conference website has a page with information about visa application: http://ismir2018.ircam.fr/pages/participants-travel.html Additional visa information will be posted in the above URL on 30th May. ===== 4) Registration and Accommodation===== The ISMIR 2018 web site has recently been updated with information about registration: http://ismir2018.ircam.fr/pages/participants-registration.html *** Early Registration Deadline (for lower rates): Friday, July 6th *** and information about accommodation: http://ismir2018.ircam.fr/pages/participants-accommodations.html ===== 5) Financial Support ===== ISMIR, the Women in MIR (WiMIR) initiative and local organizers are proud to announce a number of financial support opportunities for students and other members of the community wishing to attend the conference. Awards are granted based on the quality of the accepted submission, the degree of financial need, the applicant’s newness to ISMIR, and the applicant’s geographical diversity. The types of awards are: Student Author Grants - available for first or supporting authors of an accepted full paper who were students at the time of paper submission. Women in MIR (WiMIR) Grants - are being offered thanks to the generous support of industry partners to female first or supporting authors of accepted full papers, as well as female first authors of accepted late-breaking demo (LBD) submissions. Applicants do NOT need to be students to apply for the WiMIR Award. Unaffiliated Author Grants - will be considered on a case-by-case basis. If you have an accepted ISMIR paper; were not a student when you submitted the paper; and will not be provided conference support by your employer you may indicate so in the comments section of the application. *** Financial Support Application Deadline: Friday, June 29th *** The application form can be found at: https://goo.gl/forms/jyZ3iecMF2TdFRYX2 Notifications will be sent out on Thursday, July 5th. ===== 6) Additional Ways to Participate ===== You may be interested in contributing to one of our related events: * 5th International Conference on Digital Libraries for Musicology (DLfM) https://dlfm.web.ox.ac.uk/ Submission deadline: 15th June * 1st International Workshop on Reading Music Systems (WoRMS 2018) https://sites.google.com/view/worms2018 Submission deadline: 15th July * HAMR (Hacking Audio and Music Research) Hackathon: 21-22 September @ Deezer * The New Shape of Audio Branding Workshop: 20th September @ IRCAM (information will be posted at the ISMIR 2018 website) * Interactive Machine Learning for Music Exhibition http://ismir2018.ircam.fr/pages/important-dates.html Submission deadline: 29th June * Women in MIR (WiMIR) 1st Annual Workshop https://wimir.wordpress.com/2018/05/21/wimir-1st-annual-workshop/ Submission deadline: 15th August * Late-Breaking and Demo Session Submissions Open: 16th July to 22nd September More information about all of these events can be found on the conference website: http://ismir2018.ircam.fr ===== 7) Reviews for your Paper ===== ============================================================================ ISMIR 2018 Reviews for Submission #265 ============================================================================ Title: The NES Music Database: A symbolic music dataset with expressive performance attributes Authors: Chris Donahue, Huanru Henry Mao and Julian McAuley ============================================================================ REVIEWER #1 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Scholarly/scientific quality: medium low Novelty: medium low Relevance of topic: high Importance: high Readability and paper organisation: medium high Title and abstract: no Bibliography: no Make Review Publicly Accessible: Yes --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- There is an important place in ISMIR for these kinds of papers on datasets. A proper review of the literature before writing on this kind of paper in ISMIR would have aided the submission greatly. This more library science than MIR, and the paper should reflect this. Papers on the datasets structure and access mechanisms, along with how the dataset was collected and metadata in the database assigned date back to 2005, winning best student paper in 2006. Papers in more recent years have focused on special topics in ethnomusicology and other domain specific datasets with special cultural or ethnic challenges. How its built, how its organized, how its presented, how its available. All of these are critical. Also, how are legal challenges of copyright handled - NES games are the most legally contentious intellectual property on earth with lawsuits on ownership stretching into decades. I'm not seeing any of these in this paper. ============================================================================ REVIEWER #2 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Scholarly/scientific quality: medium low Novelty: high Relevance of topic: high Importance: medium high Readability and paper organisation: medium low Title and abstract: no Bibliography: no Make Review Publicly Accessible: Indifferent --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- This paper addresses something I think is both fascinating and of interest to the community – the possibility of using the constrained compositional parameters of the Nintendo Entertainment System’s audio processor as a test bed for automated composition, including performance qualities. This is an inventive project and is very much relevant to ISMIR’s milieu. However, I had serious difficulties understanding what was being done to create and evaluate the baselines in Section 4, which comprises much of the heart of the paper, and believe this needs to be addressed before the paper is ready for publication. As I understand the abstract and conclusion, the contributions of the paper are (1) a novel database consisting of symbolic-score representations of NES soundtrack pieces that can be used for machine-learning purposes, (2) a conversion tool that allows machine-generated music to be played through the NES audio processor, and (3) a set of baselines for evaluating compositional algorithms (and compositional/expressive algorithms) that use the database. (2) does not appear in the abstract, but it is in the introduction and conclusion. One thing I had wondered is whether the baselines are themselves useful as compositional/performance models, and whether this was an additional contribution. You cite the recent DeepBach project, a generative model for automatic composition of Bach chorales, and I agree that its compositional constraints make it a good fit for training on this dataset. You compare the DeepBach model against your own baselines, which are carrying out the tasks of separated composition and expressive performance, and which are occasionally referred to as models. So I’m confused as to whether the baselines are just serving as starter algorithms against which composer-coders can evaluate the products of their own compositional algorithms, or whether some of the baselines were actually adept at composing somewhat convincing (if imperfect, as you note in 4.1) NES music themselves. The section on background and task description is clear and easy to follow. The dataset description is also reasonably clear. I was curious to know how much musical material was being left out by the MIDI frequency range constraint – does the triangle generator often sound below MIDI note 21 in actual NES compositions, and do the three melodic voices sound above MIDI note 108? How would this affect evaluation – if you have a piece in the dataset with notes above MIDI note 108, is that piece eliminated from the dataset? Are all the notes above 108 shunted down to sound at 108? Are they silenced at 0? I have a lot of trouble parsing your evaluation criteria. Does Table 3 indicate that DeepBach consistently outperforms your baselines, and so validate DeepBach as a more sophisticated compositional algorithm than the baselines? I think it would be helpful to clarify what Tables 3 and 4 are telling us for a wider readership. The beginning of Section 4 defines negative-log likelihood, and it appears that a lower NLL is better, but I’m missing an explanation of what better performance is telling us. At first, I assumed this is a check to ensure that the model isn’t just reproducing existing music in the dataset (why else would a low score be better, given the definition on p. 3?), but after spending some time looking into how NLLs are used, I see that a low NLL is an indication of high confidence that the thing being examined is a good fit within the dataset. I don’t know that it’s fair to presume your readership understands this without explanation. In the beginning of Section 4, accuracy is defined as “the proportion of timesteps where a model’s prediction is equal to the groundtruth label,” but I can’t tell how this ground truth was established; it’s not mentioned anywhere else in the paper. How was this statistic actually compiled? The beginning of Section 4 states that you will report NLL and accuracy for each voice of a model, as well as a set of aggregate statistics across all voices: summing for NLL, and averaging for accuracy. Tables 3 and 4, however, each report *two* sets of aggregate statistics – at POIs, and globally. The “Aggregate – POI” statistics correspond to summing for NLL and averaging for accuracy, and POIs aren’t mentioned until the next paragraph of Section 4. From the language of the first paragraph of Section 4, I expected that the “Aggregate – all” columns in Tables 3 and 4 would have the summing for NLL and averaging for accuracy scores, but “Aggregate – all” is some different statistic, and I’m not sure how the “Aggregate – all” scores were created. In general, I found interpretation of these results very difficult. I find the explanation of the recurrent neural network training in Section 4.1 hard to follow (though I admit I am not an expert in this area). There’s some jargon (LSTM, softmax, unrolling, backpropagation) that goes unexplained, and while I can see some of these are standard techniques in machine learning (perhaps they are terms so common in this area that they do not require additional explanation), the second paragraph of Section 4.1 is well-nigh inscrutable to the uninitiated. Portions of Section 4.2 are similarly impenetrable. Table 5 is not referenced in the text at all – I imagine this is intended to be anchored to Section 4.3, where performance evaluation of different models is discussed, or to Section 5, where some of the compositional datasets are introduced. The structure of the last page or so of the paper seems quite confusing to me. Table 5 compares results against different separated-score/blended-score datasets, but the only one explained by that point in the text is the Bach dataset, and Section 4.3 does not say anything about them specifically; that’s left until the next section. Section 5 is titled “related work” and discusses prior datasets as well as prior literature, but it’s at the end of the paper, which is an unusual choice. I guess there’s nothing requiring that reviews of prior work be early in papers, but I’m not seeing why this format is advantageous over moving the literature review to early on in the paper, and discussing the other datasets just prior to presentation of Table 5 and the prose evaluation in 4.3. The references are not formatted according to ISMIR guidelines and are incomplete – I would not be able to find many of these via browsing rather than searching, since page numbers are omitted and some journal/conference names are overabbreviated (e.g., I don’t know what NC is in reference 15). A few minor copyediting suggestions - p. 1: “stylistically-cohesive” does not need to be hyphenated; p. 2, “more-closely” does not need to be hyphenated. In the section on Points of Interest, the last sentence should read “This evaluation criterion…”, not criteria. On p. 4, the sentence containing the phrase “however the four-voice structure”: the word “however” feels a little out of place – either “and” or “but” would probably work here, depending on whether you expect your reader to be surprised about the correspondence. Also, I think the last sentence of the abstract would be better placed before “We establish baselines”. ============================================================================ REVIEWER #3 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Scholarly/scientific quality: medium high Novelty: medium high Relevance of topic: medium high Importance: medium high Readability and paper organisation: medium low Title and abstract: yes Bibliography: yes Make Review Publicly Accessible: Indifferent --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- The paper presents a new symbolic music dataset of video game music from the Nintendo Entertainment System corpus. It consists of the score of multi-instrumental pieces and expressive attributes such as dynamics and timbre changes for each voice. The description of the dataset's format is detailed. Also the dataset will be publicly available (together with supporting scripts) after publication. The experiments presented in the paper are well-explained and provide many insights of how to explore further directions of the dataset usage. Below more detailed comments are provided: a) The novelty of providing such dataset is well supported and the arguments of its usefulness especially in music generation tasks are strong. However I miss more details and analysis about the content of the dataset in a musicological perspective; this would strengthen its impact. For example, it is mentioned that "as the result of the limited time period during which the music was composed, NES-MDB exhibits more stylistic consistency than other large datasets of multi-instrumental music"; It would be useful to report/analyse the stylistic characteristics of such corpus and report/analyse how this consistency can be formalised in terms of similarity metrics. b) I would expect Section 5 of related work to appear before deepening into the dataset details and the experiments, as whatever it is stated there is closer in concept to Introduction. c) In Introduction: Reference(s) missing on the statement: "The datasets that do contain expressive performance characteristics predominantly focus on solo piano rather than multi-instrumental music". Possibly this can be fixed by incorporating text from Related Work into this section. d) The list of datasets that contain expressive performance information is suggested to also include the additional references: (Ensemble Expressive Performance Dataset) [1] Marco Marchini, Rafael Ramirez, Panos Papiotis, and Esteban Maestre. The Sense of Ensemble: A Machine Learning Approach to Expressive Performance Modelling in String Quartets. Journal of New Music Research, 43(3):303–317, 2014. (Mazurka Project) [2] Craig Sapp. Comparative analysis of multiple musical performances. In Proceedings of the 8th International Conference on Music Information Retrieval, pages 497–500, 2007. If there is no space for these additions, I would eliminate a big part of Section 5.1 as references on related datasets as well as more detailed presentation of them are more relevant to the topic of the paper. e) Section 4.3 and 5.1 include the sentences "four specific datasets" and "four datasets", respectively when presenting related work: more details about these datasets need to be reported. Minor corrections: 1. It may seem obvious, but it would be good to introduce in text what 'n' means in Eq. 1, for clarity. 2. Fig. 1 should be changed in order to be readable in black and white. 3. There should be an "." after "Negative log-likelihood and Accuracy" in Section 4. 4. "Note+Auto" should be introduced in text before legend in Fig. 2. 5. Third paragraph of Section 4.2: should be "T_NO" and not "N_NO". 6. Reference [32] misses publisher's information (in this case, link to blog-post). ============================================================================ REVIEWER #4 ============================================================================ --------------------------------------------------------------------------- Reviewer's Scores --------------------------------------------------------------------------- Scholarly/scientific quality: high Novelty: high Relevance of topic: high Importance: medium high Readability and paper organisation: high Title and abstract: yes Bibliography: yes Make Review Publicly Accessible: Yes --------------------------------------------------------------------------- Comments --------------------------------------------------------------------------- I am the meta-reviewer for this paper Initial Review -------------- This paper describes a new dataset for expressive musical performance modeling using 5300 songs from 400 games developed for the Nintendo Entertainment System (NES). Because of the constraints of the NES audio synthesis system, the musical works consist of four monophonic synthesizers working in parallel, making them well suited to modeling and predicting with deep learning. They also contain meaningful variations in velocity and timbre, and predicting these features could be useful in modeling and predicting expressive musical performance. Experiments with several existing music composition systems show good results in terms of modeling a held out test set. Meta-review ----------- This was a contentious paper, with one reviewer very opposed to its publication, while others were more supportive. While we did not reach a consensus, I believe that it should be accepted for publication. The negative review cited a lack of metadata about the songs themselves and a lack of reproducibility, while other reviewers argued that the promised release of the code and dataset would make it quite reproducible and that while metadata might be necessary in proposing a general dataset for general MIR use, the tasks proposed here are well defined and the (meta)data included in the dataset is sufficient to train and evaluate algorithms on them. The borderline reviewer was less familiar with the tasks involved and how best to evaluate them. The positive reviewers were satisfied that the evaluation shows that the dataset can be used to successfully train and evaluate models and shows potential for use in automatic music generation.