Reviewer 1: Summary Of Contributions: The paper surveyed methods, data modalities and applications of a newly emerging machine learning area called data distillation. The goal of data distillation is to synthesize small-scale data summaries from a large-scale dataset, which can act as drop-in replacement of the original dataset in training or inference scenarios. The author first proposed the concept of \epsilon-approximate data summary and defined data distillation as an optimization problem. Existing data distillation algorithms can be encapsulated in the framework as different ways of solving the optimization problem. The author grouped data distillation algorithms into five categories in the survey: 1) model matching, 2) gradient matching, 3) trajectory matching, 4) distribution matching, and 5) factorization. Afterwards, the author discussed the modalities that existing algorithms have been applied on, including image, text, graph and recommender systems. The author also introduced the applications that can benefit from the high-fidelity data summary, including differential privacy, neural architecture search, federated learning. In the end, the author briefly discussed the challenges in this area. The paper is well-written, the unified framework is neat and the survey has included most of the recent publications. Strengths And Weaknesses: Strengths The author proposed a unified framework for data distillation and surveyed recent algorithms. In addition, the data distillation methods are compared in Table 1. The author not only discussed the algorithms, but also listed the applications that can benefit from data distillation. Weaknesses The author has not cited other surveys about data distillation, including "A Survey on Dataset Distillation: Approaches, Applications and Future Directions" which has been accepted by IJCAI 2023, and "A Comprehensive Survey of Dataset Distillation" (https://arxiv.org/pdf/2301.05603.pdf). In fact, the paper is quite similar to the IJCAI2023 survey paper For example, the IJCAI2023 survey paper also discussed model matching, gradient matching, trajectory matching, and distribution matching algorithms. One exception is that the previous survey paper has not discussed factorization-based data distillation methods. In addition, the IJCAI 2023 paper surveyed different data modalities and has included the audio modality, which was not included in the paper under review. The IJCAI 2023 paper also discussed different applications. The author compared data distillation with data compression in Section 2.5. I think the author may also compare data distillation with data pruning, especially discuss the relationship with recent paper like "Beyond neural scaling laws: beating power law scaling via data pruning". If we check the data modality section, we may notice that most problems are still classification problems. Applying data distillation to other tasks like language modeling or representation learning should also be a reasonable direction and the author may like to include it the survey. Requested Changes: The author needs to clarify the relationship / difference of the paper with "A Survey on Dataset Distillation: Approaches, Applications and Future Directions". The author should discuss the relationship between data distillation and data pruning Most data distillation algorithms are still compared on classification datasets. The author should clarify such limitation and point out potential directions to expand data distillation to more tasks. Broader Impact Concerns: No concern Claims And Evidence: Yes Audience: Yes Reviewer 2: Summary Of Contributions: This paper is a review on Data Distillation, which aims to reduce the size of datasets of various learning tasks and synthesize terse data summaries that can be used as alternatives for modeling training/inference. The paper begins by presenting the concept and rationale behind data distillation. Subsequently, it delves into a thorough examination of five primary categories of data distillation methods. Furthermore, the paper explores the modality and application of data distillation, providing valuable insights. Lastly, it outlines several prospective directions for future research in this field. Strengths And Weaknesses: Strengths: The topic discussed in the paper is highly significant, especially in the context of the prevalence of large-scale models today. With the escalating costs associated with model training, utilizing data distillation to expedite both training and inference processes is very important. This review has the potential to advance this objective. The paper provides a concise and accurate summary of the existing five data distillation methods. It includes a comprehensive summary of primary references and offers method descriptions that are not only mathematically rigorous but also provide sufficient details. The future directions outlined in Section 5 are valuable references for subsequent research in the field. The paper is well-organized and written-well. Weaknesses: It seems that the paper only focused on labeled datasets. However, regarding tasks without annotations, such as unconditioned image synthesis, are there any works related to data distillation? Currently, large-scale models and datasets are highly popular. Could the author summarize if there are any data distillation methods specifically designed for large datasets? On Page 3, below Definition 3, the paper highlights three key criteria for evaluating data distillation methods: performance, efficiency, and transferability. I'm curious if robustness should be considered as a fourth criterion. For instance, when training with a distilled dataset, is the model more vonerable to adversarial attack methods? An important role of data distillation is to boost training. Could the author provide a summary of the existing methods and their impact on training speed improvement? Requested Changes: Please attempt to address the concerns within the "Weakness" section. Broader Impact Concerns: I have no concerns on the broader impact. Claims And Evidence: Yes Audience: Yes Reviewer 3: Summary Of Contributions: The contribution of this survey paper is to present a formal framework for data distillation and provide a detailed taxonomy of existing approaches. It also covers data distillation approaches for various data modalities, identifies current challenges, and proposes future research directions. Strengths And Weaknesses: Strengths: The paper's classification of different methods based on definitions, algorithms, and data modalities is clear and systematic, making it a valuable survey. The paper's categorization of applications, challenges, and future directions is also clear and insightful. Weaknesses: The paper could benefit from a taxonomy figure summarizing existing methods, which would make it easier for readers to locate and compare different studies. It would be helpful if the paper indicated whether there is related work on surveys of data distillation. The results presented are limited to image modalities, and it would be beneficial if the paper discussed other modalities. The paper would benefit from a more detailed algorithm explaining the overall flow of the most basic model. In the applications section, it would be helpful to define and quantify the results in each field. For example, for NAS, it would be useful to mention the scale of small and large NAS test-beds, whether they refer to the number of models in the search space, model size, or exploration time. Requested Changes: A taxonomy figure summarizing existing methods would be useful for readers to locate and compare different studies. The paper could indicate whether there is related work on data distillation. The results presented are limited to image modalities, and it would be beneficial if the paper discussed other modalities. The paper could benefit from a more detailed algorithm explaining the overall flow of the most basic model. In the applications section, it would be helpful to define and quantify the results in each field. For example, for NAS, it would be useful to mention the scale of small and large NAS test-beds, whether they refer to the number of models in the search space, model size, or exploration time. Broader Impact Concerns: This paper does not discuss any broader impact concerns. Claims And Evidence: Yes Audience: Yes