I've recently graduated from UCSD and currently I am affiliated in Biostatistics Department at University of Michigan, Ann Arbor as a Research Assistant Professor. This page is no longer maintained, and recent information of mine can be found HERE

Welcome to Hyun Min Kang's Home!

Hello, I am a graduate student in the Department of Computer Science and Engineering at the University of California, San Diego. Currently, I am working at UCLA with my research advisor Eleazar Eskin who is jointly affiliated with Computer Science and Human Genetics departments at UCLA since 2007. Previously, I was a research fellow at Genome Research Center for Diabetes and Endocrine Disease, Seoul National University Hospital. Prior to that, I finished my Bachelor's and Master's degrees at School of Electrical Engineering, Seoul National University with database specialization, under the advise from Sang K. Cha.

Research Interests

Currently, my primary research topic is "Effective design and analysis of systems genetics studies", which is also the title of my dissertation. One of the major challenges in the systems genetics research is that many unmodeled (and typically undocumented) factors in these data sets often largely confound the discovery of true biological signals, resulting in excessive false positives that do not replicate between independent studies. It is statistically important both to detect and to correct for such unmodeled confounding factors to more accurately infer causal relationships among the elements of the biological system. I have observed that a vast majority of currently available high-throughput biological data sets for systems genetics research are susceptible to such confounding effects. For example, population structure and expression heterogeneity are widely known confounding factors in association mapping and expression data analysis, respectively - yet no comprehensive solution for them has been proposed. I have contributed this area by proposing novel methods that robustly resolve such confounding factors through "confounding-aware" statistical analysis in various contexts of systems genetics studies, and I am still enjoying working on them, hoping to make a breakthrough in the systems genetics studies.

I am also interested in many computational and statistical challenges in sequenced-based high-throughput biological data. In particular, I am interested in a comprehensive understanding of genetic variation structure of humans and many other model organisms. I have analyzed the haplotype structure among inbred mouse strains from both the NIEHS/Perlegen resequencing projects and mouse HapMap project. Through these projects, I was able to unveil many interesting patterns of genomic variations among laboratory mouse strains. I have been also working on the method imputing unobserved genotypes and haplotypes by leveraging the genetic variation structure. I believe that the comprehensive understanding of the genetic variation will facilitate dissecting the genetic basis of the complex traits, which is the ultimate goal of my research.


Resources and Software

  • EMMA (Efficient Mixed Model Association)
    Correcting for complex population structure and genetic relatedness in association mapping.
  • ICE (Inter-sample Correlation Emended) association mapping
    Resolving the expression heterogeneity in the expression analysis from high-throughput data sets.
  • EMINIM (Expectation-Maximized INtegrative IMputation)
    An adaptive and memory-efficient imputation of unobserved genotypes
  • Mouse HapMap Resource
    A high-density haplotype resource of 94 inbred mouse strains
  • NIEHS/Perlegen mouse resequencing resource
    Genotype and haplotype resource of sequence-based 8.27 million SNPs
  • Mouse Phenome Association Database (MPAD)
    Association mapping results between mouse HapMap SNPs and mouse phenome database (MPD)

