"One day you will wake up and there won't be any more time to do the things you've always wanted. Do it now."
- Paulo Coelho
In June, 2015 I graduated with a degree of M.S in Computer Science. As a student, I got the opportunity to explore the applications of machine learning to problems in computer systems. I was part of AI and SysNet labs, co-advised by Prof. Lawrence Saul and Prof. Geoffrey Voelker.
Over the summer, I was part of the relevance algorithms team working on the algorithm used by Groupon to rank the deals on per user basis. The goal of the project was to find anamolies in the different stages of the algorithm by computing certain metrics like category distribution, histograms stats etc. on the logged output of 1% random user sample. I also worked on creating a Random Forest model for user click prediction on deals where feature set consisted of deal features as well as user related attributes.
Previously, I worked at Qualcomm, India from August 2012 - July 2013 as Software Developer for Wireless Connectivity team, implementing features and optimizing wireless driver for Android Jelly Bean.
I completed my Bachelor's degree from Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), India and was awarded President's Gold Medal.
Apart from all this, I have varying interests and hobbies (sketching, reading fiction, hiking, running, listening to electronica, watching Friends nth number of time, watching any tv series, puzzles, travelling etc.) depending on the current urge, mood and time.
MaLCoN: Malchine Learning analysis on Copyright Notices
This is my Masters' thesis. I try to take new look at the Digital Millenium Copyright Act (DMCA) notices which are being filed under Section 512. This act allows the copyright holders to take down online content by filing notices. I try to mine the rich data available in those notices by applying different machine learning techniques and get more insight into hidden patterns.
Performance investigation on Next Generation Sequencing's (NGS) software tool chain used by Bioinformatics Dept. at UCSD for detecting variants in genomes of individuals and related family members. The current tool chain takes 12 - 14 days to complete for a single genome. Based on small datasets tried to come up with a fairly accurate model for predicting the behavior of the tool chain for the actual dataset. The model achieve ~98% accurancy on incremental datasets. Term paper ranked 1 in class of 80 students. Link
Recursive Auto-Encoder for Sentiment Classification on Movie Review Polarity Dataset
We have implemented a semi-supervised RAE for sentiment prediction on the Movie Reviews dataset. These were originally movie reviews from the website Rotten Tomatoes. We verify our implementation and reproduce the results obtained in their paper on the movie-review dataset. We achieve 75.4% accuracy using this model. Link
Latent Dirichlet Allocation
Latent Dirichlet Allocation (LDA) is a probabilistic, generative model designed to discover latent topics in text corpora. In this paper, we trained LDA model on two different datasets , Classic400 and BBC News. We use the method of collapsed Gibbs sampling to train the model and discussed issues related to Gibbs sampling, defining goodness-of-fit criteria, parameter tuning, convergence etc. We analyzed the experimental results and also tested the effectiveness of LDA in modeling and discovering latent topics in the corpus using VI distance measure. Link
Punctuation prediction using Conditional Random Field (CRF)
Conditional Random Fields (CRFs) provide a flexible and powerful model for assigning labels to input sequences in several applications such as part-of-speech tagging, text-to-speech mapping, hyphenation etc. In this article we presented a punctuation prediction system using linear chain CRF. We experimented with two methodologies, Collins perceptron and contrastive divergence, for learning a linear chain CRF model on Enron emails dataset. The model obtained word level accuracy of 93.05% for Collins perceptron and 93.43% for contrastive divergence. Link
Study of Gradient based optimizations for logistic regression
In this paper, we trained a logistic regression model on Gender recognition dataset. We trained the model using Stochastic gradient descent with L2 regularization and LBFGS optimization algorithms and compared their error rates. The model trained with Stochastic Gradient Descent achieves an error rate of 0.092 and model trained with LBFGS also achieves an error-rate of 0.092. Link
Estimation of Smaug: Benchmarking SnapDragon and Firefox OS
This report is a study of the performance of Firefox OS on the Geeksphone Keon mobile device.The study was done on a Firefox OS Developer Preview Phone which has an ARM based Qualcomm chipset. In this report we define the base hardware characteristics, predict the performance of certain operations and present the measurements obtained for the actual performance in terms of overhead added by the Operating System. (Link available upon request)
On the use of LSH for privacy preserving personalization
Armen Aghasaryan, Makram Bouzid, Dimitre Kostadinov, Mohit Kothari, Animesh Nandi.
IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom 2013)
Beacon Frame Data Transmission Rate Adjustment.
Jagatiya Vikas, Kothari Mohit, Jammula Rahul. 2013.
U.S. Patent 9088982, granted July 21, 2015.
Systems and methods for privacy protected clustering of user interest profiles.
Aghasaryan Armen, Bouzid Makram, Kothari Mohit, Nandi Animesh. 2012.
European Patent Application 12290234.9 – 1244 filed July, 2012.