================================================== Q: i was wondering if you could give me some informations about social circles the app , how does it construct datasets from informations submitted by facebook users A: We just used facebook's graph API. You can start looking into it here: https://developers.facebook.com/tools/explorer However I'm not sure that this type of app can be developed any more -- facebook has substantially changed the available permissions for applications since I wrote this app (back in 2011!). In particular I'm not sure if "mutual friends" can be collected any more. ================================================== Q: How should we interpret the data in the files named .featnames? We have interpreted it as all features that nodes connected to the ego-node have. For example, there are 11 different hometowns and 20 different work employers for those nodes. But this also means that there are only 21 last names and 4 first names, which seems improbable. A: The files only include features that are shared by at least two people in the network. So features like first names tend to disappear, basically there are four first names that at least two of this person's friends share. But in any case I suggest you check out the Kaggle competition we ran (linked from my webpage), which is a substantially larger dataset of the same type, and is somewhat better documented. Q: Why do you supply the code with the number of circles? Isn't the code suppose to find the hidden circles? How do you determine the number of circles K to use as input? A: You have to run the code for every value of K in order to determine which value of K is the best. The code outputs the log-likelihood so the information criterion should be easy to compute. ================================================== Q: Thanks for publishing your dataset for the paper "Learning to discover social circles in ego networks" published in NIPS 2012. I am thinking of using the Twitter dataset for further research (and cite your paper of course!). Just would like to ask two quick questions: 1. Regarding the 973 ego networks, for each ego, is the ID number the Twitter ID that can be used to retrieve the user profile. For example for the ego with ID "12831" (12831.circles, 12831.edges, 12831.egofeat ....), the user profile can be retrieved with Twitter API: http://api.twitter.com/1/users/show.xml?user_id=12831 2. How are the egos selected? Can I assume these social circles are for genuine social networking purposes? In other words, these egos are not businesses, celebrities or sportsmen or other professionals to market themselves on Twitter? A: 1) Yes, indeed that should be the case. 2) The egos were selected by a breadth first search of the social graph. I can dig up the seed user if you're really curious, not that you'd be able to crawl the same data anyway since the network will have changed in the 2 years since I crawled it. I didn't do anything to ensure the veracity of the ego profiles, though I did ignore users that followed a huge number of people, which hopefully cuts out a few spambots. ================================================== Q: I am writing this email because I have a quick clarification question regarding your paper "Discovering Social Circles in Ego Networks". In the evaluation section you say that you calculate F1 score for comparing the ground truth to the discovered communities. Can you elaborate on how this computation was actually done? Did you align the circles in a similar way to how you computed the BER? thanks in advance for the clarification. A: Exactly, we used the same alignment method for both evaluation measures. You should be careful using this evaluation measure though, it only makes sense if all of the methods predict the same number of circles; generally a method that makes many predictions will result in a smaller loss than one that makes few. The issue we faced is that (in the case of Google+ and twitter) we only have partially labeled ground-truth (i.e., we only observe *some* circles), so we didn't want to penalize methods that make many predictions, in case they are actually reasonable and just didn't appear in our ground-truth labels. ================================================== Q: There was an undeniable gap between the reported errors and the ones that we ran on your proposed framework after correct implementation of BIC. I couldn't find the problem (the BIC implementation was not provided in the codes). I think the reported results are just for one circle (based on our runs) and the recommended optimal matching for aligning the predicted circles and ground-truth circles in your paper works well on small predictions -- especially one circle, as it exactly matches with the best one. A: No, I never ran the algorithms with just one circle. The algorithms were run for 2,4,6,8 circles. For the case of facebook, with the *uncompressed* representation, yes, it will tend to select very few circles since there are so many parameters per circle. For the compressed representation this is not the case, larger numbers of circles are selected by the BIC, at least on larger graphs. This is also the case with Google+ and twitter, where the number of model parameters is very small and the BIC will select larger numbers of circles. The reported results in the paper are absolutely not just running the algorithm for one circle--this is not something I ever tried. You're right that for facebook, on small networks, with the naive parameterization, few circles are generally going to be selected by the BIC.