A growing proportion of human activities, such as social interactions, entertainment, shopping, and gathering information, are now mediated by digital services and devices. We distinguish between data that are actually recorded and information that can be statistically predicted from such records. People may choose not to reveal certain pieces of information about their lives, such as their sexual orientation or age, and yet this information might be predicted in a statistical sense from other aspects of their lives that they do reveal. This study demonstrates the degree to which relatively basic digital records of human behavior can be used to automatically and accurately estimate a wide range of personal attributes that people would typically assume to be private. The design of the study is presented in Fig. As a consequence, their values can only be measured approximately, for example, by evaluating responses to questionnaires. The transparent bars presented in Fig.

Thus, although the SWL score includes variability attributable to mood, users’ Likes accrue over a longer period and, so, may be suitable only for predicting long-term happiness. Amount of Data Available and Prediction Accuracy. The results presented so far rely on individuals for which between one and 700 Likes were available. Therefore, what is the expected accuracy given a random individual and how does prediction accuracy change with the number of observed Likes? Accuracy of selected predictions as a function of the number of available Likes. Individual traits and attributes can be predicted to a high degree of accuracy based on records of users’ Likes.

Table S1 presents a sample of highly predictive Likes related to each of the attributes. Moreover, note that few users were associated with Likes explicitly revealing their attributes. We Didn’t Choose To Be Gay We Were Chosen. This is further illustrated in Fig. S1, which shows the average levels of personality traits and age for several popular Likes. Each Like attracts users with a different average personality and demographic profile and, thus, can be used to predict those attributes. Predicting users’ individual attributes and preferences can be used to improve numerous products and services.

Also, the relevance of marketing and product recommendations could be improved by adding psychological dimensions to current user models. On the other hand, the predictability of individual attributes from digital records of behavior may have considerable negative implications, because it can easily be applied to large numbers of people without obtaining their individual consent and without them noticing. There is a risk that the growing awareness of digital exposure may negatively affect people’s experience of digital technologies, decrease their trust in online services, or even completely deter them from using digital technology. 1To whom correspondence should be addressed. This article is a PNAS Direct Submission. This article contains supporting information online at www. Freely available online through the PNAS open access option.