Please wait a minute...
Journal of Data and Information Science  2018, Vol. 3 Issue (1): 40-53    DOI: 10.2478/jdis-2018-0003
Research Paper     
Twitter Users’ Privacy Concerns: What do Their Accounts’ First Names Tell Us?
Fernandez Espinosa Daniela,Xiao Lu()
School of Information Studies, Syracuse University, New York Syracuse, New York, 13210
Download: PDF (310 KB)      HTML  
Export: BibTeX | EndNote (RIS)      


Purpose: In this paper, we describe how gender recognition on Twitter can be used as an intelligent business tool to determine the privacy concerns among users, and ultimately offer a more personalized service for customers who are more likely to respond positively to targeted advertisements.

Design/methodology/approach: We worked with two different data sets to examine whether Twitter users’ gender, inferred from the first name of the account and the profile description, correlates with the privacy setting of the account. We also used a set of features including the inferred gender of Twitter users to develop classifiers that predict user privacy settings.

Findings: We found that the inferred gender of Twitter users correlates with the account’s privacy setting. Specifically, females tend to be more privacy concerned than males. Users whose gender cannot be inferred from their provided first names tend to be more privacy concerned. In addition, our classification performance suggests that inferred gender can be used as an indicator of the user’s privacy preference.

Research limitations: It is known that not all twitter accounts are real user accounts, and social bots tweet as well. A major limitation of our study is the lack of consideration of social bots in the data. In our study, this implies that at least some percentage of the undefined accounts, that is, accounts that had names non-existent in the name dictionary, are social bots. It will be interesting to explore the privacy setting of social bots in the Twitter space.

Practical implications: Companies are investing large amounts of money in business intelligence tools that allow them to know the preferences of their consumers. Due to the large number of consumers around the world, it is very difficult for companies to have direct communication with each customer to anticipate market changes. For this reason, the social network Twitter has gained relevance as one ideal tool for information extraction. On the other hand, users’ privacy preference needs to be considered when companies consider leveraging their publicly available data. This paper suggests that gender recognition of Twitter users, based on Twitter users’ provided first names and their profile descriptions, can be used to infer the users’ privacy preference.

Originality/value: This study explored a new way of inferring Twitter user’s gender, that is, to recognize the user’s gender based on the provided first name and the user’s profile description. The potential of this information for predicting the user’s privacy preference is explored.

Key wordsSocial media      Twitter      Gender recognition      Privacy preferences     
Published: 19 March 2018
Corresponding Authors: Xiao Lu     E-mail:
Cite this article:

Fernandez Espinosa Daniela,Xiao Lu. Twitter Users’ Privacy Concerns: What do Their Accounts’ First Names Tell Us?. Journal of Data and Information Science, 2018, 3(1): 40-53.

URL:     OR

Algorithm Precision Recall F-score
Decision Tree 0.78 0.77 0.78
Support Vector Machine 0.78 0.76 0.77
Neural Networks 0.83 0.82 0.83
Na?ve Bayes 0.81 0.82 0.81
Table 1 Evaluation of classification results.
Female Male Undefined Total
Protected Accounts 227,238 (55.34%) 288,235 (44.68%) 268,915 (45.38%) 784,388 (47.66%)
Public Accounts 183,358 (44.65%) 354,166 (55.13%) 323,594 (54.61%) 861,118 (52.33%)
Total 410,596 (100%) 642,401 (100%) 592,509 (100%) 1,645,506 (100%)
Table 2 User accounts’ privacy setting (protected vs public) and their gender inferred from their names and profile descriptions.
HasARealName (true) NotHasARealName Total
Protected 106,053 (55.37%) 217,541 (54.24%) 323,594 (54.61%)
Public 85,451 (44.62%) 183,464 (45.75%) 268,915 (45.38%)
Total 191,504 (100%) 401,005 (100%) 592,509 (100%)
Table 3 Users whose genders cannot be inferred from the names and the privacy setting of their accounts.
Results obtained by Khazaei et al. (2016a) classifiers Results obtained by adding gender
Algorithm Precision Recall F-Score Precision Recall F-Score
Na?ve Bayes 0.66 0.67 0.66 0.67 0.68 0.66
Regression 0.71 0.70 0.71 0.73 0.72 0.73
Logistic 0.69 0.70 0.69 0.69 0.71 0.69
J48 0.68 0.66 0.67 0.67 0.68 0.67
KNN 0.67 0.59 0.63 0.67 0.59 0.63
Table 4 Comparison of our classification results with Khazaei et al. (2016a).
[1]   Adam A. (2000). Gender and computer ethics. ACM SIGCAS Computers and Society, 30(4), 17-24.
[2]   Ale F.L. (2015). What is market segmentation? (Merca 2.0) Retrieved from .
[3]   Argamon S., Koppel M., Fine J., & Shimoni A.R. (2006). Gender, genre, and writing style in formal written texts. Text-Interdisciplinary Journal for the Study of Discourse, 23(3), 321-346.
doi: 10.1515/text.2003.014
[4]   Burger J.D., Henderson J., Kim G., & Zarrella G. (2011). Discriminating gender on Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1301-1309). Stroudsburg, PA: Association for Computational Linguistics.
[5]   Campbell M. (2016). About our site. (Behind the Name: the Etymology and History of First Names). Retrieved from .
[6]   Chu Z., Gianvecchio S., Wang H., & Jajodia S. (2010). Who is tweeting on Twitter: Human, bot, or cyborg? In Proceedings of the 26th Annual Computer Security Applications Conference (pp. 21-30). New York:ACM.
[7]   Cohen J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:Lawrence Earlbaum Associates.
[8]   Entrepreneur Magazine. (2012). How to know your market. Retrieved from
[9]   Fernandez D., Moctezuma D., & Sordia O. (2016). Features combination for gender recognition on Twitter Users. IEEE International Autumn Meeting on Power, Electronics and Computing, 17(17), 400-408.
[10]   Gaucher D., Friesen J., & Kay A.C. (2011). Evidence that gendered wording in job advertisements exists and sustains gender inequality. Journal of Personality and Social Psychology, 101(1), 109-128.
doi: 10.1037/a0022530 pmid: 21381851
[11]   Gonzalez F. (2015). New Twitter tool promising better targeting. (Merka 2.0). Retrieved from .
[12]   Greenfield C. (2012). Don’t be creepy: Using robust user data for ad targeting while respecting privacy. (Target Marketing). Retrieved from .
[13]   Gross R.,& Acquisti ,A. (2005). Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM Workshop on Privacy in the Electronic Society (pp. 71-80). Alexandria: ACM.
doi: 10.1145/1102199.1102214
[14]   Herda?delen A. (2013). Twitter n-gram corpus with demographic metadata. Language Resources and Evaluation, 47(4), 1127-1147.
doi: 10.1007/s10579-013-9227-2
[15]   Howland S. (2014). Anticipate trends ensure business success. ( Retrieved from .
[16]   Irani D., Webb S., Li K., & Pu C. (2009). Large Online Social Footprints-An Emerging Threat. In CSE '09 Proceedings of the 2009 International Conference on Computational Science and Engineering (pp. 271-276). Washington, D.C.: IEEE Computer Society.
[17]   Katell M.A., Mishra S.R., & Scaff L. (2016). A fair exchange: Exploring how online privacy is valued. In the 49th Hawaii International Conference on System Sciences (HICSS)(2016)
[18]   Khazaei T., Xiao L., Mercer R.E., & Khan A. (2016a). Privacy Preference Inference via Collaborative Filtering. In Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM), 611-614.
[19]   Khazaei T., Xiao L., Mercer R.E., & Khan A. (2016b). Privacy behaviour and profile configuration in Twitter. In Proceedings of the 25th International Conference Companion on World Wide Web (pp. 575-580). Montreal.
doi: 10.1145/2872518.2890088
[20]   Krempeaux C.I. (2013). Predicting gender on Twitter. (Charles Iliya Krempeaux Personal Site) Retrieved from .
[21]   Kwasny M., Caine K., Rogers W.A., & Fisk A.D. (2008). Privacy and technology: Folk definitions and perspectives. In CHI '08 Extended Abstracts on Human Factors in Computing Systems (pp. 3291-3296). New York: ACM.
doi: 10.1145/1358628.1358846 pmid: 29057397
[22]   Liu W.,& Ruths, D. (2013). What’s in a name? Using first names as features for gender inference in Twitter. In Analyzing Microtext: Papers from the 2013 AAAI Spring Symposium (pp. 10-16). Palo Alto, CA: AAAI Press.
[23]   Lopez N. (2014). Twitter advertisers can now target ads based on the apps a user has installed. (The next web). Retrieved from .
[24]   Morey T., Forbath T., & Schoop A. (2015). Customer data: Designing for transparency and trust. (Harvard Business Review).
[25]   Nazir S., Tayyab A., Sajid A., Rashid H.u., & Javed I. (2012). How online shopping is affecting consumers buying behavior in Pakistan? IJCSI International Journal of Computer Science Issues, 9(3), 486-495.
[26]   Parker R.B. (1974). A definition of privacy. Rutgers Law Review, 27(1), 275-296.
[27]   Riquelme I.P.,& Román ,S. (2014). Is the influence of privacy and security on online trust the same for all type of consumers? Electronic Markets, 24(2), 135-149.
doi: 10.1007/s12525-013-0145-3
[28]   Sieger H.,& Moller ,S. (2012). Gender differences in the perception of security of mobile phones. In Proceedings of the 14th International Conference on Human-computer Interaction with Mobile Devices and Services Companion (pp. 107-112). New York: ACM.
[29]   Statista. (2016). Global social networks ranked by number of users. Retrieved from
[30]   Twitter. (2017). Users. Retrieved from
[31]   van AswegenA.(2015). Women vs. men—Gender differences in purchase decision making (Guided Selling) Retrieved from .
[1] Xianlei Dong, Jian Xu, Ying Ding, Chenwei Zhang, Kunpeng Zhang & Min Song. Understanding the Correlations between Social Attention and Topic Trends of Scientific Publications[J]. Journal of Data and Information Science, 2016, 1(1): 28-49.
[2] Yan ZHOU,Wei LI,Xingfu YUAN,Pengyi ZHANG. Ontology modeling of semantics in social media: Public issue knowledge base (PIKB) of the Weibo[J]. Journal of Data and Information Science, 2014, 7(1): 16-30.