Contract type : Public service fixed-term contract
Level of qualifications required : Graduate degree or equivalent
Other valued qualifications : Master in Computer Science
Fonction : PhD Position
Motivations and context
According to the 2017 International Migration Report, the number of international migrants worldwide has grown rapidly in recent years, reaching 258 million in 2017, among whom 78 million in Europe. A key reason for the difficulty of EU leaders to take a decisive and coherent approach to the refugee crisis has been the high level of public anxiety about immigration and asylum across Europe. There are at least three social factors underlying this attitude (Berri et al, 2015): the increase in the number and visibility of migrants; the economic crisis that has fed feelings of insecurity; the role of mass media. The last factor has a major influence on the political attitudes of the general public and the elite. Refugees and migrants tend to be framed negatively as a problem. This translates into a significant increase of hate speech towards migrants and minorities. The Internet seems to be a fertile ground for hate speech (Knobel, 2012).
The goal of this PhD Thesis is to develop a methodology to automatically detect hate speech in social network data (Twitter, YouTube, Facebook).
Our methodology in the hate speech classification will be related on the recent approaches for text classification with Neural Networks and word embeddings. In this context, fully connected feed forward networks (Iyyer et al., 2015; Nam et al., 2014), Convolutional Neural Networks (CNN) (Kim, 2014; Johnson and Zhang, 2015) and also Recurrent/Recursive Neural Networks (RNN) (Dong et al., 2014) have been applied. On the one hand, the approaches based on CNN and RNN capture rich compositional information, and have outperformed the state-of-the-art results in text classification; on the other hand they are computationally intensive and require careful hyperparameter selection and/or regularization (Dai and Le, 2015).
Berri M, Garcia-Blanco I, Moore K (2015), Press coverage of the Refugee and Migrant Crisis in the EU: A Content Analysis of five European Countries, Report prepared for the United Nations High Commission for Refugees, Cardiff School of Journalism, Media and Cultural Studies.
Dai, A. M. and Le, Q. V. (2015). “Semi-supervised sequence Learning”. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, pages 3061-3069. Curran Associates, Inc
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., and Xu, K. (2014). “Adaptive recursive neural network for target-dependent twitter sentiment classification”. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL, Baltimore, MD, USA, Volume 2: pages 49-54.
Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daumé, H. (2015). “Deep unordered composition rivals syntactic methods for text classification”. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, volume 1, pages 1681-1691.
Johnson, R. and Zhang, T. (2015). “Effective use of word order for text categorization with convolutional neural networks”. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 103-112.
Knobel M. (2012). L’Internet de la haine. Racistes, antisémites, néonazis, intégristes, islamistes, terroristes et homophobes à l’assaut du web. Paris: Berg International
Schmidt A., Wiegand M.(2017). A Survey on Hate Speech Detection using Natural Language Processing, Workshop on Natural Language Processing for Social Media
Zhang, Z., Luo, L (2018). Hate speech detection: a solved problem? The Challenging Case of Long Tail on Twitter. arxiv.org/pdf/1803.03662
The goal of this PhD Thesis is to develop a new methodology to automatically detect hate speech, based on machine learning and Neural Networks. Human detection of this material is infeasible since the contents to be analyzed are huge. In recent years, research has been conducted to develop automatic methods for hate speech detection in the social media domain. These typically employ semantic content analysis techniques built on Natural Language Processing (NLP) and Machine Learning (ML) methods (Schmidt et al. 2017). Although current methods have reported promising results, their evaluations are largely biased towards detecting content that is non-hate, as opposed to detecting and classifying real hateful content (Zhang et al., 2018). Current machine learning methods use only certain task-specific features to model hate speech. We propose to develop an innovative approach to combine these pieces of information into a multi-feature approach so that the weaknesses of the individual features are compensated by the strengths of other features (explicit hate speech, implicit hate speech, contextual conditions affecting the prevalence of hate speech, etc.).
The student will work in the framework of French-German project (ANR project).
Required skills: background in statistics, natural language processing and computer program skills (Perl, Python), neural networks tools.
Candidates should email a detailed CV with diploma
- Subsidised catering service
- Partially-reimbursed public transport
- Social security
- Paid leave
- Flexible working hours
- Sports facilities
Monthly gross salary : 1st and 2nd year, 1982 euros
Monthly gross salary : 3rd year, 2085 euros
- Town/city : Villers-lès-Nancy
- Inria Center : CRI Nancy - Grand Est
- Starting date : 2019-06-01
- Duration of contract : 3 years
- Deadline to apply : 2019-03-31
- Inria Team : MULTISPEECH
PhD Supervisor :
Illina Irina / firstname.lastname@example.org
Inria, the French national research institute for the digital sciences, promotes scientific excellence and technology transfer to maximise its impact. It employs 2,400 people. Its 200 agile project teams, generally with academic partners, involve more than 3,000 scientists in meeting the challenges of computer science and mathematics, often at the interface of other disciplines. Inria works with many companies and has assisted in the creation of over 160 startups. It strives to meet the challenges of the digital transformation of science, society and the economy.
Instruction to apply
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.