PhD Position F/M Users trust and legitimacy in contextual collaborative writing in Wikipedia

Contract type : Fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position

Context

This PhD thesis will be supervised by Claudia-Lavinia Ignat, researcher at Inria center of Lorraine University and co-supervised by Léo Joubert, assistant professor at Université de Rouen Normandie.

Assignment

Large-scale collaborative systems, where a large number of users collaborate to carry out a shared task, are attracting much attention from industry and academia. CSCW studies [1,2] showed that the awareness of behavior of other members of the team is an important component to compensate for the lack of direct communication. By allowing each member to be aware of what other members are doing, trust can be built in the team [3]. Trust is defined as an individual’s willingness to become vulnerable to the actions of others with the expectation that others will follow through on their commitments [4]. Trust is more crucial in open-collaborative systems such as Wikipedia in which members usually do not know each other personally. However, it is difficult for end users to manually assess the level of trust in each partner, that is the credibility value that a user can attribute to another user based on their past interactions. This thesis aims to study the problem of trust evaluation and seeks to design a computational trust model dedicated to collaborative systems.

We are particularly interested in the case of Wikipedia, a collaborative online encyclopedia, because it provides us with a huge database produced by a large number of contributors.

On the platform, users can submit revisions of articles to improve their content. The objective of Wikipedia is to ensure the quality and neutrality of the platform's documents.

We already studied how the collaborative interaction of one user affects the trust assessed by the other users in the trust game [5] and contract-based multi-synchronous collaboration [6]. In the trust game [7] the interaction consisted of the money transaction between the two users, while in contract-based multi-synchronous collaboration the computation of trust was based on the adherence to/violation of contracts shared between two users. In the context of the trust game we also showed (i) that presenting a trust score to users encourages collaboration between them in a meaningful way, at a similar level to displaying participants' nicknames; (ii) that users conform to the confidence score in their decision-making regarding monetary exchange [8]. The results therefore suggest that a trust model can be deployed in collaborative systems in order to assist users. However, in Wikipedia, users do not interact directly, but by means of the article to which they contribute. It is difficult to figure out how one user’s edits might influence another user’s edits. 

Usually, scientific literature considers the quality of a contribution in relation to its lifetime on a page. The longer the content of the contribution is present, the higher its quality. The problem with this measure is that it excludes from the quality judgment both the mutual trust that contributors may have with each other, and the fact that Wikipedia rules justifying the deletion of contributions may apply differently from one page to another.

To advance towards this issue, we want to calculate a Wikipedia user's trust level in relation to their past contributions, this trust level being able to predict the quality of this user's future contributions. The trust metric proposed in [5, 6] to predict the behavior of users in relation to their past interactions and taking into account fluctuations in user behavior could be applied by considering that interactions between users are the user contributions to revisions of Wikipedia articles. The main challenge is to define the quality of a user's contributions. For this we plan to study existing metrics based on the length of contributions (for example the length of a contribution in terms of the number of characters added) and the longevity of contributions (edit longevity, for example the duration of persistence of a contribution in the article).

Our concept relies on the use of a distance (for example the Levenstein distance) between the different versions of the document. We would like to calculate a measure of longevity based on a semantic distance by using BERT [9, 11] and SMART [10] models and compare it with existing measures. Wikipedia provides a dataset containing articles that have been manually assessed for quality by experts [12][13]. We therefore wish to validate our algorithms for measuring the quality of user contributions on this data.

In addition to the analysis of the quality of user edits, we plan to analyse user interactions on talk pages which will provide an additional measure for the trust between users.

One of the gaps that will be filled by this project will be to consider the legitimacy of the Wikipedia rules when measuring the level of trust contributors place in a peer, in the context of a page. Indeed, Wikipedia rules are widely used by contributors to settle disagreements on the collaborative writing of a page [14]. What's more, the legitimacy of the rules influences whether individual trajectories are deployed [15]. To take this into account when specifying the trust game, we might introduce parameters linked to the global state of the wiki and of a page.

We also aim to quantify the needed edits to profile a contributor. Whereas common statistical wisdom may recommend having a lot of contributions to stabilize a profile, some research has already stated that early edits of contributors can already be meaningful to state their profiles [16,17].

Bibliography

[1] Jeremy P. Birnholtz and Steven Ibara. Tracking changes in collaborative writing: edits, visibility and group maintenance. In CSCW 2012. ACM, 809–818.

[2] Chyng-Yang Jang, Charles Steinfield, and Ben Pfaff. Virtual team awareness and groupware support: an evaluation of the TeamSCOPE system. Int. J. Hum.-Comput. Stud. 56, 1 (2002), 109–126.

[3] C Brad Crisp and Sirkka L Jarvenpaa. 2013. Swift trust in global virtual teams. Journal of Personnel Psychology (2013)

[4] Roger C. Mayer and Mark B. Gavin. 2005. Trust in management and performance: Who minds the shop while the employees watch the boss? Acad Manage J 48, 5: 874–888.

[5] Quang-Vinh Dang and Claudia-Lavinia Ignat. Computational trust model for repeated trust games. In Proceedings of the IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, pages 34—41, August 2016.

[6] Claudia-Lavinia Ignat and Quang-Vinh Dang. “Users trust assessment based on their past behavior in large scale collaboration”. In: The IEEE International Conference on Intelligent Computer Communication and Processing (ICCP 2021). Cluj-Napoca, Romania, Oct. 2021, 19:1–19:8. doi: 10.1109/ICCP53602.2021.9733490. hal: hal-03469344.

[7] Joyce Berg, John Dickhaut, and Kevin McCabe. Trust, reciprocity, and social history. Games and economic behavior, 10(1):122--142, 1995.

[8] Claudia-Lavinia Ignat, Quang-Vinh Dang, and Valerie L. Shalin. The influence of trust score on cooperative behavior. ACM Transactions on Internet Technology, 19(4), 22 pages, November 2019.

[9] Liu Zhuang, Lin Wayne, Shi Ya, and Zhao Jun. “A Robustly Optimized BERT Pre-training Approach with Post-training”. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics. CCL 2021. Huhhot, China: Springer, Aug. 2021, pp. 471–484. doi: 10.1007/978-3-030-84186-7_31.

[10] Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. “SMART: Robust and Efficient Fine- Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization”. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, July 2020, pp. 2177–2190. doi: 10.18653/v1/2020.acl-main.197.

[11] Wei Wang et al. “StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding”. In: Proceedings of the 8th International Conference on Learning Representations. ICLR 2020. Addis Ababa, Ethiopia: OpenReview.net, Apr. 2020. url: https://openreview.net/forum?id=BJgQ4lSFPH

[12] Morten Warncke-Wang, Dan Cosley, and John Riedl. Tell me more: an actionable quality model for Wikipedia. In Proceedings of OpenSym, 10 pages, August 2013.

[13] Morten Warncke-Wang, English Wikipedia Quality Assessment Dataset. Figshare, Dataset. https://doi.org/10.6084/m9.figshare.1375406.v2

[14] Beschastnikh, I., Kriplean, T., & McDonald, D. (2021). Wikipedian Self-Governance in Action: Motivating the Policy Lens. Proceedings of the International AAAI Conference on Web and Social Media, 2(1), 27-35. https://doi.org/10.1609/icwsm.v2i1.18611

[15] JOUBERT Léo, « Le parfait wikipédien. Réglementation de l’écriture et engagement des novices dans un commun de la connaissance (2000-2018) », Le Mouvement Social, 2019/3 (n° 268), p. 45-60. DOI : 10.3917/lms.268.0045. URL: https://www.cairn.info/revue-le-mouvement-social1-2019-3-page-45.htm

[16] Katherine Panciera, Aaron Halfaker, and Loren Terveen. 2009. Wikipedians are born, not made: a study of power editors on Wikipedia. In Proceedings of the 2009 ACM International Conference on Supporting Group Work (GROUP '09). Association for Computing Machinery, New York, NY, USA, 51–60. https://doi.org/10.1145/1531674.1531682

[17] Dejean, Sylvain and Jullien, Nicolas and Jullien, Nicolas, Big From the Beginning. Assessing Online Contributors' Behavior by Their First Contribution. (April 27, 2015). Research Policy, Volume 44, Issue 6, July 2015, Pages 1226–1239, http://dx.doi.org/10.2139/ssrn.1980806

 

Main activities

  • Study the existing trust metrics in collaborative systems
  • Study existing works on article’s quality in Wikipedia and rules’ legitimacy
  • Propose a metric for the quality of user contributions based on the length and longevity of contributions (using both syntactic and semantic distances)
  • Adapt the trust metric proposed in [5] for Wikipedia considering that user interactions during trust game are their contributions for article revisions
  • Perform measurements using Wikipedia dataset

Skills

  • Engineering and/or Master 2 degree in Computer science / Applied mathematics / Cognitive science
  • Theoretical expertise: collaborative systems 

  • Good collaborative and networking skills, excellent written and oral communication in English
  • Good programming skills
  • Strong analytical skills

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

2100€ gross/month the 1st year