2019-01476 - PhD Position F/M Distributed Query Analytics on Property Graphs [PhD Campaign 2019 - Campagne Doctorants Grenoble Rhône-Alpes]

Contract type : Public service fixed-term contract

Level of qualifications required : Graduate degree or equivalent

Fonction : PhD Position

About the research centre or Inria department

Grenoble Rhône-Alpes Research Center groups together a few less than 800 people in 35 research teams and 9 research support departments.

Staff is localized on 5 campuses in Grenoble and Lyon, in close collaboration with labs, research and higher education institutions in Grenoble and Lyon, but also with the economic players in these areas.

Present in the fields of software, high-performance computing, Internet of things, image and data, but also simulation in oceanography and biology, it participates at the best level of international scientific achievements and collaborations in both Europe and the rest of the world.

Context

The candidate will pursue a PhD thesis under the supervision of Angela Bonifati and Pierre Genevès.

Scientific Context.  Graphs and property graphs [BFV18] are becoming ubiquitous in many settings such as social and professional networks, collaborative networks for governmental agencies, health and energy consumption monitoring, scientific networks and knowledge graphs, alongside recommendation and fraud detection systems.

Property graphs represent the newest graph data model that enhance the existing RDF and graph database models with list of properties attached to nodes and edges. In property graphs and related query languages for property graphs (out of which ongoing standardization activities bringing to proposals such as GQL [GQL] and G-Core [AAB18]), paths become first-class citizens in querying/analytical tasks, while key-value pairs are queried together with recursive paths in the underlying graphs.

As query languages for such graphs are under development, the respective modification operations are also newly defined [CAP, cypher]. The combinations of queries and updates bring to new analytical operations for such graphs, whose execution requires scalable platforms.

Assignment

Scientific Objectives. We envision the study of scalable graph query and update batch processing in a distributed setting. The current state of the art is Cypher for Apache Spark [CAP], in which for instance named queries and updates are already supported along with an initial graph schema specification for property graphs. We believe that in this direction there are several milestones out of which (i) plugging in a static analysis approach in order to capture the interference of batches of queries and updates prior to compilation [GJG16, GGL15]; (ii) proving the equivalence and bidirectionality of the operations in declarative and procedural batches of graph operations in the presence of a graph schema [BBF05, BFG19]; (iii) extending graph query workloads to the recently expressive query languages and considering the case of mixed query/update workloads [BBC17, CEG13, gmark]. Especially with massive graph data, static analysis verification is desirable in order to avoid inconsistent results. The ongoing definition of schema languages for graphs has also a huge impact on the above objectives.

Scientific, Societal and Economic Impact.  We believe that this topic has many scientific, societal and economic outcomes in France. Many businesses are in fact collecting their data under the form of knowledge graphs but they do not know how to analyze them or they do not know how to do it efficiently. Both cases are covered by the development of this PhD topic.

External Collaborations.  We would like to pursue our ongoing collaborations with Eindhoven University of Technology, Netherlands (Prof. George Fletcher) and his team, as well as collaborations with the most successful European graph database company, Neo4j [neo4j] (Dr. Hannes Voigt and Dr. Petra Selmer). During the course of the thesis, the student might be able to have exchanges with the above colleagues and also scientific stays at those universities/companies are possible.

Publications of the Inria Tyrex team (related to the PhD topic).

[BMT17] Angela Bonifati, Wim Martens, Thomas Timm: An Analytical Study of Large SPARQL Query Logs. PVLDB 11(2): 149-161 (2017)

[BBC17] Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George H. L. Fletcher, Aurélien Lemay, Nicky Advokaat: gMark: Schema-Driven Generation of Graphs and Queries. IEEE Trans. Knowl. Data Eng. 29(4): 856-869 (2017)

[BBF05] Michael Benedikt, Angela Bonifati, Sergio Flesca, Avinash Vyas: Verification of Tree Updates for Optimization. CAV 2005: 379-393

[DB18] Angela Bonifati, Stefania Dumbrava: Graph Queries: From Theory to Practice
ACM SIGMOD Record 47(4): 2018 (Invited Paper, to appear in the DB Principles column).

[BFG19] Angela Bonifati, Peter Furniss, Alastair Green, Russ Harmer, Eugenia Oshurko, Hannes Voigt: Schema Validation and Evolution for Graph Databases. CoRR abs/1902.06427 (2019)

[BFV18] Angela Bonifati, George H. L. Fletcher, Hannes Voigt, Nikolay Yakovets: Querying Graphs. Synthesis Lectures in Data Management, Morgan & Claypool (2018).


[JGGL18] Louis Jachiet, Nils Gesbert, Pierre Genevès, Nabil Layaïda. On the Optimization of Recursive Relational Queries, BDA 2018 - 34ème Conférence sur la Gestion de Données - Principes, Technologies et Applications, Oct 2018, Bucarest, Romania. pp.1-22

[GJG16] Damien Graux, Louis Jachiet, Pierre Genevès, Nabil Layaïda: SPARQLGX: Efficient Distributed Evaluation of SPARQL with Apache Spark. International Semantic Web Conference (2) 2016: 80-87

[GGL15] Nicola Guido, Pierre Genevès, Nabil Layaïda, Cécile Roisin: On Query-Update Independence for SPARQL. CIKM 2015: 1675-1678

[CEG13] Melisachew Wudage Chekol, Jérôme Euzenat, Pierre Genevès, Nabil Layaïda: Evaluating and Benchmarking SPARQL Query Containment Solvers. International Semantic Web Conference (2) 2013: 408-423

[gmark] gMark https://github.com/graphMark/gmark (accessed on: 2019-03-17)

Other references.

[AAB18] Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, Hannes Voigt: G-CORE: A Core for Future Graph Query Languages. CoRR abs/1712.01550 (2017) (to appear in ACM Sigmod 2018)

[AAB17] Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L. Reutter, Domagoj Vrgoc: Foundations of Modern Query Languages for Graph Databases. ACM Comput. Surv. 50(5): 68:1-68:40 (2017)

[FGG18] Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin et al.: Cypher: An Evolving Query Language for Property Graphs. In ACM Sigmod 2018.

[SMS17] Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, M. Tamer Özsu: The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing. PVLDB 11(4): 420-431 (2017)

[MW95] Alberto O. Mendelzon, Peter T. Wood: Finding Regular Simple Paths in Graph Databases. SIAM J. Comput. 24(6): 1235-1258 (1995)                  

[gcore] G-Core. https://github.com/ldbc/ldbc_gcore_parser (accessed on: 2019-03-17).   

[cypher] Cypher. https://www.opencypher.org/ (accessed on: 2019-03-17).

[graphql] GraphQL. http://graphql.org/ (accessed on: 2019-03-17).

[GQL] The Graph Query Language Manifesto. https://gql.today/ (accessed on: 2019-03-17).

[gremlin] Gremlin. http://tinkerpop.apache.org/ (accessed on: 2019-03-17).

[neo4j] Neo4j. https://neo4j.com/ (accessed on: 2019-03-179).

[pgx] Oracle PGX. http://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytix
(accessed on: 2019-03-17).

[sparql] SPARQL. https://www.w3.org/TR/sparql11-query/ (accessed on: 2019-03-179).       

[CAP] Cypher for Apache Spark. https://github.com/opencypher/cypher- for- apache- spark.  (accessed on: 2019-03-17)               

 

Main activities

 

Benefits package

  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

Remuneration

Salary (before taxes) : 1982€ gross/month for 1st and 2nd year. 2085€ gross/month for 3rd year.