George Papastefanatos

Researcher
Information Management Systems Institute
ATHENA Research Center
Artemidos 6 & Epidavrou Athens Greece

mail-to:, .

Find me in LinkedIn , Google Scholar, DBLP, ORCID iD iconORCID

About

I am a Principal Researcher at the Information Management Systems Institute (IMSI) of the ATHENA Research and Innovation Centre .

My research interests are in the area of big data management and analytics, working in problems related to scalable visual analytics, data integration, knowledge graphs and data evolution. I obtained my Diploma on Electrical and Computer Engineering and my PhD in Computer Science from the Department of Electrical and Computer Engineering of the National Technical University of Athens (NTUA). Before joining ATHENA R.C., I have been adjuct researcher in NTUA, University of Athens, National Centre for Social Research and University of Ioannina and worked as an external IT expert in various private and public organizations in the design and implementation of large-scale IT projects. I have been an adjunct\visiting lecturer in University of Peloponnese, Athens University of Economics and Business, University of Aegean, University of Piraeus and National School of Public Administration. I have coedited 1 book, 3 chapters in books and more than 80 publications in international conferences and journals in the areas of big data management and analytics. Three of my articles have been selected as Best Papers in International conferences.

News


Research Interests & Projects

I am coordinating the following projects. Please contact me for more details.

  • Jan 2023 - Dec 2025: ExtremeXP: EXPeriment driven and user eXPerience oriented analytics for eXtremely Precise outcomes and decisions. ExtremeXP proposes a new paradigm for data analytics. This paradigm consists of experimentation-driven analytics, to provide accurate, precise, fit-for-purpose, and trustworthy data-driven insights via evaluating different complex analytics variants, considering end users’ preferences and feedback in an automated way. The ambition is to provide capabilities for learning from experimentation to predict user requirements, profiling the user, and proactively generating the accurate analytics workflow towards more precise outcomes and personalized insights for decision making and focusing on the user experience, requirements, and needs and putting him in the center of the decision-making process. ExtremeXP will integrate cutting-edge research results from the domains of data integration, machine learning, visual analytics, explainable AI, decentralized trust, knowledge engineering, and model-driven engineering into a common framework.ExtremeXP proposes a new paradigm for data analytics. This paradigm consists of experimentation-driven analytics, to provide accurate, precise, fit-for-purpose, and trustworthy data-driven insights via evaluating different complex analytics variants, considering end users’ preferences and feedback in an automated way. The ambition is to provide capabilities for learning from experimentation to predict user requirements, profiling the user, and proactively generating the accurate analytics workflow towards more precise outcomes and personalized insights for decision making and focusing on the user experience, requirements, and needs and putting him in the center of the decision-making process. ExtremeXP will integrate cutting-edge research results from the domains of data integration, machine learning, visual analytics, explainable AI, decentralized trust, knowledge engineering, and model-driven engineering into a common framework (Co-funded by HORIZON-CL4-2022-DATA-01-01, GA:101093164).
  • Jul 2022 - Jun 2024: Arcadia: Autonomous Resource Allocation for Edge Infrastructures. The optimization of resource allocation in cloud computing environments is a crucial problem with particular research interest and direct application to a multitude of commercial applications. The main objective of ARCADIA is to investigate, design, and evaluate ML methods for optimized resource allocation in cloud computing environments focusing on a) systems exhibiting dynamic workload characteristics, and b) environments with high energy consumption requirements due to simultaneous and continuous operation of computer clusters and equipment. Both of these features can be found in edge systems, and specifically in edge data centers, which have become a pivotal computing part of next-generation networks. (Funded by: Greece 2.0 - National Recovery and Resiliency Plan)

My research interests and some past projects include:

  • Data science and Visual Analytics. An active direction in my research concerns the areas of Data visualization, Exploration and Visual Analytics.
    • Self-service scalable visual analytics: A recent national project that I am coordinating, called Visual Facts, is related to self-service scalable visual analytics over big data. Self-service visual analytics is a new paradigm, widely promoted in modern corporate environments, in which business users are enabled and encouraged to directly manipulate (explore, blend, analyze) underlying data in rich visual ways, in order to derive insights from business information as quickly and efficiently as possible. The aim of VisualFacts is to develop a cloud-based scalable platform for providing self-service visual analytic capabilities to a wide range of non-corporate users to access, explore, analyze open and privately-held data and collaborate on the analytic results of their work by sharing, annotating and reusing them in the form of open facts.
    • In-Situ Visual Data exploration: We have worked in methods aiming at enabling efficient and interactive visual analysis of very large raw data files (e.g., csv, json, etc). Our system employs an in-memory data structure that addresses the visual needs and enables users to perform several visual exploration scenarios.
    • Graph Visualization: We have developed GraphVizDB, a tool that enables the visualization and exploration of very large graphs.
  • Big Data Management. My research focus is on data management techniques and more specifically on data integration, scalable query processing and visual analytics over big data. Most of these techniques have been applied to various big data scenarios such as:
    • Telco Data: My research has recently focused on end-to-end big data solutions for managing massive streams from IoT devices. I am project coordinator of an industry-funded project between IMSI, Intracom Telecom and Ericsson. IMSI has been contracted to design and develop a end2end big data solution and machine learning methods for stream analytics on network quality data coming from IoT devices, such as drones and autonomous cars.
    • Scholarly Data: A main area of interest concerns Entity resolution in Big Data Integration settings, such as duplicate detection and entity interlinking. I have worked on Blocking \ Meta-blocking techniques, Parallelization techniques and Machine Learning techniques in Entity resolution flows for improving the performance and quality of the process. I have been involved in OpenAire project, where we have developed a scalable framework over Apache Spark for interlinking scholarly data.
    • Data from Connected Vehicles: I participate in a EU-Funded COST Action, WISE-ACT, studying the wider implications of the deployment of autonomous and connected vehicles on existing road infrastructure in EU. My interest is on the adoption of novel cloud and data management technologies for online analytics and reaction to events at the edge and the cloud.
    • Knowledge Graphs: An active line of ongoing work is in the area of RDF Indexing and Query Processing in Big Knowledge Graphs. We have developed a scalable approach for storing in relational databases and scalable query processing of RDF knowledge graphs , based on a novel indexing technique, called Extended Characteristic Sets. I have also worked on OLAP analytics on the Web. We have developed an approach that employs data mining techniques and analyzes and detects relationships in OLAP data published on the web in the form of multidimensional data.
    • Social Data: I have technically coordinated Socioscope and YouWho two projects that created a visual analysis tool and a chat-based social survey tool, targeting primarily social scientists, for collection, visualization and exploration of social and political data.
  • Web Data Management
    • Linked Data: I have coordinated www.linked-statistics.gr , a project that makes available in the form of Linked Data, socio-economic and socio-demographic data, from the Hellenic Statistical Authority. Using data web technologies for creating and managing Personal dataspaces is an active research work. We have developed www.linkzoo.gr, a web-based, linked data enabled tool that supports collaborative management of information resources, enabling users to create and manage diverse types of resources into common spaces such as files, web documents, people, datasets and calendar events.
    • Web Data dynamics:has been a primary focus of my research, involving problems related to Linked Data Evolution & Archiving, Temporal & Change Modelling, Change Propagation and Synchronization, Proactive design, and Benchmarking. I was actively involved in the FP7 DIACHRON project, which addressed many of the above issues.
    • Legal Informatics: A recent project I have coordinated, deals with the Semantic Representation of Legal Documents. We have developed a framework for automatic Structuring and Semantic Indexing of Legal Documents, used in the electronic library of the Greek General Secretariat of Public Revenue (in Greek).
  • European Open Science Cloud and European Research Infrastructures
    • EOSC Core Development: I have been for the last 4 years the technical manager of the catalogue services behind the European Open Science Cloud. It offers a single catalogue for research services and providers offered by e-Infrastructures and research infrastructures, in EU.
    • EOSC Service Development:I am the principal investigator for ATHENA RC of Neanias Project, which develops novel research services for emerging Atmosphere, Underwater & Space Research Communities in the context of the European Open Science Cloud.
    • EOSC Monitor Services: I have been involved in the design of the Open Science observatory, a framework that monitors Open Science trends and their impact on research and provides insights and KPIs to researchers, funders and academia.
  • Data-Centric Ecosystems, Database Quality Metrics
    • A long standing research interest is on Data-Centric Information Systems, and issues related to their schema evolution, the representation of dependencies and automatic repairing of syntactic and semantic inconsistencies due to maintenance operations. I am also interested in the evaluation of design quality metrics and the construction of design patterns for these environments. There is an UoI-IMIS joint project, called Hecataeus, which combines the representation and management of evolution processes into a powerful tool.

Here is a list of my publications and my bio.


Education
  • 1995-2000:School of Electrical and Computer Engineering, National Technical University of Athens, Greece
    • Diploma Thesis: A Quality Assurance Framework for Educational Software
  • 2001 - 2009: PhD in Computer Science, National Technical University of Athens, Greece
    • Ph.D. Thesis: Policy Regulated Management of Schema Evolution in Database-centric Environments