The Plegma dataset: Domestic appliance-level and aggregate electricity demand with metadata from Greece

The Plegma dataset includes smart-meter data from 13 Greek households, featuring electric measurements, environmental data, and sociodemographic and building characteristics. With both quantitative and qualitative components, it is a valuable source for creating energy-saving services.
The Plegma dataset: Domestic  appliance-level and aggregate  electricity demand with metadata  from Greece
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The increasing availability of smart meter data has catalyzed the creation of innovative energy-saving solutions, including demand response, personalized energy feedback, and non-intrusive load monitoring applications. These technologies depend heavily on sophisticated machine learning models that are trained on robust datasets detailing energy consumption. The accuracy and reliability of these applications hinge on real-world data collection.

Our recent publication in Scientific Data from Nature (check out our article here) showcases the Plegma dataset, a significant resource in this research field. This dataset provides comprehensive household energy consumption data collected at 10-second intervals from 13 distinct households over the span of a year. It also integrates environmental conditions, such as humidity and temperature, alongside demographic insights and user behaviors, facilitating both quantitative and qualitative research.

Plegma Dataset Overview
Plegma Dataset Overview

As a pioneering dataset in Greece, Plegma offers high-frequency electricity measurements that capture typical consumption patterns of Mediterranean households, including the use of devices like air conditioners and electric water boilers that are not commonly recorded in similar datasets. With over 218 million readings collected from 88 meters and sensors, this dataset serves as a valuable resource for researchers and developers.

From a technical standpoint, creating an end-to-end data collection pipeline presented significant challenges, particularly in engaging users and incentivizing their participation. The houses participating in the data collection process are part of the Athenian energy community which constitutes a newly established non-profit energy community in the municipality of Attika in Greece. In order to enhance user engagement in data collection, the developed energy monitoring system was extended to incorporate an intuitive graphical user interface to help participants to monitor and visualize their energy consumption patterns in real time.  Our solution was developed in partnership with Plegma Labs (check Plegma Labs website), which provided valuable input on optimal data collection practices and shared their infrastructure to support our efforts. An illustrative overview of our framework shows how monitored devices communicate with the IoT gateway and transmit data every 10 seconds. For reliable, scalable, and high-performing equipment in our data collection and monitoring system, we utilized commercially available hardware from renowned home automation companies such as Aeotec and Qubino. The Raspberry Pi was selected as the IoT gateway due to its well-known suitability and versatility in home automation scenarios.

Overview of the developed energy monitoring and collection framework.
Overview of the developed energy monitoring and collection framework.

The data acquisition process includes the following stages:

  •  Monitored devices initially communicate with the IoT gateway using the Z-Wave communication protocol.
  • The gateway then forwards the data to a central database server, equipped with AMD 2nd generation EPYC CPUs (8 VCPU, 16 GB RAM), and stores it in a PostgreSQL database, utilizing a RESTful API for secure data management.
  • The collected data is accessible through a custom GUI application, which interacts with the database server via RESTful API calls.

Our choice of the Z-Wave communication protocol was due to its design for connected home technology and its recommendation for the SMETS2 ecosystem. It provides better reliability and coverage compared to other wireless technologies such as Bluetooth and Zigbee. Z-Wave also stands out for its power efficiency, especially compared to WiFi, which is crucial for battery-powered devices like environmental sensors. A significant advantage of Z-Wave is its mesh network topology, where every device can act as a repeater, extending the network’s range and enhancing its reliability by finding alternative communication paths if one is compromised.

The Plegma dataset is available in both raw and processed formats as CSV files from the University of Strathclyde data repository (dataset link). Offering the raw data allows researchers the flexibility to apply bespoke pre-processing methods, accommodating the presence of anomalies and gaps due to equipment issues. The methodology for data processing is also open-source and available on our GitHub page (check GitHub page).

Overview of the dataset’s folder structure
Overview of the dataset’s folder structure.

The next figure illustrates an example of the processed electrical measurements, showcasing the power usage from House_1 on August 22nd. The synchronization of the data is evident, as the moments when appliances are turned on or off are clearly reflected in the aggregate readings.

The power usage for House 1 on the 22nd of August 2023. The space between the aggregate_appliance consumption curve and the total power consumption (P_agg) curve illustrates the energy consumed by appliances that are not under monitoring.

This dataset not only provides detailed electrical measurements but also encompasses diverse types of data, from environmental factors to behavioral insights and building characteristics, enabling a wide array of applications—from NILM and demand forecasting to user behavior analysis.

The environmental data included in the Plegma dataset features both indoor and outdoor temperature (°C) and humidity (%) measurements. Indoor data were collected every 15 minutes and sent to the central database through the IoT gateway. Outdoor data were sourced from MET Norway's open API (https://api.met.no/weatherapi/), which provides historical weather information under a Creative Commons license, documenting temperature and humidity at an hourly resolution.

The sociodemographic section of the dataset provides insights into the gender, occupation, educational level, age, number of occupants, and income of the households. Building characteristics data include details like the type of house, the number of rooms, and the year of construction. The dataset also records usage patterns for appliances, capturing how often and at what times appliances are used based on the occupants' observations, which are used as soft labels for further analysis. The following figure presents a detailed overview of the socio-demographic and building characteristics contained within the Plegma dataset.

Overview of the sociodemographic and building characteristic data.
Overview of the sociodemographic and building characteristic data.

As a pivotal outcome of the GECKO project (check out GECKO project), Plegma dataset provides a dataset that embraces an interdisciplinary approach, melding machine learning with social sciences to address critical sustainability challenges.

We invite you to explore the Plegma dataset and contribute to our collective journey towards a sustainable future 🌱💼.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Energy Grids and Networks
Technology and Engineering > Electrical and Electronic Engineering > Electrical Power Engineering > Energy Grids and Networks
Energy and Behaviour
Physical Sciences > Earth and Environmental Sciences > Environmental Sciences > Energy Policy, Economics and Management > Energy and Behaviour
Energy and Society
Physical Sciences > Earth and Environmental Sciences > Environmental Sciences > Environmental Social Sciences > Energy and Society
Research Data
Research Communities > Community > Research Data
Methodology of Data Collection and Processing
Mathematics and Computing > Statistics > Methodology of Data Collection and Processing

Related Collections

With collections, you can get published faster and increase your visibility.

Epidemiological data

This Collection presents a series of articles describing epidemiological datasets spanning diverse populations, ecosystems, and disease contexts. Data are presented without hypotheses or significant analyses, and can be derived from population surveys, health registries, electronic health records, field sampling, or other sources.

Publishing Model: Open Access

Deadline: Mar 27, 2025

Neuroscience data to understand human behaviour

This Collection presents descriptions of datasets combining brain imaging or neurophysiological data performed alongside real-world tasks or exposure to different stimuli.

Publishing Model: Open Access

Deadline: Jan 30, 2025