The increasing availability of smart meter data has catalyzed the creation of innovative energy-saving solutions, including demand response, personalized energy feedback, and non-intrusive load monitoring applications. These technologies depend heavily on sophisticated machine learning models that are trained on robust datasets detailing energy consumption. The accuracy and reliability of these applications hinge on real-world data collection.
Our recent publication in Scientific Data from Nature (check out our article here) showcases the Plegma dataset, a significant resource in this research field. This dataset provides comprehensive household energy consumption data collected at 10-second intervals from 13 distinct households over the span of a year. It also integrates environmental conditions, such as humidity and temperature, alongside demographic insights and user behaviors, facilitating both quantitative and qualitative research.
As a pioneering dataset in Greece, Plegma offers high-frequency electricity measurements that capture typical consumption patterns of Mediterranean households, including the use of devices like air conditioners and electric water boilers that are not commonly recorded in similar datasets. With over 218 million readings collected from 88 meters and sensors, this dataset serves as a valuable resource for researchers and developers.
From a technical standpoint, creating an end-to-end data collection pipeline presented significant challenges, particularly in engaging users and incentivizing their participation. The houses participating in the data collection process are part of the Athenian energy community which constitutes a newly established non-profit energy community in the municipality of Attika in Greece. In order to enhance user engagement in data collection, the developed energy monitoring system was extended to incorporate an intuitive graphical user interface to help participants to monitor and visualize their energy consumption patterns in real time. Our solution was developed in partnership with Plegma Labs (check Plegma Labs website), which provided valuable input on optimal data collection practices and shared their infrastructure to support our efforts. An illustrative overview of our framework shows how monitored devices communicate with the IoT gateway and transmit data every 10 seconds. For reliable, scalable, and high-performing equipment in our data collection and monitoring system, we utilized commercially available hardware from renowned home automation companies such as Aeotec and Qubino. The Raspberry Pi was selected as the IoT gateway due to its well-known suitability and versatility in home automation scenarios.
The data acquisition process includes the following stages:
- Monitored devices initially communicate with the IoT gateway using the Z-Wave communication protocol.
- The gateway then forwards the data to a central database server, equipped with AMD 2nd generation EPYC CPUs (8 VCPU, 16 GB RAM), and stores it in a PostgreSQL database, utilizing a RESTful API for secure data management.
- The collected data is accessible through a custom GUI application, which interacts with the database server via RESTful API calls.
Our choice of the Z-Wave communication protocol was due to its design for connected home technology and its recommendation for the SMETS2 ecosystem. It provides better reliability and coverage compared to other wireless technologies such as Bluetooth and Zigbee. Z-Wave also stands out for its power efficiency, especially compared to WiFi, which is crucial for battery-powered devices like environmental sensors. A significant advantage of Z-Wave is its mesh network topology, where every device can act as a repeater, extending the network’s range and enhancing its reliability by finding alternative communication paths if one is compromised.
The Plegma dataset is available in both raw and processed formats as CSV files from the University of Strathclyde data repository (dataset link). Offering the raw data allows researchers the flexibility to apply bespoke pre-processing methods, accommodating the presence of anomalies and gaps due to equipment issues. The methodology for data processing is also open-source and available on our GitHub page (check GitHub page).
The next figure illustrates an example of the processed electrical measurements, showcasing the power usage from House_1 on August 22nd. The synchronization of the data is evident, as the moments when appliances are turned on or off are clearly reflected in the aggregate readings.
This dataset not only provides detailed electrical measurements but also encompasses diverse types of data, from environmental factors to behavioral insights and building characteristics, enabling a wide array of applications—from NILM and demand forecasting to user behavior analysis.
The environmental data included in the Plegma dataset features both indoor and outdoor temperature (°C) and humidity (%) measurements. Indoor data were collected every 15 minutes and sent to the central database through the IoT gateway. Outdoor data were sourced from MET Norway's open API (https://api.met.no/weatherapi/), which provides historical weather information under a Creative Commons license, documenting temperature and humidity at an hourly resolution.
The sociodemographic section of the dataset provides insights into the gender, occupation, educational level, age, number of occupants, and income of the households. Building characteristics data include details like the type of house, the number of rooms, and the year of construction. The dataset also records usage patterns for appliances, capturing how often and at what times appliances are used based on the occupants' observations, which are used as soft labels for further analysis. The following figure presents a detailed overview of the socio-demographic and building characteristics contained within the Plegma dataset.
As a pivotal outcome of the GECKO project (check out GECKO project), Plegma dataset provides a dataset that embraces an interdisciplinary approach, melding machine learning with social sciences to address critical sustainability challenges.
We invite you to explore the Plegma dataset and contribute to our collective journey towards a sustainable future 🌱💼.
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in