Load disaggregation Challenge: Energy use in buildings

Background

Buildings account for a significant portion of global energy consumption and reducing energy use in buildings is critical to meeting global emissions reduction targets.

Building energy load disaggregation involves identifying individual energy services that consume energy within a building, such as heating, cooling, ventilation, lighting and plug loads. This is essential for implementing targeted energy efficiency measures and to enable building owners and operators to reduce energy consumption and related costs.

Explicit knowledge, or “ground truth”, of the energy consumed by individual energy services would require a building to be equipped with a detailed and accurate sub-metering system, where all major appliances are metered and monitored individually. However, this approach is costly and/or intrusive and is conventionally not practiced in the building industry. In most existing buildings the main energy meter data are the only data available, while the development of data service for buildings depends on the availability of additional information, such as the energy use for specific services like heating and cooling.

The accurate disaggregation of individual energy services’ consumption from the main energy metering is a complex problem and has important implications for building owners and operators, energy providers, and consumers.

The load disaggregation problem has been the focus of research in the building and energy sectors for many years. It remains a challenging problem due to the complex and diverse nature of appliances and their energy usage patterns, and due to the low resolution of the main meter’s data logging. Indeed, while smart meters are laying the ground for a digitalised monitoring of energy consumption in buildings (moving away from manual monthly, or sporadic, readings of analogue meters), their resolution is often limited to hourly values, or to 15 minutes at best.

Problem statement

The Load disaggregation Challenge aims to develop accurate and scalable algorithms for disaggregating the energy use for building heating and/or cooling from the main energy metering data, while taking the following key challenges into account:

Complexity: The challenge will focus on developing unsupervised algorithms that can accurately disaggregate heating and/or cooling loads without requiring prior knowledge of specific device features while maintaining learning ability and avoiding typical drawbacks, such as slow training speed, high calculation cost, and excessive number of parameters.
Generalization: Buildings can be quite different from one another in terms of size, design, construction materials, appliances, and occupants’ behaviour. Many of the disaggregation algorithms used are trained and tested on the same buildings, or set of similar buildings, which leads to poor generalization between different buildings. The datasets for this competition include multiple, various buildings, and promote the creation of algorithms with good inter-building generalization.

Data Sets

The companies sponsoring this competition (see logos at the bottom of this page) have provided datasets from buildings in which they also have installed a comprehensive sub-metering system, i.e. from buildings that are the exception rather than the rule in the building stock. The “ground truth” from the sub-metering of heating and cooling loads will not be disclosed to the participants but will be used to evaluate the goodness of the submitted solutions, according to the evaluation criteria described below. Only the main meter data, and weather data from the same location, will be provided in this challenge. From this the participants will have to disaggregate the heating and/or cooling loads.

The main characteristics (or metadata) for these datasets are summarised in the table below. Considering the different time resolutions, there are 14 datasets in total.

The participants should bear in mind that both buildings itself and their occupants’ behaviour are complex and varied, and therefore even the best sub-metering system has some limitations. For example, some non-heating energy use may have a certain seasonality, e.g. lighting, as well as some individual plug heating device may be occasionally in use without its consumption being captured by the sub-metering of boilers, heat pumps, district heating heat exchangers, etc. Such features are part of real-life conditions, and the challenge is for the disaggregation algorithm to match as well as possible the “ground truth” as known from the sub-metering system.

The usage of external data is allowed in this competition, as long as it is free and publicly available. The competitors must ensure that all data they use is freely available to all participants, and post access to the dataset on the competition forum before the end of the training phase.

Participation and Submission

I Register at the Adrenalin competition dashboard (Codalab)

To register a Codalab account, go to https://codalab.lisn.upsaclay.fr/accounts/signup
Go to the competition’s page.
Navigate to the “Participate” tab to accept the terms and conditions and register for the competition.

II Access the dataset

The dataset can be accessed through Codalab, under “Participate”.
The available datasets can be found under the Files tab.
Download the “Public Data”, a zip file containing the data described in the Datasets section.
When Stage 2 starts, its dataset will be available under the same name. The stage it belongs to is indicated on the left.
The starting kit is also available here, containing a Documentation template and an example submission file.

III Model development and result calculation

Model development takes place locally.
Only the results are required to get a score on the leaderboard.

IV Submission

To submit your results, navigate to the “Participate” tab on the competition’s Codalab website.
Click the “submit” button and select the file you want to upload.
The uploaded file must be a zip file, containing a solution file for every building. The file names should be the building identifiers and the data frequencies, as they are in the public dataset. The files should be CSVs containing two columns, the time stamps, and the disaggregation results.
During Stage 2, the submissions must also include detailed documentation of the approach, following the documentation template.
Once submitted, the results will be evaluated. Submissions can be expanded with the “+” button to access the output files. During the first stage, your best score will always be visible on the Leaderboard, accessible via the “Results” tab.

NOTICE: By submitting your results, you agree to make your algorithm available as open source under the BSD-3 license agreement (https://opensource.org/license/bsd-3-clause), in the case you are selected as one of the winners.

Evaluation Criteria

The evaluation will be based on both quantitative and qualitative criteria.

Quantitative evaluation

The quantitative evaluation will use the Normalised Mean Absolute Error (NMAE) as the metric for ranking the submissions, where the lower the NMAE the better. This metric is calculated as follows for each individual dataset (j), where datasets at different time resolutions from the same building are considered separate datasets:

Where: y_i = measurement values (the undisclosed “ground truth”)

ŷ = predicted values (the results of your disaggregation)

ȳ = average of measurement values

n = number of datapoints

All the individual dataset NMAE_j are then averaged to give the total NMAE as follows:

Where: m = number of datasets (where datasets at different time resolutions

from the same building are considered separate datasets)

For a submission to be eligible for the prize money, a solution must be submitted for each dataset and the total NMAE must be < 0.5.

Qualitative evaluation

The qualitative evaluation will consider the practical applicability of the load disaggregation algorithm, including data requirements, understandability, and computational efficiency.

Practical applicability is defined as follows:

Data requirements: The algorithm must only depend on the data provided by the competition.
Understandability: The algorithm’s decision-making process should be understandable by domain experts.
Computational efficiency: The algorithm should require reasonable computational resources, making it feasible for large-scale deployment.

The qualitative evaluation is based on the documentation that the participants are required to deliver together with their submissions. The documentation must have a level of detail that allows it to be reproducible by third parties. Please see the report template in the starting kit. If the evaluation committee considers the quality of the delivered documentation to be insufficient, the submission will be disqualified.

Timeline

Training stage (Phase I)

The training period will last for 2 months. In this period, the participants will be given a chance to familiarize themselves with the competition environment, training data, and documentation. This phase also allows participants to ask questions and raise issues using the forum on the Codalab page. There will be a public dashboard, showing the scoring of each submission (your submission must be complete with a solution for each and all datasets). There are no limits on how many times a participant may submit their solutions, and their best score will always be visible on the dashboard.

Competition stage (Phase II)

The competition stage will last 7 days. For this stage, new datasets will be released for the same buildings but for different periods of time. The quantitative evaluation will be based solely on these new datasets. In this phase there is a limit to submit maximum 5 solutions per participant. The best score will be retained for ranking and evaluation and will be visible on the public dashboard.

ADRENALIN