Behind the Paper

Revealing the picture behind the construction of China’s low-carbon policy intensity dataset

Based on the phrase-oriented NLP algorithm and text-based prompt learning, our new article in Scientific Data presents China’s low-carbon policy intensity dataset, including the nation, 31 provinces, and 334 prefecture cities with sub-intensity for 3 levels, 4 objectives, and 3 instruments.

Published in Research Data

Feb 23, 2024

Xinyang Dong and Can Wang

2 contributors

Revealing the picture behind the construction of China’s low-carbon policy intensity dataset

Liked by India Ambler

Explore the Research

Why did we build the dataset?

Low-carbon policies are essential for facilitating manufacturing industries’ low-carbon transformation and achieving carbon neutrality in China. However, recent studies usually apply proxy variables to quantify low-carbon policies, while composite indices of policy intensity measured by objectives and instruments focus more on the national level. It is deficient in direct and comprehensive quantification for low-carbon policies. Moreover, using traditional 100% manual policy scoring or pre-train and fine-tuning paradigms for a dataset with smaller samples and higher manual labeling costs could result in large human bias and overfitting. Therefore, a new paradigm is needed for low-carbon policy evaluation and quantification.

The contribution of this dataset

Having used the phrase-oriented NLP algorithm and text-based prompt learning, our latest paper published in Scientific Data built a low-carbon policy inventory with 7282 policies and constructed the low-carbon policy intensity for China’s manufacturing industries from 2007 to 2022. The contributions are threefold.

From the methodological perspective, we provide a new paradigm with transparent and repeatable policy evaluation processes. Although 100% manual scoring has been commonly used for policy evaluation with the scoring criteria explained in articles, it is still a “black box” from policy text to policy scores for readers since most of the evaluation processes are conducted in researchers’ own mind. Therefore, based on Python and Stata programming, this study has opened the “black box” using the phrase-oriented NLP algorithm and text-based prompt learning. By breaking it into steps, each policy evaluation stage can be quantified and repeated.

From the data perspective, this paper constructs a comprehensive dataset to reflect policy intensity’s heterogeneity from three policy levels, four policy objectives, and three policy instruments. Previous studies’ concentration on policies from a single policy level or specific regions cannot fully reflect the overall policy changes. This is why 7,282 low-carbon policies promulgated by China’s national level, 31 provincial levels, and 334 prefecture-level cities have been included in this article and dataset.

From the theoretical perspective, the meaning of low-carbon policy intensity has been expanded by adding the factor of policy level to the “policy objective–policy instrument” pattern. Hence, the low-carbon policy intensity in this paper is quantified by multiplying each policy’s objective, instrument, and level.

Stories behind the dataset construction

Compared with the time length for model training, data preparation for low-carbon policy texts as model input took longer. Owing to the application of supervised prompt learning model in this paper, the quality of training dataset directly influenced the accuracy of dataset for prediction. Hence, more than 60% of the time was spent on constructing low-carbon policy inventory, policy structuring, policy classification, and making labelling criteria. At this stage, the most important thing was to visualize the preparation process completed in the human brain through coding. For example, when classifying and scoring policy objectives, we had to consider which subsections in structured policy texts needed to be focused on (i.e., selecting the key subsection in policy texts) and which sentences contained keywords that can measure the intensity of low-carbon policy texts (i.e., identifying essential scoring and classifying phrases).

Furthermore, small and medium “pilot work” was conducted before model training. Because one of the applications of China’s low carbon policy intensity is for quantitative studies, whether the constructed index is reliable and can be used as a policy variable need to be closely verified. Therefore, apart from the validation for the whole dataset, the data validation process and correlation analysis with carbon performances were conducted after finishing the manual labelling for 4 provinces, 8 provinces and 16 provinces, respectively. These verifications for small and medium “pilot work” further guaranteed the quality of policy index construction.

The data record and access to this dataset

Our dataset is valuable for researchers concerned with low-carbon policies and has been organized in two formats (.dta and .xlsx). Apart from the inventory and intensity for each policy, the policy intensity is also aggregated to national-, provincial, and prefecture-level with sub-intensity for four policy objectives (i.e., carbon reduction, energy conservation, capacity utilization, and technology), and three policy instruments (i.e., command-and-control, market-based, composite instruments). All data and code are open to the public in the Figshare repository. Hence, data users are able to merge it with macro- and micro-data for extended analysis and discuss the impact of low-carbon policies from different perspectives.

Multiple Contributors

Xinyang Dong and Can Wang

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data

Research Communities > Community > Research Data

Scientific Data

Scientific Data

A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Data for crop management

This Scientific Data Collection welcomes submissions of Data Descriptors associated with datasets for crop management, which are essential for optimising agricultural productivity, sustainability, and food security.

Publishing Model: Open Access

Deadline: Apr 17, 2026

Explore this Collection

Data to support drug discovery

This Scientific Data collection aims to gather data descriptors on high-quality, reusable datasets relevant to the drug discovery and development process.

Publishing Model: Open Access

Deadline: Apr 22, 2026

Explore this Collection

LAMAs: controlling proteins with small molecules

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Revealing the picture behind the construction of China’s low-carbon policy intensity dataset

Share this post

Share with...

...or copy the link