Obtaining fisheries datasets can be quite challenging, and this is even more the case for fisheries inspection datasets. The complexity arises because these datasets are collected by governmental institutions that should protect the identities of the inspected vessels, as well as the vessels conducting the inspection. Confidentiality is necessary to protect sensitive information, which unfortunately makes access to such data for research particularly difficult.
In an effort that involved overseeing two master's theses at the Portuguese Naval Academy, it was possible not only to aggregate, pre-process, and cross-reference data to create a comprehensive database from 2015 to 2023, but also to make the necessary modifications for its public release.
The collection of data for the "Fisheries Inspection in Portuguese Waters from 2015 to 2023" dataset1 was conducted by the Portuguese Navy through standard Fiscalization Reports (FISCREP). These reports includes identification and type of the vessel, fishing gear being used at the time, and compliance with fishing regulations, among other variables. Pre-processing involved extensive data validation and integration with existing datasets from the Directorate-General for Natural Resources, Safety, and Maritime Services, the United Nations Code for Trade and Transport Locations, and the European Union Fleet Register. This data cross-referencing aimed to create the most comprehensive database possible to facilitate analysis without the need for external databases.
The data protection strategies detailed in the paper included anonymizing the dataset to ensure the privacy of those involved in fisheries inspections, crucial for meeting legal standards and safeguarding sensitive data. Techniques such as rounding values and adding random noise helped anonymize data points. Confidentiality was verified using Sample Frequency Count and Population Frequency Count, specifically looking for unique records (both counts equal to one) which pose the highest disclosure risk.
While protecting data privacy, the authors also assessed the data quality. This involved evaluating the dataset's integrity by examining variable distributions before and after anonymization to ensure they retained similar statistical characteristics. Correlation metrics were also used to evaluate how transformations affected variable relationships.
In summary, the process of collecting and protecting fisheries inspection data, particularly in Portuguese waters from 2015 to 2023, posed significant challenges due to the need for confidentiality and data accuracy. However, through careful aggregation, pre-processing, and cross-referencing efforts overseen by the authors, a comprehensive database was successfully compiled. This involved rigorous validation and anonymization techniques to safeguard privacy while ensuring the dataset's integrity for analysis. The resulting dataset not only facilitates robust research but also underscores the importance of balancing data protection with scientific inquiry in fisheries management and related fields.
Note: The dataset contributes to the Mar-IA2 project, dedicated to maritime Artificial Intelligence . This platform emphasizes data governance and value extraction through data science and AI techniques. Its goal is to establish a national data governance model and maximize value with the help of stakeholders' collective intelligence in the maritime sector.
1 Moura, R., Pessanha Santos, N., Vala, A. et al. Fisheries Inspection in Portuguese Waters from 2015 to 2023. Sci Data 11, 362 (2024). https://doi.org/10.1038/s41597-024-03088-4
2 For more information and to access the dataset, you can visit the Mar-IA project website: https://mar-ia.pt/
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in