Intended audience

This document is aimed at everyone working in the Gov4Nano, NanoRIGO, RiskGONE projects.

DOI

The Joint glossary of key terms related to data management and knowledge infrastructure / portal / platform summarises the activities of the Core Group on Data Management organized jointly by the three projects funded under the H2020 NMBP-13 call: Gov4Nano, NANORIGO and RiskGONE. These activities reported here took place during the period January-June 2020 as part of the ongoing activities of the data management core group, and is one of the two parallel activities that this core group focussed on over this period (the other activity will be reported separately in Joint Milestone (JM) 2 on prioritisation of databases to be made interoperable).

Introduction

A key stumbling block to within and across-project collaboration is unintentional or unacknowledged differences in understanding of the meaning of key terminology, which can arise due to language barriers or as a result of slight disciplinary differences. Formal ontologies, such as eNanoMapper ontology can help here, as these are directed towards harmonisation and organisation of dataset as well as supporting day-to-day use of terms. Similarly, standardisation organisations such as ISO have definitions for a lot of terminology related to nanotechnologies and nanomaterials, and consistent use of this is essential, but not always applied in practice.

To overcome the challenge of consistent use of terminology, several projects have established a glossary of terms for use within the project. For example, the FP7 NanoREG project developed and published the 'NANoREG Harmonised Terminology', aligned as far as possible with already existing definition text(s), thus avoiding the creation of new and unwelcome information. The list includes terms with international regulatory relevance, such as those defined at OECD level, as well as terms that have a specific meaning and use under REACH legislation.

In order to support collaboration and consensus across the three NMPB-13 projects, and to facilitate the work of the 4 “Core Groups” spanning the Risk Governance Council and Framework, the Nano-Risk Governance Portal – Tools and Instruments, Stakeholder Involvement and ourselves as the Data Management Core group, it was agreed that we would establish a cross-project Glossary of terms to provide clarity on several important issues, including:

Methodology

Based on experience from previous projects, the data management core group agreed that an excel-based system with a sheet per Core group would be a good starting point. This document has been established in the Teams folder for the Core Groups, and each core group is free to add terms to it.

As part of the Data Management Core group activity, we identified existing ontologies or relevant glossaries that had defined key terms already, in order to avoid unnecessary duplication of effort. For example, the EDAM ontology for Bioinformatics operations, types of data, data identifiers, data formats, and topics was identified as a potential source of existing definitions for data management terms such as data warehousing, data stewardship and governance etc.

In addition to the aforementioned NanoREG Harmonised terminology, we have also aligned with the efforts in the H2020 GRACIOUS project, who have established a Wiki approach to support harmonisation of terminology for IATA, grouping and read-across. This effort builds upon activities from the H2020 project NanoCommons and others.

In June 2020, as part of a growing awareness that there are a multitude of ontology activities going on independently across the NanoSafety Cluster, NanoCommons suggested the implementation of a Task Force to integrate the efforts and align them, and address the issues with the structure of eNanoMapper ontology that had driven GRACIOUS, the US NIKC and other projects to develop their own glossaries and approaches separately. The Core Data Management Group will play a central role in this activity and will support the integration of the harmonised terminology from the Core Groups into the revised eNanoMapper ontology.

Outcomes

The NMBP-13 projects Glossary of Terms is Living document, that will continue to be updated and evolved by the Core groups, the WPs and the partners, with the final goal of integration of the terms into the revised and extended eNanoMapper-NanoCommons ontology.

The Excel-based Glossary is available and was shared with the other Core Groups at the joint meeting on 18th May 2020. To start off the process of filling it, the Data management core group pre-filled examples of general terms relating to each of the cores, and key terms that had been identified by NanoRIGO’s work on database harmonisation which identified toxicology, hazard and several other terms as being potentially mis-used or mis-interpreted. Links to existing ontology terms and definitions are also included, as well as link to key references and resources, such as the Harmonised Glossary of Terms for Toxicology, the JRC Harmonised terms etc.

This spreadsheet is converted into the below "The glossary" section.

Figure 1 shows a screen shot of the data management core group initial terms, their definitions and the link to the existing ontology term.


Figure 1: Screenshot of the initial Data management core group Glossary of terms, including the link to the existing Ontology term.

How to extend the Glossary

Core groups are initially invited to add their terms, and then the data management core group will identify the relevant ontology term, and confirm that the associated definition meets the needs of the Core group.

WP leaders from across the 3 projects will then be invited to access the Glossary also to check for terms in their day-to-day activities and to add additional terms as needed. The goal is not to add every single term, but to focus on those terms where there is potential for confusion / mis-use or misunderstanding as to the specific meaning of the term in the specific project context.

How to use the Glossary

The Glossary supports harmonisation of terminology to aid discussions within and between the Core Groups. In addition, harmonisation of terminology results in a better understanding of the content which is talked and discussed about. This better understanding of the semantics within the project trickles down to ensure consistent and harmonised deliverable reports and project outputs and improves their overall quality. A key step in the internal review of all deliverables will be cross-checking them for consistency with the terminology from the glossary – this can be easily achieved through addition of the key terms and their definitions alongside the list of abbreviations in all deliverable reports.

While it was decided that the Glossary would be Excel-based, the associated ontologies will be used for harmonised annotation of project datasets, both literature curated and generated within the projects.

Furthermore, semantics, ontologies and glossaries will be used consistently by the 3 projects as these tools aid individuals, the team and the projects. This approach will be useful outside the field of toxicology as well. Whereas we see a surge in the use of semantics, applying these methodologies should be implemented more often to make more people familiar with the concepts. Therewithal, the use of the Glossary showed to be beneficial to support collaboration and consensus across the three NMPB-13 projects.

The glossary

This document does not replace the Excel spreadsheet glossary, but just a translation of it. The Excel spreadsheet is normative and this document informative.

Data Management terms

TermDefinitionSuperclass IRI
Annotation This is a broad data type and is used a placeholder for other, more specific types. A human-readable collection of information which (typically) is generated or collated by hand and which describes a biological entity, phenomena or associated primary (e.g. sequence or structural) data, as distinct from the primary data itself and computer-generated reports derived from it.
Anonymisation Process data in such a way that makes it hard to trace to the person which the data concerns. operation_3283
Classification Topic focused on identifying, grouping, or naming things in a structured way according to some schema based on observable relationships. topic_2230
Cloud Server or network of servers, accessed remotely.
Cloud computing Storing and processing data on multiple servers that can be accessed through the Internet. D000067917
Core data A type of data that (typically) corresponds to entries from the primary biological databases and which is (typically) the primary input or output of a tool, i.e. the data the tool processes or generates, as distinct from metadata and identifiers which describe and identify such core data, parameters that control the behaviour of tools, reports of derivative data generated by tools and annotation. Core data entities typically have a format and may be identified by an accession number. data_3031
Data handling Basic (non-analytical) operations of some data, either a file or equivalent entity in memory, such that the same basic type of data is consumed as input and generated as output.
Data reference Reference to a dataset (or a cross-reference between two datasets), typically one or more entries in a biological database or ontology. A list of database accessions or identifiers are usually included. data_2093
Database A digital data archive typically based around a relational model but sometimes using an object-oriented,key-value, tree or graph-based model. SIO_000750 ("database")
Databank A flat-file (textual) data archive. data_2831
Database cross-mapping The cross-mapping is typically a table where each row is an accession number and each column is a database being cross-referenced. The cells give the accession number or identifier of the corresponding entry in a database. If a cell in the table is not filled then no mapping could be found for the database. Additional information might be given on version, date etc. A mapping of the accession numbers (or other database identifier) of entries between (typically) two biological or biomedical databases. data_0954
Format A defined way or layout of representing and structuring data in a computer file, blob, string, message, or elsewhere. The main focus in EDAM lies on formats as means of structuring data exchanged between different tools or resources. The serialisation, compression, or encoding of concrete data formats/models is not in scope of EDAM. Format 'is format of' Data. format_1915
Framework A set of elements (e.g. ideas, best practices, regulatory provisions) organised in a conceptual manner, which constitute a frame of reference for a certain topic or issue.
Harmonisation The term 'harmonisation' can be defined as the establishment of a common and coherent basis in a certain field/activity or for a certain scope.
ID list A simple list of data identifiers (such as database accessions), possibly with additional basic information on the addressed data. data_2872
Identifier A text token, number or something else which identifies an entity, but which may not be persistent (stable) or unique (the same identifier may identify multiple things). data_0842
Identifier with metadata Basic information concerning an identifier of data (typically including the identifier itself). For example, a gene symbol with information concerning its provenance. data_2767
Model A mathematical model is the use of mathematical language to describe the behaviour of a system. A mathematical model usually describes a system by a set of variables and a set of equations that establish relationships between the variables. The variables represent some properties of the system, for example, measured system outputs often in the form of signals, timing data, counters, event occurrence (yes/no). The actual model is the set of functions that describe the relations between the different variables. [source: WordIQ online dictionary] SBO_0000004
Ontology An ontology of biological or bioinformatics concepts and relations, a controlled vocabulary, structured glossary etc. data_0582
Parsing Parse, prepare or load a user-specified data file so that it is available for use. operation_1812
Platform
Portal A Resource that provides a point of access to information on the World Wide Web, presenting information from diverse sources in a unified way. Portal
Software A set of coded instructions, which a computer follows in processing data, performing an operation, or solving a logical problem, upon execution of the program. C17146
Tool A bioinformatics package or tool, e.g. a standalone application or web service. data_0007

Council terms

TermDefinitionSuperclass IRI
Risk governance
Portal A Resource that provides a point of access to information on the World Wide Web, presenting information from diverse sources in a unified way.

Modelling terms

TermDefinitionSuperclass IRI
Atomistic model
Computational model
Continuum model Continuum theories or models explain variation as involving a gradual quantitative transition without abrupt changes or discontinuities. It can be contrasted with 'categorical' models which propose qualitatively different states.
Electronic model
Meta-model
Model A mathematical model is the use of mathematical language to describe the behaviour of a system. A mathematical model usually describes a system by a set of variables and a set of equations that establish relationships between the variables. The variables represent some properties of the system, for example, measured system outputs often in the form of signals, timing data, counters, event occurrence (yes/no). The actual model is the set of functions that describe the relations between the different variables. [source: WordIQ online dictionary] SBO_0000004

Hazard terms

These terms originate from https://www.ecetoc.org/sr_19/preface/definitions/.
TermDefinitionSuperclass IRI
Adverse Outcome A specialised type of key event (KE), measured at a level of organisation that corresponds with an established protection goal and/or is functionally equivalent to an apical endpoint measured as part of an accepted guideline test.
Adverse Outcome Pathway A conceptual framework that organises existing knowledge concerning biologically plausible, and empirically supported, links between molecular-level perturbation of a biological system and an adverse outcome at a level of biological organisation of regulatory relevance.
Apical Endpoint Traditional, directly measured, adverse whole-organism outcomes of exposure in in vivo tests.
Integrated Approaches to Testing and Assessment A structured approach that strategically integrates and weights all relevant data to inform regulatory decisions regarding potential hazard and/or risk and/or the need for further targeted testing and therefore optimising and potentially reducing the number of tests that need to be conducted.
Key Event A measurable change in biological state that is essential, but not necessarily sufficient, for the progression from a defined biological perturbation toward a specific adverse outcome.
Key Event Relationship A scientifically-based relationship between a pair of KEs, identifying one as upstream and the other as downstream.
Molecular Initiating Event A specialised type of KE, defined as the point where a chemical directly interacts with a biomolecule within an organism to create a perturbation that starts the AOP – as such, by definition, it occurs at the molecular level.
Mode Of Action A biologically plausible sequence of key events leading to an observed effect supported by robust experimental observations and mechanistic data.
Toxicity pathway Perturbation of a normal biochemical pathway from the molecular initiating event to the cellular effect.

Exposure terms

TermDefinitionSuperclass IRI

Acknowledgments

We thank the reviewers Janeck James Scott-Fordsmand, Monique Groenewold, and Maria Dusinska.

Funding

This work received funding from the European Union’s Horizon 2020 research and innovation programme via NanoRIGO Project under grant agreement No 814530, RiskGONE Project under grant agreement No 814425, and via Gov4Nano under grant agreement No 814401.

List of abbreviations

FP7
Framework Programme 7 (funding programme of the European Commission, 2007-2013)
H2020
Horizon 2020 (funding programme of the European Commission, 2014-2020)
ISO
International Organization for Standardization
JM
Joint Milestone
JRC
Joint Research Center (of the European Commission)
OECD
Organisation for Economic Cooperation and Development
REACH
Registration, Evaluation and Authorisation of Chemicals