Open, Reliable and Transparent Data

Iain R. Moodie

Stockholm Mini-symposium

2024-02-28

A brief annecdote

Sexual selection in plants

Pollen tubes interacting with pistil tissues - Jeanne Tonnabel
  • Bateman gradients in angiosperms
    • N = 2 (in 2021)
  • Project goal
    • Conduct a meta-analysis đŸ€”
  • Find datasets that could be re-analysed in this new context
  • Combine into a meta-analysis to test predictions

Sexual selection in plants

  • Initial search
    • N=2167 😊
  • After sorting
    • N=30 đŸ„č
  • After trying to source data
    • N=9 😐

Photo by Annie Spratt

Datasets we couldn’t use

  • Data not archived
    • No way to contact author
    • No response to contact
    • Data had been lost
    • Not willing to share data
  • Data archived
    • Inaccessible
    • Incomplete
    • Incomprehensible

Lost from science

Exxon Valdez oil spill 1989

  • 40.8 million litres of crude oil spilled
  • Settlement funds from Exxon used for research and monitoring the impacts of the spill
  • Between 1989 and 2010, 419 projects were funded
  • In 2012, NCEAS tried to compile all historic datasets
  • 70% were unrecoverable

Lost from science

Transparency in research

Opaque research

  • Publication bias
    • Not all research is published
  • Incomplete or insufficiently detailed methods
  • Selective reporting in results
    • Confirmation bias
    • “HARKing”
    • “P-hacking”
  • Unaccessible underlying data

Photo by Clem Onojeghuo

Opaque research limits science

  • Harder to replicate or re-use methods
  • Harder to build upon to progress the field
  • Harder to interpret results
  • Harder to trust the conclusions

Photo by Karl Hedin

Open, Reliable and Transparent Science

Open, Reliable and Transparent Data

And why you should care about it.

Reproducible and reliable results

  • Promotes accountability and trust
  • Mistakes can be corrected1
  • Analytical decisions can be justified
  • Scientific misconduct can’t hide

New questions & new methods

Photo by Monika Manenti
  • Built upon more effectively
    • Deeper understanding of data & analysis
    • Used to develop new tools/methods/protocols
    • E.g. Bumpus 1899
  • Viewed in a new light
    • Beyond the original paper
    • Paradigm shifts
  • Analysed using the latest methods
  • Meta analysis

More accurate meta-analysis

  • Easy extraction of accurate data
    • No need to extract from figure
    • Reduces ambiguity and error
  • Go beyond the results section
    • Helps reduce bias from selective reporting
    • Capture the full picture of the study
  • Extends the life the dataset
    • Can always be accessed

Learning and teaching

  • Teaching students using example datasets
    • Real biological “quirks”
    • Real scenarios
    • Can teach good practises from the start
  • Learning and understanding new methods
    • Complexity can be broken down
    • Walkthrough when code also available

Benefits for the data archiver

  • Increased exposure, reach, and trustworthiness
  • Citation advantage (+25%) 1
  • Your own best collaborator
    • Data is clean and ready to use
    • Well annotated
    • Cannot be lost

Photo by Anton

Reducing research loss & waste

  • Removes need for duplicated data collection effort
    • Time/location/event dependant data
    • Research animal use
  • Reduces cost of research

How are things going?

Transparency and Openness Promotion (TOP) guidelines

  • “A set of standards applied to journals to measure their alignment with open scientific principles”
    • Specific guidance on data transparency:
      • Level 3: open data + peer review of dataset and analysis
      • Level 2: open data in trusted repository
      • Level 1: mandatory data statement
  • >5000 journals are signatories
  • Field specific advice for ecology and evolution

Top down pressure

  • Journals
    • Mandated archiving has become “the norm”
  • Funding sources
    • Open access requirements extending to datasets
  • Institutions
    • To help staff meet requirements of the above

Community driven approaches

  • Positive attitudes towards data transparency are common
  • Lack of data transparency is seen as a problem
    • 67% of scientists think that lack of access to data is a major impediemnt to progress in science (Tenopir et al. 2011)

How well are we doing?

Tenopir et al. 2011

How well are we doing?

Published without sufficient data to replicate:

Photo by Steven Wright

How do we improve things?

  • Why we don’t share data?
    • Knowledge barriers
    • Re-use concerns
    • Disincentives
  • How to work towards data transparency

Knowledge barriers

What’s the process?

  • Do not know how to share data effectively
    • Which online data repository to use?
    • What format to share data in?

What’s the process?

  • Online guides and primers
    • British Ecological Society “Guides to Better Science”
    • UKRN Primers
    • SORTEE (coming soon)

British Ecological Society Primer Series

What’s the process?

  • Institutional libraries
    • Often under-utilised advice and guidance
  • FAIR templates and guides
  • Any data is better than no data!
    • Learn by doing

The FAIR Principles

Insecurities

  • Early career researchers can feel especially vulnerable
  • Fear, insecurity and embarressment are powerful emotions

Insecurities

  • Share before publication
    • Lab meetings or data review sessions
    • Pre-print (private or open)
  • Data being hard to understand is bigger issue
  • Culture that prioritises learning over citisism

Don’t see value in their data

  • Too niche
  • Too small
  • Why would someone be interested?

Photo by Diego PH

Don’t see value in their data

  • Highly subjective
  • Hard to predict future use
  • + all other benefits

Photo by Diego PH

Re-use concerns

Misinterpretation

  • Fear of inappropriate use
    • Lack of familiarity with particular dataset
    • Miss crucial details and draw misleading conclusions

Misinterpretation

  • High quality metadata
    • Peer review
  • Contactable
  • Not a unique problem to data

Sensitive information

  • Dual use problem
  • Weigh up benefits and costs
  • Ethical (and legal) implications
  • Sharing limited subset

Disincentives

Scooping

Fear of:

  • A researcher performs an analysis on publicly shared data that the original data collected had not done yet
    • Being “scooped”
  • Reduced collaborations
  • Loss of future publications
    • Metric used to assess performance

Photo by Saher Suthriwala

Scooping

Less likely than you would imagine:

  • Ideas are plentiful
  • Original collectors in best position to act
  • Most analyses by original authors on published data happen within 2 years1
  • Most analysis by other researchers peak at 5 years^1

Photo by Saher Suthriwala

Scooping

  • Pre-print to “claim”
  • If major concern:
    • restrictions on use of data can be made
    • embargo periods
  • Change in mindset to see data as a valuable contribution

Photo by Saher Suthriwala

How to work towards data transparency

1. Plan to publish your data!

  • What data needs to be recorded?
  • What metadata might be needed?
  • How raw/cleaned should my data be?
  • Talk with collaborators early about plans

2. Identify an appropriate repository

  • Field specific
  • Data type specific
  • Journal preferences
  • Good starting place: re3data.org

Subjects covered by re3data.org

3. Make a nice README file

  • One or more plain text files that describe the data in detail
  • Write early!
  • Check repository guidelines
  • Document your data

4. Pre-peer-review peer-review

  • Ask a colleague to look through your README and dataset
    • Data/code review sessions
    • Can they make sense of it?

Photo by Jason Goodman

5. Publish your data

  • Make sure it has a citable DOI
  • Cite your data in your publication!
  • Talk about it with your colleagues

Thank you for listening