A single-cell view of pancreatic cancer diversity

Pancreatic cancer is one of the most lethal cancers, with only 11% five-year survival. Tumour heterogeneity is a key reason: cells within the same tumour behave differently. Here we present a single-cell dataset of 31,000 cells from 41 patient-derived cultures, freely available.
A single-cell view of pancreatic cancer diversity
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Why we finally decided to look more closely at models we had been working with for years

Why this mattered to us

We have been working with patient-derived pancreatic cancer cells for a long time. These are living cells, taken from real patients, that we grow in the laboratory and use to study the disease and test treatments. Over the years we have published a lot using these models: drug response data, transcriptomic signatures, metabolic profiles. We thought we knew them well.

But there was always something uncomfortable sitting in the back of our minds. When we exposed these cells to a drug and some survived, we did not really know why. Were those survivors a distinct population that was always there? Or did they adapt? We were averaging signals across millions of cells and calling it a day. That felt increasingly inadequate.

Single-cell sequencing changed what was possible. And we had 41 well-characterised models sitting in our freezers. So we did it.

What we actually did and what nearly went wrong

The technical side was not straightforward. Processing 41 different cell cultures, each with its own growth characteristics and culture conditions, across two separate experimental batches introduced variability we had to be very careful about. There were moments where we were not sure the batch correction would work well enough to make the data usable across samples.

We chose SPLiT-seq over droplet-based technologies deliberately. Not because it is perfect, it has real limitations particularly around sequencing saturation, but because it was the only approach that let us profile this many samples in a way that was actually feasible. Science is full of trade-offs and this was one we made with open eyes.

The data processing took longer than expected. Quality control decisions that seem straightforward on paper become genuinely difficult when you are looking at 34,000 cells across two batches with different characteristics. Every threshold you set excludes real cells. That tension never fully goes away.

What we found and what surprised us

The heterogeneity was bigger than we expected. Not just between patients, which we anticipated, but within individual cultures from the same patient. Cells that should be relatively uniform, grown from a single source in controlled conditions, were showing dramatically different transcriptional states.

Some looked classical and well-differentiated. Others were clearly basal-like, the more aggressive phenotype associated with treatment resistance. And they were sitting side by side in the same flask.

That was the moment that shifted something for me. We had been treating these cultures as relatively homogeneous units in our drug response experiments. They are not. They never were. We just could not see it before.

What this is really for

We are making the full dataset publicly available: 30,593 cells, all the metadata, all the code. Not as an afterthought but as the main point.

There are very few large, well-characterised single-cell datasets from patient-derived pancreatic cancer models that cover the full disease spectrum, from localised to metastatic. Most single-cell work focuses on surgically resected tumours, which represent maybe 15% of patients. We wanted to offer something broader.

We hope others will use this to benchmark computational tools, explore cell states we have not looked at yet, and ask questions we have not thought of. That is not a polite thing to say at the end of a paper. We genuinely mean it.

What comes next and what we still do not know

The honest answer is that generating this dataset raised more questions than it answered.

We now want to know which of those cell states are associated with drug resistance. If a culture contains a subpopulation with a basal-like signature before treatment starts, does that predict poor response? Can we use that information prospectively?

Those experiments are underway. But they take time and the answers are not guaranteed.

What I can say is that after years of working with these models, I feel like we are finally seeing them properly. That alone felt worth doing.

Dataset: ArrayExpress E-MTAB-16856 https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-16856

Code: zenodo.18273699 https://doi.org/10.5281/zenodo.18273699

References

  1. Chocoloff V. et al. Single-cell transcriptomic profiling of patient-derived pancreatic ductal adenocarcinoma primary cell cultures. Scientific Data (2026). https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-16856
  2. Fraunhoffer N. et al. Development and validation of AI-assisted transcriptomic signatures to personalize adjuvant chemotherapy in patients with pancreatic ductal adenocarcinoma. Annals of Oncology, 35(9), 780-791 (2024). https://doi.org/10.1016/j.annonc.2024.06.010
  3. Fraunhoffer N. et al. Multi-omics data integration and modeling unravels new mechanisms for pancreatic cancer and improves prognostic prediction. NPJ Precision Oncology, 6, 57 (2022). https://doi.org/10.1038/s41698-022-00299-z
  4. Hwang W.L. et al. Single-nucleus and spatial transcriptome profiling of pancreatic cancer identifies multicellular dynamics associated with neoadjuvant treatment. Nature Genetics, 54, 1178-1191 (2022). https://doi.org/10.1038/s41588-022-01134-8

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Spotlight on Research from Germany
Research Publishing > Spotlight on Research from Germany
Pancreatic Cancer
Life Sciences > Biological Sciences > Cancer Biology > Cancers > Gastrointestinal Cancer > Pancreatic Cancer
Tumour Heterogeneity
Life Sciences > Biological Sciences > Cancer Biology > Tumour Heterogeneity
Transcriptomics
Life Sciences > Biological Sciences > Biological Techniques > Gene Expression Analysis > Transcriptomics
Cancer Biology
Life Sciences > Biological Sciences > Cancer Biology
Computational Biology
Mathematics and Computing > Mathematics > Applications of Mathematics > Computational Biology

Related Collections

With Collections, you can get published faster and increase your visibility.

Genomics in freshwater and marine science

This Scientific Data collection of articles focuses on transcriptomic datasets and genome assemblies from freshwater and marine taxa.

Publishing Model: Open Access

Deadline: Jul 23, 2026

Genomes of endangered species

This Scientific Data Collection of articles focuses on genome assemblies of endangered or threatened species.

Publishing Model: Open Access

Deadline: Jul 01, 2026