Podcast: How to get big projects to soar

Anna Barker, chief strategy officer at the Ellison Institute and former principal deputy director of the US National Cancer Institute knows about large-scale projects as one of the co-initiators of The Cancer Genome Atlas.
Podcast: How to get big projects to soar

Share this post

Choose a social network to share with, or copy the shortened URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

You are perhaps ready to dream big...how do you get a large-scale project to come to be and to soar?

Dr. Anna Barker has some answers about that from the past, the present and the future. She is chief strategy officer at the Ellison Institute, a think tank and research institute. Before that, she was the principal deputy director of the US National Cancer Institute and deputy director for strategic scientific initiatives there. One of her projects was The Cancer Genome Atlas, which she co-directed with Dr. Francis Collins. 

Here is a podcast with Dr. Barker and a transcript of the podcast is pasted below. 

Transcript of the podcast

Note: These podcasts are produced to be heard. If you can, please tune in. Transcripts are generated using speech recognition software and there’s a human editor. But a transcript may contain errors. Please check the corresponding audio before quoting.

Anna Barker

You have to have an advocate, you have to have an advocate who’s willing to really go to bat for you and believe in the science and believe that it’s important enough that you find ways to do things.


That’s Dr. Anna Barker, Chief Strategy Officer at the Ellison Institute, which is a think tank and research institute. Before that she was the principal deputy director of the US National Cancer Institute and deputy director for strategic scientific initiatives there. Hi and welcome to Conversations with scientists, I’m Vivien Marx.

I do these podcasts as a way to share more of what I find out in my journalism travels. Today’s episode is based on a conversation with Dr. Anna Barker, someone I have wanted to speak with for a long time. Yes, if you know her, you know this episode will be about cancer. It will also be about academia, physics, information theory, big data, history and science policy. It’s about finding advocates and supporters for projects, building alliances and consortia. Big ones. Dr Barker takes you on an intellectual and intriguing journey through all these topics. Of course cancer is in its core upsetting but I don’t think this episode will necessarily feel upsetting, it might even feel empowering. You can let me know how you find what she says.

She is perhaps most well known for having co-founded at The National Cancer Institute, the project called The Cancer Genome Atlas, which is a project that began in 2006 and ran for 12 years that ran cancer research and genomics on a big scale indeed. The teams, and there were many, many scientists and clinicians involved, characterized in a molecular way over 20,000 samples from tumors in 11,000 patients and they did the same with samples of healthy tissue from the same people. And the teams did this analysis for 33 cancer types in total.

In cancer research, The Cancer Genome Atlas is a rather well-known and highly accessed data resource. It was a collaboration between The National Cancer Institute and the National Human Genome Research Institute (NHGRI). It has generated around 2.5 petabytes of data. https://www.cancer.gov/ccg/research/genome-sequencing/tcga

Dr Barker is currently the chief Strategy Officer at the Ellison Institute, which is a think tank and research institute. Before that she was at the NCI. The programs Dr Barker led, while she was principal deputy director of the US National Cancer Institute and deputy director for strategic scientific initiatives there there led include a long list The Cancer Genome Atlas which I just mentioned and which she developed along with colleagues at NHGRI. And she led many other projects including the Physical-Sciences Oncology Centers program that connects physicists, mathematicians, engineers and cancer scientists.

Dr. Barker has many tales to tell about the past, present and future of cancer research and cancer treatment, about research and policy more generally, about doing big things and getting those off the ground, about data-sharing and why people don’t share data and what to do to get them to share. We jumped right in talking about epigenomics. And about how invisible that field was for a long while. Genomics involves the study of DNA and RNA and epigenomics involves chemical modifications of DNA and RNA.

One of those types of modification is methylation. These modifications are turning out to be a kind of real-time tuning and they indicate changes happening to the genome right now. You will hear more about this from Anna Barker. First, just to explain the people she mentions. There’s for example Peter Laird. For a story, I had spoken with Dr Peter Laird at Van Andel Institute and he is someone who has long worked on epigenomics. It turns out he is also someone Anna Barker knows well. And the Dr. Collins she mentions here, just adding that as well, is Dr. Francis Collins the former director of the US National Institutes of Health and before that he led NHGRI and is a co-founder of TCGA. Here’s Dr Barker on epigenomics.

Anna Barker

It's a very interesting story in its own right. When we started The Cancer Genome Atlas epigenomics was really nowhere. I mean, you know, it was kind of a glint in people's eyes. And actually, my good friend, Dr. Collins, and I didn't agree on including it. And so I said, Okay, well. So what I did was I, you know, I had, as the principal deputy director of NCI, I had my own money, of course, and of course, I had our budget our NCI budget to work from. And so I set up a little pilot, and that pilot was essentially part of it was Peter Laird and Peter Jones and a few other people working very early on epigenomics.

And as it turned out, I mean, it was one of the wisest things I did. Because I mean, I, if you think about it, just briefly, in retrospect, of course, everything in retrospect looks much clearer. If you really think about, and I will digress here, if you think about the epigenome, especially the methylome, which is the way nature has decided to sort of, totally can reconfigure as it needs to, for an individual in real time. That's the way that's the way it happens. And so if you look at these very early detection assays that we have in oncology, I'm not surprised nor anyone should be I think that the methylation approach is the first one out of the gate, you know, the Roche tests to say that we can detect cancer very early, because you likely we'll have a signal there. And I don't yet know what we're going to do about that.

I mean, I think it's a very interesting question, one that you should write about, but, you know, what do we do with that? And if the data turns out to be real, and I suspect it will, that, you know, it does open up I think it's kind of I use this term only guardedly , sometimes. I think it's an inflection point in cancer and the whole of cancer research. You know, we are to be really candid, Vivien, you know, we didn't sequence all the genes, do all this work work over these decades just to treat cancer, right? So we want to use that information now and that data to prevent cancer or downstage cancer, or you know, so is it possible?

I don't know, a lot of people are naysayers. You know, they say, Well, no, because what do we do with that we don't know if they're going to live any longer. Well living longer isn't the only thing that you can do for patients, you know, and personally, I think they could live longer if we make better drugs. And so and better biologics and better interventions, which will I mean. That's what happens, people underestimate.

When you have when you have something like this, how it drives the technology development community, how it drives, people who are really thinking deeper about, you know, in this case, how do you actually prevent something if you know, it's coming, we, you know, we've done this with other diseases over the years as well. That's why we have vaccines. And so, you know, I'm very enthusiastic about Peter Laird and everything that he contributed. He was one of the first people actually, to come up with a way to measure these things. You know, I mean, in terms of the

Vivien The assay that he developed this

Anna Barker Something light I’ve forgotten the name of it, was really powerful. And so and then my colleague from the Hutch, Bob Day and I really saw a lot of potential in those assays. And so it was, you know, we were always I was always very interested when he did and, and the Johns Hopkins group as well. So, you know and they collaborated on this. I just think it was, he's a man who was way ahead of his time. And sometimes you know, people challenge you when you’re way ahead of your time.

Vivien Indeed some are way ahead of their time. I wondered about what kind of strategic advice she has for people cooking up a potentially big ideas that are ahead of their time.

Anna Barker [8:55] You have to have an advocate, you have to have an advocate who's willing to really go to bat for you and believe in the science and believe in, you know, that it's important enough that you find ways to do things. And I never buy the fact that, you know, it's people ask me: How do you do big things, and I say: Well, if your idea is good, or even great, then you know, that's where you, that's the whole sort of nucleus of what you want to accomplish. But then you have to start. And most people don't start, that's where they fail. And then you have to do the work. And a lot of people don't want to do the work. And so, you know, it's three key steps.

And if you don't, if your idea isn't a great idea, you'll find out pretty quickly if you start. But you know, I think most great ideas, people don't have the confidence to start. So what I do is I help people start and then you know, once we find a way forward then you know, things work. And TCGA was one of those that it was obvious, it was very obvious.

Vivien I wondered how she set about setting up big ideas and big science in biology. The Cancer Genome Atlas came about when big science and big data in biology were not at all typical. At the time, as she explains, scientists were decoding cancer genes one at a time. But, of course that would have meant it would take a long time to complete the cancer genome. So they discussed a different approach to the cancer genome, an approach that didn’t exactly come together in just one meeting or two.

Anna Barker [10:35]

Oh, no, there were lots of long days and long nights. Yes. Well, I'll get back back to my, you know, my three my three steps, and that is essentially, the ideas are, the critical step is the idea. I mean, you know, no, you, you have creative, great creative people who have great ideas. I mean, like, Elon Musk is a great idea guy, and he's one of those people who starts and he does the work. There’s no secret to what he has done, actually, I mean, you know, he's an amazing thinker. And, but the one thing I think that you need to have in the in something like TCGA, or anything else that where you're starting here, but you have to see out far enough to see where this field is going to go, and has to go.

So if you start thinking back to 2003, and four, we had just finished the sequencing of the human genome. Now, it's interesting when you think about that, and one of the things that encouraged me is that most of the genes that have been discovered to that point actually were cancer genes. I mean, you know, so in cancer, going all the way back to the 1930s, we, you know, with, with, I mean, like, I don't know how many Nobel Prizes, but many that focused on the discovery of the oncogenes, finally, the suppressor genes, etc.

So we had a very big footprint in genomics already. And we at NCI had really followed up on that with Dr. Collins and his groups at NHGRI. So we had several other projects ongoing at the same time that we're dealing with germline mutations, etc, etc. So, when you and when we started talking about this at NCI with NHGRI Dr. Collins and I agreed 100% on you know, If you think about The Cancer Genome Atlas that would have, potentially, if we had left this to discovering, and actually sequencing and characterizing one genomic change at a time, over time, it's hard to estimate how many decades it would have taken.

But it was probably it would have probably taken several 100, cancer genome projects or genome projects overall. So it made perfect sense. I mean, when you start thinking about it, there was no secret there in terms of is this something that should be done? The answer was, could it be done? And so we were cautious in the early days about this, because there was so much anti big science, whether you love the Human Genome Project, or you didn't.

Most people that, you know, are R01 scientists, the current science, we support mostly at NCI, which is, you know, an individual investigator’s idea that's about, that's always been more than 50% of the science we supported NCI still is on the NCAP, we still have, you know, if you look at our budget, that's where most of them, but a big part of money goes into that individual scientists. But what they didn't get, and many of them were very vocal about the fact this is the end of science as we knew it. And I think they published, they didn't publish in Nature, by the way they published in Science, they don't publish in Nature, but and by the way, Nature was one of, I think, a significant advantage for this project, because we early on, decided that we would like to publish all these papers, as a compendium, you know, as an atlas, and as something that people could follow, know where to look for it, etc.

So we did, we did that we did two things that I think were really important in terms of it becoming sort of a living kind of atlas that would continue to grow. And the data would continue to expand and be more and more user friendly. And that was to sit down with Nature and say, Would you be willing to publish all these papers? I mean, and they said, Yes. And I think that made a huge difference. The second thing is, we had, you know, hundreds of people working on this project. But you know, we had the leadership. And so what we decided is, let's just call this, you know, let's call it what it is.

It's essentially The Cancer Genome Atlas Consortium. And that team, essentially, we gave in the early days, we got credit to the people who actually took the writing on, the four or five people, but essentially, we just credited, you know, we had in that first Nature paper, the list of authors was longer than the paper. I think.


The list of authors is longer than the paper itself. Wow. There’s a link to that paper in the show notes. http://nature.com/articles/nature07385 The issue of long author lists on papers is still an ongoing issue of course.

Anna Barker [15:35]

At some point in time, you just don't have to put everybody's name in there. I think you just say, you know, you give them if the, I think the publisher can associate that. And we can use whatever acronym we want, or whatever. But I think the time has come to sort of say, yes, yes, that if there's a question about that person being part of the TCA team, and you know, where you can go as Nature or you can. But it's an issue, because now we're talking, you know, even more people involved in some of these sequencing projects. But getting back to 2004. For that discussion, though, was let's start small, Let's figure this out. And we had a lot of things to figure out. So my plan for this from NCI’s standpoint, and fortunately, Dr. Collins and NHGRI agreed, let's organize this as a project that is focused on quality of data, quality of samples, quality of DNA.

So let's make this more than just another project, you know, we'll have all kinds of questions about the quality of the data. Let's do the best we can to set this up in a way that it becomes a flywheel. And we can put more tumors through it as we actually understand the technology and how to collect the data, how to manage the data, but mostly how to do this in a way that actually sets up a data set that's going going to have longevity and actually be of high value for very long time.


And healthy normal and the paired, I guess, all of these things so you were thinking about usage down the line. Right. So that's probably guidance for others.

Anna Barker

Okay. Yeah, because you don't want to I mean, you can't just be as good as you are in current day, you've got to be much better than that. So, so we tried to set the stage in terms of quality of everything in the project. And it was essentially a very well run machine. And we had a strong team at the NCI that met every single week. I chaired that, personally, for the first seven years of it. And we, you know, we agreed we disagreed we, you know, we disagreed, we were very careful with this project, because it was a lot of money.

And it was an incredibly important project for cancer. And, and I think, for the Genome Institute, as well. So, you know, we, I think we have a lot, we had lots of debates during those years. But I mean, even starting out, we decided to do three tumors in a pilot. So to get this project approved, you know, I had to go through the boards at the NCI and even though I was the principal deputy director. It was a big change from, you know, the way that NCI operated, we had contributed to projects like this, but we had never led a project like this. So, you know, we had to, we had to convince the boards.

And one thing that made sense to the boards was, let's do this as a pilot, let's see if this can be see if this can work, especially something organized this tightly, you know, with the data collection center, defined, essentially, the data, specimen collection, the quality of the samples defined, how the DNA is going to be extracted, and ultimately the RNA etc, and ultimately, the kind of work that Peter Laird and his group did. So how was that all going to be controlled?

And then one of the standard operating procedures for something like this that everybody can use, and everybody can recognize. I mean, something is trivial. It may sound trivial, but you know, how do you ship the samples? And be sure the quality of the sample would be set so so all of that, and then in we had in the early days, we had these genome characterization centers, which NCI supported, and then we had in NHGRI sequencing centers.

And so, so if you take all those pieces and put them together, then in a management structure, and that's what it was, we managed it, I mean, it was a project and involve literally 10s of on. At its height, I think we had more than probably 100 plus institutions involved. And at any given analytics team for any of these cancers, there was well over 100 scientists working in the analysis of this data. So what we saw, though, and what we anticipated, and I think reasonably well, was that TCGA would drive technology development, and it did. So, you know, we started out with doing what, in retrospect, may have been fairly simple Sanger sequencing, you know, short reads sequencing, it's as good as it was, I mean, it was it was the tumor we did first was GBM. That's the first paper that we published. And that was a learning curve.

One of the things we learned is our samples that we had collected in the country weren't very good. We had not paid attention to things like you know, the time after surgery that you leave the sample at room temperature or, you know, 1000 other issues that can cause the sample to not be high quality, mostly decay versus live cancer cells.

Vivien The TCGA, lobbying for it internally at NIH and externally, organizing it, running pilots, evaluating and then scaling it up. This all seems to hold such invaluable lessons.

Anna Barker [21:30]

So let me just take a side road there. Maybe we should write this up for next year, but we should write up TCGA history. I think would be a great learning curve. For a lot of people. I've been asked to do it, but I frankly, have never had the time. And Francis is also kind of quasi working out of The White House, maybe he and I could put our heads together.


I mentioned to her Helen Berman who co founded the Protein Data Bank recalled that colleagues were concerned when the War on Cancer was declared during the Nixon administration. Scientists wondered what they would work on if cancer were solved, so to speak. She said that people were having meltdowns privately, with saying what do we do when they've solved cancer? You probably hear this, too. And she told me the story. She said: ‘hurray, we'll find other things to do, don’t worry about. Anna Barker [22.15]

We'll open restaurants. we will bake bread and people won't be dying from this horrible, horrible disease.


But the lessons are so powerful that you have to tell. So yes, please.

Anna Barker

Just a side trip, though. So one are the reasons that prompted me actually to go to the NCI with my good friend, Andy von Eschenbach is just this, I thought that, at that time, you could see on the horizon that somewhere out here, as we begin to sequence the genome, that we would be able to actually precisely predict what people should get for their disease, as we learn more about the causative genes, etc,


Predict what drugs?

Anna Barker

The drugs, and also build better diagnostics, as we're doing now, earlier and earlier detection. And so the reason really, the reason I went into AI was to set the stage for precision oncology. That's what I wanted to do. And that's, effectively if I have any contribution that I think I've made, it was there, because I also set up the Proteome Project, which is now really changing the proteomics landscape, and that's going to lead to a whole bunch of new drugs and and we had to, had to tie that back to TCGA. So we can determine, you know, what proportion of the genome was really cancer genome was really ultimately translated. And then I set up something called the Cancer Bioinformatics Grid, which we, which was quite successful, it was a very big grid. Unfortunately, when you change leadership at NCI, you change focus. And so that was not popular with, you know, with an incoming director. So, but that was a real step forward. It taught people how to work together how to share data. And and it was we had, I don't know, probably 80 or so institutions on the grid when they


Yeah, I remember reporting on that, the CABiG

Anna Barker Yeah, yeah, that was very, it was actually way ahead of its time, but quite successful for what it did. And ultimately, I wish we still had it. Because it would have grown, I think quite significantly, and probably would have brought in the private sector at some point, too. Somebody need to take this over at some point and scale it. But it was a it was a good, it was a great step forward. I also set up a Nanotechnology Initiative, which was probably also quite controversial.

But the point in all of this when you're working at the molecular level, you got to be able to deliver this stuff. And so we're just getting to that now in the pharma industry, there lots of nano-products now that are coming out. And it'll be I mean, the nano-constructs will be critical as we move into AI and start to use these data because you're going to, you're going to have to capture diagnosis and therapy in the same animal constructs. And so, and you're going to have to, some of this is going to it's already happening, but this is the future. You can see this coming well.

We're already having people swallow pills that are that are actually being able to track things through their digestive system, this kind of stuff.


What I liked about this conversation was the way Anna Barker’s past set the stage for the future in cancer research. And it was great to hear about her path at NCI, for example the connections she built because she trained in multiple disciplines, including biophysics.

Anna Barker [22:45]

Then I set up something late in my tenure at NCI because I trained in biophysics, in chemistry. And I believe in understanding that we live, we are three dimensional beings, our cells are also three-dimensional, etc. And so nothing is going on in two dimensions anywhere, yes, you can see literally in the result of doing your thing.

So I set up something called the physical sciences oncology centers, which were, and the thing that made them really different is a physicist had to be the principal investigator along with an oncologist. And I think they have brought a lot of change to oncology, we're starting to think about the temporal aspects, as well as the spatial aspects of cancer. And, and it may very well turn out that if you think about what cancer is, and how we still diagnose it, a pathologist looks through a microscope, they see changes in the way this cell is actually constructed in terms of especially changes in its membrane, etc.

That's the way we diagnose cancer, a pathologist looks through a microscope and says, ‘Oh, the shapes have changed’ cetera. That's all spatial. And that's information. And so I think we're getting to a point now, and I'll come back, to the point where we're starting and you see this in digital pathology, now we're starting to now define these things, in terms of changes in the genomics and what's being translated in the proteome.

Vivien [27:15]

I wanted to circle to also to the data sharing behavior. I know that everybody's noble, and virtuous, and all of that.

Anna Barker

But no, everybody isn't noble and virtuous.

Vivien Most people are, most people. So how have you seen, I guess it would be okay to say that the physicists sort of also, the mathematicians also, I guess, led to a culture change. But how have you seen the culture change? And, you know, you've talked to the junior clinicians or the junior PhDs who are saying, ‘Wait, my data, I need to still build my career with it, let me hug it tight, and not share it just yet. I mean, how has that kind of, TCGA changed that obviously, but there's still sort of hesitancy in some quarters to not change share so early.’ So I'd be interested to hear as you talk about the TCGA evolving, how that behavior has changed?

Anna Barker

I'll answer your question. But isn't it interesting to think about the fact that we had a huge team at TCGA, all of whom voluntarily shared their data? I mean, well, yeah, I mandated it, but you know, it's the government, you can push back on the government. And then I'm sure in the early days, there were people who probably held their data back. We went after them. We said, No, if you want to play with us, you've got to play by our rules. That's hard for the government to do.

And it's very interesting, you think you step back now and see that, you know, it's the individual investigators that are forced now to protect their data, because they're on their own, or they view themselves as kind of being out there looking for tenure, looking for promotion, looking for whatever they're looking for. So this is a big issue in academic, I think, I think in academics in general, but especially academic medicine,


Not the Cedars Sinai and not the Broad Institutes they're there. They know, pretty

Anna Barker

The big labs get it, I think the big labs got it and, and wrote, especially of course, Broad was extraordinarily involved in TCGA. And people like Eric Lander, and with a great colleague, and Gaddy Getz, I mean, these are people who helped us to sort of formulate these policies and were very forthcoming with their data. And the big labs are like that, they know the value of sharing data. But there is this thing, and I've met with university presidents who came over to NCI. If you continue to hold these people accountable in ways that demand that they keep their data in terms of getting tenure, publishing in Nature, publishing in Science, publishing in Cell, they're gonna, you know, you tend to, you tend to get the behavior you reward. And so, you know, for us to move forward and this is just escapes me. We're dealing with the human genome here. Now in cancer, it's a dysfunctional human genome.

And I say I view cancer as a disease of communication. Very, very unregulated communication, but communication nevertheless is dysregulated. And at some point in our lives, and maybe in generations to come, but we're going to have to know all that data. I mean, what's really interesting about this, it's the genome is is finite. I mean, you know, yes, it changes.

It's infinite in the sense that, you know, you can change it with somatic mutations, and there, maybe there's no end of those. I don't know. Is there an infinite number of changes that can occur in the genome, we don’t know the answer to that. I doubt it, there are probably only so many ways you can affect DNA. But you’re going to have to know pretty much everything, you are going to have to have that whole book written.

This cancer genome is a cancer genome, we’ve started to see this over the last few years. You’re starting to treat stomach cancer with the same drugs that you’re treating CML. Because the mutations are going to be by and large overlapping. We’ve come to call those things drivers, I always challenge people to tell me what they mean by that because no one actually knows but they know it’s an important gene. It’s a signal and that’s another word we use all the time there’s only so many ways what we know of, Vivien. Until you’ve figured out what that means in all of the genome , you’r going to be still guessing.


I wondered given the wealth of data, the genome has yielded and there’s plenty of data still to come, it seems that data need to be sort of forever data. Data needs longevity because data collected today might yield insight soon or in five years or even beyond those five yers.

Anna Barker [32:40]

Yes, that's what, that's what we tried to build into TCGA. And, you know, a lot of people would say, Well, I would have done it this way or that way. But you know, at the time, we didn't know even how to organize the data. And the one thing that I think the big labs contributed to a lot is helping us to organize the data and start to build the kind of analytics that would allow you to functionalize the data. So the one thing that I think is getting back to your question on, you know, why people don't share data, that, that's not going to go away until we change the reward system in universities, and you reward people for sharing their data, I mean, you if you if you do that, and they can demonstrate that their data has value, and they've shared it out.

That will make all the difference in the world, I mean, just turn that little knob, you know, and say, instead of hanging on to your data, let's reward you for sharing it, because that's going to change the world. And if enough people do that, you can start to build repositories that we could only dream up. I mean, you know, especially people who are doing very, very specific aspects of the genome. And we could have specialized data lakes, or you know, for to ask certain questions.

I like this discussion , the discussion that is very important in terms of where we go, because we're at an inflection point in data collection and in cancer. And I use the word data, not information. You have to bring context to it. And if you can't bring context, it's just entropy. The point is that we are at an inflection point and secondly, we have to bring the theoreticians to the data. We do not have them in oncology, we do not have them in biology we have no. Except for evolution. We don't have any theories.


Beyond the need for theories Anna Baker mentioned cultural aspects about science. Of course there is a difference between locating cancer patients, finding ways to obtain consent so when they have surgery some samples can be donated to research. One needs to do that but also one needs to make sure the sample is well prepared for research. Then there are those who download the data of others to use in their analysis. There has been name-calling , awful name calling, there has been talk of these people as data parasites.

Anna Barker [35:15]

No, that's a bad term. But again, we have to change the culture. If we don't change the culture, in terms of data sharing, and you will get, you'll get these people calling the people who use that data parasites. But the truth is, data is for sharing, and whatever data you have, you can only do a limited amount with it unless you can bring in a lot more data.

I mean, it's like artificial intelligence today, you know, we're making decisions on that are completely underpowered. And so, you know, you have to have enough data, you have to have the right data, it has to have metadata. You know, it is the whole thing needs to be rethought.


At this year’s AACR meeting, the American association for Cancer Research I attended a sessions on NIH’s new rules about data-sharing with scientists from the NIH. Data does not flow flow from a sequencer right into a repository where others can use the data. Data needs to be prepared, it needs to have metadata.

Anna Barker It's expensive, right, so somebody has to pay for it. And frankly, the biggest issue we have is getting data from patients clinical data is extraordinarily hard to get not because it's not there. It's just not easily accessible. And we started TCGA, there were lots of people who said the electronic medical records in this would be very easy. But of course, they had to do it by hand. And so the first couple three tumors that we did in the pilot, that was mostly people going in for hours on end, and actually finding the data by


By hand is kind of, that's mind blowing. And of course, you need the permissions. And I understand I mean, as a patient, I understand that you don't want just somebody from wherever taking your personal encounter with breast cancer or pancreatic cancer. You I understand that patients have a word to say on this, but by hand.

Anna Barker [36:25]

Now that was the early days. So getting back to the way we did the pilot, then it was just that we learned for everything we learned about everything, what worked, what didn't work. And so the boards were our boards at NCI were reasonably impressed with the first couple tumors that we were working on. which was the. The first one was GBM, and then ovarian cancer. And then us a very smart question, you know, did we have in mind doing 33 tumors when we started, we had in mind doing 20. And we thought that was a huge reach. But you know, as it turned out, the technology matured and came online. So we were able to, you know, after I left in 2010, then they were able to carry on the project into 33, and pick up some of the rare cancers, which was really a great thing.

But what happened in those years was, you know, the process worked. Everybody collaborated. And so if you asked me why that worked, essentially, people want to be part of something that's going to change the world. And so careers were made there. I mean, you know that that's how people got a lot of the tickets punched, because they were willing to do something that they knew, was much, much more important than any small piece of science they would be doing, not to trivialize any small pieces of science.

I'm a very big supporter of individual investigator initiated research. But without the TCGAs, you can't do the kind of science that you could do if those projects were available to you.

And we've shown that and so I think we've shown now that the two really reinforce each other, you know, big science is very important to actually doing the work that will allow individual scientists to ask better questions and answer big problems as opposed to doing one more gene. And that doesn't help a lot. So over the course of the project, there were lots of learning curves. And so when I discovered that 30% of the samples in the country didn't qualify for TCGA, I mean, sorry, only 30% qualified, sorry, got that upside down, only 30% of the early samples.

And then, so we had to go to a process. So we had very high standards for our samples, you know, they had to be 80% tumor nuclei, etcetera, etcetera, they had, you know, they have to have been handled correctly by the surgeon and by the pathologist. And then we had, you know, a group of outstanding pathologists nationally who looked at the sample guaranteed us it was what they what the originator said it was, and then we had, you know, then we had, we purchased samples as well from so, people got the message that, oh, there's, you know, this is something that you can probably, you know, make money on.

And so there are to this day, now, there are people who provide very high quality samples for people. So, and collecting a good sample is expensive. And it takes people it takes time, it takes very careful attention to detail. So I mean, all the way back then collecting a good sample cost about $4,000. I mean, so it's a lot of money.

So you're investing when you collect some, you start collecting samples now that's a surgical sample blood is obviously much less expensive. But you know, that's turned out to be very important. So let me just tell you a quick story. One of the reasons I wanted to go to NCI to set up these precision oncology sort of early projects, so we can move the field was because I knew that the power of the sample was going to be very important to that the quality of the sample would turn out to be very important.

And I discovered that from some work that I did before I went to NCI working with the private sector, when I asked them what their biggest problem was, they said, get quality samples. That's the industry. I mean, that's the pharma industry.

So I said, Whoa, that's, that's a big deal. So we did a report at NCI, we set up guidelines for how you collect a great sample, they're all still there, I set up even an office of bio repositories in biospecimen research under Dr. Carolyn Compton. And at that time, the UK was attempting to set up a national biobank, which was our intent as well, a national biobank and a national bio repository and a national dataset. Would that be nice? Think about that for a second. So and that's exactly what the UK did. Vivien now everybody's using the UK Biobank. And so they've done that huge sample, they also have all kinds of other data, MRI data, and it's like the electronic health records.

Anna Barker

I could not convince our boards to do that. Because essentially, the argument was left to their own devices, everybody should be able to collect their own samples, and it'll all work out. I mean, that's kind of where we were.

And where we still are, to be honest. I mean, people there, you know, there are pockets of people who collect great samples, and they work with those, and that produces some of the data you're talking about that people don't like sharing, because it's they know, it's high quality, and they know, it's probably they can probably convert it into revenue in some way. And that's what the industry has learned and others have learned. So, you know, it's, but it's, yeah, if we had set up that national database, that national repository, then I can't even imagine where we'd be. Vivien that and also all populationswould have been included, also those who are having a harder time getting cancer treatment and minorities and people of color

Anna Barker

Yeah, everybody had their own reasons for not doing it. But we've done a good job of actually defending it, doing a business plan for it. So it could be set up. We kind of used as a model, the transplant program in the country then, you know, organ transplantation as an example of something like this. But anyway, that's one of my failures was not getting that done.


And so I guess the lesson also for me as I write is, you don't have good data, if you don't have good samples and a good process in place, right? You can't just sequence whatever from whoever. And then think that you can fill repositories meaningfully.

Anna Barker I've just written about big data recently. And how, how much is it really a value in terms of informing what we're doing for patients. And that's really not the issue, the issue of the bigger one, and that is, going back to square one, what we've just talked about is having enough good data's and with enough power to actually say that you've been, you really couldn't define inflammation. Most biologists do not understand that data is not information. Information is a term that it's a real thing in physics. And so data without context is just entropy. It's just noise.

And so to go from where we are to have, and just terabytes of data on a patient, or you know, in this case, we're building petabytes of data into all kinds of data resources, you have to understand how you're going to convert that to information. And to do that, you're going to have to have theoreticians come to help us.

And I'm a great fan of Shannon information theory, Shannon, Claude Shannon, created everything and allows us to do everything with these phones, you know, so he figured out in with sound, how do you separate signal from noise, and he did it in the 1940s. And he's a genius. And he loved by the way, he loved biology. And he had some very strong thoughts about biology, even on paper. In 1949, he published his paper on, you know, on Shannon information theory. And there are other theories that not just that one, but that's my favorite, and because it makes perfect sense.

But the truth is the information that is the digital part of the genome, okay, that's the DNA that gets translated across both space and time when you're going from DNA to RNA to the proteome, and then how the methylation groups actually then contort and change the DNA itself. So all of that is in theory predictable. But it's theoretically predictable. But you have to have people who know how to look in the genome differently than we do today. And I think the Googles of the world get this. And I think there are, at least in the private sector, there are people likely doing this already, I think I see evidence that that's the case. But for us to move, I think the whole sort of establishment that's going to depend on using big data in effective ways for patients, we've got to get more of these people who are trained into our areas and and into the teams that actually are trying to create information.

So I mean, it's a real brain change for people I people think if they have data, they have information, but they don't. So I think that's a that's it's really is we have a couple of inflection points, or maybe more than two right now in oncology. And what we do at this point is going to is going to in many ways derive how quickly we can change something like cancer in terms of preventing it downstaging it and ultimately treating it with much more effective therapies. So when we need much more effective therapies, for sure so and that's my game.

I am very much about using the information to make the lives of people better. And if we can do that, for cancer, that, you know, I'll be happy. So I will, I've met one of my goals, I lost my whole family to cancer, but it was single one. Now, it's you know, it's just it tells you the story of cancer is that you can know a lot about cancer, you can know everyone. But if you don't know enough, then you know, my sister died, for example of breast cancer, about two to three years before Herceptin came on the market. So that's, you know, that look at that three-year window made all that would have made all the difference in her life. So and that's, that's why research is so critical. It changes. I mean, an advance in research can make can make a whole change for a generation.


At this point in the conversation, and cancer has taken the lives of people I care about, too, I felt empowered, which Anna Baker’s bright mind enabled and her wit

Anna Barker [49:05]

I think humor is important in cancer research you have to have where we fail a lot. So you know, you have to be you have to be witty, at least a bit. So don't take yourself too seriously.


That was Conversations with scientists. Today’s episode was with Dr. Anna Barker the chief Strategy Officer at the Ellison Institute, which is a think tank and research institute. Before that she was the principal deputy director of the US National Cancer Institute and deputy director for strategic scientific initiatives there. The music used in the podcast is Solstice by Michael Drake, licensed from artlist.io. And I just wanted to says because there’s confusion about these things sometimes. Nobody paid to be in this podcast and nobody paid for this podcast. This is independent journalism that I produce in my living-room.

A greencorkboard empty and waiting for ideas

(Getty Images/ hudiemm)

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in