There Is No Certainty

There Is No Certainty

(audience applauding) – And for those of you
that are interested in art and science, this is not the only thing. If you go to the museum of art, there’s an exhibit on art and science, which is really spectacular, so I urge you all to go and visit
it in the next few months. Now, the universities like ours, I think are the core of
having a functional democracy, because we educated not only our students, but we’re supposed to
educate our community, and these days, more than ever, educating everybody gets complicated because there’s an enormous amount of factoids and facts and misinformation that is out there because of the advent of digital information. This lecture series is all about that, but before we get into the lecture itself, I’m honored to introduce to all of you somebody that really
doesn’t need introduction, President Robbins, who is here to welcome all of you as well. (audience applauds) – Good evening everyone. I got to say a few words last year. I’m thankful that Joaquin
allowed me to come back tonight. Thank you for all being here. Now, 14 seasons, is that, 14 years. Incredible. We’re gonna get to hear an incredible lecture from Dr. Masel tonight. I understand she was in the
first lecture 14 years ago. She spoke at this event. Probably many of you were here. Thank you for continuing to come back. It’s just an incredible lineup of people every year that Joaquin puts this together, and I’m impressed. I really enjoyed it last year, and I’m looking forward
to tonight’s lecture. The only thing I can say
about is there certainty, the only certain thing is it’s uncertain whether there’s certainty, right? (audience laughs)
So you can never predict, but we
think the sun will come up over there tomorrow
and go down over there. Joanna Russell, who, I think,
you’re third on the list for lecture, and I did check my calendar and I’ll be here, so
I’ll be tweeting, yes, incredible, incredible
individuals that we have at this university,
and thank you for being so supportive here and
supporting this lecture series. This last year, a few months
ago, Joaquin won a very well-deserved award from
the country of Mexico as being a treasure to that country, (audience applauding)
and, as I said a few minutes ago, yep.
(audience applauding) My friend, you have
served this institution for so long and so well,
and you’re certainly a University of Arizona
and a Arizona treasure, and I’m proud to call you my friend, and congratulations on the continued success of this lecture series. Thank you. (audience cheering and applauding) – Thank you President Robbins. And now, I just want
to briefly go through, you know that this
lecture series is actually quite expensive, that’s
why none of you have to pay at the entrance, so, they’re underwriters that support
us, and today, I just want to highlight two of them,
these are the title sponsors. TEP has been with us since day one, supporting about probably
50% of the expenses of this lecture series,
and the Office of Research, Discovery, and Innovation,
which, for many of you would also be understood as
the office of Vice President for Research, has been a
big partner with us here. The next slide just shows
that there’s many more than the title sponsors,
and I want to point out just two folks or two entities tonight. One of them is the Galileo Circle members. And the Galileo Circle
members which are all sitting here in the
front, because they pay for it in many ways, the Galileo Circle members support the College of Science with their time, with their funds, and it’s sort of like the rotary club of the College of Science and I thank you all for being friends and
for supporting the college. For your advice and for
your financial support. Thank you so much. (audience applauds) The second entity I’d like to point out tonight is the Honors College. And the Honors College is a newcomer here in supporting this, and what they’ve done is they’ve given us the funds so that we can actually have a
course for high school and middle school teachers to come here, get continuing education,
listen to the lecture, get credit for the
lecture, and their tuition is funded by the Honors
College, so thank you so much for that really important (audience applauding and cheering) We will be discussing the rest of these in the next lectures. Now, you may want to
ask what the dickens is, what are we talking about, uncertainty or certainty, and I think the next video will describe to you the problem. (upbeat instrumental music) – Science is constantly putting through new studies, as you would know if you’ve ever watched TV. – A new study shows how sugar might fuel the growth of cancer.
– A new study shows late night snacking could damage the part of your brain that creates and stores memories. – A new study finds pizza
is the most addictive food in America.
– The new study suggests hugging your
dog is bad for your dog. – A new study showing
that drinking a glass of red wine is just as good as
spending an hour at the gym. (audience laughing)
– What? (audience laughs) – That, my friends, is the problem. Science in itself is complex,
and there are terminology that we use that we make
it even more complicated, what’s a hypothesis, what’s a theory, what’s a law, how do
all these things relate, and it becomes sort of fuzzy as arguments about scientific issues come about. Sometimes those arguments are not real arguments, they’re really political arguments, but in some
cases, the arguments truly have to do with the
state of affairs in science. I like to joke with my friends that media makes us look like bad
engineers because we don’t actually know what we’re doing, but in fact, there’s a very big difference between science and
engineering which is important for everybody to keep in mind. Science tries to understand
the world as it is, with all its complexities,
and the more we learn about it, the more complex it is. Engineering is creating
a world that never was. And, in fact, that’s very important, but it’s a completely
different set of ideas, it’s a completely different
way of looking at stuff. Now this non-science
business is not just new. I mean, now we can watch
it on TV or we can see it on the web or listen on the radio, and sadly, it creates an enormous amount of anxiety on everybody, because nobody really knows what is good
science and what is bad science, but it’s been around forever, and the next slide just shows just an
example from the past. In 1860, a general,
Pleasonton, he just decided that what made plants grow was blue light. You can actually think that, right. He didn’t know anything
about photosynthesis, he didn’t know anything about anything, but when you have blue light, things grow, and when you don’t have blue light, in the darkness, nothing much happens. So, based on that sort
of theory, he went back and created a whole bunch of articles that describe how blue
light actually operated. So he created these books, and just like in anything else, marketing is important. And after he wrote all these things, which described in this slide here, in which there’s all kinds of things about electromagnetism and how the atmosphere is stuck to the earth, anyway, the guy had big ideas even though
they were all wrong, but it was fake news, it was fake science, because it did not follow the scientific method in any kind of way. He actually then created, at least he had the sense to create a greenhouse, although it was a blue house,
where he had blue panes, and he was able to show,
and I’m not exactly sure how the experiments were done, that this blue house would
create bigger grapes, they would create bigger pigs, and anything you put in blue was better. The newspapers picked all of this up and they said, holy smoke, this guy actually has this thing about blue theory, so then everybody started
wearing blue glasses because it was for your
health, and it became viral. But of course it was until
science caught up with this, and there was an article later on, about 10 years later, which was published in The Scientific American, which actually showed that all
of this was nonsense. The problem these days
is that a lot of what we consider nonsense
because we don’t understand is not nonsense, and a lot of it is nonsense when it shouldn’t be. There are huge debates
about vaccination of kids. There are huge debates
about global climate change. 50% of this country does
not believe in evolution. About 50% of this country does not believe in global climate change. And it’s not because
of the state of affairs of science, it’s the way that the state of affairs of science is portrayed. I hope that at least this crowd will get to recognize
how we go about science and how we, we have a very robust way of understanding what we
know and what we don’t know. The lecturers here, today’s
lecture, Joanna Masel, is gonna talk about,
more or less, the advent of statistics and how experiments in which you can compare groups and
do a careful statistical analysis makes a huge difference
in how you deal with stuff. The lecture after this one, which is about black holes, and Feryal Ozel
is gonna talk about that, is more the sort of
traditional physics kind of way in which you can actually create laws without actually even understanding
what drives those laws, but they’re laws because
they always happen. So understanding what happens versus always happen is very different. Laws versus hypotheses versus theories. And we’ll go down the
list in which each one of the speakers will
be describing a way in which we understand science to be. Big data. Data driven by dumping an enormous amount, sorry, science being created by dumping an enormous amount of data into a problem, and then from that, coming up
with a hypothesis and so on. So, I think it’s gonna be extraordinary. It’s a very different kind of
lecture we’ve had in the past in the sense that we don’t have a theme, but the theme itself is science. I hope you enjoy it. And let me now introduce
the first speaker. Let me just get this out of here. Joanna, come up here. What did I forget? – [Woman] Six scientists
explore the debates and discoveries shaping science today. – [Woman] In searching for certainty, scientists focus on truth. – [Woman] I’m very interested in how the mind and the body interact. – [Woman] Microbes
literally shape our lives. – [Woman] Understanding
statistics is the only way we can evaluate messy data. – [Woman] With super-computers
and robot floats, we can see the future. – [Woman] There is an unseen universe. – [Woman] Searching for certainty begins at Centennial Hall
Tuesday, January 15th, at 7 p.m. – Okay. Now, another thing that
you may have all noticed is all of our speakers are women. (audience applauding and cheering) And let me assure you
that they’re not here just because they’re women, they’re here because they’re the best. (audience applauding) Many of them have actually spoken in the lecture series before, so I know how good they are
from first experience. Joanna is a theoretical biologist who researches foundational questions about how evolution works. She also dabbles in many other fields from biochemistry to education economics. She did her doctorate
in philosophy in Oxford as a rote scholar, a post-doc at Stanford, and joined the University
of Arizona department of the College in
Evolutionary Biology in 2004. She’s fascinated by the different forms that competition can take. In particular, the tension between zero sum races in which the number of winners or losers are set in advance versus competitions in which the numbers of winners can go up as
part of real progress. She spent her last
sabbatical writing a book, Bypass Wall Street, a Biologist’s
Guide to the Rat Race, bringing insights about competition from biology into economics and finances. This talk also draws from her experience developing and teaching
an undergraduate course in evidence-based medicine
with support from the Howard Hughes Medical Institute. Teaching this course has intersected with her interests in the psychology and philosophy of chance events and driven much of what she
will explore in tonight’s talk. Please welcome Joanna Masel. (audience applauding) – So thank you so much. So you guys, you all came here, presumably because you are searching for certainty, and us scientists are
gonna help you get that. And here am I, always the contrarian, straight off the bat, telling you too bad. You can’t get what you want. There is no certainty. But I think where we
agree, right off the bat, is that science is amazing, and we do know more and more all the
time, with ever-increasing confidence whether or not you can say that adds up to certainty. And I think that’s good
enough to get going with, but the question is how,
and this is the theme for this year, how is it
that we know what we know? So I’m gonna start by
actually asking you that. Like what is it that you think makes the good scientific evidence? So, imagine you have some disease, pick your own nasty choice, and I’m gonna give you a choice between two options. Oh, yeah, sorry, you’ve got your nasty disease and you’ve got two possible treatments you can have. And either I’m going to give you the results from a thousand
patients who aren’t you. They’ve got the same disease but there’s a gazillion ways where
they’re different to you. And those thousand
patients have been randomly divided into two treatments
and been given those treatments and you can
tell how well they did with one treatment versus the other. And in the other option,
it’s gonna be all about you. I’m gonna tell you your
genome and we’re gonna do an enormous amount of
molecular and cellular biology and physiology
and everything that we can in order to understand what the different mutations that we find in
those genome might mean for the biochemical and
physiological pathways and so on. So I want a show of hands, how many people would rather have evidence A? Put your hand up. And how many people would
rather have evidence B? It’s about 50-50 is my sense. Okay, we’re at about 50-50. Okay, now I’m gonna give
you another problem. So, here’s a knuckle bone. So it’s an irregular object and you can use it a bit like a dice. And you can roll it, and when you roll it, you can score which
side came up on top. And I’m gonna give you this knuckle bone and I’m gonna tell you
to roll it exactly once, and before you roll it,
you’re gonna predict which side comes out on top. And again, I’m gonna give you the choice between two
different kinds of evidence. So once evidence, I’m gonna say, okay, we already rolled it 1,000 times, or maybe 1,000 different people
rolled it, or something, and here’s how many times
each of them came out on top. And in the other evidence,
I’m gonna give you a complete technical diagram that shows the center of mass and
you can take that further. We can build a machine that standardizes as much as possible what
the angle it’s gonna be thrown from and its
starting position and so on. So now I’ll have you vote. How many of you would
rather have evidence A? How many would have evidence B? Okay, so it’s still
split, but there’s a lot more evidence A people on this question than there was
on the last question. Alright, so now I’m going
to ask you a new question. What would you do with the
data if I gave it to you? So if I gave you A, I’m pretty sure you know what to do with it, right? You’re gonna pick the one
that came out the most often. And if I give you B, like I certainly wouldn’t know what to do with it. I could give you some physics books and engineering books and whatever. Maybe you could figure it out, but personally, I don’t think so. Okay. So let’s ask the same
question about the other. If I gave you evidence A, would
you know what to do with it? Sure you would. You’d take the treatment
that led to 500 people doing better than when they
got the other treatment. You’d know exactly what
to do with evidence A. If I gave you evidence
B, I think it’s really interesting that more
of you said B for this question, for the knuckle bone question. So you’re telling me that
it’s easier to figure out how human beings are
gonna react in the real world to all sorts of
complex things than it is to figure out how a
knuckle bone is gonna roll? So I think you can tell
that I’m an evidence A person and that’s what
this talk’s gonna be about. And I think the comparison
between the knuckle bone and the randomized trial of
patients is really interesting. So I came across this knuckle bone example when I was reading the
literature on education research, and this tendency to pick B, which you saw with the knuckle bone,
too, there are people who vote for that, and there are students who give that answer, it’s known as the outcome orientation, and what it means is that people are focused just on one outcome. They’re not thinking about groups. They’re only thinking
about a single outcome, and thinking about it
in a way that rejects the whole idea that the
universe might be random. And think that everything is knowable. And when you interview people who have the outcome orientation, they’ll say that statistical evidence
is not real evidence, but evidence A, which we
actually know what to do with, of course, evidence B is
real evidence but of course, we wouldn’t know what to do with it. And so the, and this is common, and it comes up, and I try and address it in my teaching and so on, but I think it’s really important, and the reason we hold this outcome orientation is because yeah, we don’t want
the world to be random. We don’t want to say,
okay, this one came up on top 40% of the time, so
that’s the best I can do. It was better than the other three. You know, when you do that, you quantify your uncertainty and then
you accept that uncertainty. But when you go with option B, you think I want a world in which
there is no uncertainty. I want to really know that. Keep giving me more and
more and more information that I can’t make any
sense out of, because I believe that somebody is gonna get certainty from that information. And evidence A, okay,
we accept uncertainty, we accept that not everything happens for a reason, we accept that
the world might be just random. So, people have called
me courageous for giving this talk, because I’m gonna go directly against human psychology, the outcome orientation and a whole lot more. If people hold the outcome orientation even for a knuckle bone, this outcome orientation is a whole lot stronger when you’re talking about medicine, because of course you only care about the next outcome, that’s your life. The whole scenario is set up to encourage you to have this outcome orientation that far fewer people have when it comes to a knuckle bone, but it’s still there. So I’m gonna ask you, especially those who voted for the technical diagram, to bare with me, give me some forbearance. I’m trying to do something hard here. I’m gonna give you a talk that pits my arguments against everything I know about human psychology and it’s stupid enough to think that I’d win. So, of course you’re
free to disagree with me, but try and suspend disbelief, disagree with me now and then go home
and think about it afterwards. That’s what I’m asking you all. Alright. So now, I’m gonna change
tack a bit and go back to the beginning of the story,
as far as I’m concerned, the beginning of data
science, and this particular way of knowing what we
know, with a man who I think was the greatest data
scientist, not just of the 20th century, but of all
time, so Ronald Fisher. And, exactly 100 years ago, he took a job in a not particularly
impressive agricultural laboratory, that’s what he could find, and they handed him an enormous data set and said go figure it out. So the big data of the
era was leather-bound books, but they had 70 years of it. For 70 years, they’d been planting different crop varieties, watering in different ways, using different fertilizer, recording everything that happened with the weather, everything you can think of, they wrote it down, and then, of course, they said what yields they got from the crops. And so they had a massive data set. So big that it was hard
for him to work with despite inventing modern statistics. The sophisticated machines of his day was a hand-cranked mechanical calculator, so it was quite a lot of work. But the really fascinating thing is, despite having, you would think, all the data that you could possibly want, at the end of it, he
couldn’t figure it out, and he said, you know what,
I can’t tell you anything. And the reason for that
was because all of those different variables, many
of them changed together rather than randomly
and separately, so you couldn’t tell which one
was important to know, the technical term for
that is they’re confounded. The data hadn’t been gathered in the right way, and so it was useless. And Fisher described this as
raking over the muck heap, and basically he said
that his job, not just coming in as a statistician, but his job, literally coming in
and inventing the field of statistics in order
to do this, was simply to tell them why 70 years of experiments was a complete and abject failure. He did get one paper out of it. But going forward, what he did is he said alright, enough of that. Let’s do it differently now. And this is what he recommended, and this is what we still
do in agriculture today. So we divide land into squares here, and then we set up a scheme
of the things we want to vary. We can vary the crop variety,
we can vary the irrigation, we can vary the fertilizer,
we can vary the blue light if we like, too.
(audience laughs) And then we randomly,
using dice, using chance, we randomly assign each
treatment we want to study to squares, and we leave it to chance, and that’s the way that we get the best possible information
out, so he invented what we call the randomized
experiment and put it into practice in England in the 1920’s. Okay, so I’m going to fast forward to 1996 and change to medicine. So, a large study was
done on a bunch of nurses, and it asked what are you taking, hormone replacement therapy or not, around the time of menopause, and then it tracked what happened to them next. And what was clear from the study is that those who did
take hormone replacement therapy had less heart disease. And there were some doctors
who used this evidence to tell women whatever your problem is, you should be on hormones, anyway, even if you’re doing fine with menopause, because hey, it lowers your heart disease. So some people obviously
got over enthusiastic, and other people weren’t convinced and they said uh-uh, that’s not how it should be done according to Fisher. And in 2004, they organized an even larger study that followed
Fisher’s recommendation that took a bunch of women and took away their decision to decide for themselves whether to take hormone
replacement therapy or not and gave that decision to random chance. Of course, women all consented
to leave it to random chance, and then the dice made
decisions, and the dice decided, rather than the women deciding, who was gonna get it and who wasn’t. And what was really
interesting is the study got exactly the opposite result. Those taking hormones
had more heart disease. So the point of this is not to tell you whether to take hormone
replacement therapy or not because the reasons for taking so usually don’t revolve around heart disease and the effect is small and
there’s many other effects. But the point of this is to say that these two facts can’t both be right. Either it causes heart disease
or it prevents heart disease. Which one is it? We have two studies that both look good. Which one is correct? And the answer is that the
randomized trial is correct. When there’s a conflict in this case, there’s no question which one is correct. And the particular thing that’s wrong with the other study is what’s
called confounding factors. So the main confounding
factor that’s common in studies like this is,
some women, of course, have more money, some have less. They’re more educated,
they’re less educated. Generally better off. And women who are better
off are obviously healthier. And women who are better off might also, back in the 90’s, be more likely to take this cutting edge innovation
of hormone replacement therapy. So if you have women who were, anyway, more likely to be
healthy, doing something, then you’re gonna say,
oh, look, the causal thing is the thing that those
healthy women do, but that’s not the reason that
those women are healthy. Those women are healthy because they’re rich or more educated, whatever. And the problem here is the
lack of a control group. What we know with observational data is whether the women taking it or not taking it do better or worse. But what we don’t know is that if we took that set of women and forced them to do the other thing, how would they do? And that’s the question
when we’re choosing a treatment that we want to answer. We’re not interested in correlation. We’re interested in causation here. We want to know how taking this hormone or any other treatment will cause us to do differently than we would
have done otherwise, which is a counter-factual, which is a statement of causality. So I’m sure you’ve all
heard somewhere, warning, that you have to, you
can’t, it’s very important not to confuse correlation and causation, and just because you have correlation, that doesn’t mean you have causation. What I see when I teach is that students get so enthused about this that they start thinking that
it’s an impassible problem. This sort of Nihilistic
effective that you can never tell correlation
and causation apart. But the brilliance of Fisher,
he found a way of doing that. If you take a bunch of
people, and this one’s not hormone replacement therapy
’cause there’s men in there, And you randomly assign
them to two groups. If you control the cause,
it’s called an intervention study ’cause you intervene, take charge of the cause, and then, of all things, give that control of cause to dice. If you do that, then you
can tell the difference. So you may say, but hey, what happened to all those confounding factors? Why are they suddenly not a problem? And of course the confounding
factors are still there. These different individuals, for example, are wealthier or poorer,
but what we do know is that we can calculate the mathematics of how differences in how wealthy they are translate to differences
in the two groups. We can use the mathematics of probability to do that, and specifically to calculate, if I have a study this
size, and the patients vary this much, how different do I expect the two groups to be anyway, just because of the confounding factors? And the beauty is that’s not just the confounding factors that we know, certainly not just the confounding
factors we can measure. That’s all confounding factors, known and unknown, obey the
same laws of probability when we make them obey the laws of probability by using dice. And then we know how big a difference between the two groups we should expect, and then we can look
for differences between the groups that are
substantially bigger than that. And that’s how we reach
reliable conclusions. So because the randomness,
random assigning to groups, the basics, probably, you don’t have to understand all the statistics, but the main point here is I can say that any particular value on the list is 50% likely to be in
either the top or the bottom, and all of the fancy
math follows from that, and it’s 50% ’cause we made it. So this talk is kind of
ironic for me personally. I’m standing up here
and I’m saying the only way to do science is
randomized assignment, random assignment, which is kind of weird, because I’m actually
primarily an evolutionary biologist, and I study a lot of questions in really deep time,
and you certainly can’t go and randomly assign
things that happened billions of years ago and wait and see in another billion years what happens. So, luckily for me, as
you’ll see in the rest of the series, there are
other ways of getting at answers in cases where we can’t do this amazing innovation of Fisher’s. I’m just gonna give you a simple example that’s not from my field, but it’s from probably most of our daily lives. So here’s Google Translate. It blows me away how good it is, right, that I can just type something and get comprehensible language out. I mean, that’s just fantastic. So I’ll give you a quick, this is all machine learning and
it’s all observational data, and I’ll give you a quick
tutorial how it works. So it starts, of course, with data, lots of data, preferably, and that data is basically just correct translations. Here’s it in English
and here’s what a human translator would make it in such. And then you divide that
data into two groups we call training and testing and we keep them carefully apart. And then you take the training data and the computer scientists will get very excited about telling you the details of the black box, and I’m
not gonna do any of that, I’m just gonna say there’s a black box that does predictions,
and it uses the data to train itself to predict, okay, here’s some English, what would
a translator put in Spanish? And then you take the
testing dataset and you use the black box on it,
and then the final step that seems to be pretty
much inevitable in any study like this is the authors
always declare victory. So, a question to ask is should they have declared victory and just
say the results look good, right, was it correct
of them, what are the circumstances under
which, even if the results look really impressive, is it appropriate to declare victory? And so when we discuss that, we toss out the validity of the study, and I’ll make a difference between internal validity and external validity. So, internal validity means are the claims sound in their own context? So does the claim follow
from the evidence? And the interesting thing
about internal validity is it doesn’t just depend on what sort of study or evidence you have. It also depends on what sort
of claim you’re trying to make. So distinguish between two
different kinds of claims. So in one claim, all you
need to do is prediction, and prediction’s a kind of correlation. You’re trying to predict what
would this be in Spanish? Will this woman have a heart attack? You’re making a prediction. And in the other, you’re
trying to confer a causality, so the causality is
what would have happened if we had done things differently? So if you’re just trying to do prediction, the bar is much lower. So really, all you need to know is that the translations
are actually correct. If I translated, if I put
nonsense in and trained it to do nonsense and then
tested it on nonsense, there’s a technical term
for that in the field. It’s called garbage in, garbage out. But that’s enough. That’s the main question I need to ask if all I care about is prediction. But when I care about causation, then I also need to ask
about confounding factors. If all I care about is prediction, confounding factors are fine. So what, they’re confounded. I don’t care why the woman did or didn’t have a heart attack. I just want to predict whether she will. But if you wanna know what
would have happened differently, if she, then you need to worry
about confounding factors because you need to isolate the cause. Okay, so the other sort of
validity is external validity, and that means what happens when you go outside the context of the study? Obviously, you don’t have, say
hormone replacement therapy is for menopausal women and
there’s not a lot of external validity in trying it out in men, right? Similarly, with Google Translate, I had fun trying to break it. I assumed it had been trained on prose and had fun trying poetry out on it, and here’s The Jabberwocky
by Lewis Carroll, and for those who speak French, the last line here, it’s a little bit misspelled, but it looks a little bit like the mother’s becoming Egyptian, so I broke Google Translate
because I exceeded, it’s wonderful, but I exceeded
its external validity. Okay. So let’s go to another machine learning example that’s really exciting today. A lot of people are excited
of what use we can make of machine learning for cancer. So by now, you hopefully have this idea, alright, the first question we should ask is what is the internal
validity of the study and, in order to even ask that question, you have to say well what is the claim, not just what is the study,
but what is the claim that is being made, what is the purpose? Why would you do machine
learning on cancer? So there’s several
different answers to that. The easy one that you’ve
got lots of internal validity to, is to predict
how long somebody would live. Remember, all we need
to do there is make sure the data is correct for prediction. Well, it’s pretty easy to measure how long somebody lives and then you have the data you need to train a
machine learning algorithm. So that one’s fine. But I don’t think, I
mean, it’s good to know, but I don’t think it’s what
people are most interested in. Another thing that a lot of people do is they use machine learning to classify cancers into different types. So this one, eh, where’s
the internal validity here? We don’t know what the different types of cancers are at the
beginning of the study, right? So we have a problem here. If we don’t know the data we’re feeding in to train with is correct, we don’t know what we’re getting out. So what we can train with, the best we can normally do is we’ll try and train it to give whatever answer
a human expert would guess, but it’s like okay, so why
is it that we’re doing that? And there’s a few different reasons. If it was like Google
Translate, the reason would be that it’s faster and cheaper, but I’m pretty sure that’s not what most patients want for an answer for this. But there are much more
useful things to do, so for example, you can train it in what a human would say, and
then you can say huh, even though I trained
it on what humans said, it keeps stubbornly
disagreeing for this category. Let’s do some more research into that. So it can be used not necessarily to immediately help patients but to identify interesting
cases for future research. And another thing that’s
sort of more directly useful is if you can
take something upstream and use it to predict
downstream, so for example, if you have a radiology
result and there’s a high rate of it being false
positive, you could train a machine radiologist not
on what a human radiologist would say, but on the subsequent data that comes from a biopsy, and maybe if you got that trained well, you could avoid a bunch of biopsies
because you could say with more confidence,
perhaps, than a human, if the machine training happened to work better than the human training. So there are uses for
classifying cancer types, but I don’t think they’re really what people most care about. What people really most care about is they want that original evidence B. They want their cancer genome
to go into a prediction machine and change what
treatment they’re gonna get so that it’s the best possible treatment for them with no uncertainty. And it just can’t do that because, should I take treatment A or
should I take treatment B is not a prediction question. It’s a causal question. And in cases like this
where you’re asking a causal question, there is only one method that we have to reliably answer them, and that one method is Fisher’s method. So we can ask well how do, you know, Fisher’s method is the
statistics of groups, right? How are we gonna use Fisher’s method to personalize treatments? And there are a couple of
different ways of doing it. One is to use these types. So you have some reason to believe that different types are different, so gather a bunch of patients that are more similar to each other, that are of that type, and do a randomized trial just on them. And then another type, do a randomized trial that’s just on them and see, maybe drug one works better for the first type, and drug two works better
for the second type. So the machine learning here can be used to generate hypotheses
that these different cancer groups might be
different, but to answer the question that the
patient really wants to know, how should I be treated,
hypotheses aren’t enough. You have to follow-up
and do the randomized experiment to truly get the answer. There’s another method, actually. You can randomize the
whole prediction machine. If somebody claims to be able to build a prediction machine that
will take your genome or take your knuckle bone
and tell you with total certainty what you should do, great. Let’s test the whole machine. So we can take our group of patients, and one group can be
assigned to the machine and the other group can
be assigned to whatever the current standard of care is, and so many of the patients assigned to the machine are gonna
get the standard care, but a few of them won’t, and this study will answer the question that the person in the starting group should be asking, which is, statistically, on average, am I better off doing this personalized medicine thing, or am I better off with the standard of care? And it can do that even
though the personalized medicine thing does different
things for different patients. So randomized trials still work. They can do all sorts of things. And they’re known as the gold
standard because of this. And the gold standard makes it sound sort of venerable and stuffy. It makes it sound like, it’s much more fun to attack them than to defend them. You don’t want to defend something that old and stuffy, right? But I think what people don’t get about us is how recent they are. They are less than a hundred years old since the first one was done. So think about this in the context of the history of science. We understood the fundamentals of quantum mechanics before we understood that if you wanted to know which medical
treatment was better, you could try randomly dividing up people and seeing which group did better. Like that concept was harder
than quantum mechanics? But that’s what history tells us. It’s such a simple idea and yet so hard. So it took a while. It wasn’t instant. Fisher did this in the
’20’s, but it wasn’t like doctors started doing it
immediately afterwards. In fact, it took an
unusual series of events. So the first published randomized trial was on pulmonary tuberculosis
and being treated with streptomycin, and it was in 1948, so there was a bit of a delay. And it’s easy to sort
of say, well, of course, it would’ve happened
eventually, some things happened between the 1920’s
and 1948, but it isn’t inevitable in history that
this would’ve happened at all. People thought that it
was thoroughly unethical to withhold a treatment from people, and of course, people still do think if you’ve just discovered this wonderful new antibiotic, how could you possibly deny people with a potentially
deadly illness this drug? How could you justify
having a control group? So the history of this is interesting. The reason this worked is because streptomycin was discovered
in the U.S. during World War II, and in
the UK, after the war, when times were pretty tough there, and they had received exactly one shipment of streptomycin from the U.S. It wasn’t nearly enough to treat everyone and it had been shipped
to the medical research council which was completely
controlled by statisticians, all of whom were fans of Fisher, so that was the historical circumstances. They were in physical control of it. So they argued alright, fine, we get it that you don’t want to withhold treatment, but unfortunately,
obviously, there’s no choice. We don’t have enough. We can’t give treatment to everyone. So we think as statisticians that the fairest thing to do would
be to draw straw for it. And it just so happens that if we draw straws for it, we can use Fisher’s method and we can learn all sorts of science that, under normal circumstances, it wouldn’t be ethical to do. And most people, most of the doctors’ response was you can’t do that. That’s still unethical because I know that this kind of patient is more likely to do well, and this other kind of patient is less likely to do well, so you should prioritize who gets it on this basis. And then of course a different doctor might make a similar argument but argue for a completely
different kind of patient. And then the statisticians said too bad. We’re in physical control,
so if you want any, you gotta take part in our study. You don’t have any choice here. So the study went to head, and, so, okay, so now we can ask, would
it have been unethical? Was it a wonder drug? Everyone assumed it would be fantastic. Was that true? And the answer is kind of mixed bag. The patients did improve
substantially at the beginning, and then they evolved
resistance to the antibiotic, and this was the first
time anyone in science had seen that phenomenon
of evolution of resistance, and you’ll learn much
more about microbes from, wherever she is, Dr.
Vercelli’s talk in a few weeks. And then on the flip side, they had substantial side effects. They didn’t live longer or
recover more often in the end. They had some temporary recovery, but they also had serious side effects, so in 20/20 hindsight, this
would not have been unethical. The two groups actually
kind of came out in a wash. The real winner of this trial was not instantly treating tuberculosis. It was the randomized trial. ‘Cause after these unexpected results were found, the tuberculosis doctors said ah, okay, you persuaded us. We can’t predict what’s gonna happen. We’re gonna need to
keep doing these trials. So, for example, the standard of care at the time was enforced bed rest. Pretty Draconian. People weren’t even allowed to
read and write in many cases. And the people who were
allowed to go home, randomly assigned to go
home, did equally well. No difference. The standard of care was useless. And it was essentially,
massively, the sanatoria were all closed and people were essentially released from prison. You know, we couldn’t cure them yet, but they got to go home. And then, later, of
course, we kept playing with the protocol for the antibiotics and found a way to get it to work. But it wasn’t the first trial, and the method was needed
in order to get there. So this comes into the question
of double-blind experiments which means neither the
patient nor the doctor knows who is getting
which, so most of you have probably heard that the
reason we do that is ’cause the placebo effect is so powerful. (audience laughing) Like that’s funny and all, but
it’s actually not the reason. Not hysterically. So I could give a whole talk on placebos, but I don’t have time to
give two talks right now, but what I’m gonna say
briefly is that they’re not nearly as powerful for most things as popular discussion
would make them sound. They do have some effects for subjective outcomes, for behavioral
outcomes, but nobody’s ever seen a placebo effect
for whether or not you live or die or get over infectious illness or anything like that. You can just assume that there aren’t sort of mind over matter type placebo effects for those kind of conditions. The real reason that we
need to do double blind experiments lies in the
history of several failed trials, before the streptomycin ones. And what happened during those trials is the doctors cheated. ‘Cause they were so sure they knew which patients should be getting the drug and which ones shouldn’t that they would manipulate the experiment
and cheat and get patients into the group, and so
the only way to be 100% sure that doctors aren’t doing that is for them to not know what the groups are. So that’s the dirty secret
of the history of medicine, of the real reason we do randomized trial in a double blind format. And that’s because ultimately, even though we do them, we don’t
like them ’cause we don’t like uncertainty, we
don’t like probability, we don’t like leaving human
fate to the roll of dice. And we want there to
be a reason for things. We want there to be causes. And you can hear that in the language we use around medicine. So, for example, a question people often ask is well, some people respond to this drug and some people don’t, so you want to do research into what it was about those people who responded and what it was about those
people who didn’t respond. Let me write that language down. Respond. That’s a causal statement. We don’t know causality, and
we’ve got two observations. This patient took the treatment
and this patient did well. We don’t know whether
they would’ve done well anyway, the counter-factual that you’d need for a casual claim. We don’t know that. And you can see, and then we go further, and we say, okay, not
just that the treatment was a cause, but there were more causes which was the particulars of that patient. Not the particulars of
that patient in doing well or badly, but the
particulars of that patient in responding to the other cause. So then you can see what’s wrong with this language if I flip it and say what happens if you gave, not even a
placebo, no treatment at all. We don’t talk about responding
to nothing treatment. But we have the same
information, that they weren’t treated and that they did well. So we’re slipping causation in in a place it really doesn’t belong, and we do that because we want everything
to have a cause. That’s part of our psychology. And so we see it from
remarkably little evidence like one patient, even
as we deny it from large randomized trials because
those large randomized trials force us to accept the residual, inherent uncertainty of the universe. And that’s how it is. That is the universe. I can’t change it,
right, by doing science. I can reduce the uncertainty. But I can’t get rid of it. But, when I talk about randomized trials, when you accept evidence that comes from randomized trials, your
brain can’t process it unless you accept uncertainty. That evidence A requires you to accept uncertainty, and that evidence B doesn’t. So evidence B sounds good. But the irony is that if
you go for evidence B, you’re turning your
back on the best method that science has to reduce uncertainty. But yeah, the world is
complex and that’s how it is. So, but people don’t
want to say that that’s why they don’t like randomized trials. So human beings, if our actual reason is randomized trials, uncertainty, ick, you don’t want to say that that’s what your problem with it is, right? And humans, especially intelligent humans, are remarkable at the
process of rationalization and coming with really
intelligent-sounding and alternative objections,
and, saying no, no, no, my reason for disliking it isn’t ick, this is my real reason and then
comes something very fancy. So I’m gonna go through some of those fancy things that I hear a lot. And I’m gonna go through
what one of those common criticisms is, and then I’m gonna tell you what I think the obvious solution to it is and then I’m gonna tell
you what I hear instead, and, spoiler, hear instead always involves abandoning randomized trials. So a common criticism is the patients in randomized trials in
medicine aren’t representative. They’re younger than normal patients, they’re healthier than normal patients, they have fewer other
conditions at the same time, and that’s all true, absolutely true, and I agree with them
that this is a problem, and there’s an obvious solution. Enroll more representative patients. Problem solved. And it’s really interesting, that’s not what these people tend to suggest. They tend to suggest using
non-randomized forms of evidence, and I know that it’s a
crazy rationalization because sometimes I even
seem them suggest forms of evidence that don’t have
better external validity than the trials they’re criticizing. So an objection we’ve already
seen, in the streptomycin case is the idea that it’s unethical. How could you possibly deny somebody this treatment and have a
control or a placebo group? That’s really unfair to
the people in that group. So, obviously, in
streptomycin, that wasn’t the case, but we could ask. We’ve been doing trials for a while now. How often is the case? And one example is in pediatric oncology, people went over a period of time and they looked at every randomized trial that had been done in that period, and they said for how many trials was a kid better off getting the new treatment that everyone was excited about and thought would be better,
and how many trials were they better off getting the
former standard of care? And the answer came out
pretty much exactly 50-50, so there isn’t an ethical problem. And some of this has been
done in some other fields and other areas of medicine, and probably we should do some more,
but as best we can tell, there is no ethical
problem, and the solution to this so-called ethical
problem is to embrace the truth, that it’s not there, because
the idea is if the doctor thinks they know, well, the
doctor doesn’t actually know. Nobody knows. Until the evidence is
there, it doesn’t matter how smart you are, what
your background is, what you’ve studied, if the evidence doesn’t exist, then you don’t know. But what gets proposed
instead, for example, are things called right to try laws, I won’t talk about them in any detail except to say that they have the effect of undermining FDA’s ability
to enforce randomized trials. So another common criticism is we can’t make drug approval depend on randomized trials because they’re
so slow and so expensive. So the first thing I want to say is okay, how slow and how
expensive depends on what? If you want to look at, say,
how people with antidepressants do over six weeks, which is
what most of the studies are, all the studies, six to eight weeks, then all you need to
do is wait six to eight weeks after enrolling them and so on. It’s quite quick. If you want to know what
we actually probably care about, taking them
for two years would be the effects, there is no human way to get a result in less than two years. You have to wait for the patients to have that much time pass, and that’s not about the randomized
part of the trial. If you want to know a two year outcome, you have to wait two years, okay. So it’s not about randomized. It’s about what you’re interested in and it’s true for all forms of evidence. As for the expense, one of the components of the expense is how big
the trial needs to be, and one of the things that
comes out of the mathematics of statistics is if you
want to see an effect, a difference between the
groups that’s half the size, you need four times as many patients, and so that relationship has two effects. Firstly, it means if you want to look at really small things,
like small preventative actions, you need an enormous study. But there’s another side to it, which is if you want to look at really big effects, a tiny, tiny, tiny study
is actually enough. So I’ll give you an
example that I wrote about during the West African Ebola
outbreak a few years ago. And just say you have
Ebola and it’s killing 70&, and you have a new
wonder drug that reduces that to 10%, and I did
the math of statistics on it and said seven doses of the drugs, randomized 14 patients would be enough. And you’d probably get the
answer in a couple of weeks. And you don’t need to do anything fancy. You just need to make some placebos, make some drugs, hand
’em out in the field, and have the doctors report back whether the person died or not. That’s it. No big change to their
operating procedure. Of course, when the effect is that big, you will see it even
without a randomized trial. When effect sizes are enormous, you don’t need fancy statistics
or methods to see them, but it would take more time, more expense, more uncertainty, at the end, ’cause you’d say aw, yeah, but all the
ones who got the drugs were treated at a better
hospital than the ones who didn’t get the drug
and maybe that’s the reason and you’ve got all these
confounding factors. So, in fact, in a case like this, the fastest way to get the evidence that we need is randomized
trial, and that’s generally true. If you give it out from
the very first dose in a randomized trial,
from the first patient, it’s gonna be faster than any other method of getting the answer. So yes, so we’ve talked
about that being slow and expensive is not
about randomized trials. Being slow is about
what sort of time scale the outcome is on, and being expensive is partly on how many patients you need, but it has other factors, and to solve those other factors that are specific to the randomizing part, my answer would be to make it easier to randomize. And the U.S. is not very well set up for this, but the U.K. is. So in the U.K., they can do things. Firstly, people who run trials don’t need to pay for the healthcare of the participants because the government pays for that anyway, and secondly, they have centralized
electronic health records, so if I go in and I’m
at high risk for heart disease and they decide
to give me a statin because I’m cholesterol’s so high, and they don’t want to
know which statin is better than which other statin, press a button, says okay, you’ve been randomly assigned to this statin, enter it in, and then some other researcher ten years later can pull the data on whether I had a heart attack or not that’s in the files. Simple. We have decided to make randomization difficult because, in part, we are morally uncomfortable with it. It doesn’t have to be that way. And of course, what’s usually suggested is not to do randomized trials instead of systematically
experimenting on a few people in a way that leads to
us learning the most. The alternative is ultimately
to unsystematically experiment on a much
larger number of people in a way that teaches us less and that gives more scope for the
manipulation of the results. So on the good news front, in some areas, randomization’s really taking off. Silicon Valley has given it a new name to make it friendlier and more appealing. It’s called A/B testing now. Some of you, I’m sure many of you were here for last year’s lecture. We had Luis von Ahn, who
talked about Duolingo and they randomized people who were trying to learn a language to
different conditions and get information and make their program better and nobody complains. They’re doing randomized trials on you all the time in Silicon Valley. Sometimes we complain, but usually we complain, the basic thing is if A is ethical, and if B is ethical, and if doing either A or B, essentially randomly, but totally unsystematically, depending on whatever, if that’s ethical, why should it be unethical to use dice to
decide whether to use A or B? So a lot of our objects are ultimately objections to we think that either A or B might not be ethical, and
that’s a real objection, but if we’re totally sure that A and B are both ethical, we
shouldn’t put any barriers up. And when it comes to things
like tweaking an algorithm that helps you learn a
language, we usually don’t. I think there was a lot of outcry, a Facebook study that was
tweaking your emotions because we felt bad about
A and B in that case, but when we feel fine about both A and B, then this should be good
in all kinds of areas. So there’s a lot of potential here. They’re increasingly,
but not nearly enough, being used in education, they’re increasingly being used in economics. They’re not being used in criminal justice, but they could be. You could do a randomized
assignment of cases to drug court versus criminal court and follow up outcomes of
interest however many years later. There’s enormous potential to improve so many things about our life
if we do the experiments. If we gather the data. If we design it in the way Fisher said. If we accept that the Universe is intrinsically uncertain, and so it’s okay to stay uncertain, it’s okay to get that uncertainty down, except that it’s never going to zero, and it’s by accepting uncertainty that we learn the most in an intrinsically
uncertain world. So thank you. (audience applauding)


  1. I just want to make the comment that in the knucklebone example having all of the information about the angle of incidence, center of mass etc. Would actually give you more reliable information to deal with if any one the variable was decided to be constrained for the experiment. Being that if you chose to constrain any variables you could see to what degree each variable is a determinant factor in increasing the likelihood of either side to occur more frequently. The statistical model would emerge as you approximated the operating variables. If constraints became part of the game your statistical model alternatively would fail to predict. It is only more useful when a person has no knowledge of how to make use of the more complex data. Also in the first part where you ask if it is better to use a statistical model or a genome, similar problem. If you are going to compare something like similar age, geographical area etc. or genome then it very much depends on the condition some diseases will favor the genetic approach actually. In fact with appropriate knowledge it will always be favored. I think you misunderstand that since the advent of non-linear dynamics it has been understood that 'random' as was traditionally defined is non-existent. It is scale- relative dependent. Random implies hidden variables. It is a useful concept to invoke when variables are unknown or impossible for what ever reason to know. Otherwise it is convenient.. not the natural explanation of how the universe is operating. Take random walks, take Brownian motion etc. All phenomenon influenced by variable outside the scope of the scale of observation. Example: our social behavior leading to complex phenomenology that literally changes the landscape of the earth is a random operation when viewed at the scale of the planet, however at our scale the are causes that are not random.

Leave a Reply

Your email address will not be published. Required fields are marked *