An Interpretable Machine Learning Model of Biological Age [transcript]

Written by Christopher Kelly

March 22, 2019

[0:00:00]

Christopher: Hello and welcome to the Nourish Balance Thrive Podcast. My name is Christopher Kelly and today, I'm delighted to be joined once again by my very special guest, Dr. Tommy Wood. How are you doing this morning, Tommy?

Tommy: I'm doing very well, thank you. It's sunny in Seattle, which is nice.

Christopher: Sunny in Seattle. I can't believe it. I thought I was doing the right thing sunshine-wise by moving to Santa Cruz and I've hardly seen a bit of it in the last few days. In fact, I think I've even gotten to the point where I'm starting to get a little bit of insomnia just because I'm not getting enough light during the day. You know when you get to that point where you become restless at night and the dogs start annoying you and your one-year-old kid starts annoying you and you realize it's not them; it's you.

Tommy: Yeah. You're in the continuous exposure to gloomy light that messes everything up.

Christopher: Yeah. I'm outside recording this podcast in an attempt to get more daylight during the day. Hopefully that helps me out tonight. Okay. Today, we are going to talk about our recent paper, an interpretable machine learning model of biological age, but before we go there, I would like to start with a giant tangent that is snake bite. Tommy, can you talk about snake bite?

Tommy: Yeah. We were in Costa Rica at the beginning of January at the Flō Retreat Center run by Dr. Ben House and his wife, Steph. We were having a generally good time. On my last night there, you and I and Ben went for a walk just along the road near where we were staying and crossing a stream and I got bitten by a snake.

Outside of the Spanish-speaking countries, they call it a fer-de-lance. In Costa Rica, they call it a terciopelo. If you Google this snake, you will see a pretty horrific-looking leg from a kid who was bitten by one and the venom can have some pretty nasty effects in terms of clotting and then you always get an infection. Ben took me straight to the hospital. I was given antivenom pretty quickly. I was given antibiotics pretty quickly, but I ended up with a pretty serious infection that spread all the way up my leg and I had an abscess. I needed multiple different types of antibiotics. I also had a reaction to the antivenom itself, so I needed steroids and antihistamines. I ended up spending about 11 days in the hospital before they released me and I could fly back to the US and continue my recovery. It was a pretty interesting ordeal. Luckily, no lasting damage other than some scar tissue just above my ankle on my right leg, but other than that, I ended up getting pretty lucky.

Christopher: So many questions. Where to start? The reason I wanted to talk about this is because I think it's a really good example where the choice to go to the hospital is really, really obvious. Working with clients, I think one of the things that we do really well is help people navigate the system. When is it a good idea to go and see a licensed physician and when is it a good idea to try some diet and lifestyle stuff on your own at home? It's not always obvious.

I think we've worked with lots of clients recently, but probably even since the beginning who are terrified of doctors. They would rather stick a needle in their eye rather than go and see the local GP. I think sometimes that fear is unwarranted and I'd like to think that what we help people do is navigate that system. We've talked a lot on the podcast before about iatrogenic antibiotic injury and the overuse of antibiotics and some of the problems that it can lead to, but here we have a really fantastic example where going to the hospital was obviously the right thing to do. Could you talk about the antivenom itself? This is a really interesting thing. What is antivenom? It's a word that most people know, but what is antivenom?

Tommy: Antivenom is basically horse serum that's got the antibodies against the various factors that show up in the venom of the snakes. This class of snakes known as pit vipers -- and they use the same antivenom for most of these snakes. I think the antivenom that I got is actually the same antivenom that you give for a rattlesnake in the US, I believe, because they're in the same sort of family or genus of snakes.

Basically, what happens is you inject horses with the venom and then you collect the antibodies out the other side. There are two types of allergic reaction that they might expect from people who are injected with it. There's the first, the early allergic anaphylactic-type reaction where you get a sudden histamine release basically as soon as you're exposed to this. People who have severe allergies to peanuts or shellfish or something might know something like that. There's a more delayed type hypersensitivity, they call it, where these foreign proteins -- complex -- then get deposited throughout the body and then your body reacts to them more slowly. I didn't get a reaction until five days after, four or five days after I was given the antivenom. I actually got very little venom from the snake itself. My main problem was the infection that came with it.

[0:05:12]

Christopher: How was this helpful?

Tommy: How was the antivenom helpful?

Christopher: Yes, sorry. I just assumed you finished your thought. There's a bit of a delay on the line here.

Tommy: The way the proteins work in the venom, they're twofold. The first one is a direct, basically digestive effect. It starts to break down the tissue locally, and then there's a more systemic effect where the proteins affect clotting. If you have the antibodies circulating that bind to these proteins then they stop them having that effect. Fatalities from a snake bite are very, very rare nowadays because of the antivenom, but the more serious effects could come from probably the clotting. So if you get a lot of the venom, it interferes with your clotting and you basically hemorrhage into your organs or your brain. The thing that they've used to track whether they should give me more antivenom was to look at the clotting of my blood, so they don't measure anything to do with the venom itself. They measure the physiological effects that the venom might be having. That's how that works.

When I got the delayed type hypersensitivity, I basically had a rash over my whole body. At first, I thought it was a reaction to the new antibiotic that they've given me because vancomycin, which is the antibiotic that ended up doing most of the job, can also cause something called red man syndrome, which is very similar. To start with, I didn't really know which it was and eventually they decided it was the reaction to the antivenom, but then I also had fevers from that and it was basically like a whole body inflammatory response while you clear out these foreign proteins. I needed IV steroids. I needed IV antihistamines to dampen that down because that could be quite dangerous. It can lead to organ failure, so that's something they got on top of fairly quickly.

Christopher: Talk about the snake oral microbiota. I find this absolutely fascinating. In the beginning, I thought, I don't understand. What are the chances of getting an infection just after having this tiny little tooth that's in this not very big snake's mouth enter your tissue? What are the chances of that causing an infection? Those sorts of injuries happen all the time and they don't always result in infection, but there's something special about the snake oral microbiota, right?

Tommy: Yeah. This I didn't appreciate either even when I went into the emergency room and I've got the antivenom first and then a few hours later, I've got prophylactic antibiotics. They said the first problem that you get with a snake bite is the venom and then the second problem is the infection. I was like, okay, fine, I might get an infection. What I didn't realize is that you always get an infection and there's basically a wide variety of different bugs that live or grow or are hosted within the venom glands of the snake and it's almost like a symbiotic relationship because that bacterial infection may be the thing that kills the prey. That means the snake can eat it.

There are three different ways that the venom could get you. There's the local digestive effect of the enzymes breaking down tissue. There's the systemic hemorrhagic effect. There's also the infection and maybe the snake is going to wait a couple of days while that infection really takes hold. So whenever you get bitten by these kinds of snakes, there will always be antibiotics. Again, it's the same if you got bitten by a rattlesnake in the US. You'd get a cocktail of intravenous antibiotics pretty much immediately to try and stop any serious infection taking hold.

Christopher: So they don't know which antibiotic to use. How do they select an antibiotic?

Tommy: In Costa Rica, they started fairly narrow, so penicillin plus clindamycin, which works against anaerobic bacteria largely. For me, within two or three days, it was obvious that wasn't going to do the trick, so then they started to broaden out and eventually, I was on the second/third line of antibiotics before things really improved. In the US, I looked at the rattlesnake antibiotic guidelines that they use here in the trauma center in Seattle and it's basically all of the things that I got but all at the same time right from the beginning. They're like, "We're not messing around. We're just going to carpet bomb thing and get whatever it is that's there."

Christopher: And so do you think that would've been -- with 20/20 hindsight, was that the right thing to do in your situation?

Tommy: Yeah, I think so. I would've responded faster. I probably wouldn't have had an abscess that required a very painful drainage, but in reality, I understand why antibiotic stewardship is very important. You might not think of it, but the Costa Rica Healthcare System is actually pretty good. It's socialized medicine. They're going to be very good for the emergency stuff. They're very good for snake bites. Costa Rica is where they do most of the research and make most of the antivenom for a lot of different types of snake.

Part of having a good planning long-term is that you make sure you only use the antibiotics that you need for the period that you need them, so starting very narrow and then broadening out makes a lot of sense in that case. So before they could give me vancomycin, which is like I said the antibiotic that ended up doing most of the work, I think, they had to get permission from a central pharmacy in the capital to give me the antibiotic.

[0:10:05]

Even though the delay probably ended up making things a bit worse for me, I absolutely understand why they did that and it kind of shows that they really know what they're doing. That makes me quite faithful in the way they run their healthcare system.

Christopher: Are you worried about any downstream effects of your microbiome? Do you think there's any chance that that might be affected? Has it been affected?

Tommy: Yeah. A lot of people have mentioned that and it's possible, but in reality, I've chosen not to worry about it because there are so many things, other things that are going to cause my gut microbiota to shift back even if they were initially affected. One thing particularly -- so it's not the case for all the antibiotics that I got, but for vancomycin, people may recognize it as an antibiotic that you take orally for C. diff and it doesn't really cross the gut very well. So if I take it orally for a C. diff, it stays in the gut. If I take it intravenously for my snake bite, it doesn't end up in the gut. I've chosen not to worry about it and some of the antibiotics that I got I don't think are really going to affect the gut microbiota and I think everything's been fine --

Christopher: But you haven't had any symptoms?

Tommy: No. I've been fine.

Christopher: Okay. What about muscle mass? Did you lose much muscle mass? How long is it going to take to get it back?

Tommy: I lost a lot, but it actually came back really quickly. I'm probably back to normal already even though I haven't been doing -- or at least within a couple of pounds. I've been going to the gym and Zach made me a kind of a pared-down program, but most of it, I just ate constantly for about three weeks and most of it came back quickly. It was a calorie deficit, inflammation, intravenous cortisol kind of thing, but muscle satellite cells, they stay there and that comes back pretty quickly, so that was fine. I was worried initially, but I was back to close to normal within a month.

Christopher: That's great. That's really good to hear. Would you have any advice for the listener on how to navigate the system? Is there a way to do it? In this case, it's really obvious. You've just been bitten by a snake. "Take me to the fucking hospital." Sometimes it's less obvious. Maybe you need to go see a gastroenterologist and have a scope put down and put up and just have a look, find an image of what's going on here, but are there any general guidelines that you think about when trying to make the decision, "Should I get my physician involved or is this something that I should try and use some ancestral health principles, diet and lifestyle modification to overcome?"

Tommy: It probably depends on the severity and where you're starting from. If you're coming from zero, standard Western lifestyle and diet, and you're not incredibly sick, maybe you're just tired, you don't sleep well, maybe you have IBS or symptoms like that, some constipation, some diarrhea, some bloating, something like that then I think the ancestral health principles automatically come into play. By the time people come to us, we end up recommending that they go and see their doctor fairly frequently. You have somebody who's not responding to anything that we might think would be helping with their gut health and then I want to make sure I'm ruling out -- or maybe this person actually has inflammatory bowel disease and we're messing around with diet. Obviously, diet does help inflammatory bowel disease, but I want to make sure that you don't need to get on top of something more serious to start with. Then an endoscopy, a top and tail at your gastroenterologist is going to be incredibly useful.

Another example is people come to us and they have low testosterone. We've seen low testosterone on the DUTCH. We don't use the DUTCH that much anymore. We prefer blood hormones, but you see low testosterone and then you do a full hormone panel and their prolactin is high or there's something up with their pituitary hormones. We've had a couple of people who ended up having pituitary tumors and that is something that is very detectable first on the blood tests then on an MRI scan and then it's eminently fixable either -- if it's micro -- they call it microadenomas then you can treat it with a medication, or if it's a tumor then a surgery. It's usually pretty successful. So messing around with testosterone problems -- and yes, maybe you have a history of traumatic brain injury, multiple concussions, and your pituitaries have just given up. There's not much you can do about that other than replace, but if you're messing around with a functional medicine practitioner and they saw low testosterone on your DUTCH and they're doing all these various other things and you actually have a tumor in your head, I would much rather be a little bit more cautious and have somebody just do a quick scan of your brain and make sure that there's not something going on in there first before we start worrying about supplements and all this other kind of stuff.

I guess we've self-selected a population that have exhausted some of those other avenues, but that example of a pituitary tumor has happened enough times for me to say it's definitely out there and people are seeing it and there's not something -- that's one example of where it's a really good idea to go and see a real doctor.

Christopher: A real doctor. And of course, the two don't need to be mutually exclusive. It's not like, yeah, you can stay up eating pizza and watching Netflix because now you're going to see an endocrinologist, right?

[0:15:11]

Tommy: No. Anytime you do any of that, of course, the recovery process and the long-term health process is still going to be grounded on those same principles, those same things are still important, but eschewing modern medicine can be really dangerous. Let's just put it that way. There are some very easy screening tests that you can get done, rule out, whatever it is, and this often happens. We'll say, "Well, let's just get this test done." They go to the doctor to do this. Let's just rule out the bad stuff and then we can start working on the other things. I think that's the safest way to do it and then that also ends up getting the best outcomes for the client.

Christopher: Okay. Well, let's shift gears and get back to talking about an interpretable machine model of biological age. Why don't we start by talking about the importance of biological age? I'll give you my take on this.

What I think is missing from health as it stands is a feedback loop. I'll give you an example. I was riding my bike in the woods in Santa Cruz at the weekend and I was going at quite a clip and there was a tree down. There was a hole where the tree root had been. I didn't think about it very carefully and I just rode my bike straight into this big hole. The front of the wheel went into the hole. I went over the handlebars. I didn't even get time to put my hands out. I just went face first straight into the dirt. As it happened, I was okay. Nothing really bad happened, but what I did get was some important feedback about how I was doing on the bike at that moment. I found out right away that I wasn't really doing it right.

I think that is missing from health and health span in general. We may call these recommendations, these ancestral health principles that everybody really should be following, I think. You don't really know. Sometimes you're lucky and you get feedback right away like you get some symptom that goes away almost immediately, but for the most part, you don't really find out right away. I think it makes it very, very difficult to learn when that feedback loop is missing. I don't know. What are your thoughts on this, Tommy? Why do you think biological age is important?

Tommy: I guess for that reason, it's very important. Can you pick up a signal of cellular aging which is now being considered a disease early on and in a way that you could intervene and see feedback such that you know that what you were doing is helping, is moving in the right direction because most of what we see at the moment is supplements, coffee enemas, intravenous infusions, stem cells. If this stuff is helping with aging, do I have to do it once? Is it working? Do I have to do it for the next 50 years? Will I only find out whether it worked in 50 years? All of the things in that space, you don't really know the answer to that, so picking up a signal is very important. At the same time, figuring out what it is in terms of the environmental inputs both on a population level as well as an individual level, what's causing that difference in terms of the speed of aging in one person versus another, and you need some kind of adaptive and responsive signal or biomarker so that you can start to figure that out.

This is becoming a very popular area often with blood tests, various machine learning approaches, statistical approaches. Other people are looking at epigenetics. Some people are even looking at the gut microbiota and how that changes over a lifetime. People are very interested in this and I think largely some of the drive is going to be pharmaceutical. Can we pick up a signal so that we can do faster clinical trials on drugs that might slow the aging process or slow disease processes? That makes a huge amount of sense to me why they would want to do that, but equally, we then might be able to use those same approaches to leverage or understand or intervene with some of the stuff that we like to do in terms of diet and lifestyle and environmental approaches.

Christopher: What other tools are there out there that could do the same job? Is there anything that's more established that's being used in the field, a test that I can do that will tell me what my biological age is? Is there anything else out there like it?

Tommy: Yes, depending on what you're looking at. One thing people might think of straight away is telomeres. You can do a telomere test. Don't do that. It's nonsense.

Christopher: Oh, really? Okay.

Tommy: Yeah.

Christopher: Why is it nonsense?

Tommy: Because first of all, establishing a baseline is really difficult. When you do telomeres from a blood test, you are measuring the telomeres in white blood cells because obviously, there are no telomeres in red blood cells. As you know, Chris, people that we work with, the results coming in from the Blood Chemistry Calculator, all our clients, the ratio and the number of white blood cells is really dynamic and really different from person to person, and also the length of the telomeres, just the baseline in different white blood cells, is different and it also changes during the day.

[0:20:08]

It's so dynamic. Those populations are so dynamic that they change all the time that creating a stable marker is going to be really difficult to get, a stable reading of telomeres. Yes, if you do it on a population level, you'll see a signal, but in reality for the individual, that's going to be really hard to turn into something useful, so save your money.

The alternative is epigenetic testing, which is based in much better science, but it's incredibly expensive. At the moment, you're looking at $1000 to $2000 per test, and then if you wanted to intervene and repeat that on multiple time points, that rapidly becomes completely untenable. In the future, that might change, but at the moment, it's not really something that I'd recommend unless you have a lot of money. There's a lot of really interesting research on different environmental inputs and then lifestyle changes, how those affect epigenetic methylation. That data does exist or at least starting to exist. There's also data on how epigenetic shifts happen with aging and that seems to be -- basically, every time you repair DNA, you have to move your methyl groups around. You have to move your proteins around that protect and hold the DNA. When you finish fixing it, you have to move those proteins back and the methyl groups back, but they never quite go back to the place they were before. They're close, but not quite. Over time, it causes these shifts in the DNA that you could detect and that's a much more physical, robust shift in terms of the aging signals. That will become cheaper and better.

The other method that's very similar to what we did, aging.ai do have some algorithms where you can put in blood tests, a photo of your face, some other data, and they can give you a biological age as well. The problem is that you can -- it tells you your biological age, but then you don't know what to do about it, which is kind of the problem that we've been trying to attack. Their predictions may be more accurate than ours. We don't know, but what's different is the fact that at least we can tell you why you've got the answer that you did, which then means you might start to be able to do something about it. Of course, if I do an intervention which changes your biological age based on the machine and the algorithm, I still don't know whether I've then made you live longer, but I'm much closer to answering that question than anything else that exists currently.

Christopher: I think it might be good for us to say that we're talking about this published work, which I will of course link to in the show notes that you can find for this episode over at nourishbalancethrive.com/podcast, we made a decision several months ago now that the Blood Chemistry Calculator, the software that uses this algorithm, is for practitioners. The reason that we made that decision is because I moved the problem around. Many months ago, maybe even a year ago or more, I said what I'd really like to do is create Tommy or Bryan in a box and you could feed your blood chemistry into that box and outward pop a report that would tell you what Tommy or Bryan was thinking.

So far, I have been able to create no such thing. Really what I've done is move the problem around. So before you had this problem of how do I interpret my blood chemistry, now you have the same problem, which is, how do I interpret my Blood Chemistry Calculator report? So the guidance of a practitioner is required. However, there is a free version of the Blood Chemistry Calculator report. I will link to that in the show notes for this episode where you can input your blood chemistry markers and then see our predicated age score together with the explanation. The explanation, I think, is the most important part. Should we talk about that, Tommy, the importance of explainable machine learning? Do you want to say something about that or shall I?

Tommy: You can definitely give the most background on it, but basically, the way things have shaken out and is using a lot of techniques actually developed here in Seattle at the University of Washington basically show you first on by the population level how a certain marker changes your biological age so then you know, okay, is lower better or is higher better or is somewhere in the middle better, which is kind of the starting point. Then there's also you can create an individual output which says that your biological age was created based on these markers, so this marker gave you this many more years, this marker gave you this many more years, this marker gave you this many fewer years, and then you can see most of my biological age is increased because I have high fasting blood sugar. Okay. I could do something about that and then you can think about ways to intervene. That's what's most interesting about it in my view.

Another thing that we did is that based on -- for each marker for men and women, we've created these plots which show on average overall the participants used to build the model how a marker at a given level, say glucose, would affect biological age then you can start to see where do I want to be in the range to give me the highest likelihood that I'll get a lower biological age?

[0:25:13]

Then you can start to create some reference ranges that might be more physiologically useful in terms of what's affecting the physiological aging process and then you could start to target those. You can create more optimal ranges based on the algorithm rather than just saying what's normal in the population.

Christopher: I think I can jump industries and talk about something else that'll make this more clear. I'm sure that many people listening to this podcast, they're working in industries using very similar approaches, but for different applications. For example, take your FICO credit score. This is some number that's assigned to you as an individual that apparently says something about your risk for loans, the FICO credit score. You may not know this as an individual when you apply to find your FICO credit score, but somebody somewhere has to be able to explain that. Imagine my occupation. I work in finance as a day trader, which you might say is very risky, but then I'm married, which makes me less of a risk. I'm a homeowner, which makes me less of a risk. I've got kids and that makes me less of a risk. You can imagine all these things. We would call these features. These are the independent variables, and the FICO credit score, that is exactly like our predicted age. For you to believe in this score, for you to trust this score, for you to find value in this score, you have to understand how the algorithm came to the result that it did. Like I say, for the FICO score, it may be a bad example because you probably don't get the explanation of the FICO credit score, but wouldn't that be useful because then you might be able to do something about it?

Actually, this happened to me. I had some mountain bike accident and I ended up in a hospital. A bunch of physicians were there. Apparently, they did some work. I can't really tell you. I don't even know if they're even present, and then all these bills show up in the mail for the next eight months after the event. Again, you don't really know whether any of these people were even there or whether they've even rendered any of the services that they said they did, but nevertheless, you must pay these bills. I forgot one for about 70 bucks or something and this had a huge impact on my FICO credit score. In the end, it affected our mortgage rate. We bought a house in Santa Cruz then. It negatively impacted it, so wouldn't it be nice to know like you get your FICO credit score, "Oh, you need to pay that bill, Chris, in order to get a perfect score."

The same is true with our biological age. "Oh, I'm telling you that your biological age is so terrible because your fasting glucose is so elevated. Well, there's something you can do about that" and there are lots of other markers like it. For example, RDW, red blood cell distribution width, may also be negatively impacting your biological age. There's probably something you can do about that as well. So really the value is not so much in the biological age itself, but in the explanation of the age I think is the most important part.

So why should we publish this? This is an interesting question, perhaps a philosophical question for you, Tommy. Why do you think it's important to publish this stuff? This generally doesn't happen in software. Google, for example, came up with a clever new algorithm that does something useful for its users. They wouldn't typically -- although I wouldn't say that Google have never published, they absolutely have -- but they wouldn't typically publish everything that they do. Why do you think it's important to publish in this industry, in this field?

Tommy: I was actually watching this interesting discussion that was published by Medscape actually talking to journal editors about peer review and one of the guys made an important point -- this is actually about preprints versus peer review, but the point is that when these things can actually actively affect people's health and potentially quite seriously, getting it out there and getting other people to look at it and comment on it I think is really important.

The first thing is we build these algorithms and we're going to use it to direct interventions for people. We're kind of slightly different because most of the interventions that we use are very low risk, which makes our lives a lot easier in that regard, but you still want to make sure, having other people gut-check this and not say, "Hang on a second. This is just a complete nonsense." The problem is that in the world that we exist, most of the people and most of the stuff they do in terms of the test that they have, the interpretations they use, the supplements or the interventions they recommend, most of it is not published, is not tested. Lots of it is nonsense. So if you're trying to build something that has real large scale utility then I think publishing is potentially important. Equally for us in this world of machine learning, biological age, blood chemistry interpretation, in reality, we are nobody.

[0:30:02]

In that world, nobody really knows who we are, so if we want to establish some kind of credibility and show that we know what we're doing, again, putting what we do out there and having experts in the field look at it and tell us what they think of it I think is really important. As we talk about the peer review process -- Chris, you can talk about how actually some of the things they recommended were actually pretty useful. However, there are other things where it turns into this academic contest where people say, "Oh, well, the algorithms you used weren't fancy enough" or "You should try these 700 different algorithms and compare them all."

You can do that. That's intellectually interesting and we're definitely interested in other people comparing what our algorithm says to their algorithms or their approaches be they statistical or whatever, but the interpretability is so important that you might actually be willing to sacrifice some accuracy for interpretability and actually making that point particularly the fact that people in this field have read this paper, and we know they have because they reviewed it. We can say, "You know what? Actually, maybe the most important thing is that this wasn't that difficult to do. It's interpretable. We might actually be able to intervene with lifestyle interventions to change these things." That's immediately useful rather than this intellectual exercise in terms of how many algorithms and how complicated and comparing accuracy and that kind of stuff.

Christopher: Can you say something about why F1000Research appeal to you? Would it not be better to try and get into the highest impact journal possible?

Tommy: Yeah. F1000Research has a really interesting publication model in that it's fully online. It's fully open access. As soon as you submit something, it becomes a pre-print, so it's online. Technically at the moment, our paper, even though you can go online and you can read it, is not yet fully peer reviewed. When it is fully peer reviewed and the reviewers are happy that we did the things that they asked us to do and changed the things they asked us to change then it becomes listed on PubMed and stuff like that, but in the meantime, anybody could read the paper. Anybody can read what the reviewers say, and normally, the peer review process is a closed door thing that people don't see. It's also normally a blinded process so that I don't know who reviewed my paper and is trashing it in this review, but the F1000Research process -- and actually, the Frontiers journals use the same method -- you know who's reviewing it and you can see who said what. I like that idea because it forces people to be more constructive. If you really don't like something, you can say that, but sometimes you'll get a review back and it's just they're being rude for the sake of being rude because you don't know who the reviewer is. They can say whatever they like whereas in this process, you're forced to have a more human interaction about it, which I quite like.

And then the final and really interesting thing, and it's kind of a struggle for us to get this to work properly, but I think it's overall beneficial, is that for F1000Research, you have to make all of your data available and then you also make the algorithms available for the predicted age so that anybody can turn up and they can take the data and they could take the algorithms and they can repeat it and see if they get the same answers. So almost anything that's published in F1000Research other than some papers that have some restricted data for whatever reason, you could go and you can get the data and you can try and make sure that what they say they found is the same. We're talking about the reproduction crisis in science [0:33:27] [Indiscernible] at the moment that maybe most of the papers we're citing are just wrong. They're wrong, but they gave a sort of interesting answer by chance rather than because it's a true signal, but at least with this model, we can start to test some of that. For all those reasons, I thought this was quite a nice way to publish this in some more modern, forward-facing publication model.

Christopher: Yes. I should make it clear that all of the code is on GitHub. I have published that. It's all open-source and you can download the notebooks and run them. It's Jupyter Notebooks, so you run the code inside of your web browser. I've written all the code to download the data, so you don't need to worry about where I got the data. The code will download it for you and then you can train the machine learning models on your -- I've only got a MacBook Pro that I've been doing all this stuff on.

I think the reviewers criticized us for having a low level of sophistication and it seems that deep learning is the new skinny jeans and I would agree with that. It may be the deep learning model will surpass this inaccuracy at some point, but the problem at least for now is the barrier to entry is quite high in that generally you need to have some cloud compute instance, probably something running on Amazon AWS or Google, others that have got cloud computing services now, some computer with a GPU. GPU stands for Graphics Processing Unit. These are graphics cards that were originally developed for gaming and the deep learning makes heavy use of these GPUs. They're generally not in the computer that you're sat at when you're doing your day job or for any other reason.

[0:35:06]

The barrier to entry is so much lower when you use a model like XGBoost, which is what we've done because it can all just run really quickly on your laptop, so you can have a go. If you're interested in learning more about this stuff, maybe you already know a little bit of Python, then I would encourage you to check out the source. Maybe you can make it better. Maybe you find a mistake and you can let me know about that and that's how we advance the field. I think that's really, really important.

Talk about who the reviewers were. I thought this is really interesting. You get these -- this is perhaps for me one of the greatest value of going this route with F1000Research, is having these incredible people who are published in the field. These amazing experts look over your shoulder and look at your stuff and tell you what they think. You don't normally get that, right? Even if this never gets published, even if we don't make these reviewers happy and we never make it on to PubMed, I would still say that this process has been more than worth it just to get the feedback from these people who are obviously more experienced than us in the field. Who are these people and why would they do this? Why would they give us the feedback?

Tommy: That's a great question. Often I feel like people who criticize the peer review process either have an experience that's working well for them or they haven't actually experienced it at all, and often you find people commenting about the peer review process being people who've actually never written a scientific paper based on research they did and had people review it. Like I said, it's useful to do it this way because you know who it is that's reviewing it. We now that these guys are people that know what they're talking about. The first one was Alex Zhavoronkov who basically runs Insilico Medicine in aging.ai, like I said, one of the other available methods to look at biological age based on blood testing. They've already done this and they use a deep learning model to build theirs and they've published several papers on this.

The other guy who reviewed it was Peter Fedichev, who's similarly published in this arena looking at machine learning to analyze biological age and these guys are running their own research programs and they're the senior authors on papers in the field. There are two reasons why you might want to do this. The first one is because you're interested to see what other people are doing and then also to make sure that good work is being done especially -- it took a while for us to find -- we need at least for the F1000Research, to allow your paper to be formally published, they need two reviewers to be happy with it, which is good. It may be that some of the people that we suggested as reviewers are uncomfortable with the reviewing model, which is that we know who they are. It's actually quite refreshing to see people who are willing to do that.

Yeah, they didn't like everything that we did. They thought it could be more sophisticated. We could've done other things and that's great. That's what these reviews always look like, but if you don't take it personally and you appreciate the input, you can actually -- many of the papers that I've written, the first time they go in, they could've been a lot of better and then the reviewers help me make a much better product in the end and I always think that's really beneficial. However, I do know there are some fields where there's a huge amount of I guess argument about what the correct answer is, so maybe lipids and long-term outcome in terms of cardiovascular disease and health, and then people pitch against each and that makes it really difficult, but I do think it would help if we had an open model like this so you can always see who's saying what because then you're forced to have more of a human interaction rather than just anonymous reviewers throwing slinging mud.

Yeah, I think it's nice to see people engaging in this and it's probably because this is quite a new field and there isn't really much, so how people end up using this information, how we figure out what the best way to look at biological age -- we still haven't figured that out because they're sort of growing in those interests then at the moment, the people who are sort of bigger guys in the field are still very active in terms of fostering it, but maybe more other fields where things are more entrenched, that might be the less likely.

Christopher: And some really great ideas came out from the reviewers. For example, they told me I should be using k-fold cross-validation, which is a fantastic idea. I'll try and resist the temptation to get into too many technical details, but essentially, when you train a machine learning model, quite typically the way that it's done is you hold out, say, 20% of the data. Imagine a spreadsheet, 20% of the rows are going to be stashed away somewhere so that you can validate and test your model. You can see how good it is on the unseen data. You fit the model on the training or the 80% and then you test it or validate it on the remaining 20%.

Well, in k-fold cross-validation, that split is dynamic and you test the model on each of the folds. It could be tenfold, so you split your data ten ways and you train the model on 90% of the data and then test it on the remaining 10%, and then you do that for each of the one-tenth of the data, so you're getting a better appreciation of how this model might perform in the wild and it's definitely superior for what we're trying to do. That was something that I'm like, "Oh yeah, that's definitely a really good idea. I should code that up right away" and that code will make it back into the Blood Chemistry Calculator, so I was really pleased with that.

[0:40:15]

Another thing they pointed out was that we didn't really have a baseline, so we just assumed that the XGBoost model would be better than, say, something like just picking the median. Let's say the goal is to predict or guess your biological age, and so you show me your blood chemistry. I'm imaging Tommy in a box again. You push your blood chemistry through that slot in the box and Tommy looks at it and all he does is write down the median that was in the training data set on a piece of paper and he pushes it back out. If he were to do that over and over again, that might actually be a pretty good strategy, and it turns out it's not. Our XGBoost model is much, much better than that. We also tried the linear model as well like a less sophisticated linear model and the XGBoost model was also better than that. It was a really good point from the reviewers. We hadn't done that. We didn't even know. It seemed like an obvious thing to do and we hadn't done it, so there were definitely some really valuable things that came out of the review process. I do hope we make these people happy. Do you think we will?

Tommy: Yeah, I think we will. There are big things that are shown that you can improve. That's a big part of the process. It's also a discussion. Often a reviewer will suggest something and it requires a lot more work or it's something else you've thought about and I've already decided why it's not important and then you explain those things and you send that back. There's sometimes a bit of back and forth usually just once, but maybe more than that depending on the journal and what's being done, and the editor.

The other thing that I think came up, it was that they kind of felt that what we've done wasn't particularly novel like people have done take a blood test, run machine learning algorithms to predict biological age, and that's true. We said that upfront. However, we are less interested in the tool that we are and how we can use it, so things that haven't really been done before are having something that's explainable so that I know that it's my blood glucose that's the issue and I can focus on that or creating those plots that show where the lowest predicted biological age is and then maybe thinking how are there ways that I could help somebody move their blood markers in those directions, have optimal ranges for lowest biological age. Again, that doesn't prove that that's the best approach, but it's much closer than anything else that exists currently. I think the novelty is in how we are using the outputs rather than using some particular complex set of machine learning algorithms to get there.

Christopher: Certainly no one has calculated the Shapley Values for each of these independent variables. That certainly I'm sure is novel. The Shapley Values, it's a math from game theory and in machine learning. It allows you to know how important each of the features are, which is exactly the name of the game. I don't believe anyone has done that, so that is the novelty. I think that the reviewers, they miss the point there a little bit.

Tommy: It's just their expertise is in very complex machine learning models and that's not what we've done on purpose.

Christopher: Yeah. I really don't get this idea that every time you publish a paper then you're supposed to compare every single machine learning model that's ever been thought of and then produce a ton of data showing some accuracy score that compares all these different models and then conclude that deep learning is slightly better than a boosted decision tree, is slightly better than a random forest, is slightly better than -- you go through this big, long list of tools that you could've used and you show -- when you look at Kaggle, Kaggle is the machine learning competition website where people host competitions and then data scientists compete for prizes. You see for this type of problem, some sort of decision tree has performed very, very well in the past and I don't get the need to keep comparing all these other tools when you've already found one that works really well for you and especially if it already works really well for you out the box. There's not a whole bunch of hyper parameter tuning then it gets really complicated really quick and it requires this GPU thing and all this complexity.

I think the best tool is the right tool for the job. I always imagine what it would be like if -- sometimes we have some landscapers come in and they do some brush clearing and there are all these different tools that they could possibly use to do the job. What would happen if they did this experiment every time they showed up on site, "Oh, we're going to compare these seven different tools before we decide which is the best one to do the brush clearing." No. They just use their previous experience. This is the best tool for the job. That's what we're going to use and we don't really need to do this experiment again.

Tommy: I think it's just the difference between different fields. If you're at the cutting edge of machine learning research, which we are not obviously, then that's an important thing to do, but when you have a specific goal in mind and you know the tool that's going to do the job well enough for you to get a useful output then that's the approach that you take and that's what we've done. It's just the difference in terms of the worlds in which people exist and what they're trying to achieve.

[0:45:16]

Christopher: Would you like to say something about the most important blood markers for predicting biological age and how they might be physiologically relevant?

Tommy: Yeah. That's probably what people are actually interested in after we waffled down about the processes.

Christopher: Yeah, exactly. We better actually give people the answer else we're going to be in trouble.

Tommy: Yes. It's online and you can see these graphs in real time so you can actually understand what I'm talking about. We have things separated for men and women and they are slightly different. In terms of the most important markers, BUN seems to be really important, but I will say that we don't really know how much we should try and push that system or why that's such an important signal. It's there, but glucose is super important. Albumin is super important; MCV and RDW, like you mentioned; some of the liver markers lowered down; creatinine, really important because probably in a lot of people, it's going to signify both kidney disease when it's high but low muscle mass when it's low.

You can think about obviously blood sugar and so many different reasons why that's physiologically important, but actually, where most of the issue happens -- and for that, you can go to the separate SHAP plots that you produce. The top one on the GitHub page that we link to from the paper is glucose in women and you can see this big jump from low biological age from 70 to 86, and then from 90 upwards, things start to increase linearly, so the biggest difference happens in that kind of -- 85 to above 90 is where things really start to climb up. Actually, we see the same thing from the published population data that actually most of the increase in risk like single increase in risk from blood sugar happens in that range. Once you get into the 100, 110, then most of the bad stuff has happened already, so worrying about it being any higher than that is already too late, so things like that are really interesting.

Albumin, like I talked about, that's going to be liver synthetic function. Lots of things can affect that. Protein intake. Albumin is a really strong marker again of longevity and health in general populations. Red blood cells, iron, B12, folate, the MCV, the mean corpuscular volume, and the red cell distribution width can tell you about all of those. What else do we have? Creatinine, again, it might tell you about your muscle mass. You can certainly intervene there. Anything else? There's a long list. These are like 20 markers that are the top 20 that we have for each. You can dig into each one. Triglycerides turn out to be really important for men, so lower is almost certainly better up to a certain point. Interestingly, none of the cholesterol markers other than total cholesterol show up in here, so LDL doesn't show up, but that also makes sense because LDL is calculated based on the total cholesterol, so it's not really a truly useful, independent marker in this kind of setting.

Christopher: Yeah. Total cholesterol is an interesting marker for predicting biological age. It's kind of generally assumed. I'm still hearing podcasts now this week where there's this assumption that lower total cholesterol is better, but that is not what we saw in our model and it's not what's seen in the epidemiological data, right?

Tommy: Well, this is interesting and it's kind of similar to BUN. BUN, as BUN increases, that increases your predicted biological age. As total cholesterol increases, that also increases your predicted biological age. But if you look at population data, other than at the extremes particularly when you get into your 70s and 80s, higher total cholesterol is associated with longevity. So what we need to figure out and what the field doesn't really know at all yet is what is a marker that may actually be useful. It is something that we don't want to try and change versus what is a pathological signal of aging that we want to intervene on, and I'm not convinced that BUN or cholesterol are those things that we want to try and actively medicate down or maybe they are to a certain point, but not beyond.

What's really interesting is that a lot of questions have popped up. BUN is a great example like, "How do I learn my BUN? Should I eat less protein? What are these other things that might be causing that?" or maybe BUN increasing over time is actually a beneficial thing and we don't know that. It may be protective as you get older or maybe you'll see something else, which is that if you intervene elsewhere, which reduces aging, then cholesterol and BUN come down because those reactive processes that are protective as you get older aren't needed as much anymore and all of this stuff is just things that we have yet to find out. We just don't know yet.

Christopher: Yeah. I think it's an important distinction to make that this is an explanation of the predicted age and not necessarily a cause of.

[0:50:04]

Tommy: Yeah, exactly.

Christopher: We see this all the time, so regression to the mean is an explanation, but it's not cause, right?

Tommy: Yeah.

Christopher: Yeah, it's an important distinction. Okay. Well, I think this is a good place to wrap up if you can think of something that I've missed. I really apologize. Just as a health enthusiast, someone that's interested in improving your health and performance, at the moment, the Blood Chemistry Calculator is not the tool that you're looking for although if you're really motivated then you can input your blood chemistry markers and find your predicted biological age using the link that I put in the show notes for this episode.

One thing you can do though is get on to the forum. If you're not a practitioner, you're not going to sign up for the monthly subscription for the Blood Chemistry Calculator. You can come and find us on Patreon. If you search for Nourish Balance Thrive on Patreon, that will give you access to the forum. Lots of people, including practitioners, have posted links to Blood Chemistry Calculator reports on the forum. Tommy and Bryan have been doing technical support for those reports on the forum, so if you're super technical, super motivated, you can read some of those. It might give you some ideas. It's also a really good place to find a practitioner.

If you're interested in working with someone to improve your Blood Chemistry Calculator scores then the forum might be a really good place to find that person. Come and find us on Patreon. We've been doing some "ask me anything" episodes. I've also got a back catalog of interviews that I've recorded with Dr. Simon Marshall that have been incredibly helpful for the people that we work with, so find us on Patreon. Search for Nourish Balance Thrive on Patreon and you'll find this right away. Was there anything else that you wanted to add, Tommy?

Tommy: No, I don't think so. I think a lot of what we've talked about today is interesting, but not necessarily that useful yet. We're working on it.

Christopher: Yeah, absolutely, we're working on it. I'm hoping that the solution is not a pill. I feel like everyone else watching in this field is like, "It's just a race to find a pill that's going to improve my biological age."

Tommy: Let's just all take metformin and rapamycin and call it a day.

Christopher: But then you're still going to have a behavior change problem because everybody will forget to take the rapamycin and metformin. Well, this has been great. Thank you very much and thank you for listening and thank you for supporting us on Patreon. We very much appreciate you.

Tommy: Yeah.

[0:53:00] End of Audio

Join the discussion on the NBT forum when you support us on Patreon.

An Interpretable Machine Learning Model of Biological Age [transcript]

Register for instant access to your FREE 15-page book, What We Eat