June 07, 2022

Paul X. McCarthy on networks to find experts, identifying authorities, computational social science, and latent knowledge (Ep24)

Podcast: Play in new window | Download

“Productivity is ultimately one of the greatest predictors of success in all fields. It doesn’t matter whether you’re an artist or a scientist, or whatever you’re doing, productivity is a key marker to long-term success.“

– Paul X. McCarthy

Tim O'Reilly

About Paul X. McCarthy

Paul is CEO of data science and research startup League of Scholars, which works with a wide range of organizations including Nature and News Corporation, and the co-founder of a number of other ventures, He is an Adjunct Professor at U of NSW and Honorary Research Fellow at Western Sydney University, and the author of Online Gravity, a successful book on how technology is rebooting economics.

Website: Paul X. McCarthy

LinkedIn: Paul X. McCarthy

Twitter: Paul X. McCarthy

Facebook: Paul X. McCarthy

Instagram: Paul X. McCarthy

Books

Online Gravity

What you will learn

How to identify experts or stars in a field of study within their networks (03:34)
How to ask simple questions to uncover their hidden expertise (06:24)
How to find an expert in your network that you should be listening to (09:14)
How to get even more granularity in finding experts (11:00)
How to identify credible and authoritative sources (15:29)
To what degree can we infer credibility from an expert’s network centrality (17:49)
Why purpose leading to clarity and focus is key to thriving on overload (19:09)
Why productivity is a key marker to long term success (21:00)
Why sharing insights from information is crucial to network formation (24:20)

Episode resources

Transcript

Ross Dawson: Paul, it’s wonderful to have you on the show.

Paul McCarthy: Thanks, Ross.

Ross: Paul, I’d like you to tell us about the League of Scholars and what the underlying principles are, and how it helps you and others to thrive on overload?

Paul: League of Scholars is a global startup that looks at researchers and research analytics worldwide, and the basis of League of Scholars is that individuals are the key to the success of the research. In recent years, in the last couple of decades, there’s been a global rise in the rankings of universities and other research institutions worldwide. There are now three large global ranking systems, the ShanghaiRanking, the Times Higher Ed, and the QS ranking. All people interested in the university sector are aware of these and very acutely aware of the rankings game between institutions in terms of how they’re perceived in terms of their institution’s reputations.
What we’ve realized is while these rankings are useful, there are a lot of drawbacks to them. They’re not very up to date. Often, they include things like Nobel Prize winners, whose work is 20-25 years plus years ago, and often the rankings don’t change very much each year, so most rankings have hovered at the top and that hasn’t changed significantly in the last couple of decades, the elite list of organizations. What’s not so visible, I guess, is information about individuals. That granular and timely information is about what the League of Scholars is about, about uncovering the individuals in science, engineering, health, but also in other areas, in humanities, in social sciences, trying to understand who are the leaders in these individual more specific fields but also looking at tomorrow’s leaders and the emerging stars.

Ross: What’s the basic principle underlying how it is you identify these stars in these fields?

Paul: We use a variety of traditional bibliometric techniques. For those unfamiliar with the research world, research impact is citations; it’s the number of times that work has been cited by other scholars in peer-reviewed journals and publications. We use those traditional measures of bibliometrics but also predictive measures. We use machine learning to try and understand who is most likely to have the greatest impact in the future, especially for early career and mid-career people. As inputs, there’s a variety of measures that are known to be predictive of future impact. One of those, of course, is your peer network, that idea of who are your co-authors, what are your current co-authors, and how fast the school of fish you’re swimming with now is, is one way to think about it.

Ross: Of course, you can just go on Google Scholar and see the number of citations of a particular scientist from their papers and so on but that’s a pretty crude measure, so how does the network aspect overlay that to identify who’s most well regarded in the field?

Paul: What we do is we look at their co-author networks. There’s a range of network analytics approaches we use to understand the influence of their network, both their direct co-authors and also their co co-authors, so we build this analysis into the inputs to machine learning algorithms, which then go on to predict scientists’ or other academics’ likely future impact.
We’re looking at other things. Citation patterns vary radically between fields and across disciplines so you can’t compare the citation impact of scholars in different fields. You need to compare like with like, so we take that into account too, and also the stage and age of people. The quality of the venues that they’re publishing is important too. Early on in one’s career, there’s not a lot of data so it’s quite difficult to see, just with the untrained eye, to distinguish between people but there are signals in the data. There is information that can be used to predict things like the quality of the venue, the co-authors, how many co-authors are outside your institution, and a bunch of social publishing metrics.

Ross: This idea you’ve mentioned to me of the expert’s expert.

Paul: Yes.

Ross: I’d love to hear about where you’ve come across that idea and how you apply that in both League of Scholars and also more generally how you are keeping across the information.

Paul: Yes. This idea was introduced to me by a colleague, Doris Field Tanner. I think you may know Doris, she’s an expert in network analytics. She explained to me that some of her previous work showed that you can discover an expert in any field by asking a series of simple questions iteratively to your peers. One can do it oneself. In a very simple sense, if you’re looking for information about a restaurant in another city that you’re not familiar with, you might ask someone who lives there. Then they might not be much of a foodie, and you might ask them to ask who would they know in their city that’s a restaurant.
Obviously, it becomes a bit more complicated if you’re looking to understand quantum machine learning, for example, you might think of someone you know who’s a scientist in your field, and then ask them to ask, who do they know in their sphere, who’s the greatest authority in quantum computing and then quantum machine learning and other specialization, a really hot field that’s emerging now. They may know people in their sphere and so on. We know from the work of network scientists like Barabasi and others that the six degrees of separation storied in the Fred Schepisi’s film is very true and is shrinking, so there is a path between us and most other people on the planet, which is quite short, and there’s an easy way of identifying that through intuitive crowdsourcing approach.

Ross: Marshall Kirkpatrick, who has also spoken to us on Thriving on Overload used the expert’s expert frame for his platform Little Bird to identify influences. It’s also interesting to look at your network, social network analysis, one of the classic techniques is the snowball where everybody asks who all should be included and such sort of building out the scope of the group and the interactions between them to ones that encompass as many people as possible that are relevant.

Paul: Yes, absolutely, Ross. That’s a really good example. What it tells us, I guess, is that we’re all much more connected than we think. There are opportunities, I guess, that are in that.

Ross: Are there any other particular aspects of the network analytics, which help to uncover those that are the most…well one of the things people talk about a lot in network analysis is centrality.

Paul: Yes.

Ross: So who are the people who are most central to the network, that’s one indicator, but is that the best indicator? What are the ways in which within a network of experts who respect each other, how is it you find the ones that in any particular domain are going to be the ones you should be listening to?

Paul: Yes, one of the things is, it’s always about authority and expertise. Particularly in academia, there are a lot of subtleties in academia. As we know, Google search, it’s a two-dimensional thing, there’s authority and relevance. Similarly, for any topic, people’s expertise might be subtly different, so it’s quite difficult to actually put people into the same category. So often, it’s a case of finding the person who’s most relevant to your particular information needs rather than specifically saying that they’re better or worse than another person.
Having said that if you’re hiring and you’re a university, you’re looking to hire an early career researcher, and you want to have a significant impact in a particular field, you are going to choose between particular candidates, yes, I mean, there are a bunch of predictive features but certainly, the quality of the venues in which they’re publishing, the influence and impact, output, the productivity of the co-authors and their co co-authors is significant.

Ross: Let’s say you’ve got a specific domain, not just quantum computing, but specific aspects of that. Either currently, or can you envisage how we might be able to get some real granularity around finding the expert in a very specific area?

Paul: Yes, absolutely. This is something we’ve been doing quite a bit of work in is computational linguistics, which is fascinating. Looking at using machine learning and computer science to do analysis of large-scale text databases, that can be of literature, news, for example, or of information just broadly available on the web, Wikipedia is another source. There’s a fascinating study that was done a year ago and published in the journal PNAS, which looked at the last 100 years of books, books published in English, I think they also looked at Spanish, and they looked at the language used in all the books published in the last 100 years, they found some macro trends in the use of language, across a century over time, so each year’s books were looked at separately, and were analyzed using a technique known as principal component analysis to understand what the characteristics or the features which were most distinguishing of the language, and how those features changed over time.
It’s a fascinating study, it’s one of the most interesting things I’ve seen in the last 10 years. They came to the conclusion that there was an inflection point in 1980, the post-truth era, some people call it the post-truth era where, basically for the best part of the last century, from 1900 until 1980, there’s a rise in the use of rational language and there’s a rise in the use of language that is in third person, in an objective sort of sense. Then from 1980 onwards, there’s a decline in the use of rational language and an increase in the use of first-person pronouns me, myself, I, and also words associated with conjecture, I believe, I think, my view is this, rather than we conclude, or we have observed, and so on. That’s a sort of simplistic way of characterizing, but it’s an incredible paper that makes a wide-scale macro-observation about society, for these kind of tools …

Ross: Do you recall the title of the paper?

Paul: It’s the Rise And Fall Of Rationality In Language.

Ross: Right. Seems like a very pointed commentary on our times. I think 1980 was post-truth, and probably the last six years or so we’re in post-post-truth.

Paul: That’s right. Yes, it’s fascinating. We did a review recently. I wrote an article about the top eight papers over the last decade in this field of computational social science because it’s only in the last decade. People have been using large-scale computation in the natural sciences and engineering for decades, and the big breakthroughs a decade ago in the Hadron Collider, and the discovery of these new fundamental particles in our universe, as a result of large-scale computing, largely. Similarly in astronomy, there was a paper published in 2009, in science called computational social science by Barbasi, and several other authors, Sandy Pentland from MIT. It kind of foreshadowed possibilities that large-scale computing could offer social scientists, and also scholars working in humanities, digital humanities. In the last decade, we have seen some amazing papers. This is something that I’m particularly interested in.

Ross: I’d like to distill a little bit. We’re looking to thrive on overload.

Paul: Yes.

Ross: Both or either for academics or nonacademics, what are some of the lessons that you would derive from what we’ve just been discussing around identifying the credible or authoritative sources in a particular domain?

Paul: Network centrality is certainly as you mentioned, one of the key things. It seems, in any environment, as Google has identified at the heart of their algorithm is the extent to which other people defer to individuals or sources of information. There was a piece of work we did last year, which was published in PLOS One where we looked at online diversity over the last decade through the lens of links in Twitter, and Reddit in social media. What we looked at was the diversity of links.
The number of links relative to the number of domains is quite revealing because it shows that over time, the diversity of links is shrinking. In other words, more of the links across the entire web resolve to a smaller number of domains. You’re seeing, in various categories, for example, on YouTube, a decade ago, there was a variety of video platforms, but now most video on the web is hosted on YouTube, similarly, in social media. You get this kind of realms. But one of the things, I guess that it does reveal is what are the authoritative sources, as defined by the attention that people give them via these social media links. That was quite revealing, but from a practical point of view, one has a mix of using various tools but also as we’ve referred to earlier, using one’s social network as well, friends and colleagues.

Ross: There is this thing, just because somebody is influential doesn’t mean that they’re right or worth listening to.

Paul: Yes, absolutely.

Ross: There are probably plenty of examples of people that have very big audiences or many people look to them but that doesn’t necessarily arbitrate and perhaps, that’s a difference between academic domains and nonacademic domains. I wonder even if that plays out a little bit in academics even in terms of popularity, as it were? Or do you feel that it’s interesting to look, and so to what degree can we infer credibility or authority from the network centrality?

Paul: Yes, that’s a really good question, Ross. The way it works in academia is that you get a certain starting platform, by your level of authority. The trust and respect that you earn as an academic, throughout your career gets you at a baseline, but it doesn’t impact the overall success and total impact of the work. There’s a fantastic book called “The Science of Science”, which was published last year. It looks at all the evidence behind bibliometrics. One of the things they show is that the ultimate impact of scientific works is independent of the authority. But what authority does give you is it gives you a starting platform.
You’re right, though, I think beyond academia authority confers a lot of influence. But yes, within academia, the peer review process still determines the ultimate impact. One can have confidence that academic work is different in its character and its nature, and that’s one of the reasons many of us have a lot of trust and faith in scientific work because I think it does have this kind of unique approach.

Ross: Rounding out, I’d like to hear more about the ways in which you thrive and overload. You are obviously exposed to and digest a lot of research, you’re running a startup, your fingers are in a lot of pies, what’s the day by day or the practices or what is it that you can share that you think to be valuable to others in how you thrive on extraordinary amounts of information?

Paul: your framework in the book, Ross is really good. I think it comes down to a lot of things about purpose, having purpose leading to clarity, and focus. There’s potentially an overwhelming amount of information out there and it’s growing, we need to relax about that is the key thing. Our relationship to information can be one of confidence or one of fear and it depends on how we see that in terms of our status in relation to the information. Information is only useful if it serves a purpose, which is useful to the user or the people in which you’re helping through disseminating that information, or if it’s helping you do your job better. I think that’s a really useful way to cut through things. I’d have to say that I don’t feel to be an expert myself in the mastery of my own day.

Ross: Nobody thinks that they are, but there are many of us, many people I know who are extraordinary. They just don’t think it was always that way, and probably me included, or aren’t aware of their own practices that get there.

Paul: Yes.

Ross: I think that’s probably the thing of that sharing of what is it that you think might be useful for others as you have been?

Paul: Yes, I guess, trying to be mindful of the day. I am aware of the research on this, that productivity is ultimately one of the greatest predictors of success in all fields. It doesn’t matter whether you’re an artist or a scientist, or whatever you’re doing, productivity is a key marker to long-term success. I think it’s quite simple. It doesn’t matter whether you’re an artist or a scientist or a business person, much of life is a series of experiments, whether they’re formal lab experiments, whether they’re a startup, or whether they’re experiments in relationships, we’re all experimenting and we’re all doing what we can with what information we’ve got…

Ross: Absolutely.

Paul: …and learning from that as we go. The thing about productivity is, that it’s just a sign that you’ve done more experiments. Ideally, we learn from others, that’s an ideal thing but often, it seems the best lessons, the hardest ones, challenges usually that we’ve faced ourselves, we seem to have the most significant lessons from those experiences. It’s worth noting that there is a relationship between productivity and success. The other thing I should say is, again, drawing on Barabasi’s work, who’s written a book about success with all sorts of amazing work that they’ve done on networks.
They said that two things drive success, they divide the world into two, there are those industries in fields of endeavor that have performance measures; sports, one of those; you just have to be good at performing and you need to get better at performing and you need to practice and continually practice; whereas some fields of endeavor such as art, and to some extent business, are much more ephemeral and the measures of success are much more judged through social influence and networks where there are no clear measures of performance or success through performance, then networks’ trump performance, that’s all their science in a nutshell about performance.

Ross: That’s really interesting. I think it goes to the point that part of it is in recycling the information we bring in. Part of the value, of course, or information is to help us think better and act better but it’s also the more that we then share that out hopefully, and having added value to it through our thinking, that is a fundamental part of network formation of where people can then see that what we’ve done with that information and where the network becomes, as you say, either central to performance or trumping performance.

Paul: That’s a really good point, Ross, about the visibility of seeing the networks and seeing the provenance of ideas. That’s another feature of academia that is worthwhile for people to be aware of in day-to-day life, too. We’re all benefiting from others. Personally, I’ve benefited from your insights, Ross, throughout the years, and it’s been wonderful. You introduced me many years ago to the concepts of impro and the improvisational theatre, and that’s led me down to some fantastic powers and I’ve learned a lot as a result of that. We’re all in the same boat, where we, like Newton, standing on the shoulders of giants come in. We’re all indebted to others, and being able to see that is a great way of learning. Humility is a great, the best way to learn anything. The greatest barrier to learning anything is to think that you know it already.
One of the things that fascinate me with this information space, possibly the most interesting question for me is this idea of latent knowledge. I just mentioned one other, I’ve mentioned lots of academic papers, there’s another paper published in Nature a couple of years ago, looking at latent knowledge in chemistry research. What they’re doing is they’re using this computational linguistics again to look at a really large database of chemistry research papers. What they found is that there’s all this information that can be inferred automatically from the papers.
Firstly, things like the periodic table can be inferred, semi-automatically, out of the papers themselves using these new techniques. But not only that, there’s latent information, in other words, hidden information that was not available, or not widely known amongst chemistry researchers worldwide at the time, which is in the papers. There are new materials, for example, that can be predicted through analysis of these papers, using machine learning. It gives you a glimpse into this idea that there is all this information potentially available, which is beneath the surface. It’s like an iceberg. We see the tip of this iceberg, but underneath this is a potentially huge reservoir of information. That’s a personal interest of mine. There are a lot more interesting ideas out there that are yet to be discovered.

Ross: Absolutely, it is a fantastic place to end in the potential of not just what we can create, but what’s already out there and how we can find that. Thank you so much for your time and your insight, Paul, that’s been fantastic.

Paul: Thanks, Ross.