Can we automate science? Sam Rodriques is already doing it.

“We can build AI scientists that are better than we are… these systems can be superhuman,” says the FutureHouse co-founder.

By Ross Pomeroy

November 18, 2024

FutureHouse

By Ross Pomeroy

November 18, 2024

By Ross Pomeroy

November 18, 2024

Artificial intelligence is already doing some pretty amazing things — writing stories, solving problems, creating songs, generating art, and producing lifelike videos and images. But these are mere parlor tricks compared to a more profound potential ability: automating discovery itself.

Many scientists are already excited about AI. In a Nature survey of 1,600 scientists published in September 2023, more than half of respondents expected AI tools to be “very important” or “essential,” citing faster data processing, expedited computation, and a reduction of research time and costs.

But some people are working on an even more ambitious goal: AI models that could be transformed into “full stack” AI scientists, capable of not just formulating hypotheses but also conducting the experiments, analyzing data, and sharing possibly game-changing findings.

The nonprofit FutureHouse is working to realize this vision. Its stated purpose is “to build AI systems that can scale scientific research and accelerate the pace of discovery, so humanity can proceed as quickly as possible to find cures for disease, solutions for climate change, and other species-accelerating technologies.”

“20th century science was awesome. We made a ton of progress… [T]hings have slowed down a lot.”
Sam Rodriques

Though just a year old, FutureHouse has made remarkable progress. The group has already released an AI agent, called PaperQA2, to probe the scientific literature and synthesize information on desired topics, helping scientists hone questions and guide their research. FutureHouse also built a version of the AI optimized to answer the question of whether anyone has done something before – a question that scientists are constantly asking each other, and frequently can’t be Google-ed.

Sam Rodriques is the CEO and co-founder of FutureHouse. A theoretical physicist by training, he started out working on quantum information theory and ended up feeling that there were not enough problems left unsolved in physics. He moved on to invent technologies for spatial and temporal transcriptomics, brain mapping, gene therapy, and nanofabrication, and has now turned his attention to an AI scientist for biology.

“From the perspective of things that are actually accessible, biology has such a rich supply of great mysteries that are still to be discovered,” he told Freethink. He thinks an AI scientist will radically speed up the quest to solve these biological mysteries, and eventually build on that to accelerate discovery in other fields. Oh, and if it works, it will almost certainly garner a Nobel Prize.

(The following interview has been edited for length and clarity.)

Freethink: What does FutureHouse do? What are your goals and how are you working to achieve them?

Sam: Our fundamental goal is to figure out how to accelerate progress in biology and other complex sciences. If you think about the history of science, 20th century science was awesome. We made a ton of progress. And then my impression of this century is that things have slowed down a lot. Part of the reason, I think, is that there’s too much knowledge now for us to sift through effectively.

As a result, in biology, we come up with lots of new inventions that often don’t make it into the clinic. It’s often hard to figure out what are the key insights that we need in order to transform the lives of patients or to understand the more fundamental biology. These considerations are what led us to the hypothesis that we can build AI scientists that are better than we are at understanding complex science. We’re interested in building an AI scientist for biology, which is probably the science where there are the most things that can be discovered and also simultaneously the most information that you have to integrate. But I think an AI scientist can also be applied in other kinds of complex sciences like climate science, economics, and so on.

“I am optimistic that with all the attention going into robotics, that we can be here within five years.”
Sam Rodriques

Freethink: So you’re working on an AI scientist for biology. A lot of biologists would tell you that physical lab work is a key hurdle to overcome here. Have you thought about this problem at FutureHouse? How are you going to automate lab work?

Sam: We think about it all the time – we have a wet lab here. You cannot claim to be doing biology research if you’re not generating hypotheses and then testing them in the lab. The AI systems that we have today are much better suited for hypothesis generation, literature search, and data analysis than they are for wet lab research. You can have all kinds of debates about why that is. You know, some people think that humans are much better evolved for motor function than we are for intelligence. Intelligence is a relatively recent development, whereas motor functions go back hundreds of millions of years.

Automation is still much better suited for car manufacturing – you can automate the process of building a car; you can build a really high throughput production line. But if you want that production line to do anything different than what it has been built to do, good luck, it’s not going to be able to do it.

We do have robots at FutureHouse, but those robots can do one thing, and they can do one thing only. General purpose automation is just not here yet. And so, in the short term, we will continue to have humans doing work here in the wet lab.

Freethink: Do you have a timeline?

Sam: There will be a couple of phases. The first phase is that there will be robots that can use existing labware and existing machinery so that they can slot into labs. They can handle pipettes or have a pipette tool. They can pick up plates and put them into centrifuges.

I am optimistic that with all the attention going into robotics, that we can be here within five years.

Freethink: Wow. So robots will be working alongside scientists?

Sam: Correct. But some caution: Those will be useful, but you also need a robot that can pick up a mouse, anesthetize it, and inject a substance into the tail vein. We are so far away from any robots that have enough dexterity to do that… A lot of people are working very hard on it, however. So we’ll see what happens. We will have AIs that are very good at generating hypotheses, analyzing data, analyzing literature, long before we have any AIs that can do work in the wet lab.

“People have to anticipate this world where reviews become an interactive thing, where you can ask your review a question.”
Sam Rodriques

Freethink: Let’s talk about analyzing the scientific literature. FutureHouse recently released PaperQA2, an AI agent that conducts entire scientific literature reviews on its own. You posted on X that “This is the first example of AI agents exceeding human performance on a major portion of scientific research, and will be a game-changer for the way humans interact with the scientific literature.”

Sam: Yeah, so we released PaperQA2, which is an AI agent that searches the scientific literature to answer questions asked by humans. And unlike systems that people had come up with previously, PaperQA2 is an agent, which means that it conducts a search, looks at the results that come back, and then keeps searching until it thinks it has what it needs to answer the question.

And this was the first time that anyone was able to show on real-world scientific tasks that these systems can be superhuman, i.e., that they can actually answer more quickly and more accurately than humans doing the same tasks. We had human PhD students and postdocs compare the articles it writes to human-written Wikipedia articles on the same topic, and we find that the humans find more errors in the real Wikipedia articles on average than they do in our articles. And we have open-sourced it, so anyone can use it and build on it.

You will be able to order up a review on anything you want, whenever you want… Often, as a scientist, you read reviews where there is one paragraph in the review that briefly describes the thing that you’re actually interested in. Now, with PaperQA2, you can have it expand that paragraph into a full review. Humans, I think, will still write reviews: It is interesting for me to know what, say, George Church thinks about sequencing. Humans have opinions, and we will continue to want to know what those opinions are. Reviews are a great way for humans to communicate their opinions. But people have to anticipate this world where reviews become an interactive thing, where you can ask your review a question.

Introducing PaperQA2, the first AI agent that conducts entire scientific literature reviews on its own.

PaperQA2 is also the first agent to beat PhD and Postdoc-level biology researchers on multiple literature research tasks, as measured both by accuracy on objective benchmarks… pic.twitter.com/2v8MsHVp2H

— Sam Rodriques (@SGRodriques) September 11, 2024

Freethink: Let’s say all of FutureHouse’s aims are realized. How do you envision a full AI research lab, and what role will humans play in it? Paint a picture for me.

Sam: You are going to have a bunch of humans sitting at their laptops with various AI workers operating in parallel – performing tasks, operating robots, etc. The humans ultimately will be responsible for resource allocation. I’m a humanist, and I basically think that at the end of the day, AI is a tool for us to use in order to further the cause of humanity. It’s ultimately up to humans to decide what we want them to do. Because we will always be resource bottle-necked. Running these things is not free. We are now doing experiments internally that can cost $100,000 to run AI models. We are able to do science that we wouldn’t be able to do otherwise, but it’s expensive. You can imagine that – in the sci-fi future where everything goes to plan – humans will be looking at the data and thinking about what they want to do. The AI might say, “there are five really good directions to go here.” The human has to choose what they want.

Now, what will be different about an AI research lab? The AI workers will be exploring. One might be conducting a literature search. Another might be analyzing data and coming up with hypotheses. Some will be doing wet lab work. The difference between now and then is that we will be able to create and analyze giant data sets. We will be able to know that things are reproducible because we will automatically reproduce published work. And finally, researchers won’t lack context. Say you didn’t know about some paper or discovery key to your work – that will go away. Biology will be much more efficient. There will be more serendipity. A lot of discoveries in biology are made serendipitously. Imagine that we can systematize the process of accidents.

“Language models aren’t very good at coming up with ideas right now – but it’s not that they can’t.”
Sam Rodriques

Freethink: That sounds great – almost poetic. But it also segues to a key philosophical question about AI scientists. Will they ever be able to come up with a truly novel idea? Go back in time. If all a large language model was trained on is appendage-based locomotion, could it ever have invented the wheel? Could it have thought up the steam engine?

Sam: The answer is “definitely yes.” I think humans may give themselves too much credit here. Before someone came up with the wheel, they probably observed that if you had a flat object and put tree branches on the ground, it’s way easier to push the thing on top of some circular objects than it is to push it on ground. Before people came up with the steam engine, they already knew about water wheels. They knew that falling water could turn a wheel. It’s not that hard to realize that rising water could turn a wheel.

Freethink: So you think that LLMs would be able to see very preliminary data and be able to infer a big advance? I think maybe what you’re getting at – and I agree with you – is that science isn’t just built on “Eureka!” moments. It’s built upon incremental advances.

Sam: Exactly. Language models aren’t very good at coming up with ideas right now – but it’s not that they can’t… I actually think that language models will be better than humans at coming up with ideas by virtue of the fact that they can test so many. Imagine Einstein and relativity – would the model have come up with the idea that the speed of light should be the same in all reference frames? My guess is that if you create a model that can come up with 100,000 ideas, that idea might have been among them. And then if that model can analyze all of the ideas, it would say that this is the one that works.

“Within 26 years, we’ll have AIs making Nobel Prize-winning discoveries.”
Sam Rodriques

Freethink: A key part of conducting science these days is critically analyzing research, as unfortunately, many published studies are wrong or even fraudulent. How will an AI scientist handle this challenge?

Sam: There are two sources of irreproducibility in biology. The first is that something goes wrong in the wet lab. The second is that the data is analyzed incorrectly. Both could be affected by fraud, but I think more common than fraud is just incompetence or variability. The second issue is more easily solved than the first one. I think that [reproducing analyses] will be relatively straightforward. If a model reads a paper and it has questions about whether a paper was analyzed properly, it can go in and analyze the data itself. This is what humans should do, but humans never do that because they don’t have enough time. Of course, this does not solve the problem of wet lab validation. In order to solve that problem, you need more automation in the wet lab and quality control.

Freethink: Other LLMs, when trained on their own data, have been shown to fall apart into jumbled nonsense. Why won’t this be the same for an AI scientist?

Sam: It probably would be, if trained on its own nonsense. The key thing is that in science you have an oracle – you run experiments to see if your ideas are correct or not. External information is put into the system… People often ask, “Can you build a simulator of the wet lab?” The answer is that you maybe could, but then you’re going to be learning the simulator, which is problematic.

“There might be another AI winter. Even if that happens, it’s okay.”
Sam Rodriques

Freethink: Do you have concerns about an AI scientist being used unethically – to make biological weapons, for example? How will you prevent these unethical uses?

Sam: From a long run perspective, I certainly have concerns. There’s a risk that powerful technologies fall into the wrong hands and help bad people do bad things. We have to pay attention to that, and I think we’re relatively diligent internally about making sure we’re working with and talking to the right people and that we evaluate the models as we’re building them, on the basis of whether or not they’ll be dangerous before we release them. If we’re seeing any behavior that’s particularly dangerous, we can try to mitigate it. In the short term, though, the bottleneck is still accessing the wet lab. If you want to do something dangerous in biology, you need access to a pretty good wet lab and reagents.

Freethink: In what disciplines do you think an AI scientist will first make significant contributions? Could we see a Nobel Prize-winning AI scientist by, say, 2050?

Sam: It’s 2024 – that gives us 26 years. Within 26 years, we’ll have AIs making Nobel Prize-winning discoveries. I’m reasonably confident in that. Which fields? Biology and machine learning, easy. It can only fail to happen if AI flattens out – if we hit a ceiling and the AI stops getting better, and we can’t fix it. Then it’s possible we’ll hit 2050 and we won’t have a Nobel Prize-winning discovery.

You have to remember that ten years ago, AI was not a thing. There might be another AI winter – the language models might be too expensive, not get commercial traction, people start to run out of money, etc. Even if that happens, it’s okay.

We already have AIs here that can generate ideas that are pretty good. They’re not great, but they’re better than the average human. If you pulled a human off the road and you said, “Hey, come up with an idea,” the ideas that you get today from an AI scientist are much better than those. They are probably not as good as what an expert would come up with in a given field. How much better do you have to get until you start to be better than the experts? It’s a significant distance, but it’s not going to be that bad. A lot of it will depend on whether scaling laws continue, but even if they continue way more slowly than they are now, we’ll get there. I think we could have really high-quality discoveries within the next two to three years. What wins the Nobel Prize and what doesn’t is up to some people in Sweden.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].