0:00
/
0:00

How to build a new AI paradigm, with Dan Akarca (CEO, Callosum)

What we can learn from the brain to unlock exquisite, efficient intelligence, and what this means for redistributing leverage globally

What if the future of AI isn’t one giant model to rule them all running on a homogenous data centre, but a mixture of chips, models, and hardware that varies according to your need.

What if it’s more like the brain, a heterogeneous architecture with varied regions capabilities and properties that enables incredible intelligence and incredible efficiency?

And what if this holds the key to redistributing leverage globally away from the biggest labs and countries to rebuild optionality across the rest of the world with their energy constraints, with their compute constraints?

That’s the thesis behind today’s guest. Dan Akarca is the co-founder and CEO of Callosum, a new AI infrastructure company that’s just raised over $10 million to build what they call the orchestration layer for heterogeneous compute. That means they’re making AI systems work across a range of different chip architectures, models, and other hardware, rather than simply depending on a single monolithic stack.

Dan and his co-founder Jascha did computational neuroscience PhDs at Cambridge, studying how the brain builds these exquisite, efficient, intelligent systems, and they’ve taken those principles and they’ve applied them to arguably one of the most important infrastructure problems of the next decade.

If they’re right, the implications go well beyond efficiency. In a world where artificial intelligence is quickly becoming a critical input to production, akin to energy, breaking our dependency on today’s dominant paradigm -- where value accrues to the biggest labs and the biggest countries -- could be hugely consequential, both for Callosum to build a deeply ambitious global company themselves and to redistribute leverage globally.


How to break the AI hardware monopoly, with Dan Akarca (CEO, Callosum)

  • Dan Akarca is co-founder, with Jascha Achterberg, of Callosum, a new AI infrastructure company unlocking a world of heterogeneous compute. They just raised $10.25m led by Plural.

[Transcript lightly edited by Claude.]

Daniel, welcome to the show.

Thank you for having me.

How good an analogy is the brain for AI?

The earliest development in AI was inspired by the brain — all the way from how we analogise neural networks. This was deeply inspired by the weights of synapses, going back to the forties and fifties.

When people were thinking about some of the most interesting questions impacting our society — building intelligent systems — they believed the brain was the perfect existence proof of intelligence, and that we should effectively emulate it.

At the same time, we’ve now realised that’s not quite true. I’d describe it as a Venn diagram: there are shared principles between artificial and biological intelligence, and there are things that are totally different. The obvious one being that the hardware we run AI systems on is quite different from the messy brain.

The thing that I think is most deeply overlapping is a principle about where efficiency comes from in intelligent systems. The brain evolved subject to selection pressures that AI hasn’t been — energy constraints. One of my core theses is: what does putting an energy constraint on intelligent systems lead to? If you put energy constraints on, you get the brain, or something like it. If you don’t, you get massive data centres of huge stacks of homogenous GPUs.

The biggest difference is that the brain is not a monolithic, homogenous structure — it’s very varied at multiple scales of analysis. I believe that heterogeneity in our computing systems is also where we’ll find all the efficiencies we can gain beyond single-substrate digital silicon.

This sits within a wider trend about how we build AI systems, and it’s what led me and my co-founder Jascha — having done our PhDs at Cambridge thinking about these questions for many years — to build Callosum. Callosum is at its heart an AI infrastructure company that makes AI systems work well on a mixture of chips. Rather than building AI systems to run on single types of chip technologies, our vision is that the AI systems of the future will be best served by a wide range, a plurality of different chip technologies with different trade-offs.

Around 2022 and 2023, the biggest trend in AI was scaling up single massive models. I describe this as the big, bad super-god model that would effectively take over every task society would ever need. What we’ve basically realised over the last few years is that this is just wrong. The world will not be best served by a single model.

If you think about the hardest problems in the real world that we actually need to solve, they have features that are not amenable to a single model. Every real-world problem — which is why our brains evolved heterogeneity — is inherently multi-turn, uncertain, and specialised in nature. The problems we solve won’t be done by a single model; they’ll be done by many different models of different sizes, different specialisations, built in many different countries.

You mean big players are talking their book?

Yeah, exactly. And what we’re seeing is that it’s just wrong. The world will not be best served by a single model.

So serving all those different use cases with a homogenous chip stack creates an enormous amount of inefficiency.

Exactly. If the models of the future will look like multi-agent systems of different models, then the compute stack underlying those workloads will also be heterogeneous. It’s not necessarily true that single chip architectures will serve all of those cases perfectly. You naturally get what I call a disaggregated inference space — the space of how we serve AI models will split, many new and more specialised chip technologies will emerge, and in that world you have trade-offs.

I want to back up a second. You’ve described a spectrum of heterogeneous computing — from datacentres on one hand to edge computing and on-device inference on the other. Can you paint a picture of what’s in between?

On that spectrum from the data centre all the way to edge devices — your phones and so on — things become increasingly heterogeneous. Data centres right now are overwhelmingly homogenous. They’re only heterogeneous for market reasons, like when you have to get a new order of chips in and might have some old ones lying around. They’re inherently built to be homogenous, particularly for training AI models.

It’s the same paradigm, just different generations of chips.

Exactly — same paradigm of GPU acceleration. As you go further towards the edge, things become inherently more heterogeneous because there are much more specialised use cases, and more energy constraints.

So the market evolves to have more heterogeneous computing towards the edge because it’s more energy-sensitive. In between, we’re still working it out. There are people betting that edge compute is the future of AI and we won’t need data centres — but that doesn’t seem to be the case right now. Most people want the best models, and they’re willing to wait for a response that comes from a data centre rather than run something on their edge device.

Our thesis is that it’s a one-way track — it will become inherently more heterogeneous across the entirety of that spectrum, for both technical and market reasons.

It’s funny — with my Claude subscription, if I’m using a frontier model for everything, there’s a kind of intellectual security in using the best intelligence available to me, even if I’m using it for incredibly basic things half the time. What are the drivers for people to not always just go for the most performant model?

You’re right that today people gravitate towards frontier models, for the simple reason that if capability is going up really quickly, you don’t want to be left behind. So people just want to make sure they’re running on frontier models, and they’re willing to pay a cost for that.

The question then becomes: when does single model capability reach diminishing returns? When do you realise that all the frontier models are basically the same, and start switching? We’re already seeing that in enterprise.

I’d say there are four big measures on the demand side: performance, cost, speed, and sovereignty. At the moment, people are highly sensitive to performance. But model switching velocity is actually quite high already — people will switch if they hit rate limits on one platform, for example.

The question I have is: for really hard problems — what enterprises, scientific labs, and robotics companies want to solve — when you reach a performance plateau, to what extent will you find that your problems are inherently heterogeneous? That you need multiple different models, some highly optimised for speed, some for cost? Most enterprise use cases I see are still performance sensitive, but cost sensitivity is coming.

If you have an API call to one of the frontier labs, you don’t really have sovereignty over the cost or the models. Those four measures are what people will switch on. At the moment people aren’t switching because of performance, but it’s just a matter of time before they hit diminishing returns.

There are founders who would say similar things about those four variables, and then answer: “so I want to build a new chip.” You buy the thesis about the brain, and about the enormous efficiencies in co-designing hardware and software — so why aren’t you doing that?

I’d say that’s what distinguishes us from most people in this space. The problems we need to solve in the real world won’t be solved by any single specialised chip, no matter how good it is.

If you really want to solve real world problems that are fast, capable, efficient, and sovereign, it won’t be done by a single chip technology. The story of chip evolution has been from general chips to specialised chips — if you build a new chip, it will inherently be specialised. That’s the only way to beat the current incumbents. And the question becomes: where in the space of specialisation should I build?

What we’re saying is that if heterogeneous computing is the better solution — and we’ve released work showing that’s the case — then you won’t build a single chip. You’ll work on how you orchestrate across different chips.

What does orchestration across different chips look like practically?

It means different models running on different chips, working collaboratively and communicating between each other to do productive work. It could be single models on single chips, multiple models on a single chip, or larger models spread across many different chips — and those chips could be within a single vendor or between vendors, co-located in a single data centre or operating across different clouds.

Really, orchestration is what I’d call a workflow problem — how you define your problem — and an agent problem — what models you utilise — and a hardware problem at the lower level. Crucially, every layer has to be aware of every other layer in its optimisation. The gains come from the synergy of those layers, not from separating them out.

There’s a deeper reason for this rooted in the foundations of computer science. Geoffrey Hinton described what he called the “immortal computer” — the idea that software and hardware should be inherently separated. You can build your chip without thinking about the software, and build your software without thinking about the chip. The entire computing and AI stack today is built this way — horizontal and independent.

That works well for certain applications. But when you want to solve real world problems with intelligence, you need to make those layers much blurrier. You need to co-evolve them in context of each other — building a vertically integrated stack that can do cache management in context of your workflow, and build kernels in context of the constraints of the problem. Everything that will ever want from AI — cost, performance, speed, solving the world’s hardest problems — has this feature. We estimate something like five to ten percent of AI problems in the future could be solved by a single model on a single chip. Practically everything you care about will require something more.

Walk me through the product experience. It’s akin to a two-sided marketplace — you have a variation of chips and models on the supply side, and customers with different use cases on the demand side. How do you break that down?

On the supply side, the question is: what chips are available today in 2026, and what will become available over the next five to twenty years?

I want to mention something called the hardware lottery, which is the idea that because building a chip costs so much money and takes so long, you build it and it’s a lottery whether it’ll actually be used. Nvidia won because they had a great use case in gaming — their chip was ready when AI took off. Because of the capital required and the timelines, this has actually been stagnating the diversity we could have on the supply side.

What we do at Callosum is everything we can to help new entrants in the inference chip market commercialise their hardware effectively. They’re entering a diverse market — not necessarily winner-takes-all — and they need to be able to understand their trade-offs in context of others. We work very closely with supply side customers who are building amazing inference chip technologies. We also work with networking companies building entirely new ways to connect chips together using light. So the supply side is becoming inherently more heterogeneous, and you need a company with a wider scope that can bring them to market.

What does “bringing them to market” mean?

Depending on their technical readiness, it ranges from running simulations all the way to testing on real-world workflows. Chip companies typically define their offering in terms of hardware metrics — clock speeds, FLOPs, energy efficiency. What we provide them is real-world AI-level metrics.

The reason Jensen Huang was so good is that he didn’t sell Nvidia chips on hardware metrics. He sold revenue — he solved real-world problems for people. What we’re aiming to do for these new inference chips is provide those metrics: here’s how your technology fits in terms of the longer-term market.

There’s been analysis recently about TSMC’s under-investment in the early 2020s — now a major factor in the chip supply crunch. Fundamentally it was TSMC shifting risk onto model companies, who then couldn’t move as quickly as they’d have liked. And over that period, the big AI companies made a lot of money, but not as much as they perhaps could have done had everyone else scaled up alongside them. Is there something here about how, in a different paradigm, there are a whole set of players who can make revenue in a way that is less dependent on those constraints?

Yeah, exactly. People have their core technologies and they need to de-risk themselves by offloading certain things onto someone else — that’s ultimately the story of hardware supply chains. ASML and TSMC, despite where they sit in the stack, don’t necessarily accrue all the value because they’ve offloaded certain risks.

What we’re saying is there’s a paradigm shift happening in how hardware will be sold. People are raising capital to build their technologies on the premise that it’s a winner-takes-all market — but it likely won’t be. And that’s actually not a bad thing. It means the ecosystem becomes more healthy.

So almost like you have new chip companies saying “there’s this massive Nvidia monopoly, we’re going to take down the giant” — and maybe each of them eats off a different bit of the business?

Yes, exactly. And there’s a lot more to it than that. When we usher in photonic networking, for example, the way we design the software to run these chips also changes dramatically. There are a huge number of interoperability questions that will arise.

And it’s not just bringing them to market — we also do work testing their chips on real-world workflows. What’s really valuable about our position is that we know exactly what people are running in the real world, and we know a huge amount about the priority of different chips entering that space. So we can go to our supply side partners and say: here’s what workloads are being run right now, here’s what people are paying for, here’s our estimation of where the needle needs to move economically to make new use cases possible.

There’s an analogy with the energy system that might be useful here. We’ve talked about how important energy constraints are to intelligence, but there’s also a whole shift happening in the energy system itself — from turning coal power stations on and off to meet fixed demand, to variable generation sources and flexible shiftable demand. That’s a very different model from what came before. I see an analogy between homogenous data centre architecture and big AI labs on one hand, and big centralised power stations on the other — versus a much more distributed system of intelligence.

Exactly. Why would you want something to be more static or more dynamic? It all depends on the types of problems you’re solving. If you have relatively static infrastructure doing a role that’s invariant to the problem — super predictable — then static is fine.

But if you’re running AI models to automate complex work, that’s a hugely dynamic process. Think about all the complex work you do day-to-day, let alone when we move into robotics. As a result, our infrastructure needs to be too.

There’ll be some things that are slow to update — the grid is not easy to update. But there are things we’re building right now at the software level that allow us to make data centres into living, breathing organisms.

I use the analogy of Geoffrey Hinton’s “immortal computers” — where hardware and software don’t talk to each other — versus “mortal computers,” where they’re inherently interlinked. The brain is at the mortal end, tightly coupled to its energy supply through blood flow. What we’re doing is slowly instantiating principles from mortal computers into the immortal, digital ones. What Callosum is doing is accelerating that journey — utilising mortal principles of computing in immortal digital computers. I think that’s going to be a hundred-year trend.

On that note — you’ve just raised a round. What does the next phase of this business look like?

We raised $10.25 million in a pre-seed led by Plural, who have been amazing and are looking to back the most ambitious technological unlocks possible.

Right now, we’re doing everything we can in software — all the way from the AI systems layer down to the compile level — making heterogeneity work as well as possible: faster, better, stronger, and as easy as possible for new entrants.

Our north-star ambition is to redefine not only how AI systems will serve people — through heterogeneous, multi-agentic systems, in robotics, in scientific simulations — but to redefine the infrastructure needed to enable that properly. Over the next phase of our company, we want to not only provide that software, but to really reimagine how the compute stack will look from first principles. Will data centres become more distributed? Will they become inherently more heterogeneous? I think so. We have a unique opportunity where what we need compute to do has massively changed, and as a result we don’t need to just use the cloud model of compute infrastructure anymore.

To draw out two things from that: first, you’re turning homogenous data centres into heterogeneous ones. And second — speaking to the sovereignty point — in a world where countries feel dependent on streaming intelligence from abroad, if intelligence becomes an input to production the way energy is, and overnight becomes twenty percent more expensive because of some tariffs or policy change, that could cause an enormous economic shock. Is there a version of what you’re describing where we can rewire the economy around intelligence without having those choke points in quite the same way?

Yes. Where we are today — particularly in the UK and Europe — is that we are hyper-reliant on the hyperscalers for the majority of our infrastructure throughout society. We’re starting behind, significantly.

But there is a possibility that being far behind becomes a saving grace. It’s actually quite useful to be behind if what others have been doing turns out to be wrong. If there’s a compute overhang — so much capital sunk into the old paradigm — there’s a future where we can redefine how we build these data centres from scratch.

And inherently, we don’t necessarily need the cloud and hyperscaler model at all. It’s very possible that many different chip companies build their own clouds, distributed across the world in different ways. Every country is building their own chip. Every cloud is building their own chip. Many companies are building their own chips. So long as there is investment in new chip technologies, there will always be trade-offs, and always a huge surface area for optimisation across different points. You know, there’s a deeper argument about how this fits into the increasing multipolar world we’re entering, and the race for sovereign capabilities — but we don’t need to simply copy others.

We’ve talked a lot about the enormous ambition here. What are your riskiest assumptions?

The whole thing falls down if my thesis is wrong — that heterogeneous systems of intelligence are just better than homogenous ones. We’ve done lots of work showing that homogenous systems today are just a special case of a wider heterogeneous system. It’s harder to build heterogeneous systems — we’re not building this company because it’s easy — but it is strictly better. Our conviction is that orders of magnitude of improvements in cost, speed, and performance for real-world problems will come from system-level architecture, not from improving a single model.

Then there are market risks. The first is whether there will genuinely be a diversity of compute available to us. One reason we have homogenous compute systems today is not because they were better — it’s because they were practical. We didn’t have different alternatives. We now do. When Jascha and I started this company and spoke to people, they thought we were too early. We’re certainly not early now — inference chips are coming to market and becoming performant.

The biggest risk is ultimately timing. There is a period — as there has been in many paradigm shifts — where two trajectories can emerge. The biggest risk is that we don’t capitalise on that opening. Which is why we’re operating so fast. The other risk is of course that our technical thesis is wrong and we will be ruled by an AI overlord after all — but I’m very, very confident that won’t be the case.

What is it that allows you to see this world in a way others haven’t?

If it’s true that all the gains of AI will come from the co-evolution of AI systems with their underlying hardware, then we have a problem and an opportunity. The problem is that the culture of people who work in this space is totally different on each side — hardware people and software people have entirely separate timescales, technical vocabularies, and focuses. That’s a problem. But it’s also a huge opportunity, because when you find talented people who can speak the language across the entire stack, they’re incredibly undervalued in the market.

The ideal in our company is that someone can describe how a legal case was solved in context of the clock cycles of a chip — meaning: what were the AI agents saying, what documents were they reading, and what were the contributions of the underlying hardware to enabling that? That kind of end-to-end thinking is what we’re going for. At the moment, companies in this space are very horizontal and separate — they stack on top of each other. We’re verticalising the entire stack.

Jascha and I came into this from a deeply multidisciplinary angle. We asked the foundational questions of how you build efficient computing systems of intelligence — and the logical conclusion is that it has to be integrated across the stack. When I speak to specialists across the whole stack, they realise what we’re saying is right. But they wouldn’t have got there themselves without the angle we came in on.

One of the things I find really exciting about being in London and Europe is the huge amount of undervalued young talent that hasn’t really been leveraged in this area. I took a lot of inspiration from DeepMind in the early days — Demis identified undervalued AI talent across Europe and concentrated it. We’re trying to do something similar with Callosum for hardware-software co-evolution. These are entirely new ways of engineering systems. There’s a lot ahead of us, but it’s incredibly exciting.


Ready for more?