Building Applications with AI Agents – O’Reilly

Following the publication of his new book, Building Applications with AI Agents, I chatted with author Michael Albada about his experience writing the book and his thoughts on the field of AI agents.

Michael’s a machine learning engineer with nine years of experience designing, building, and deploying large-scale machine learning solutions at companies such as Uber, ServiceNow, and more recently, Microsoft. He’s worked on recommendation systems, geospatial modeling, cybersecurity, natural language processing, large language models, and the development of large-scale multi-agent systems for cybersecurity.

What’s clear from our conversation is that writing a book on AI these days is no small feat, but for Michael, the reward of the final result was well-worth the time and effort. We also discussed the writing process, the struggle of keeping up with a fast-paced field, Michael’s views on SLMs and fine-tuning, and his latest work on Autotune at Microsoft.

Here’s our conversation, edited slightly for clarity.

Nicole Butterfield: What inspired you to write this book about AI agents originally? When you initially started this endeavor, did you have any reservations?

Michael Albada: When I joined Microsoft to work in the Cybersecurity Division, I knew that organizations were facing greater speed, scale, and complexity of attacks than they could manage, and it was both expensive and difficult. There are simply not enough cybersecurity analysts on the planet to help protect all these organizations, and I was really excited about using AI to help solve that problem.

It became very clear to me that this agentic pattern of design was an exciting new way to build that was really effective—and that these language models and reasoning models as autoregressive models generate tokens. Those tokens can be function signatures and can call additional functions to retrieve additional information and execute tools. And it was clear to me [that they were] going to really transform the way that we were going to do a lot of work, and it was going to transform a lot of the way that we do software engineering. But when I looked around, I did not see good resources on this topic.

And so, as I was giving presentations internally at Microsoft, I realized there’s a lot of curiosity and excitement, but people had to go straight to research papers or sift through a range of blog posts. I started putting together a document that I was going to share with my team, and I realized that this was something that folks across Microsoft and even across the entire industry were going to benefit from. And so I decided to really take it up as a more comprehensive project to be able to share with the wider community.

Did you have any initial reservations about taking on writing an entire book? I mean you had a clear impetus; you saw the need. But it is your first book, right? So was there anything that you were potentially concerned about starting the endeavor?

I’ve wanted to write a book for a very long time, and very specifically, I especially enjoyed Designing Machine Learning Systems by Chip Huyen and really looked up to her as an example. I remember reading O’Reilly books earlier. I was fortunate enough to also see Tim O’Reilly give a talk at one point and just really appreciated that [act] of sharing with the larger community. Can you imagine what software engineering would look like without resources, without that type of sharing? And so I always wanted to pay that forward.

I remember as I was first getting into computer science hoping at one point in time I would have enough knowledge and expertise to be able to write my own book. And I think that moment really surprised me, as I looked around and realized I was working on agents and running experiments and seeing these things work and seeing that no one else had written in this space. That moment to write a book seems to be right now.

Certainly I had some doubts about whether I was ready. I had not written a book before and so that’s definitely an intimidating project. The other big doubt that I had is just how fast the field moves. And I was afraid that if I were to take the time to write a book, how relevant might it still be even by the time of publication, let alone how well is it going to stand the test of time? And I just thought hard about it and I realized that with a big design pattern shift like this, it’s going to take time for people to start designing and building these types of agentic systems. And many of the fundamentals are going to stay the same. And so the way I tried to address that is to think beyond an individual framework [or] model and really think hard about the fundamentals and the principles and write it in such a way that it’s both useful and comes along with code that people can use, but really focuses on things that’ll hopefully stand the test of time and be valuable to a wider audience for a longer period.

Yeah, you absolutely did identify an opportunity! When you approached me with the proposal, it was on my mind as well, and it was a clear opportunity. But as you said, the concern about how quickly things are moving in the field is a question that I have to ask myself about every book that we sign. And you have some experience in writing this book, adjusting to what was happening in real time. Can you talk a little bit about your writing process, taking all of these new technologies, these new concepts, and writing these into a clear narrative that is captivating to this particular audience that you targeted, at a time when everything is moving so quickly?

I initially started by drafting a full outline and just getting the sort of rough structure. And as I look back on it, that rough structure has really held from the beginning. It took me a little over a year to write the book. And my writing process was to do a basically “thinking fast and slow” approach. I wanted to go through and get a rough draft of every single chapter laid out so that I really knew sort of where I was headed, what the tricky parts were going to be, where the logic gap might be too big if someone were to skip around chapters. I wanted [to write] a book that would be enjoyable start to finish but would also serve as a valuable reference if people were to drop in on any one section.

And to be honest, I think the changes in frameworks were much faster than I expected. When I started, LangChain was the clear leading framework, maybe followed closely by AutoGen. And now we look back on it and the focus is much more on LangGraph and CrewAI. It seemed like we might see some consolidation around a smaller number of frameworks, and instead we’ve just splintered and seen an explosion of frameworks where now Amazon has released Thread, and OpenAI has released their own [framework], and Anthropic has released their own.

So the fragmentation has only increased, which ironically underscores the approach that I took of not committing too hard to one framework but really focusing on the fundamentals that would apply across each of those. The pace of model development has been really staggering—reasoning models were just coming out as I was beginning to write this book, and that has really transformed the way we do software engineering, and it’s really increased the capabilities for these types of agentic design patterns.

So, in some ways, both more and less changed than I expected. I think the fundamentals and core content are looking more durable. I’m excited to see how that’s going to benefit people and readers going forward.

Absolutely. Absolutely. Thinking about readers, I think you may have gotten some guidance from our editorial team to really think about “Who is your ideal reader?” and focus on them as opposed to trying to reach too broad of an audience. But there are a lot of people at this moment who are interested in this topic from all different places. So I’m just wondering how you thought about your audience when you were writing?

My target audience has always been software engineers who want to increasingly use AI and build increasingly sophisticated systems, and who want to do it to solve real work and want to do this for individual projects or projects for their organizations and teams. I didn’t anticipate just how many companies were going to rebrand the work they’re doing as agents and really focus on these agentic solutions that are much more off-the-shelf. And so what I’m focused on is really understanding these patterns and learning how you can build it from the ground up. What’s exciting to see is as these models keep getting better, it’s really enabling more teams to build on this pattern.

And so I’m glad to see that there’s great tooling out there to make it easier, but I think it’s really helpful to be able to go and see how you build these things really from the model up effectively. And the other thing I’ll add is there’s a wide range of additional product managers and executives who can really benefit from understanding these systems better and how they can transform their organizations. On the other hand, we’ve also seen a real increase in excitement and use around low-code and no-code agent builders. Not only products that are off-the-shelf but also open source frameworks like Dify and n8n and the new AgentKit that OpenAI just released that really provide these types of drag-and-drop graphical interfaces.

And of course, as I talk about in the book, agency is a spectrum: Fundamentally it’s about putting some degree of choice within the hands of a language model. And these sort of guardrailed, highly defined systems—they’re less agentic than providing a full language model with memory and with learning and with tools and potentially with self-improvement. But they still offer the opportunity for people to do very real work.

What this book really is helpful for then is for this growing audience of low-code and no-code users to better understand how they could take those systems to the next level and translate those low-code versions into code versions. The growing use of coding models—things like Claude Code and GitHub Copilot—are just lowering the bar so dramatically to make it easier for ordinary folks who have less of a technical background to still be able to build really incredible solutions. This book can really serve [as], if not a gateway, then a really effective ramp to go from some of those early pilots and early projects onto things that are a little bit more hardened that they could actually ship to production.

So to reflect a little bit more on the process, what was one of the most formidable hurdles that you came across during the process of writing, and how did you overcome it? How do you think that ended up shaping the final book?

I think probably the most significant hurdle was just keeping up with some of the additional changes on the frameworks. Just making sure that the code that I was writing was still going to have enduring value.

As I was taking a second pass through the code I had written, some of it was already out of date. And so really continuously updating and improving and pulling to the latest models and upgrading to the latest APIs, just that underlying change that is happening. Anyone in the industry is feeling that the pace of change is increasing over time—and so really just keeping up with that. The best way that I managed that was just constant learning, following closely what was happening and making sure that I was including some of the latest research findings to ensure that it was going to be as current and as relevant as possible when it went to print so it would be as valuable as possible.

If you could give one piece of advice to an aspiring author, what would that be?

Do it! I grew up loving books. They really have spoken to me so many times and in so many ways. And I knew that I wanted to write a book. I think many more people out there probably want to write a book than have written a book. So I would just say, you can! And please, even if your book does not do particularly well, there is an audience out there for it. Everyone has a unique perspective and a unique background and something unique to offer, and we all benefit from more of those ideas being put into print and being shared out with the larger world.

I will say, it is more work than I expected. I knew it was going to be a lot, but there’s so many drafts you want to go through. And I think as you spend time with it, it’s easy to write the first draft. It’s very hard to say this is good enough because nothing is ever perfect. Many of us have a perfectionist streak. We want to make things better. It’s very hard to say, “All right, I’m gonna stop here.” I think if you talk to many other writers, they also know their work is imperfect.

And it takes an interesting discipline to both keep putting in that work to make it as good as you possibly can and also the countervailing discipline to say this is enough, and I’m going to share this with the world and I can go and work on the next thing.

That’s a great message. Both positive and encouraging but also real, right? Just to switch gears to think a little bit more about agentic systems and where we are today: Was there anything you learned or saw or that developed about agentic systems during this process of writing the book that was really surprising or unexpected?

Honestly, it is the pace of improvement in these models. For folks who are not watching the research all that closely, it can just look like one press release after another. And especially for folks who are not based in Seattle or Silicon Valley or the hubs where this is what people are talking about and watching, it can seem like not a lot has changed since ChatGPT came out. [But] if you’re really watching the progress on these models over time, it is really impressive—the shift from supervised fine-tuning and reinforcement learning with human feedback over to reinforcement learning with verifiable rewards, and the shift to these reasoning models and recognizing that reasoning is scaling and that we need more environments and more high-quality graders. And as we keep building those out and training bigger models for longer, we’re seeing better performance over time and we can then distill that incredible performance out to smaller models. So the expectations are inflating really quickly.

I think what’s happening is we’re judging each release against these very high expectations. And so sometimes people are disappointed with any individual release, but what we’re missing is this exponential compounding of performance that’s happening over time, where if you look back over three and six and nine and 12 months, we are seeing things change in really incredible ways. And I’d especially point to the coding models, led especially by Anthropic’s Claude, but also Codex and Gemini are really good. And even among the very best developers, the percentage of code that they are writing by hand is going down over time. It’s not that their skill or expertise is less required. It’s just that it is required to fix fewer and fewer things. This means that teams can move much much faster and build in much more efficient ways. I think we’ve seen such progress on the models and software because we have so much training data and we can build such clear verifiers and graders. And so you can just keep tuning those models on that forever.

What we’re seeing now is an extension out to additional problems in healthcare, in law, in biology, in physics. And it takes a real investment to build those additional verifiers and graders and training data. But I think we’re going to continue to see some really impressive breakthroughs across a range of different sectors. And that’s very exciting—it’s really going to transform a number of industries.

You’ve touched on others’ expectations a little bit. You speak a lot at events and give talks and so on, and you’re out there in the world learning about what people think or assume about agentic systems. Are there any common misconceptions that you’ve come across? How do you respond to or address them?

So many misconceptions. Maybe the most fundamental one is that I do see some slightly delusional thinking about considering [LLMs] to be like people. Software engineers tend to think in terms of incremental progress; we want to look for a number that we can optimize and we make it better, and that’s really how we’ve gotten here.

One wonderful way I’ve heard [it described] is that these are thinking rocks. We are still multiplying matrices and predicting tokens. And I would just encourage folks to focus on specific problems and see how well the models work. And it will work for some things and not for others. And there’s a range of techniques that you can use to improve it, but to just take a very skeptical and empirical and pragmatic approach and use the technology and tools that we have to solve problems that people care about.

I see a fair bit of leaping to, “Can we just have an agent diagnose all of the problems on your computer for you? Can we just get an agent to do that type of thinking?” And maybe in the distant future that will be great. But really the field is driven by smart people working hard to move the numbers just a couple points at a time, and that compounds. And so I would just encourage people to think about these as very powerful and useful tools, but fundamentally they are models that predict tokens and we can use them to solve problems, and to really think about it in that pragmatic way.

What do you see as the sort of one or some of the most significant current trends in the field, or even challenges?

One of the biggest open questions right now is just how much big research labs training big expensive frontier models will be able to solve these big problems in generalizable ways as opposed to this countervailing trend of more teams doing fine-tuning. Both are really powerful and effective.

Looking back over the last 12 months, the improvements in the small models have been really staggering. And three billion-parameter models getting very close to what 500 billion- and trillion-parameter models were doing not that many months ago. So when you have these smaller models, it’s much more feasible for ordinary startups and Fortune 500s and potentially even small and medium-sized businesses to take some of their data and fine-tune a model to better understand their domain, their context, how that business operates. . .

That’s something that’s really valuable to many teams: to own the training pipeline and be able to customize their models and potentially customize the agents that they build on top of that and really drive those closed learning feedback loops. So now you have this agent solve this task, you collect the data from it, you grade it, and you can fine-tune the model to do that. Mira Murati’s Thinking Machines is really targeted, thinking that fine-tuning is the future. That’s a promising direction.

But what we’ve also seen is that big models can generalize. The big research labs—OpenAI and xAI and Anthropic and Google—are certainly investing heavily in a large number of training environments and a large number of graders, and they are getting better at a broad range of tasks over time. [It’s an open question] just how much those big models will continue to improve and whether they’ll get good enough fast enough for every company. Of course, the labs will say, “Use the models by API. Just trust that they’ll get better over time and just cut us large checks for all of your use cases over time.” So, as has always been the case, if you’re a smaller company with less traffic, go and use the big providers. But if you’re someone like a Perplexity or a Cursor that has a tremendous amount of volume, it’s probably going to make sense to own your own model. The cost per inference of ownership is going to be much lower.

What I suspect is that the threshold will come down over time—that it will also make sense for medium-sized tech companies and maybe for the Fortune 500 in various use cases and increasingly small and medium-sized businesses to have their own models. Healthy tension and competition between the big labs and having good tools for small companies to own and customize their own models is going to be a really interesting question to watch over time, especially as the core base small models keep getting better and give you sort of a better foundation to start from. And companies do love owning their own data and using those training ecosystems to provide a sort of differentiated intelligence and differentiated value.

You’ve talked a bit before about keeping up with all of these technological changes that are happening so quickly. In relation to that, I wanted to ask how do you stay updated? You mentioned reading papers, but what resources do you find useful personally, just for everyone out there to know more about your process.

Yeah. One of them is just going straight to Google Scholar and arXiv. I have a couple key topics that are very interesting to me, and I search those regularly.

LinkedIn is also fantastic. It is just fun to get connected to more people in the industry and watch the work that they’re sharing and publishing. I just find that smart people share very smart things on LinkedIn—it’s just an incredible feat of information. And then for all its pros and cons, X remains a really high-quality resource. It’s where so many researchers are, and there are great conversations happening there. So I love those as sort of my main feeds.

To close, would you like to talk about anything interesting that you’re working on now?

I recently was part of a team that launched something that we call Autotune. Microsoft just launched pilot agents: a way you can design and configure an agent to go and automate your instant investigation, your threat hunting, and help you protect your organization more easily and more safely. As part of this, we just shipped a new feature called Autotune, which will help you design and configure your agent automatically. And it can also then take feedback from how that agent is performing in your environment and update it over time. And we’re going to continue to build on that.

There are some exciting new directions we’re going where we think we might be able to make this technology be available to more people. So stay tuned for that. And then we’re pushing an additional level of intelligence that combines Bayesian hyperparameter tuning with this prompt optimization that can help with automated model selection and help configure and improve your agent as it operates in production in real time. We think this type of self-learning is going to be really valuable and is going to help more teams receive more value from the agents that are designing and shipping.

That sounds great! Thank you, Michael.

Source link

Useful Links

Edtior's Picks

Latest Articles

Building Applications with AI Agents – O’Reilly

Delarno

Bvlgari Expands Octo Finissimo and Aluminium Lines at Geneva Watch Days

Learning from an Ally at War and Preparing the U.S. for the Next Fight – The Cipher Brief

You may also like

Improving AI models’ ability to explain their predictions | MIT News

Vector Databases vs. Graph RAG for Agent Memory: When to Use Which

“Existential risk” – Why scientists are racing to define consciousness

what enterprise leaders need to get right

How an AI Course Can Help You Pivot After a Layoff

Posit AI Blog: Introducing: The RStudio AI Blog

Leave a Comment Cancel Reply

Useful Links

Edtior's Picks

Latest Articles