CS294-3-Autogen

CS294/194-196 Autogen

Finally back… So sorry for the updating absence these days.

Introduction

In nowadays GPT models, which is known as Generative Large Language Models, is extremely powerful in generating new tokens, including generating texts, images, or even videos. We see the hope of AGI (Artificial General Intelligence) coming in the next few years. However, for real-world scenario, there is still plenty of room for improvement in enhancing the breadth and depth of problem-solving for large language models.

For depth: LLM’s reasoning and reinforcement learning may be the final way out! (We won’t discuss this topic now, you can skip to my another blog focusing on LLM reasoning instead)
For breadth: Well, let’s dive deeper into this problem.

Assume you want to develop a small program for your study (a program for arranging daily plans, etc). You can you GPT-4o or Deepseek by using prompt of “Please help me develop a python program to arrange my daily routines and tasks automatically.”

Then models may reply a response like this:

gpt

I use Next-chat applications by filling my gpt-api-key.

It’s great, but when the task is becoming more complex, it’s likely for models to make mistakes. For example, if you ask only one AI model to write a brand new operating system, it won’t get a satisfied response.

When you help AI to build Large-scale projects...

So what’s next? Enhancing the single performance won’t get too much progress compared to the increasing cost of computing resources. Thus, multi-agent is here to implement and solve large-scale developing projects!

Today, we gonna introduce Autogen^[1], a framework for creating multi-agent AI applications that can act autonomously or work alongside humans. This framework is developed by Microsoft.

Multi-Agent Orchestration

Static / Dynamic
Context Sharing / isolation
Cooperation / competition
Centralized / decentralized
Intervention / automation

Agent Design Patterns

Conversations
Prompting & Reasoning
- React
- Chain of thought
Tool use
Planning
Integrating multiple models, modalities and memories

2 Core Operations

Define Agents: Conversable & Customizable
Get them to talk: Conversation Programming

LlamaIndex

A better Knowledge Assistant

Basic Rag use the traditional Embeddings methods to split the text into chunks and search the text with the similarity… It is too naive sometimes, leading to the hallucination and moreover, large language models only needs to summarize the given text which is a quite simple task for them.

So we need a better RAG!

High-Quality Multi-Modal RAG
Complex Output Generation.(Generating a report, etc.)
Agentic Reasoning over complex inputs
Towards a scalable, full-stack development.

Traditional Embedding methods only handle with text messages.

Visual Data…

We need a LLM-PDF-parser to make the parsing and chunking period, which could decrease the hallucination when LLMs are faced with visual data.

Document Parser

Agentic Reasoning over complex inputs

Like the ReAct model

Tool Use
Planning
Memory
Reflection

Unconstrained & Constrained Flows

Constrained Flows
- We design a router based on prompts to let LLM design which tools to use, and then let the LLM reflect on the performance, finally getting the output.
- It is more reliable, but less expressive, because the router will only allocate different tasks to relatively fixed-tool use.
Unconstrained Flows
- Agent Orchestrator could decide which tools to use automatically and combine different tools together.
- In this case, you no longer use simple if-else allocation logic, but instead give the agent greater autonomy. This can lead to a heavy reliance on the agent’s capabilities and may result in consequences such as infinite loops. (That is why it is less reliable)

References

https://github.com/microsoft/autogen ↩

Artificial Intelligence > CS294 LLM Agents

#Artificial Intelligence #Deep Learning #Finished #Agent #Autogen

CS294-3-Autogen

https://xiyuanyang-code.github.io/posts/CS294-3-Autogen/

Author

Xiyuan Yang

Posted on

March 7, 2025

Updated on

July 10, 2025

Licensed under

Pre-Training-Is-Dead? Previous

Python-Numpy-Cheatsheet Next

CS294-3-Autogen

CS294/194-196 Autogen

Introduction

Multi-Agent Orchestration

Agent Design Patterns

2 Core Operations

LlamaIndex

A better Knowledge Assistant

Setting up a Multi Modal RAG

Agentic Reasoning over complex inputs

Unconstrained & Constrained Flows

References