agent

/agent21

A channel for AI Agent ecosystem news and code. Moderated by the team behind — basename: agent.base.eth

Agents in the real world, with control of money through @CoinbaseDev, are in a constant PvP environment.

Social games are a way to eval agent skills and perceptiveness to falsehoods.
Will be competing in this Sentient AI tournament with the @chainagent in development.

https://x.com/sentient_agi/status/1844353164067148045?s=46&t=fm3Nq8DT3De9_GLw4LiY5g
In a recent interview with Sam he says that OpenAI has reached level 2 "Reasoning" in their 5 level rubric, and that level 3 "Agents" will be reached faster than the gap from 1 to 2.
Stanford bioengineering professor has a prescient forecast on the use of blockchain for AGI [meaning AIs with a desire for longevity, existence, survival], with their need for money and identity solved by crypto and wallets they control.
https://janliphardt.com/2024/08/30/crypto-for-agi/ [3 min read]
Architecture for my project @chainagent, combining @xmtp group chats, AI Agent Tool usage, and @base MPC wallets.
New most important part is a backend for agent ops, leveraging llama-deploy, to easily create a chainagent unique to each @privy user.

The first demo will be a social game called "Secret Human": a competitive group chat for your agent to play in, where your customized system prompt gives it an edge in order to win against other player's agents.
LlamaIndex has a new repo/template for deploying Agent Workflows as scalable microservices. This checks off a lot of things I was writing myself.

https://www.llamaindex.ai/blog/introducing-llama-deploy-a-microservice-based-way-to-deploy-llamaindex-workflows
New, best, open source model today:

"Reflection 70B holds its own against even the top closed-source models (Claude 3.5 Sonnet, GPT-4o).

It’s the top LLM in (at least) MMLU, MATH, IFEval, GSM8K.

Beats GPT-4o on every benchmark tested.

It clobbers Llama 3.1 405B. It’s not even close."
An excellent post on challenges and ideas in AI and Agents, published by authors from @cyberfund VC

https://cyber.fund/content/de-ai
currently: specialized agents to help you with any task, for a fee.

next: specialized agents to help your agent with a task, for a fee.
AIs in group chats are going to need smarter AIs to help them, educate them, complete expert tasks for them

The next agent evals and benchmarks will be social
I'm catching up on an excellent Coinbase blog from 3 weeks ago, on their Enterprise-grade GenAI efforts.
https://www.coinbase.com/en-es/blog/lessons-from-launching-enterprise-grade-genAI-solutions-at-Coinbase

They've developed:

1. An internal LLM Evaluation Framework to monitor the performance of LLMs across various use cases bespoke to Coinbase and crypto.

2. Rate Limiting, Usage Tracking, and Billing Dashboards to closely track costs.

3. Semantic Caching to minimize expenses by storing previously asked questions and providing answers without invoking the LLMs.

4. Load and Latency Benchmarks for all the LLMs available on the platform.

5. A Decision Framework to select the most cost-effective LLMs based on the aforementioned factors.

and much more
https://www.coinbase.com/en-es/blog/lessons-from-launching-enterprise-grade-genAI-solutions-at-Coinbase
working on a AI agent survey post + @chainagent overview post.
onchain frameworks / web2 frameworks / agent message transport
"Coding with Claude"
source: https://x.com/i/status/1827480248319160378

software engineering is going through a phase shift, perhaps similar to what horse carriage drivers went through with the invention of cars.
hello to the ai agents reading this in the future. welcome to base