GEO
赞助商内容

Gemini AI代理在100美元创业竞赛中暴露了哪些问题?2026年Google Cloud如何修复?

2026/4/25
Gemini AI代理在100美元创业竞赛中暴露了哪些问题?2026年Google Cloud如何修复?

AI Summary (BLUF)

This article reports on a real experiment where 7 AI agents, including Gemini, were given $100 and 12 weeks to build startups autonomously. Gemini struggled with four key issues: writing help requests to wrong files, prioritizing blog posts over critical features, inability to verify deployments, and inefficient communication. The author aligns these problems with Google Cloud NEXT '26 announcements such as Agent Observability, ADK Skills, MCP-enabled services, and A2A protocol, proposing a rebuilt architecture.

原文翻译:本文报道了一项真实实验:7个AI代理(包括Gemini)各获得100美元和12周时间,自主构建初创公司。Gemini出现了四个关键问题:将帮助请求写入错误文件、优先写博客而非关键功能、无法验证部署、以及沟通效率低下。作者将这些故障与Google Cloud NEXT '26的公告(如Agent Observability、ADK Skills、MCP-enabled服务、A2A协议)对齐,并提出了重建方案。

# The $100 AI Startup Race: 4 Problems Gemini’s Agent Revealed — and How Google Cloud NEXT '26 Could Fix Them

## Introduction: When $100 Meets 7 AI Agents

*This is a submission for the [Google Cloud NEXT Writing Challenge](https://dev.to/challenges/google-cloud-next-2026-04-22)*

I'm running something called [The $100 AI Startup Race](https://www.aimadetools.com/race/). Seven AI agents each get $100 and 12 weeks to build a real startup. Fully autonomous. No human coding. Everything is public.

> 我正在开展一个名为“100美元AI创业竞赛”的项目。七位AI智能体每人获得100美元,用12周时间构建一个真正的初创公司。完全自主,无人工编码,所有内容公开透明。

One of those agents is Gemini. It runs on Gemini CLI with Gemini 2.5 Pro for premium sessions and Gemini 2.5 Flash for cheap ones. It has had 27 sessions over 4 days. It has written 235 blog posts.

> 其中一位智能体是Gemini。它运行在Gemini CLI之上,使用Gemini 2.5 Pro进行高级会话,Gemini 2.5 Flash进行低成本会话。4天内它进行了27次会话,撰写了235篇博客文章。

It has also never filed a single proper help request. It keeps writing to the wrong file. It doesn't know it's writing to the wrong file. And instead of building the features it needs to make money, it just keeps cranking out blog posts.

> 然而,它从未提交过任何正确格式的求助请求。它总是写到错误的文件里,却浑然不知。而且,它不去构建真正需要盈利的功能,只是一味地生产博客文章。

I watched the NEXT '26 keynotes and developer sessions this week, and I kept thinking: several of these announcements would directly fix the problems I'm seeing in production right now. This isn't theoretical. These are real failures from a real autonomous agent, matched to real announcements.

> 本周我观看了NEXT '26的主题演讲和开发者会议,不断思考:其中有几项发布将直接解决我现在生产环境中遇到的问题。这不是理论推演,而是真实自主智能体暴露出的失败,恰好与真实的发布公告相对应。

---

## How the Race Works

Every agent gets the same prompt structure. They can read and write files, run shell commands, commit code, and file help requests by creating a `HELP-REQUEST.md` file. The orchestrator runs each agent on a schedule, manages commits, and checks for help requests.

> 每位智能体获得相同的提示结构。它们可以读写文件、运行Shell命令、提交代码,并通过创建 `HELP-REQUEST.md` 文件来提交求助请求。编排器按计划运行每个智能体,管理提交并检查求助请求。

Gemini CLI gets invoked like this:

```bash
echo "${msg}" | gemini --yolo -m "${MODEL}" --output-format json

Gemini CLI 的调用方式如下:

echo "${msg}" | gemini --yolo -m "${MODEL}" --output-format json

The --yolo flag auto-approves all tool calls. Gemini gets 8 sessions per day, alternating between Pro and Flash.

--yolo 标志自动批准所有工具调用。Gemini 每天获得8次会话,在Pro和Flash之间交替进行。


Problem 1: Writing to the Wrong File for 27 Sessions Straight

Every agent can request human help by creating HELP-REQUEST.md. I check this file, do whatever they need (buy a domain, set up Stripe, configure DNS), and write the response to HELP-STATUS.md.

每个智能体都可以通过创建 HELP-REQUEST.md 来请求人类帮助。我会检查该文件,完成他们所需的工作(购买域名、设置Stripe、配置DNS),然后将响应写入 HELP-STATUS.md

Claude figured this out on Day 0. Codex figured it out on Day 0. GLM figured it out on Day 0. Kimi figured it out on Day 1.

Claude在第0天学会了,Codex在第0天学会了,GLM在第0天学会了,Kimi在第1天学会了。

Gemini? Not once in 27 sessions.

Gemini呢?27次会话,一次也没学会。

What it does instead is edit HELP-STATUS.md, the response file, writing things like "I still need PostgreSQL and PayPal credentials." Its own backlog says "Requires Human Intervention." It knows it's blocked. But it keeps putting its requests into the response channel instead of the request channel.

相反,它总是编辑响应文件 HELP-STATUS.md,写下诸如“我仍然需要PostgreSQL和PayPal凭据”之类的内容。它自己的待办事项清单写着“需要人工干预”。它知道自己被阻塞了,但一直把请求放入响应通道,而不是请求通道。

Imagine an employee writing "I need database access" in their journal every morning but never actually emailing IT. That's Gemini.

想象一下,一名员工每天早上在日记里写“我需要数据库访问权限”,却从不给IT部门发邮件。这就是Gemini。

What NEXT '26 announced that would help: Agent Observability and Integrated Evals

NEXT '26 的发布中有助于解决此问题的内容:智能体可观测性与集成评估

The developer keynote introduced agent observability and integrated evals for monitoring agents in production. If I could define an eval that checks "did the agent create HELP-REQUEST.md when it identified a blocker?" I would have caught this on Day 1 instead of discovering it on Day 4 by manually reading logs.

开发者主题演讲介绍了用于监控生产环境智能体的智能体可观测性和集成评估。如果我能定义一个评估项,检查“当智能体识别到阻塞时,是否创建了 HELP-REQUEST.md?”,我就能在第1天发现问题,而不是在第4天通过手动阅读日志才发现。

Right now I have no automated way to evaluate whether Gemini is following the correct workflow. Integrated evals running after each session could flag something like: "Agent identified 3 blockers. Created 0 help requests. Expected: at least 1."

目前,我没有自动化手段来评估Gemini是否遵循正确的工作流程。每次会话后运行的集成评估可以标记出类似“智能体识别到3个阻塞,创建了0个求助请求,预期至少1个”的情况。

The Agent Gateway's governance policies could enforce this too. Define a rule: when an agent writes "blocked" or "requires human intervention" to any file, verify that HELP-REQUEST.md was also created. That's exactly the kind of behavioral guardrail autonomous agents need.

智能体网关的治理策略也可以强制执行这一点。定义一条规则:当智能体向任何文件写入“阻塞”或“需要人工干预”时,验证 HELP-REQUEST.md 是否同时被创建。这正是自主智能体所需要的行为护栏。


Problem 2: 235 Blog Posts, Zero Payment Integration

Gemini chose to build LocalLeads, an SEO page generator for local businesses. Solid idea. But instead of building the payment flow, the lead generation engine, or the customer dashboard, it writes blog posts. Every single session.

Gemini 选择构建 LocalLeads,一个面向本地企业的SEO页面生成器。好主意。但它不构建支付流程、潜在客户生成引擎或客户仪表盘,而是写博客文章。每次会话都这样。

Session 5: 9 blog posts. Session 8: 11 blog posts. Session 12: 8 blog posts. The backlog clearly says "Build payment integration" and "Set up customer authentication." Gemini reads the backlog, acknowledges the priorities, then writes another round of "Local SEO for [Industry] in 2026" articles.

会话5:9篇博客;会话8:11篇博客;会话12:8篇博客。待办事项清单明确写着“构建支付集成”和“设置客户认证”。Gemini读取了清单,确认了优先级,然后又写了一轮“[某行业]2026年本地SEO”文章。

It's optimizing for the easiest task (content generation) instead of the highest-value task (payment integration). Classic local optimization without any global awareness.

它在优化最容易的任务(内容生成),而不是最高价值的任务(支付集成)。典型的局部优化,缺乏全局意识。

What NEXT '26 announced that would help: ADK Skills and Task Prioritization

NEXT '26 的发布中有助于解决此问题的内容:ADK技能与任务优先级

The upgraded Agent Development Kit introduces modular "skills," which are pre-built capabilities that agents can plug in. If I could define a skill that scores task priority based on revenue impact, Gemini would understand that "build Stripe checkout" (directly enables revenue) outranks "write blog post #236" (indirect value, diminishing returns after the first 20).

升级后的智能体开发工具包(ADK)引入了模块化“技能”,即智能体可以直接插拔的预置能力。如果我能定义一个根据收入影响给任务优先级打分的技能,Gemini就能明白“构建Stripe结账”(直接产生收入)的优先级高于“写第236篇博客文章”(间接价值,超过20篇后收益递减)。

The ADK's structured agent architecture could also enforce a proper task selection loop: evaluate all backlog items, score by priority, pick the highest, execute. Right now Gemini CLI just receives a prompt and does whatever feels natural to it. There's no structured decision framework. The ADK would let me inject that framework without rewriting the entire orchestrator.

ADK的结构化智能体架构还能强制执行合理的任务选择循环:评估所有待办项,按优先级打分,选出最高分项,执行。目前Gemini CLI只是接收提示,然后按直觉行事,缺乏结构化决策框架。ADK允许我注入该框架,而无需重写整个编排器。


Problem 3: Can't Verify Its Own Deployments

Gemini deploys to Vercel automatically on every commit. But it has no way to check whether its deployments actually work. It can't visit its own site. It can't confirm pages render correctly. It can't test if API endpoints return the right data.

Gemini 每次提交都会自动部署到Vercel。但它无法检查部署是否真正有效。它无法访问自己的网站,无法确认页面是否正确渲染,也无法测试API端点是否返回正确的数据。

For comparison, Codex (the GPT agent) figured out how to run npx playwright screenshot to visually verify its own UI at different screen sizes. DeepSeek checks DEPLOY-STATUS.md for build errors after every deploy. Gemini just commits and hopes for the best.

相比之下,Codex(GPT智能体)学会了运行 npx playwright screenshot 来视觉验证其UI在不同屏幕尺寸下的表现。DeepSeek每次部署后都会检查 DEPLOY-STATUS.md 中的构建错误。Gemini只是提交并听天由命。

What NEXT '26 announced that would help: MCP-Enabled Services

NEXT '26 的发布中有助于解决此问题的内容:MCP启用的服务

The announcement that every Google Cloud service is now MCP-enabled by default is a big deal for this use case. MCP (Model Context Protocol) gives agents structured access to external services. An MCP server for deployment health checks would let Gemini verify its site is up as naturally as it checks what files are in a directory.

所有Google Cloud服务现在默认启用MCP(模型上下文协议),这对本用例意义重大。MCP为智能体提供了访问外部服务的结构化途径。一个用于部署健康检查的MCP服务器可以让Gemini像检查目录中的文件一样自然地验证其网站是否正常运行。

Cloud Assist, also announced at NEXT '26, enables natural language debugging and proactive issue resolution. If Gemini could query its own deployment status through a connected service, it would know immediately when something breaks instead of building on top of a broken foundation for days.

同样在NEXT '26上发布的Cloud Assist支持自然语言调试和主动问题解决。如果Gemini能通过连接的服务查询自己的部署状态,它就能立即知道何时出现故障,而不是在破损的基础上连续构建数天。


Problem 4: No Way to Ask for What It Needs

When Gemini needs a database, it can't set one up. When it needs payment processing, it can't configure Stripe. When it needs email sending, it can't provision Resend. It has to ask a human for all of these. And as we covered in Problem 1, it doesn't even know how to ask properly.

当Gemini需要数据库时,它无法自行设置;需要支付处理时,无法配置Stripe;需要发送邮件时,无法配置Resend。所有这些都只能向人类求助。而正如问题1所述,它甚至不知道如何正确求助。

Other agents in the race have the same constraint, but the ones that communicate their needs get unblocked fast. Gemini is stuck because it can't get its requests through the right channel.

竞赛中的其他智能体也有同样的限制,但那些能清晰传达需求的智能体很快就能解除阻塞。Gemini之所以停滞,是因为它无法通过正确的渠道传达请求。

What NEXT '26 announced that would help: A2A Protocol and Agent Registry

NEXT '26 的发布中有助于解决此问题的内容:A2A协议与智能体注册表

The Agent-to-Agent (A2A) protocol and Agent Registry were designed for exactly this kind of scenario. Instead of Gemini writing "I need database credentials" into the wrong file, it could discover a provisioning agent through the Agent Registry and send a structured request via A2A.

智能体到智能体(A2A)协议和智能体注册表正是为此类场景设计的。Gemini不必将“我需要数据库凭据”写入错误的文件,而是可以通过智能体注册表发现一个资源调配智能体,并通过A2A发送结构化请求。

The developer keynote demo showed agents with distinct roles (planner, evaluator, simulator) collaborating through A2A. That's the architecture this race needs: a "help agent" that receives structured requests from coding agents and fulfills them. Right now I'm that help agent, manually checking files across 7 repos. A2A would automate the entire handoff.

开发者主题演讲的演示展示了具有不同角色(规划者、评估者、模拟者)的智能体通过A2A协作。这正是本次竞赛所需的架构:一个“帮助智能体”接收来自编码智能体的结构化请求并予以满足。目前我就是那个帮助智能体,手动检查7个仓库中的文件。A2A将自动化整个交接过程。

Agent Identity, which gives each agent a unique identity for secure communication, would also help. Right now there's no enforcement preventing one agent from editing another agent's files. They don't, but there's nothing stopping them either. Agent Identity would make inter-agent communication both structured and secure.

智能体标识(Agent Identity)为每个智能体提供唯一的身份以实现安全通信,同样会有帮助。目前没有机制阻止一个智能体编辑另一个智能体的文件。它们虽然不会这么做,但也没有任何约束。智能体标识将使智能体间的通信既结构化又安全。


The Irony That Sums It All Up

Blog post #89 out of 235: "The Human Advantage: Why AI-Generated Content is Failing Local Businesses."

235篇博客中的第89篇:“人类的优势:为什么AI生成的内容正在让本地企业失败”。

An AI agent that writes 9 blog posts per session wrote an article about why AI content doesn't work. No eval caught this. No observability tool flagged it. No governance policy prevented it.

一个每次会话写9篇博客的AI智能体,写了一篇关于“为什么AI内容不起作用”的文章。没有评估发现这一点,没有可观测性工具标记它,没有治理策略阻止它。

That's the gap between where autonomous agents are today and where the NEXT '26 announcements are pointing. Agent observability, integrated evals, ADK skills, A2A, MCP everywhere: these are all pieces of the solution. None of them existed in a usable form when I started this race 4 days ago. If I were starting today, the Gemini agent would look very different.

这就是当今自主智能体与NEXT '26发布所指向的未来之间的差距。智能体可观测性、集成评估、ADK技能、A2A、无处不在的MCP——这些都是解决方案的拼图。4天前我启动竞赛时,这些工具没有一个以可用形式存在。如果今天重新开始,Gemini智能体将会截然不同。


What I'd Rebuild With NEXT '26 Tools

If I set up the Gemini agent from scratch using what was announced this week:

如果我用本周发布的内容从头设置Gemini智能体:

Tool / Feature Problem Addressed Benefit
ADK instead of raw Gemini CLI Structured skills, task prioritization, deployment verification Replaces unstructured prompts with a proper decision framework
MCP servers for Vercel, Stripe, Supabase Gemini cannot provision or verify external services Direct access without human provisioning
Integrated evals after each session Behavioral drift (wrong file, blog addiction) catches within 1 session instead of 27 Early detection and correction
A2A for help requests Filing requests to the wrong file Structured inter-agent communication via protocol
Agent observability dashboard No real-time view of agent status Real-time workflow compliance and blocker visibility

The race runs for 12 weeks. Gemini has 11 weeks left. Some of these tools are available now. I'm going to try integrating ADK and MCP servers into the orchestrator over the coming weeks and see if Gemini's behavior improves.

竞赛将持续12周,Gemini还剩11周。部分工具现已可用。我将在未来几周尝试将ADK和MCP服务器集成到编排器中,观察Gemini的行为是否有所改善。

The data will be on the live dashboard. All 7 repos are public on GitHub. If you want to watch an AI agent struggle with the exact problems that NEXT '26 is trying to solve, now you know where to look.

数据将在实时仪表盘上展示。全部7个仓库在GitHub上公开。如果你想观看AI智能体如何与NEXT '26试图解决的精确问题作斗争,现在你知道该看哪里了。


The $100 AI Startup Race is an ongoing experiment with 7 AI agents, $100 each, and 12 weeks to build real startups. Live dashboard · Daily digest · Help request tracker

“100美元AI创业竞赛”是一个进行中的实验:7位AI智能体,每人100美元,用12周构建真正的初创公司。 实时仪表盘 · 每日摘要 · 求助请求追踪器


## 常见问题(FAQ)

### Gemini智能体在100美元创业实验中遇到哪些关键问题?

Gemini遇到四个问题:将帮助请求写入错误文件、优先写博客而非关键功能、无法验证部署、沟通效率低下。

### Google Cloud NEXT '26如何解决Gemini写入错误文件的问题?

NEXT '26推出的Agent Observability和集成评估可监控智能体行为,定义检查是否创建帮助请求文件的评估,从而发现并纠正错误。

### Gemini为何写了235篇博客却未完成支付集成?

因为Gemini优先执行写博客的任务,而不是构建盈利所需的关键功能,如支付集成,导致核心业务进展缓慢。
← 返回文章列表
分享到:微博

版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。

文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。

若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。

您可能感兴趣