AI毕业生就业选多模态?2026年技术护城河与薪酬深度分析
For AI graduates seeking employment, the author strongly recommends focusing on multimodal AI, arguing it offers the highest long-term value, deepest technical moats, and best compensation potential compared to pure LLM or general AIGC application roles. The analysis is based on firsthand hiring experience and current market trends.
原文翻译: 对于寻求就业的AI毕业生,作者强烈建议专注于多模态AI,认为与纯语言大模型或通用AIGC应用岗位相比,多模态方向提供了最高的长期价值、最深的技术护城河和最佳的薪酬潜力。该分析基于一线招聘经验和当前市场趋势。
985 Master's AI Career Crossroads: LLM, Multimodal, or AIGC – How to Choose?
引言:一个时代的焦虑
Introduction: The Anxiety of an Era
这问题绝对是今年计算机应届生最焦虑的问题,没有之一。我这几年面过的人没有一千也有八百,从海外大厂回来的博士到你这样的985硕士,聊得多了,有些话也就憋不住了,今天就发帖子聊聊。
This question is undoubtedly the most anxiety-inducing one for computer science graduates this year, bar none. Over the years, I've interviewed no less than eight hundred to a thousand people, from PhDs returning from overseas tech giants to 985 Master's graduates like yourself. Having had many conversations, there are some things I can't hold back any longer, so I'll share my thoughts in this post today.
我不跟你扯那些虚头巴脑的行业报告,什么市场规模、增长率,那些东西对你找工作没半毛钱关系。我就从一个在一线带队、看简历、拍板给offer的人的角度,跟你盘盘这几条路。
I'm not going to bore you with fluffy industry reports about market size and growth rates—those have zero relevance to your job search. I'll break down these career paths from the perspective of someone who leads teams on the front lines, reviews resumes, and makes final hiring decisions.
时间坐标:2025年8月。 这个时间点很重要,因为技术风向标变得太快了,去年的答案今年可能就是个坑。
Time Coordinate: August 2025. This timestamp is crucial because technology trends shift so rapidly; last year's answer might be a pitfall this year.
核心结论:All in 多模态
Core Conclusion: All in on Multimodal
先摆结论,不卖关子:all in 多模态。
Let's get straight to the point without any suspense: all in on multimodal.
如果你的目标是就业,特别是找一份有长期价值、不容易被替代、薪资天花板还高的算法岗,就别犹豫,头也不回地扎进多模态。
If your goal is employment, especially landing an algorithm position with long-term value, low replaceability, and a high salary ceiling, don't hesitate—dive headfirst into multimodal without looking back.
为什么?我把我的逻辑掰开揉碎了讲给你听,你听完自己品。
Why? Let me dissect and explain my reasoning in detail. You can judge for yourself after listening.
三大方向现状深度剖析
In-Depth Analysis of the Current State of the Three Major Directions
1. 纯语言大模型 (LLM)
- Pure Language Large Models (LLM)
LLM现在啥情况?四个字:基建化、工程化。
What's the current state of LLMs? In four words: Infrastructuralization and Engineering.
你得明白,2025年的今天,从零开始训一个SOTA级别的基座模型,这事儿已经不是一个普通公司,甚至不是一个普通大厂能玩得起的游戏了。这是巨头的战争,是算力、数据和顶尖人才的无差别火力覆盖。OpenAI、Google、Meta,国内的几家头部,牌桌上就这么几个玩家了。
You need to understand that in 2025, training a SOTA-level foundational model from scratch is no longer a game that an ordinary company, or even an ordinary tech giant, can afford to play. This is a war among titans, a blanket coverage of computing power, data, and top-tier talent. OpenAI, Google, Meta, and a few domestic leaders—there are only a handful of players at the table.
所以,对于一个应届生,你进去能干嘛?大概率不是去设计新的TransformerA deep learning neural network architecture using self-attention mechanisms for sequence processing.架构,而是去做模型的“下游工作”。具体点:
So, what can a fresh graduate do if they enter this field? Most likely, you won't be designing new TransformerA deep learning neural network architecture using self-attention mechanisms for sequence processing. architectures; you'll be working on the "downstream tasks" of the model. To be specific:
- Fine-tuning (微调): Taking pre-trained foundational models from others and fine-tuning them with industry-specific data to solve particular business problems. There's technical depth here, but it's increasingly becoming a skilled trade.
- RAG (检索增强生成): This was all the rage last year and is now basically standard. How to create good embeddings, optimize retrieval, reduce hallucinations—there's a lot of engineering know-how here, but the space for algorithmic innovation is shrinking.
- Agent Development (Agent开发): Building various intelligent agents based on LLMs sounds cool, but it's essentially prompt engineering + tool using + a bit of planning. The core is still "using" the model, not "creating" it.
- Model Compression, Quantization, Deployment (模型压缩、量化、部署): These positions are solid, with stable demand, but lean more towards Model Engineering or MLOps, moving further away from core algorithms.
你看,纯LLM方向的算法岗,正在快速分化。一小撮人在头部公司的核心团队里继续搞模型结构、预训练算法的创新,这部分人凤毛麟角,门槛高得离谱。而大部分所谓的“LLM算法岗”,正在变得越来越“应用”,越来越“工程”。
You see, algorithm positions in pure LLM are rapidly bifurcating. A tiny fraction of people in core teams at leading companies continue to innovate on model architecture and pre-training algorithms—these individuals are rare gems with astronomically high barriers to entry. Meanwhile, the majority of so-called "LLM algorithm positions" are becoming increasingly "applied" and "engineering-focused."
薪资上,LLM岗位的下限很高,应届生拿个大白菜、sp不成问题,但天花板…说实话,有点被锁死了。因为你创造的价值,很大程度上依赖于你所使用的那个基座模型,你的“杠杆”不够长。
In terms of salary, the floor for LLM positions is high; fresh graduates can easily get a standard or special offer package. But the ceiling... frankly, is somewhat capped. Because the value you create largely depends on the foundational model you're using; your "leverage" isn't long enough.
2. 生成式模型 (AIGC)
- Generative Models (AIGC)
我得先澄清一下,AIGC这个词太宽泛了,它几乎把前面两个都包进去了。但从业内招聘的角度看,当我们特指“AIGC”方向时,通常更偏向产品和应用落地。
I need to clarify first: the term AIGC is too broad; it almost encompasses the previous two. But from an industry recruitment perspective, when we specifically refer to the "AIGC" direction, it usually leans more towards product and application implementation.
比如,你去做一个AI生成PPT的应用,一个AI生成广告视频的平台,或者一个AI辅助编程的工具。
For example, you might work on an AI-powered PPT generation app, a platform for AI-generated advertising videos, or an AI-assisted programming tool.
在这些公司里,岗位分得更细。可能有一个小团队负责维护和优化模型,但更多的人是“应用算法工程师”或者干脆就是“后端工程师”,他们的工作是把模型的能力封装成API,嵌入到业务流程里,去打磨产品体验。
In these companies, roles are more specialized. There might be a small team responsible for maintaining and optimizing the model, but more people are "Applied Algorithm Engineers" or simply "Backend Engineers." Their job is to encapsulate the model's capabilities into APIs, integrate them into business workflows, and refine the product experience.
这个方向好不好?好,离钱近,业务导向强,能快速看到自己的工作成果。但对你这种想做核心算法的人来说,可能有点“降维”。你可能会花大量时间在处理业务逻辑、数据清洗、API调试上,而不是模型本身。
Is this direction good? Yes, it's close to revenue, strongly business-oriented, and you can quickly see the results of your work. But for someone like you who wants to work on core algorithms, it might feel like a "dimensional reduction." You might spend a lot of time dealing with business logic, data cleaning, and API debugging, rather than the model itself.
AIGC方向的薪资,方差很大。在一个成功的商业化产品里,核心成员的收入(薪资+期权)可能非常高。但在一个还没找到盈利模式的创业公司里,可能还不如去大厂拧螺丝。
Salary variance in the AIGC direction is huge. In a successful commercialized product, core members' compensation (salary + options) can be very high. But in a startup that hasn't found a profitable model, it might not even compare to being a cog in a big tech machine.
3. 多模态大模型
- Multimodal Large Models
好了,说到重点了。多模态,这才是现在真正的蛮荒之地。
Alright, now we get to the main point. Multimodal—this is the true frontier right now.
为什么我这么笃定?
Why am I so certain?
第一,它是通往AGI的必经之路,是真正的技术前沿。 世界是多模态的,人类的智能也是多模态的。我们看、听、说,同时处理图像、声音、文字信息。纯文本的LLM虽然强大,也只是“缸中之脑”,它理解不了这个真实的世界。从文生图(DALL-E, Midjourney)到文生视频(SoraOpenAI开发的文生视频人工智能模型,于2024年发布。),再到未来的物理世界交互(机器人、自动驾驶),核心技术突破都必然发生在多模态领域。
First, it's the inevitable path to AGI and the true technological frontier. The world is multimodal, and human intelligence is multimodal. We see, hear, speak, and process image, sound, and text information simultaneously. Although powerful, pure-text LLMs are just "brains in a vat"; they cannot understand this real world. From text-to-image (DALL-E, Midjourney) to text-to-video (SoraOpenAI开发的文生视频人工智能模型,于2024年发布。), to future physical world interaction (robotics, autonomous driving), core technological breakthroughs will inevitably occur in the multimodal domain.
这意味着什么?意味着这里有大量的、根本性的问题还没有被解决。数据对齐、跨模态表征、长视频生成的一致性、3D世界的理解与生成…每一个都是大金矿,都可能诞生出伟大的公司和技术。
What does this mean? It means there are a vast number of fundamental problems yet to be solved here. Data alignment, cross-modal representation, consistency in long video generation, understanding and generation of the 3D world... Each is a goldmine, potentially giving birth to great companies and technologies.
在这个领域,你不是一个“使用者”,你更有可能成为一个“创造者”。你做的东西,不是对现有工作流的优化,而是创造全新的可能性。
In this field, you're not just a "user"; you're more likely to be a "creator." What you build isn't just an optimization of existing workflows; it's creating entirely new possibilities.
第二,技术壁垒高,护城河深。 搞多模态,你不仅要懂NLP,还得懂CV,甚至可能要懂图形学、语音处理。这个知识栈的要求,天然就筛掉了一大批人。LLM的很多知识,看看博客、刷刷论文、跑跑开源代码,似乎很快就能上手。但要真正理解Diffusion Model的数学原理,或者搞懂NeRF(神经辐射场)这种东西,没下苦功夫是不行的。
Second, the technical barriers are high, and the moat is deep. To work on multimodal, you need to understand not only NLP but also CV, and possibly even graphics and speech processing. This knowledge stack requirement naturally filters out a large number of people. A lot of LLM knowledge can seem accessible quickly by reading blogs, skimming papers, and running open-source code. But to truly understand the mathematical principles of Diffusion Models or grasp something like NeRF (Neural Radiance Fields) requires serious effort.
我之前团队里一个做CV的小伙,在大家一窝蜂转LLM的时候,他坐得住冷板凳,一头扎进了3D生成。当时我们都觉得这方向太窄,不好找工作。结果今年,SoraOpenAI开发的文生视频人工智能模型,于2024年发布。出来之后,所有大厂都在布局视频和3D生成,他手里的offer拿到手软,给的package比同级别的LLM岗高了至少30%。人家面试官问的都是底层细节,什么DiT架构、视频压缩网络,他都能对答如流。而很多搞LLM应用的同学,面试官问到TransformerA deep learning neural network architecture using self-attention mechanisms for sequence processing.的底层优化,就有点支支吾吾了。
A former colleague in my team who worked on CV sat through the cold bench when everyone was flocking to LLMs, diving deep into 3D generation. At the time, we all thought this direction was too niche and would make job hunting difficult. This year, after SoraOpenAI开发的文生视频人工智能模型,于2024年发布。's release, all major tech companies are investing in video and 3D generation. He received offers left and right, with packages at least 30% higher than comparable LLM positions. Interviewers asked him about low-level details like DiT architecture and video compression networks, and he answered fluently. Meanwhile, many classmates working on LLM applications became hesitant when interviewers asked about low-level TransformerA deep learning neural network architecture using self-attention mechanisms for sequence processing. optimizations.
这就是壁垒。当潮水褪去,那些只会在岸边捡贝壳的人就尴尬了,而那些学会了深潜的人,才能拿到真正的宝藏。
This is the barrier. When the tide goes out, those who only know how to pick up shells on the shore are left embarrassed, while those who learned to dive deep can obtain the real treasure.
第三,岗位需求正在爆发,但合格的人才供给严重不足。 现在打开招聘软件看看,搜“多模态算法”,你会发现很多岗位要求都非常高,而且薪资范围也给得非常宽。这说明什么?说明用人单位自己都清楚,这个方向的人不好招。他们愿意为真正懂技术的人才付出高昂的溢价。
Third, job demand is exploding, but the supply of qualified talent is severely insufficient. Open a recruitment app now and search for "multimodal algorithm." You'll find many positions have very high requirements and offer very wide salary ranges. What does this mean? It means employers themselves know that people in this direction are hard to find. They are willing to pay a high premium for truly technically proficient talent.
LLM方向呢?海量的求职者,大家都会用LangChain,都会做RAG,简历看起来都差不多,那凭什么给你高薪?只能卷项目、卷实习、卷学历。
What about the LLM direction? A massive number of job seekers, everyone knows how to use LangChain, everyone has done RAG, resumes all look similar—so why give you a high salary? You can only compete on projects, internships, and academic pedigree.
给求职者的务实建议
Practical Advice for Job Seekers
如果你想清楚了要搞多模态,下面这几条你听清楚,不是什么人生哲理,就是几条能让你少走弯路的“土办法”。
If you've decided to pursue multimodal, listen carefully to the following points. They're not life philosophies, just a few "practical methods" to help you avoid detours.
第一,别把看论文当学习,那顶多算“信息录入”。 看论文这事儿最容易自欺欺人。收藏夹里存个几百篇,感觉自己懂得挺多,面试官一问细节,支支吾吾。这没用。你得找个东西“刻”在脑子里。
First, don't mistake reading papers for learning; that's at best "information input." Reading papers is the easiest way to fool yourself. Saving hundreds of papers in your bookmarks makes you feel knowledgeable, but when an interviewer asks for details, you hesitate. That's useless. You need to "engrave" something in your mind.
怎么刻?你甭管别的,就盯住一个方向,比如现在最火的视频生成。把SoraOpenAI开发的文生视频人工智能模型,于2024年发布。的技术报告(假如它细节公布了的话)或者相关的开源实现,比如Open-SoraOpenAI开发的文生视频人工智能模型,于2024年发布。、Latte这种,给我当成你毕业设计的代码那么去读。
How to engrave? Forget everything else, focus on one direction, like the currently hottest topic of video generation. Take SoraOpenAI开发的文生视频人工智能模型,于2024年发布。's technical report (if its details are released) or related open-source implementations like Open-SoraOpenAI开发的文生视频人工智能模型,于2024年发布。 or Latte, and study them as if they were the code for your graduation project.
你得搞清楚:
- 人家的数据是怎么洗的?怎么切成patch的?为什么这么切?
- 那个DiT(Diffusion TransformerA deep learning neural network architecture using self-attention mechanisms for sequence processing.)到底是怎么把文本条件加进去的?代码里哪几行是干这个的?
- 跑起来,你肯定会遇到各种坑,显存爆炸、loss是NaN、生成的东西一坨屎。你就去debug,去一行一行地看,去Github issue里跟人撕逼讨论。这个过程,比你看一百篇论文的摘要都有用。
You need to figure out:
- How was their data cleaned? How were patches cut? Why cut them that way?
- How exactly does that DiT (Diffusion TransformerA deep learning neural network architecture using self-attention mechanisms for sequence processing.) incorporate text conditions? Which lines of code do that?
- Run it. You will definitely encounter various pitfalls: GPU memory explosions, NaN losses, generated outputs that are garbage. Go debug, look at it line by line, argue and discuss in Github issues. This process is more useful than reading the abstracts of a hundred papers.
你得自己动手把一个东西从头到尾跑通,甚至魔改一下。没卡就去租,现在AutoDL、恒源云这种平台一天几十块钱就能搞个不错的卡,这顿饭钱你必须得花,不然你简历上那个“熟悉xxx模型”就是一句空话,一戳就破。
You need to personally run something from start to finish, even modify it. If you don't have a GPU, rent one. Platforms like AutoDL or Hengyuan Cloud offer decent cards for tens of RMB per day. You must spend this meal money; otherwise, that "familiar with xxx model" on your resume is just empty talk, easily exposed.
第二,数学和基础别丢。各种新奇的开源项目是好看,但不扎实的数学就像空心楼梯,踩几步就塌。 很多人把公式挂嘴边,显得自己很牛。其实大部分时候,我们根本不需要从头推导一个什么玩意儿。那为啥还要啃数学?
Second, don't neglect mathematics and fundamentals. All sorts of novel open-source projects look impressive, but shaky math is like a hollow staircase—it collapses after a few steps. Many people throw formulas around to seem impressive. In reality, most of the time, we don't need to derive something from scratch. So why bother with math?
是为了让你在模型出问题的时候,能有方向地去猜,而不是抓瞎。
It's to give you a direction to guess when the model has problems, rather than being clueless.
举个例子,你训个diffusion model,结果生成出来的全是纯纯的噪声,半点图像的影子都没有。这时候你怎么办?如果你懂那个加噪去噪的数学过程,你至少能怀疑几个点:是不是我的time embedding出问题了?是不是U-Net的结构没把噪声和条件信息给解耦开?是不是我用的scheduler在推理的时候步子迈得太大了?
For example, you train a diffusion model, and the generated output is pure noise with no trace of an
常见问题(FAQ)
AI毕业生找工作,为什么作者强烈推荐多模态方向而不是纯LLM?
作者基于一线招聘经验指出,纯LLM岗位已高度工程化,多数工作集中于微调、RAG等下游应用,创新空间有限。而多模态方向技术护城河更深,长期价值和薪酬潜力更高。
多模态AI相比AIGC应用岗位有哪些具体优势?
多模态AI涉及跨模态理解与生成的核心算法创新,技术壁垒更高,不易被替代。而AIGC应用岗位更多是工具使用和工程实现,算法深度较浅,长期竞争力不足。
对于2025年求职的AI硕士,选择多模态方向最实际的理由是什么?
多模态是当前技术前沿,头部企业集中投入,能提供核心算法研发机会。相比纯LLM的工程化趋势和AIGC的应用层竞争,多模态兼具高成长性、高薪资天花板和强不可替代性。
版权与免责声明:本文仅用于信息分享与交流,不构成任何形式的法律、投资、医疗或其他专业建议,也不构成对任何结果的承诺或保证。
文中提及的商标、品牌、Logo、产品名称及相关图片/素材,其权利归各自合法权利人所有。本站内容可能基于公开资料整理,亦可能使用 AI 辅助生成或润色;我们尽力确保准确与合规,但不保证完整性、时效性与适用性,请读者自行甄别并以官方信息为准。
若本文内容或素材涉嫌侵权、隐私不当或存在错误,请相关权利人/当事人联系本站,我们将及时核实并采取删除、修正或下架等处理措施。 也请勿在评论或联系信息中提交身份证号、手机号、住址等个人敏感信息。