It’s year 2024–––only three years since LLMs become (somewhat) general-purpose and can follow instructions. Who would have guessed how much AI has shaken the whole world to the point that Nobel Prize in Physics was awarded for “laying the foundation for today’s powerful machine learning”, and Nobel Prize in Chemistry was awarded for “using AI to predict proteins’ complex structures”?
It is kind of surreal for me being part of this revolution. In this post, I want to share some of my observations.
Personal Background (Feel free to skip this)
Before I started my PhD, I was working on semantics. I was not really satisfied with my work, because in the back of my mind, I couldn’t see how my work that leads to linguistic-driven solutions1 can scale up and solve any NLP problems in any language.2 Note that linguistic tasks also often don’t come with enough data, so training models on linguistic data does not give any significant performance gains–––in today’s terms, they don’t pass the vibe check.
Anyone who transitioned from computational linguistics into LLMs scaling during the time (around year 2020-2022) must have come across the Stochastic Parrot paper and the hype around linguistic-driven language modeling. At that time, I was really confused on whether I should work with LLMs. Many linguists were saying that “language models are simply regurgitating what they are trained on”, that “next-token prediction cannot learn the language (and therefore cannot do incredible things like humans do)“.
Ultimately, I decided to lean fully into LLMs and believe the magic of scaling. My decision eventually comes down to “if AlexNet can replace hand-engineered kernels, large language models can do the same”. You might think it is a no-brainer from today’s point of view3, but at that time, I was really conflicted when I made this conscious decision.
Observation 1: You will have to work on hyped stuff anyway, if the hype is in the right direction.
What is hyped stuff? Those stuff that you know can be easily classified as one of the viral topics on X that people cannot stop talking about. The cornerstone paper on the topic had blown up and gained thousands of citations in less than a year (think about T0/Flan for instruction-following, GPT-3 for in-context learning, chain-of-thought for reasoning, etc.)
Voices about on hyped stuff are often loud for a period of time, and things move really fast and are competitive. Hyped stuff usually catches attention of people who have never worked on it directly before, and yet they turned into huge proponents for it in academic social circle after a short while.
LLMs were hyped stuff in 2021 and are commodity now. Everyone now works with LLMs to some degree, regardless of how much they believed in LLMs in pre-ChatGPT era. I’d say now there are two groups of people: (1) those who are excited to work on it (usually early adopters or (2) those who begrudgingly have to work on it (because of reviewers #2 and where funding comes from).
Info
This is my bitter observation. If the hype train is right, you will have to directly join it anyway if you want to stay relevant in AI research. So why not join early? Hence, study the trend, think two steps ahead, and iterate fast (Omar Khattab has really good advice on this) in the direction of hype train (that you think is the right direction).
Observation 2: Hype train is never crowded. There’s always room for more research.
One common argument against working on hyped stuff is that it is very easy to get scooped. Yes that is inevitable. My multilingual toxicity work (where we are trying to collect multilingual prompts that elicit toxicity) got scooped by Microsoft and AllenAi–––they released their work around the same time, and we were only halfway to completion so we had to halt our project entirely.
But I think people who complain about being scooped and hence shy away from hyped stuff miss the big picture. Hyped stuff is nascent by nature. There are exponentially more things remain unexplored and you can always find another research problem to pivot into.
In my case, we stopped our project but the project had enabled me to fully understand what problems haven’t been studied and what resources are available to research on, so I quickly bang out one paper in a month, which got into *ACL findings.
Observation 3: You cannot really predict whether a low-hanging fruit is really low-hanging.
In my opinion, in a well-studied area (usually opposite of hyped stuff), low-hanging fruit has capped value because high-value research problems have mostly been figured out. But in hyped area where many things are unexplored–––many people called the obvious next steps as low-hanging fruits–––there’s really no low-hanging fruit because doing those grunt work and promoting your findings in reality allow people to immediately explore the next bigger thing.
An analogy here is that you are being on an literally unchartered territory. Any direction that is well-motivated and well-reasoned, even if it seems to be the obvious low-hanging fruit to work on, is intrinsically valuable because now others know where to steer into. The most important aspect here is to move fast and communicate well.
For instance, multilingual jailbreaks are in my opinion low-hanging fruits in 2023–––somebody eventually gotta investigate4 if these safety guardrails hold up in non-English settings because more than half of the global population speaks languages aside English. This paper simply translated malicious prompts with Google Translate into non-English to test the robustness of GPT-4’s guardrails. The method was so stupidly simple, and yet the implications for global AI users were significant because the safeguards are not robust across languages.
Info
There are two takeaways here:
- Do the obvious next steps, because what’s obvious to you is not necessarily obvious to the others. Then, invest time in communicating the significance of the work. Make sure the people who will build on your work see your work, so the field can move to the next (and hopefully less obvious) research questions.5
- Even hyped research area works super slow on cutting edge problems. Multilingual jailbreak papers came out around 8 months after jailbreaking become popularized.
Observation 4: You might come across naysayers who put you down (to the point you feel personal.)
Sometimes you might come across people who make you feel like what you work on is meaningless. I personally came across one highly-regarded researcher at NeurIPS 2023 who told me AI safety redteaming is nonsense work6 when I asked for research career advice. On that day, I felt like shit, and it was really hard not to take it personally, especially because that conversation took place in front of a big group of people.
I’ve moved on since then because I’ve gotten enough positive feedback that what I work on is useful and more people/industry labs are investing in AI safety. My biggest takeaway is that when you work on hyped stuff, you have significantly higher chances in encountering negative feedback and strong pushbacks because working on hyped stuff requires you to be both an early adopter and an active promoter of your work so as to feed the hype.
You encounter naysayers because you work in an extremely visible area. People will form extreme opinions and point it out to you (sometimes impolitely) because (1) either you are objectively wrong (well, not all hyped stuff will work) or (2) they didn’t see the same value you see in your pursuits. Often, you see this manifests as conversations spiraling into drama on social media. In either case, you need to develop mental fortitude and remember that nobody truly knows what works or fundamentally valuable (or we would have scaled language models much earlier).
Sometimes, people just don’t get it no matter how hard you try explaining the significance of your research direction. It is okay. AI research is to a great extent empirical in nature, so let results do the talk.
Info
Don’t take things too personally. If you happen to hop on the wrong hype train, learn from your mistakes and see where your reasoning went wrong. If you are right, figure out where the naysayers came from and learn from their mistakes.
Footnotes
-
I refer to “linguistic-driven” as in manually baking in linguistic rules (aka injecting explicit inductive bias) into language models. ↩
-
I firmly believe that for AI to be truly useful–––or to achieve AGI/ASI–––it shouldn’t have any language barrier. Advancement in technology in a small subset of langauge leads to socioeconomic divide at a global scale. ↩
-
Felix’s notes: “If it’s possible for a human domain expert to discover from the data the basis of a useful inductive bias, it should be obvious for your model to learn about it too, so no need for that inductive bias.” ↩
-
During the same period, there were three other papers released and submitted to ICLR 2024 about the same multilingual jailbreaks [1, 2, 3] ↩
-
(Footnote written on Oct 11th, 2024) I am not being crazy here. Just found out today that Terrence Tao made the same comments about what one should do (although with different motivation): “spend most of your time on more feasible “low-hanging fruit”. ↩
-
That’s what I recall from my fuzzy memory. Probably not verbatim but definitely along the line. ↩