Joan Graça is Ann Babel.
This year marks the 20th anniversary of my career in natural language processing, now commonly known as “AI.” Here I’d like to share some of the key lessons I’ve learned during that time, both as a researcher and as the person responsible for building commercial products.
Lesson 1: Be realistic – use what you already have and build around it
In 2013, I co-founded Unbabel with the idea of becoming the “translation layer of the world” and solving large-scale multilingual translation by combining AI and human effort. By sending machine-translated (MT) chunks to bilinguals for approval or correction via a mobile interface, we aimed to achieve higher quality translations at a lower cost and faster than pure human translation or pure MT.
To achieve this, we needed access to the internals of the translation models, which third-party providers like Google and Microsoft did not allow, so we built our own MT model based on the open-source Moses Statistical MT library that I contributed to during my PhD. It was not as advanced as Google’s, but it was our model.
In the end, we developed our own library from scratch, but Moses helped us go from idea to proof of concept to validation with our first customers, and ultimately led us to become the first Portuguese company accepted into Y Combinator (Winter 2013).
So the first lesson is to be realistic. Leverage existing resources and build around them. Focus on quickly proving product-market fit and develop your own AI only if it is core to the service you offer. There are many tools and resources available for free today, so take advantage of them and differentiate from them.
Lesson 2: Partner with academia to build capacity
After Y Combinator, we needed to move beyond the prototype stage and build a scalable solution. We built dozens of models for different language pairs and customers, but we needed to solve a number of research challenges to stay ahead.
We didn’t have the wherewithal to do so, so we decided to partner with universities doing related research and propose big, difficult, and bold challenges that would interest bright graduate students to solve real customer problems.
It was a win-win for the students, the university and us, as we were able to expand our research team through co-supervising master’s and doctoral theses, which led to even larger research projects, creating a flywheel effect that resulted in dozens of papers and product innovations that have underpinned our business ever since.
The lesson here is to partner with academia to build capacity – a mutually beneficial arrangement that fosters success.
Lesson 3: AI is only as good as your customers can use it
While our growing research team was developing cutting-edge research and presenting at top conferences, these results were not benefiting customers as intended because there was a gap between our product engineering and research teams. Misaligned priorities, working methods, tools, and incentives were the main issues.
Our initial approach was to integrate researchers into cross-functional product teams, but this pressured them to focus on short-term goals and lost sight of the long-term. We then split researchers into AI research and applied AI teams, but were left with conflicting priorities, unproductionable research code, and disparate infrastructure and datasets.
Since then, we have filled most of the gaps by sharing our roadmap, datasets, codebase, benchmarks, hardware infrastructure, and aligned incentives. In fact, our research team’s success metric is the percentage of words translated in production by our models, which allows our product teams to deliver the best quality to their customers.
Charlie Munger once said, “Show me the incentive, and I will show you the results.”
Lesson 4: Aligning AI research with business needs
Unbabel initially focused on translating millions of customer service emails. As their MT engines improved, they realized that while humans didn’t always need to be involved, they needed a way to determine when they should.
Quality estimation (QE) research, which predicts the effort required to bring machine translated text up to human translation quality, is an emerging research field that was a perfect fit for our needs.
We formed a QE team and used historical data to train a model that exceeded expectations and quickly became the best QE system in the world for the past eight years at the annual World Machine Translation Shared Task Competition.
The lesson here is to align business needs with research opportunities. If you can find this intersection, you will discover cutting-edge value and stay ahead of the curve.
Lesson 5: You can always stand out from the big tech companies
After ChatGPT was released, the biggest question I had to answer was what would become of Unbabel: “Can’t we just do it all with LLM and prompts?”
This question makes me laugh because it’s not that different from the question people asked us when we started the company in 2013: “Hasn’t Google Translate already solved this problem?”
The answer to both questions is no. Building a production system to solve a specific customer problem requires more than meets the eye. Over the course of a decade, we’ve realized there are three ways we can differentiate ourselves from the tech giants:
1. CostCutting edge models will be commercialized soon, so you can take advantage of them for free and focus on creating professional, profitable value.
2. Quality: By focusing on your specific use case and customer data set, you can fine-tune your models to a level that no one else can. Your own data and models provide higher quality than a one-size-fits-all platform.
3. Privacy and Security: Many companies aren’t interested in handing over their sensitive data to tech giants. If you can help your customers protect their data and maximize its value, you could build an incredibly successful business.
We often hear “AI has no moat,” yet these are key differentiators for building a world-leading business.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Am I eligible?