Clarify: Designing User<>AI Collaboration at DeepL
Company:
DeepL
Role:
Senior Product Designer
Key responsibilities:
Model-User interaction design
Product design & design strategy
Experiment design and releases
Synopsis:
Clarify is an interactive feature in DeepL Translator that acts like a “language-expert assistant.” Rather than simply producing a translation, Clarify detects ambiguous or context-sensitive parts of the translation.
Problem space
Every translation makes assumptions. Traditional machine translation lacks necessary user input required for precision, adaptability, and contextual understanding. The ambiguities such as gender, idioms, or specialised terms can result in confusing or misleading translations, especially for non-expert users.
Opportunities
Machine translation makes assumptions
Users rely on machine translation but often find that ambiguous phrases or words with multiple meanings go unnoticed, leading to severe miscommunication
Users usually need high proficiency in target language to refine the translation
Without expert knowledge in the target language, users struggle to grasp the quality of the translation, resulting in the lack of confidence in utilising the translation
Editing translation can be costly and cumbersome
Manually adjusting translations or rewriting phrases to avoid ambiguity adds extra steps, making the experience frustrating, inefficient, and costly especially in business settings
Users in specialised fields lack terminology support
Translating industry-specific terms such as those in legal, medical, or technical industries require professionals to cross-check and correct translations, increasing the cost and time for business
The solution
Clarify acts as an expert assistant that asks users for more context. The AI detects ambiguous or unclear parts of users' input text (e.g. phrases with multiple meanings, idioms, dates, gendered terms, specialised vocabulary) and highlights those parts and asks for more contexts. Based on the users' answers, the model refines the translation so it better reflects the intended meaning and context.
Impact
Improved translation reliability for enterprise customers with multiple reports of increased confidence in the translation
Enhanced DeepL’s value proposition by offering a differentiated interactive translation experience that competitors lack
Provided a scalable UX model for disambiguation workflows across future language pairs.
Strengthened collaboration between design, linguistics, and ML teams, creating a repeatable framework for evaluating ambiguity in translation
How we did it
Signal scanning
Through scanning the pool of customer support tickets and the interview notes, we identified that no matter how accurate the translation the machine produces, it will always lack contextual clues that only human users can provide. These insights helped establish the need for a system that could involve users in clarifying intent, rather than relying solely on AI guesses.
Conceptualise
Since the model could detect any ambiguity, the challenge was to create a system that'd match how the users view potential mistakes in their translation. At this stage, I focused on analysing and testing the patterns in translation errors.
Some of the key categories are:
Gender
Idioms
Formatting
Culturally-specific terms
Prototyping & user testing
I developed early prototypes and conducted customer interviews to collect early feedback. The learnings were shared with leadership stakeholders and AI scientists in order to improve the model to match user mental models.
Internal release
Before a public release, we conducted an internal launch to gather insights on:
Usability: Was the feature intuitive and non-disruptive?
Value: Did Clarify improve translation accuracy and was the effort required justifiable?
Scalability: Could the AI model efficiently handle a range of clarifications without overloading users?
As well as the previous stage, I synthesised and shared the feedback from potential users with ML researchers to improve the AI model to best mach the users' expectations.
5. Experiment design & release
We identified key success metrics and create an experimentation plan to measure the impact for the launch.
The experiment was designed to track both quantitative metrics, such as engagement rates, and qualitative feedback, assessing how intuitive and helpful users found the feature through the survey.
First release
Next steps
Monitor post-launch metrics & user feedback – Continuously track engagement, error rates, and qualitative user feedback to identify areas for improvement
Collaborate with ML scientists to optimise AI models – Refine the AI’s ability to detect ambiguity and improve contextual recommendations
Iterate on UI/UX for a more effective and time-saving userflow – Address frictions to make the interactions more intuitive and integrated into workflows.
Scale to a broader user base – Gradually expand availability to more users and more languages
Second iteration
By the time of the second release, the web translator has undergone a few significant changes. The changes did not only impact the interface but also multiple underlying user flows, including Clarify.
Clarify changed the way we approached product design at DeepL. It paved a way for future AI-driven solutions that could solve user problems in even more efficient and effective ways. To ensure scalability of continuously added AI functionalities, I decided move the Clarify feature into the contextual menu instead of the original side panel. Though the decision was based on the internal needs in terms of future scalability of the interface, the initial feedback from internal release was generally very positive.



