Building AI prototypes in a 180-year-old newsroom

Can generative AI meet The Economist's quality standards? As AI Product Design Lead, I built three prototypes exploring how large language models impact reader experiences. I worked with editors, data journalists, NLP engineers, full-stack engineers and content designers. The projects tested proof-of-concept development, live prototyping and rapid iteration.

Each experiment tested specific hypotheses. The results fed directly into production features: semantic search, article summaries and AI labels. See AI in production for details.

Developing editorial pipelines

DeepSet AI, the Berlin-based AI startup, let us test prompts, models and outputs together. Their platform provided modular, versioned experiments with grounding and style. A simple bullet list of abstract rules failed to produce outputs in the brand's voice.

I used NotebookLM and AI Studio to extract rules and examples from the Economist's PDF style guide. I then used Deepset's prompt builder, Haystack, YAML and Jinja2 to develop system instructions with few-shot examples and guardrails. I evaluated combinations of prompt versions and models for accuracy, style, tone and length. This was the basis for all subsequent prompt development.

Ask The Economist

Duration: One quarter | User tested with Economist's "subscriber-only" events

Six months before I joined the AI Product team, senior leadership presented an AI concept to the board. The tool would let audiences query The Economist's journalism before, during and after subscriber-only events. Senior leadership asked my team to build a prototype for the live event experience.

The Economist ran a pilot with Contextual AI, the California-based start-up. Their RAG 2.0 product powered a chatbot grounded in published articles related to the event topic. Users could ask questions and view links to up to three cited sources. Early demos looked good but needed safety rules, quality checks and The Economist's tone.

Contextual AI built the evals and guardrails using their proprietary software. Before each event, we interviewed the topic editor and created a golden-set of questions and answers. We combined this with the Economist style guide to improve chatbot responses.

Hands on with the interface

I needed to design a branded experience on top of the existing subscriber-only events page, without requiring a full rebuild. I worked with Contextual AI and internal engineers to develop UI to test with subscribers. We agreed on states, interactions and API responses in our weekly check-ins. I used mockups to move discussions forward and help engineers decide on technical approaches. I worked with the research lead and product manager to develop a research plan. We used it to choose which features to prioritise.

As the user test date approached, senior leadership sent multiple last-minute requests and changes. Relaying these as UI feedback to engineers was inefficient. They were busy building the back-end.

I asked the Engineering Manager, Lead and Engineers if I could submit front-end code changes directly. They agreed. I knew the codebase well enough to add UI details I would have cut otherwise. I added display logic, typography scaling and transitions. I raised pull requests and they invited me to review theirs. We shared the work, fixed bugs and delivered quickly.

User testing insights

We ran two user tests with 10 participants each. The first tested the feature during a live event followed by a group video call where participants could provide feedback. In the second, participants watched a recording.

Participants found the interface intuitive and in line with the brand. They compared the feature to search, allowing them to find related articles on a topic. Participants felt free to ask "dumb questions" they would otherwise avoid asking publicly. They raised three issues:

When an answer pulled from several chunks of a single article, it appeared multiple times in the citation. Participants did not understand this. They assumed it was a duplication error.
Some sources were out of date.
The answers were verbose. Reading them distracted from the ongoing event.

What we learned and what it changed

An intuitive search functionality across the platform would be useful. A dedicated live event AI question-and-answer product was not the right approach. The value for subscribers did not justify the work needed to prepare for each event. The chatbot's UI was small to fit alongside the video on screen. This reduced white space and made lengthy text responses hard to read.

AI should not distract from the primary activity on a page. This became a key principle as I designed AI features in production. I used interface patterns that clearly separate AI output from the underlying editorial. For example, the AI article summaries open within a tray and modal overlay. When triggered, the user can focus on AI output. When ready, they can return to the default reading mode.

Editors and designers strip articles to essential information. Anything added must earn its place. Poorly conceived AI elements undermine this effort.

Design with editorial in mind first. They champion an initiative once familiar with its value and confident in its quality. Working as a satellite team means aligning with wider initiatives or solving another team's roadmap.

Topic Summaries

Duration: Three sprints | Concept tested

In August 2024, we ran in-depth interviews with senior-level B2C subscribers. As part of the session we concept tested eight ideas for desirability. Participants most preferred "Topic summarisation: ask for any topic you are interested in to be summarised based on the Economist archive". They could imagine many personal and professional use cases and saw it as a time-saver.

Users appreciated the range of formats displayed in the visuals and indicated that the concept could increase their depth and breadth of engagement with our content. Participants assumed the system would work similar to ChatGPT and claimed this feature had monetary value above their standard subscription.

When the team picked up the topic summary project I started with market and competitor research. I also analysed the last four print editions of the newspaper. In addition to a standard bullet-point summary, I discovered many alternative styles and formats that help readers understand a topic or catch-up.

Format exploration and editorial alignment

I created a document listing these extensively and shared this with the product team and stakeholders for comment. It included:

time-based summaries (catch me up since last visit, last seven days, last month, since last edition, timelines),
entities (people, organisations, countries, regions, markets) and
formats (questions and answers, timelines).

We used this to create concrete priorities for exploration and testing. The top candidates were time-based summaries, location-based summaries, FAQs and timelines.

We were asked to explore "The war in Ukraine" and "US elections 2024" as test topics. To avoid ideas being rejected due to errors in the content, I used AI Studio and NotebookLM to mock up summaries of manually selected articles from recent editions. I shared the articles, outputs and prompts with our NLP engineer, who was working in parallel to develop topic summarisation pipelines using Deepset AI.

I used these mocks to design the four priority approaches. I worked with the web and app designer teams to align on placement, typography and templates.

I presented the designs to editorial. The interface and concepts were well received and they suggested potential use cases. However, the chosen topics were too fast-moving and sensitive for external testing. Given our editorial policy, every generative AI output requires a human editor. The editorial team could not manage this volume and the summaries would quickly become outdated. Instead of developing this as a user-facing AI tool, editorial advised focusing on internal tooling first.

Learnings that shaped production

Subscribers want flexible summarisation in format and scope. Depending on the story, they expect multiple content types: multimedia, maps and timelines. This revealed a gap between current capabilities (text-based RAG) and what subscribers and editorial valued.

Sensitive topics are poor testing grounds for experimental AI. Fast-moving, high-stakes stories need constant editorial oversight. AI must provide value to the newsroom without increasing workload.

Summaries require a canvas not a single template. Rich, multi-format outputs need dedicated space.

This work stress-tested how we displayed AI outputs. I used applied these learnings to iterate on production designs for article summaries. I pivoted towards a canvas approach that could handle variable content: article summaries, topic summaries and future AI outputs. I used this to shape the design specs for the AI label, tray and modal across the platforms.

The findings confirmed that capabilities should power multiple experiences. The prompts I developed manually informed the NLP team's production work. Engineers built this into the API that served web, mobile apps, newsletters and other touchpoints.

Archive 1945

Two sprints | Special project exploration

"From January to August, 80 years on from the end of the second world war, this project has been republishing excerpts from The Economist's archive, week by week as the fighting at last reached its conclusion. It is a time capsule of how we reported on the war's final months. It combines original reporting from our archive with guest essays by historians, video, photographs, maps, charts and more."
— https://www.economist.com/interactive/archive-1945

In an editorial planning meeting for special project "Archive 1945", there was a discussion around how generative AI could support the project. An early idea was a chatbot, allowing readers ask questions and get answers from the archive.

When I was asked to explore ideas from a design perspective, I wanted to understand the ambitions for the project, as well as the style and tone. I met with the editorial designers, creative director and editors. They shared how they planned to produce the weekly updates and the process.

While our NLP engineers were working on adding the archive scans to Deepset AI, I focused on the role generative AI could play in this context. I researched online archives and how other news organisations have experimented with surfacing historical reporting. I worked on the Guardian's bicentenary celebrations, so I had some familiarity with the subject.

I found themes around annotation, modern perspectives and visual interactivity. When I shared the research with the digital editor, he agreed this was an opportunity to bring the audience closer to the source material.

The visual retrieval pivot

After ChatGPT launched, text-based chatbot interfaces became the default executions for LLMs. In this context, the project required more of an emphasis on the retrieval part of RAG. Give readers access to our archive and a way to ask questions in natural language. Instead of an answer engine, the system could return scanned clippings and page excerpts. A guided archive exploration of the original artefacts with page layout, surrounding articles and period advertising intact.

This created a risk to the editorial brand. WWII-era reporting includes outdated terms, language and framing. Presenting archival text as plain chatbot output would reproduce these without context, could offend readers and impact reader trust.

I proposed an alternative for the experience. Primarily, responses focus on surfacing the original scan: a transparent, authentic record of what was published. Editorial commentary can then be used to introduce and provide modern framing. It can annotate terminology that is no longer used, explain what was believed at the time versus what is known now and highlight bias or limitations in original reporting. I worked with editorial stakeholders to develop and present the mockups of this experience.

Editorial liked the archival scan approach. Their workflow for assembling weekly clippings was highly manual and they saw value in this as an internal tool.

Given the Archive 1945 was already live and the timing of the exploration, the concept did not move to delivery. The ingestion of articles was text-based and the AI Product team could not yet retrieve multimedia.

What we learned and what it changed

Generative AI does not have to be a chatbot. In editorial environments, it can elevate source material instead of replacing it. This "national archive feel" made editorial far more comfortable with sensitive archives. It paired original artefacts with modern framing, protecting readers and brand integrity.

AI concepts need early alignment. Editorial intent, publishing cadence and product roadmaps impact the success of concept exploration. By designing with editorial first, we can develop AI that earns its place on the page.

This exploration led to a NotebookLM internal tool for exploring the archive.

It also reinforced a broader design direction I carried into production projects. Create canvases that support variable AI outputs (scans, excerpts, summaries, commentary) rather than assuming the output is always plain text.

The two-layer pattern (verbatim extracts plus generated context) became a principle for handling sensitive or historic content. This influenced how we approached transparency and attribution in production AI features.

From experiments to production

These experiments informed the AI features that shipped. The work established design principles for all subsequent AI development.

The experiments helped me build relationships with engineers and stakeholders. They invited me into new projects and discussions.