Case Study · TalkingPoints

AI/ML Data Operations

Infrastructure to support, evaluate, and control AI features at scale.

2024–2026·AI Ops & ML Infrastructure

The Context

As TalkingPoints added AI features—message classification, translation quality estimation, AI-powered intervention cards—I built the infrastructure to support and evaluate them.

Message Classification

The need was clear—teams wanted to understand what kinds of messages were flowing through the platform—but there was no clear path to get there. I proposed the approach, designed scalable MVPs, mapped out the milestones, and stewarded it to production over months. Now it's running and expanding.

Machine Translation Quality Estimation

Built a Streamlit app so the translation team could evaluate Machine Translation output quality without waiting on engineering. They can use LLM-as-a-judge for automated evaluation, change the evaluation prompt, test on different scales (binary, three-point, five-point, Likert), run evaluation against a data subset, and generate reports automatically.

All self-service. Non-technical product people evaluating their own AI tools without depending on the technical team.

Cost Controls and Safeguards

AI features have a cost per call. I added safeguards to models using Snowflake Cortex so they wouldn't accidentally run on large datasets and blow the budget. Built evaluation frameworks that make accuracy vs. cost tradeoffs visible—so stakeholders can decide what tradeoff they want, instead of that decision happening invisibly in the engineering layer.

The Philosophy

Every AI feature has an accuracy level, a latency, and a cost per unit. Build dashboards that show all three. Empower non-technical teams to evaluate AI tools themselves. The goal isn't to be the bottleneck—it's to build systems that work without you.

Impact

Message classification in production after months of stewarding
Self-service MT evaluation for translation team
Cost safeguards preventing accidental large dataset processing
Visible tradeoffs for stakeholder decision-making

Technologies

Snowflake CortexStreamlitPythonNLPdbtLLM-as-a-Judge

← Impact Research TalkingPoints Overview →