Case Study Overview
Four Years at TalkingPoints
From building data infrastructure from scratch to designing AI evaluation systems—four years of work at an EdTech company enabling multilingual school-family communication.
The Company
TalkingPoints enables multilingual communication between schools and families. 145+ languages, billions of messages exchanged. Teachers send messages, families receive them in their home language and can respond the same way.
I joined the data team in April 2022 as the second data engineer. Over the next four years, I touched almost every part of the data stack—infrastructure, analytics, integrations, new products, and research.
The Work
Six areas of work, each with its own case study. Click through for the full story.
2022–2026
Built from scratch (-ish). Hevo → (a few other things 😬) → Fivetran, Medallion architecture with dbt, CI/CD on GitLab, 70% Snowflake cost reduction.
2022–2026
In-app analytics consulting, custom enterprise partner Tableau dashboards, internal self-service via Sigma and Streamlit. Laid the foundation for AI-powered analytics starting with an MCP server that let stakeholders chat directly with our warehouse.
2022–2026
Salesforce, PlanHat, Customer.io, Intercom/Dixa. Fuzzy matching with Jaro-Winkler. Philosophy: minimize friction for humans, don't replace them.
2024–2026
Data Rules Everything Around Me. New product vertical built from scratch. FastAPI, Pydantic, Kinesis Firehose. Architecture that enabled AI-assisted development.
2025–2026
Research infrastructure for impact analysis. Clio-inspired conversation analysis pipeline. Dialogue segmentation, utterance role labeling, privacy-preserving NLP at scale.
2024–2026
Message classification stewarded to production. Machine Translation Quality Estimation app for self-service evaluation. Cost safeguards and visible tradeoffs.
Highlights
- New product vertical enabled—built the data layer for Attendance, from real-time streaming ingestion to API delivery
- Research infrastructure built to test claims, not just support them—from traditional impact studies across tens of thousands of students to AI-powered conversation analysis pipelines
- Conversation analysis pipeline processing millions of multilingual messages with privacy-preserving NLP
- 10+ external systems integrated into a unified data platform—CRM, support, marketing, analytics, streaming
- Second data engineer when the team was six months old—built foundational infrastructure that scaled with the company
- AI/ML evaluation infrastructure—message classification pipeline, conversation segmentation, MT quality estimation app, cost safeguards. Built so non-technical teams could assess and control AI features themselves
- Custom translation models for low-resource languages like Capoverdian Creole and Karen—trained in-house on millions of educational translations, because no off-the-shelf API could serve these communities
What I Learned
Build so others can do it without you. When partner success can query the warehouse through an MCP server, when a principal can see their metrics without waiting on anyone—that's technology doing its job. Building things that depend on you is almost selfish. You're not that important; the mission is.
Be honest about what the data can and can't prove. There's pressure to show impact—especially when you believe in the mission. But if the numbers don't hold up when you actually run them, you say so. If you don't build the data foundations right, there are things you simply cannot prove later. Invest in data early. Let data people do the data work.
The messy middle is where the work happens. There's no right answer for a lot of things in tech—especially when you're doing something no one else has done. You try stuff. You validate it. Each iteration, you think about what worked and what didn't. That's how you learn. That's how you end up with the right solution. As Adam Savage put it: "The difference between screwing around and science is writing it down."
Access isn't a feature, it's the product. If a language doesn't have good machine translation, those families get left out. If a tool requires SQL, most of the team can't use it. In nonprofit tech, you can either make things better or make things worse. Make them better for everyone.
Reduce friction, don't replace humans. Fuzzy matching so someone doesn't have to scroll through 500 schools to find a match. An MCP server so product can ask questions without filing a ticket. The human still decides—they just spend less time on the tedious parts.
Technologies
Infrastructure: Snowflake, MongoDB, dbt, Fivetran, GitLab CI/CD, AWS (S3, Kinesis Firehose)
Analytics: Tableau, Sigma, Streamlit, Mixpanel
Integration: Salesforce, PlanHat, Customer.io, Intercom, Dixa, RosterStream
API: FastAPI, Pydantic, REST
AI/ML: Snowflake Cortex, NLP pipelines, LLM-as-a-Judge, MCP, Claude Agent SDK
Languages: SQL (advanced), Python