Case Study · TalkingPoints
Data Infrastructure Modernization
Building data infrastructure from scratch—not fixing what was broken, but creating what didn't exist.
The Starting Point
I joined as the second data engineer when the data team was six months old. There was no infrastructure—just Hevo piping MongoDB straight to Snowflake without a transformation layer, automated testing, deployment pipelines, or documentation.
First task: fix analytics dashboards that had been timing out since Back-to-School 2023. That's where I started. But the real work was building everything that would come after.
What I Built
Data Replication Evolution
This wasn't a straight line. We started with Hevo, then tried MongoDB triggers writing to S3 with Snowpipe consuming into Snowflake—too costly and too fragile. Eventually we landed on Fivetran, migrating collection by collection: messages, users, students. Each migration was its own project.
But Fivetran had its own challenges. Our data had complex nested structures that sometimes didn't capture changes correctly. I built a monitoring system that compared summary statistics and sample records between MongoDB and Snowflake, alerting to a dedicated Slack channel when discrepancies appeared. The goal: pipelines that fail loudly instead of silently.
CI/CD on GitLab
We started on dbt Cloud, then attempted to migrate dbt jobs into CI/CD pipelines. GitHub ran out of minutes fast—the cost was too high. GitLab offered more flexibility. Meanwhile, we kept optimizing the models themselves, reducing the runner minutes needed with each iteration.
The result: automated testing, environment promotion (dev → staging → production), scheduled jobs. Deploying data model changes went from risky and manual to routine and reversible.
Medallion Architecture & Governance
While all of this was happening, I was building the transformation layer in dbt. Medallion architecture—bronze (raw), silver (cleaned), gold (business-ready). Foundational models for Users, Students, Contacts, Messages. Everything else builds on these.
At the same time: role-based access control and data governance in Snowflake. The infrastructure to keep data secure and accessible to the right people.
Cost Optimization
Cost optimization wasn't a project—it was a constant. Over four years, it kept pushing us forward. MongoDB triggers were too costly. Snowpipe was too costly. Dynamic tables were too costly. Each time, we found a better solution.
Switched from dynamic tables to incremental models. Optimized dbt models to run faster. Had conversations with stakeholders about what data freshness they actually needed versus what they assumed they needed. The infrastructure scaled with the company.
The Approach
We tried video walkthroughs early on, but they weren't useful for the team. What worked: templates, a documentation database in Notion, and leading by example. I tried to foster a culture of documentation within the data team.
Our philosophy: anyone can jump into any project at any time. With three or four people on the team, we couldn't afford knowledge silos. Proper documentation and communication weren't nice-to-haves—they were how we stayed functional.
The goal wasn't to be the only person who understood the system. The goal was to build infrastructure that anyone on the team could maintain and extend.
Impact
- Data reliability—from daily data discrepancies to almost none
- Scaled with the company—a team of four supported TalkingPoints growing from 40 to over 100 employees
- Right-sized data freshness—near real-time where it matters, efficient batch processing where it doesn't
- Full CI/CD for data model deployments on GitLab
- Medallion architecture with documented, tested dbt models
- Data governance with role-based access control in Snowflake
- Custom monitoring comparing MongoDB and Snowflake with Slack alerts
- Documentation culture where anyone on the team could jump into any project