Skip to main content

SERGIO SÁNCHEZ

v0.4 · Jan 17, 2026 · San Francisco Bay Area

Case Study · TalkingPoints

Long-Term Impact Research

Data science in service of educational equity—proving (or disproving) that the work matters.

2025–2026·Data Science & Research

Impact Analysis

TalkingPoints claims their platform improves student attendance. I built the research infrastructure to help the research team test it—multi-round analysis across multiple school years, tens of thousands of students at Tulsa Public Schools. Enrollment estimation methodology, validated against NCES data.

Conversation Research

Separately, I proposed, designed, and implemented a conversation analysis pipeline inspired by Clio's methodology (paper).

Dialogue Segmentation

Identifying conversation threads from message streams. When does one conversation end and another begin? How do you handle topic shifts within ongoing exchanges?

Utterance Role Labeling

Classifying each message's communicative function. Two taxonomies: boundary states (new conversation, continuation, topic shift, resolution, escalation, farewell) and functional states (action request, information update, clarification, follow-up, acknowledgment).

Privacy-Preserving Clustering

Production-tested on millions of messages. Hierarchical clustering with automatic small-cluster merging to protect student data. Privacy compliance isn't an afterthought—it's built into the methodology.

AI Evaluation Infrastructure

Built the NLP data infrastructure in Snowflake—vector embedding pipelines using Snowflake Cortex, message labeling systems for training and evaluation, anchor text embedding for absence classification.

Designed evaluation frameworks for AI classification accuracy—because if you're going to deploy AI at scale in education, you need to know when it's wrong.

The Approach

The impact analysis was collaborative—I built what the research team needed. The conversation analysis and AI evaluation work was mine: I proposed it, designed it, developed it, deployed it.

Both matter. Infrastructure work enables others. Original research extends what's possible.

Impact

  • Research infrastructure for multi-year impact analysis
  • Clio-inspired pipeline for conversation analysis at scale
  • Privacy-preserving NLP methodology for educational data
  • AI evaluation frameworks for classification accuracy

Technologies

PythonSQLSnowflake CortexNLPStatistical AnalysisdbtVector Embeddings