Data Scientist & AI Systems Builder

Victor
Vassallo

Turning complex information into decisions.

I design systems that transform complex data, documents, and policy contexts into actionable decisions — bridging technical rigor with real-world impact.

Pipelines in Production

NLP + Econometrics

Policy-Grade Decision Support

Scroll

IT Modernization — DOC

Federal92

Data Analytics Platform — HHS

Federal88

Broadband Planning — NTIA

Grant61

GIS Mapping Services — DOT

Federal58

Research Support — NSF

Grant34

IT Modernization Support & Advisory Services — Dept. of Commerce

Route →Summarize

Sector Fit

0.87

Capability Match

0.64

Strategic Align

0.91

Agent Reasoning

Strong alignment with federal IT modernization portfolio. Prior DOC engagement and data platform work increases win probability. Recommending BD Team A routing with 48hr response window.

Agentic WorkflowProprietary ScoringAuto-Routed

What has our team previously written about broadband equity measurement frameworks?

Based on 4 sources retrieved from your corpus, your team has developed three distinct measurement frameworks for broadband equity. First, a digital equity index combining FCC availability data with census tract-level adoption indicators and affordability benchmarks src_02. Second, a provider-level scoring methodology applied during the NTIA grant analysis, weighting deployment commitments against historical underservice src_07. Third, a mixed-methods rubric that integrates qualitative community survey inputs with quantitative coverage data to produce a composite access score src_11. The equity index is your most-cited internal artifact, referenced in 6 subsequent deliverables.

Retrieved Sources

broadband_equity_index_v3.pdfsim 0.94

ntia_grant_analysis_final.docxsim 0.89

mixed_methods_framework_2024.pdfsim 0.82

Ask a question about your documents...

→

AllInterviewsReportsSurvey

Interview · USDA Rural Devsim 0.96

"Residents cited cost as the primary barrier, followed by a lack of awareness that affordable programs existed. Even where infrastructure was present, digital literacy gaps prevented meaningful adoption."

Cost BarrierDigital LiteracyRural

Report · FCC Broadband Studysim 0.91

Analysis of 12 rural counties found device availability and perceived relevance as consistent secondary barriers alongside infrastructure gaps.

Device AccessInfrastructure

Survey · NTIA Program Evalsim 0.87

68% of non-adopters indicated affordability as their primary reason for not subscribing to available broadband service.

AffordabilityAdoption Rate

Theme Distribution

Cost / Affordability

Digital Literacy

Infrastructure Gap

Device Access

Perceived Relevance

Source Breakdown

USDA Rural Dev312

FCC Studies284

NTIA Program251

Active Workstreams

↑ 2 this week

Team Utilization

84%

⚠ Near capacity

On-Time Delivery

91%

↑ 6pts vs last qtr

Scope Overages

↓ flagged for review

Workstream Progress

Federal Broadband Analysis18d remaining

RAG Pipeline Deployment5d remaining

Policy Document NLP12d remaining

⚠ Staffing gap detected

Client Dashboard Q32d remaining

✗ Behind schedule

Econometric Modeling24d remaining

Capacity by Role

Analytics

92%

ML / NLP

78%

Research

65%

Delivery

88%

ML Forecast Alert

Analytics role projected to exceed capacity in 6 days. Recommend rebalancing 2 tasks to Research track.

BOOM System

Agentic Opportunity Intelligence

About

Systems-oriented.
Analytically precise.

"My work focuses on turning fragmented data, documents, and qualitative inputs into structured, usable intelligence."

I'm a data scientist and AI systems builder who operates at the intersection of technical architecture and applied research. My work spans decision support systems, knowledge pipelines, and economic analysis — with a consistent focus on making complex information actionable.

I bring together machine learning, NLP, and systems design with deep domain knowledge in economic and policy research. I build things that are meant to be used — not just analyzed.

Data Systems

Modeling + NLP

Decision Impact

Capabilities

What I Build

⬡

Decision Support Systems

AI-powered pipelines that synthesize multi-source intelligence into clear, structured outputs for high-stakes decisions.

→

◈

Document & Knowledge Systems

OCR, LLM extraction, and retrieval-augmented architectures that unlock value locked in unstructured documents.

→

◎

Analytical Models

Predictive models, NLP classifiers, and statistical frameworks tuned for operational and policy contexts.

→

◇

Economic & Policy Analysis

Rigorous quantitative analysis connecting data-driven methods with economic theory and policy implications.

Background

Past Experience

Kaptivate LLC

May 2024 — Present

Data Scientist

Develop and maintain production-grade analytics pipelines and agentic workflows that unify structured records with narrative text sources. Apply ML/NLP to transform qualitative inputs into decision-ready metrics and support econometric broadband impact modeling used in policy research.

Click for expanded background

Alexandria, VA

Design and maintain production data pipelines integrating structured records, documents, and narrative inputs across multiple federal and operational data sources to enable decision-ready analytics.
Develop and deploy NLP and semantic modeling systems that classify qualitative research inputs, extract meaning at scale, and transform unstructured text into structured, queryable knowledge.
Build data validation, anomaly detection, and monitoring frameworks with schema-constrained extraction pipelines and threshold alerting to ensure integrity across multi-source environments.
Create dashboards and analytical deliverables used by 30+ program staff, translating complex operational and policy datasets into clear, actionable insights for technical and non-technical stakeholders.
Architect and implement several retrieval-augmented generation systems including a contract opportunity intelligence platform that cut review time 70% and supported $4M+ in successful federal funding applications.

Data Society LLC

May 2023 — August 2023

Data Science / Machine Learning Intern

Developed supervised and unsupervised machine learning models for client research engagements, engineered ETL pipelines to prepare large datasets for modeling, and implemented evaluation practices to strengthen reproducibility.

Click for expanded background

Washington, DC

Developed supervised and unsupervised machine learning workflows for client-facing research projects, improving predictive performance and supporting reproducible, well-documented analytical outputs.
Engineered Python- and SQL-based ETL processes to clean, transform, and organize large datasets for modeling, reporting, experimentation, and downstream analytics applications.
Applied natural language processing and automated text analysis techniques to extract insights from unstructured data, improving research efficiency and analytical consistency.
Evaluated model behavior, validated datasets, and produced technical documentation that improved transparency, usability, and communication of analytical results across stakeholders.

AARP

May 2022 — August 2022

Membership Lifecycle Management Intern

Built predictive models to analyze membership behavior and improve retention-oriented campaign strategy. Automated recurring data pipelines and integrated structured and unstructured CRM data for fuller lifecycle visibility.

Click for expanded background

Washington, DC

Built predictive and descriptive models analyzing member behavior, supporting retention strategy, campaign optimization, and more targeted decision-making across business units.
Automated recurring data preparation and reporting workflows, reducing manual effort while improving consistency, timeliness, and accessibility of operational insights for stakeholders.
Integrated CRM, behavioral, and reporting data into unified analytical views that improved visibility into lifecycle trends, segmentation opportunities, and performance drivers.
Developed dashboards and presentation-ready analyses that translated complex membership data into clear recommendations for planning, strategy, and cross-functional collaboration.

Key Projects

Featured Project

BOOM (Business Optimization & Opportunity Management) System

An agentic opportunity intelligence system that continuously monitors, evaluates, and routes contract and grant opportunities through a stateful, human-governed pursuit workflow.

Agentic Workflow LLM Decision System

Decision Intelligence RAG

A production-grade, multi-source RAG system with corpus, document-only, and hybrid modes that delivers citation-ready, context-grounded intelligence for policy and program decisions.

RAG Multi-Source Managed LLM

Predictive Forecasting & Strategic Design System

A machine learning decision system that forecasts demand and outcome quality, then operationalizes targeted intervention strategies to improve participation and resource alignment.

ML Forecasting Decision Systems

Published Research on Economic Impact of Federal Broadband Funding

Published mixed-methods research combining regional econometric analysis and qualitative evidence synthesis to evaluate federal broadband funding impacts and policy implications.

Econometrics Mixed Methods Policy Research

Areas of Active Focus

AI-driven decision systems for civic and government contexts

Designing agentic pipelines and structured reasoning workflows that support human-governed decisions in policy and program environments.

Agentic AI Civic Tech Decision Design

Scalable semantic modeling across multi-source policy and opportunity data

Building NLP and embedding-based systems that extract, align, and surface meaning from heterogeneous document corpora and operational data.

NLP Semantic Search Embeddings

Applied economic analysis of federal broadband & infrastructure policy

Using econometric modeling and mixed-methods research to evaluate federal investment outcomes and inform infrastructure policy recommendations.

Econometrics Policy Research Mixed Methods

Advanced data visualization for document-heavy research and decision workflows

Translating complex, multi-layered datasets and document-driven findings into clear, interactive deliverables for analytical and executive audiences.

Data Viz Dashboards Stakeholder Comms

Portfolio

Systems &
Projects

A deep look at the systems, pipelines, and analyses I've built — organized by domain and designed for exploration.

01 Decision Systems

BOOM (Business Optimization & Opportunity Management) System

An agentic opportunity intelligence system that monitors, evaluates, and routes contract and grant opportunities through a human-governed workflow.

BOOM LLM GCP AI

Overview

An end-to-end agentic platform that ingests contract and grant opportunities, evaluates them against organizational priorities, and executes downstream actions including prioritization, routing, summarization, and analyst escalation.

Combines automated decision support with human oversight, enabling faster and more consistent pursuit workflows.

Problem

Business development teams face fragmented, high-volume opportunity streams with inconsistent structure and changing requirements.

Traditional tools surface opportunities but don't actively move them through the pursuit process
Missed high-fit opportunities due to manual triage bottlenecks
Inefficient use of analyst time on low-value screening tasks

Architecture

BOOM operates as a stateful orchestration layer that:
(1) continuously ingests and monitors opportunity sources,
(2) normalizes and semantically structures incoming data,
(3) retrieves relevant internal context from capabilities, offerings, and prior work,
(4) evaluates opportunity fit using proprietary scoring logic, and
(5) conditionally triggers downstream actions such as routing to teams, generating summaries, initiating document analysis, and preparing decision support outputs.

Decision Impact

~70%

reduction in review time

500+

opportunities identified / month

By automating triage, prioritization, and initial analysis, BOOM allows teams to focus entirely on high-value pursuit decisions rather than screening.

Architecture Diagram

02 Document & Knowledge Systems

Decision Intelligence RAG

A production-grade, multi-source RAG pipeline with three operating modes: RAG corpus, document-only, and hybrid retrieval for decision-ready answers.

RAG Multi-Source Managed LLM GCP

Overview

A production-grade decision intelligence RAG system engineered for high-reliability retrieval, grounded generation, and operational scalability. It supports corpus retrieval, document-scoped retrieval, and hybrid retrieval across persistent and session-level sources.

Problem

Critical documents, prior work artifacts, and institutional company IP were fragmented across disconnected databases and repositories, making knowledge hard to locate at decision time. That fragmentation slowed deliverable production, weakened proposal development velocity, and increased the risk of repeating work or missing high-value evidence during response windows.

Architecture

The system unifies ingestion, retrieval, and generation in a single operational pipeline:
(1) query and document inputs are routed across corpus, document-only, or hybrid modes,
(2) multi-index retrieval materializes chunk/parent/neighbor context with metadata-aware reranking and diversification, and
(3) a managed LLM layer generates citation-grounded responses with streaming, retry/backoff resiliency, concurrency controls, token-budgeted context assembly, and telemetry for auditability.

Decision Impact

> 200%

increase in proposal production / month

data silos converted into a single knowledge base

By centralizing fragmented documents and institutional IP into a retrieval-first workflow, the system reduced knowledge loss during active pursuits, improved delivery speed, and fueled more consistent decision support across recurring and ad hoc requests.

RAG Pipeline

Qualitative Research Database

A context-aware qualitative intelligence system that structures diverse evidence into a semantic layer for policy and program research.

Embeddings Semantic Index Metadata Knowledge Base

Overview

A production qualitative research database that transforms unstructured evidence into a context-aware semantic knowledge layer for analysis, synthesis, and decision support.

Problem

Critical research evidence arrived in fragmented formats and channels, including federally mandated reports, interviews, survey instruments, and scraped media references. Manual cross-source synthesis was slow, inconsistent, and difficult to operationalize at scale.

Architecture & Methods

The system was designed as a single qualitative intelligence workflow:
(1) ingest and normalize heterogeneous evidence sources into a shared, schema-flexible structure,
(2) preserve provenance and metadata at the source and record level for traceability,
(3) generate embeddings and vector indexes to enable semantic retrieval alongside structured filtering, and
(4) assemble context windows with entity tags and citation-linked references for reliable synthesis and reporting.

Decision Impact

federal agency datasets consolidated

published research paper enabled

Consolidating cross-agency qualitative data improved research continuity, strengthened evidence reuse for reporting and proposals, and supported publication-ready analysis.

Qualitative Intelligence System

Qualitative Research Database Flow Diagram

03 Business Analytics & Operations

Operational Analytics & Resource Allocation Dashboard

An operational intelligence system using machine learning and predictive analytics to optimize workflows, project resources, and coordination across teams.

ML Analytics Predictive Ops Dashboard Risk Monitoring

Overview

An operational intelligence platform combining machine learning, predictive workflow analytics, and resource-allocation visibility to improve delivery performance in multi-project environments.

Problem

Program leadership lacked timely integrated signals for workflow health and resource pressure. This increased exposure to burnout risk, workstream fragmentation, and poor coordination across interdependent project teams.

Architecture & Methods

The system was built as a unified operational decision workflow:
(1) integrate workflow, staffing, and delivery signals into a shared operations model,
(2) map dependencies between workstream stages and resource capacity to expose coordination pressure points,
(3) apply forecasting, anomaly detection, and risk scoring for burnout, fragmentation, and handoff bottlenecks, and
(4) surface role-specific dashboards and alerting layers to support earlier intervention and allocation decisions.

Decision Impact

Teams completed workstreams in fewer hours than prior projections while improved division of labor reduced duplicate effort and clarified ownership. Operational response cycles accelerated as risks were surfaced sooner and addressed earlier.

Operational Intelligence System

Operational Analytics Dashboard Pipeline Diagram

Predictive Forecasting & Strategic Design System

A machine learning and strategy-design system for forecasting demand and outcome quality, then proactively shaping federal program results through targeted interventions.

ML Forecasting Decision Systems Strategy Design

Overview

Designed and implemented a predictive system that forecasts participation dynamics and outcome quality, then translates those forecasts into targeted intervention strategies. In federal participation-driven programs, this enables teams to anticipate variability, optimize resource allocation, and influence outcomes instead of reacting after trends emerge.

Problem

Organizations operating in uncertain, participation-driven federal environments lacked visibility into future demand and outcome distribution. This produced resource mismatches (overload vs. underutilization), ineffective or poorly timed outreach, and limited ability to correct course once trends appeared. Even where forecasts existed, there was no structured system for operational action.

Architecture & Methods

The system was implemented as a closed-loop predictive decision workflow:
(1) model expected demand and outcome quality from historical program data, behavioral signals, and structural characteristics,
(2) calibrate probabilities and uncertainty ranges to support reliable deployment thresholds,
(3) diagnose key drivers with feature attribution and segment scenarios (low participation risk, low quality risk, imbalance), and
(4) activate targeted intervention strategies through decision rules that link forecasts directly to operational actions.

Decision Impact

~15%

forecast variance from actual outcomes

risk scenarios operationalized

Enabled proactive resource allocation aligned to predicted demand, improved participation and outcome quality through targeted interventions, and reduced intuition-only decision making by embedding data-driven rules into federal program operations.

Predictive Forecasting / Strategic Design Flow

Predictive Forecasting Strategic Design Loop Diagram

04 Economic & Applied Analysis

Published Research on Economic Impact of Federal Broadband Funding

Published mixed-methods research quantifying the economic effects of federal broadband investment and integrating qualitative evidence for policy interpretation and implementation guidance.

Econometrics Mixed Methods Policy Research Published

Overview

A published research project analyzing the economic impact of federal broadband funding through a custom mixed-methods framework that pairs regional econometric analysis with large-scale qualitative evidence synthesis.

Problem

Policy stakeholders needed causal, region-specific estimates of economic effects, but quantitative findings alone were insufficient for implementation interpretation. Existing analyses lacked a unified structure linking economic outcomes to on-the-ground qualitative evidence from communities and program actors.

Approach & Methods

The research used an integrated mixed-methods design:
(1) regional econometric modeling estimated effects across employment, income, and local business indicators,
(2) qualitative evidence was systematically indexed and analyzed to interpret mechanisms and implementation constraints, and
(3) both evidence streams were synthesized into a single policy analysis layer for stronger confidence, sensitivity testing, and applied guidance.

Decision Impact

prominent research conferences

10+

national media outlets citing findings

Findings informed federal broadband policy conversations and implementation strategy discussions, with integrated quantitative and qualitative evidence improving confidence in where and why impacts were strongest.

Analytical Framework

Broadband Funding Economic Impact Framework Figure

NLP Policy Intelligence & Geospatial Analysis

A unified NLP + policy research + geospatial approach to analysis, combining public input intelligence with implementation-plan analysis to support broadband strategy and evidence-based policy design.

NLP Geospatial Advanced Viz Policy Analysis

Overview

Combined listening-session NLP analysis with implementation scenario research into a single service layer for policy intelligence. The system synthesizes unstructured inputs, policy documents, and geography-linked infrastructure data to surface actionable implementation insights.

Problem

Public inputs and state implementation choices were being analyzed in silos, obscuring cross-signal patterns. Stakeholders lacked an integrated framework to connect thematic concerns, sentiment trends, geographic disparities, and likely policy outcome tradeoffs.

Approach & Advanced NLP Methods

The system used a unified policy intelligence workflow:
(1) ingest comments, transcripts, and planning artifacts into a shared analysis layer,
(2) apply supervised and weakly supervised classification, semantic clustering, sentiment analysis, and topic modeling to surface dominant concerns and emerging issue clusters,
(3) link NLP outputs to geospatial coverage and demographic context for comparative scenario assessment, and
(4) communicate results through advanced visuals including choropleth and bivariate maps, Sankey flows, and stakeholder-topic network graphs.

Decision Impact

policy analyses produced

analyses cited and published by national media

Delivered integrated NLP and geospatial evidence products that improved policy scenario evaluation and accelerated synthesis during active federal and state broadband decision windows.

NLP / Geospatial Visuals

Policy Intelligence Rotation Visualization

Reach Out

Contact &
Speaking

Open to collaborations in AI systems, data science, and applied research.

victorevassallo@gmail.com

Best for project inquiries, collaborations, and speaking requests.

View LinkedIn Profile →

Professional background, publications, and updates.

Resume

Download PDF →

Full work history, technical skills, and education.

Current Availability

Selectively Available

Open to contract, advisory, and collaborative research engagements in AI systems and policy analysis.

Areas of Collaboration

AI Systems Design

RAG architectures, decision pipelines, document intelligence

Applied Data Science

NLP, predictive modeling, analytical systems for civic/gov contexts

Policy & Economic Analysis

Broadband, digital equity, federal program analysis and evaluation

Engagements

Speaking Engagements

Conference Presentation

Western Regional Science Association (WRSA) Annual Meeting

Mixed Methods Approaches to Broadband Research and Policy Analysis

Presented research at an international, multidisciplinary conference of economists, policymakers, and regional scientists focused on spatial and economic analysis. Shared a mixed-methods framework integrating quantitative modeling with qualitative insights to inform broadband policy and regional development strategies.

Panel

Federal Reserve Digital Access Research Forum

Connecting Minority Communities

Panelist at a national convening hosted by multiple Federal Reserve Banks, bringing together researchers, policymakers, and practitioners to advance work on digital access and economic inclusion. Contributed perspectives on equity-focused broadband deployment and the role of data in identifying and addressing disparities in underserved communities.

Presentation

Broadband Breakfast (National Broadband Media Platform)

BEAD Policy Restructuring & NLP Applications in Policy Analysis

Delivered a policy-focused presentation for a national broadband policy audience, examining evolving BEAD implementation strategies and demonstrating how NLP methods can be used to systematically analyze policy documents and public input at scale.

Conference Talk

ForwardDMV (Regional Workforce & Innovation Conference)

Identifying and Implementing AI-Supported Workflows

Panel discussion with regional leaders and practitioners on practical approaches to identifying high-value AI use cases within organizations, with emphasis on implementation strategy, change management, and aligning technical capabilities with operational needs.

VictorVassallo

Systems-oriented.Analytically precise.

What I Build

Past Experience

Key Projects

Areas of Active Focus

Systems &Projects

Contact &Speaking

Speaking Engagements

Victor
Vassallo

Systems-oriented.
Analytically precise.

Systems &
Projects

Contact &
Speaking