AI-powered data collection

2026

Designing PitchBook's first AI-powered data collection platform

From 30 fragmented tools to one unified system that cut collection time, reduced errors, and shifted researchers from typing data to judging it.

My Role

Lead Product Designer

Product Designer

Product Designer

Team

Cross-functional with PM, AI/ML, Engineering

1x Product Manager

1x Product Manager

Scope

6 month

launched Q1 2026

Overview

PitchBook sells best-in-class private market data to global investors. For two decades, the data was hand-collected by financial analysts. Data collection is slow, manual and error prone and there are 30 fragmented tools


I led the design of the platform that finally unified the work — not by adding AI on top, but by building the foundation AI needed to be trustworthy. Users stopped pasting data. They started judging it.

Problem area

Financial analysts were drowning in manual data collection work, and errors were slipping through to customers.

Why solve this now?
Data collection had been manual for decades because the alternative didn't exist. AI extraction has only recently matured enough to handle real filings. The window opened now.

What I reframed

The ask came in as "add AI to the workflow." I made the case that unification had to come first as AI on fragmented schemas can multiply chaos.

"We can't automate chaos. We can't add AI to a system that has no shared schema."

The Ideal

Full AI Automation

AI extracts all fields

Error Review

Done

GAP

The Reality

No shared schema. No tracking. No quality check.

Excel

Survey

Scripts

Error List

Filing Viewer

Tracker

Intake Forms

Validation

Legacy Survey

Email

Survey

Manual Log

What's the top user case?

Review Mode的inline error resolution。

理由:这是researcher每天做最多次的动作。如果这一个动作慢、笨、打断context,整个平台就崩了。Edit Mode和QC Mode都是低频动作。Review Mode是高频动作。高频动作必须做到极致。

how the new solution different from eixsting one?
1. most data and information will be prefilled in as AI/ML extract data from sources file
2. so user's job changed from copying and pasting a lot of data to just reviewing and make sure everything looks right.(this is where I designed many tools to do that such as QC tool, notes tool, the AI label/pill that explains where the source is from and why AI thinks this is the irght term/value
3. this is already gonna save researcher a lot of time from pasting every single field, most will be there to be review which is much faster
4. another thing that is helping minimize researcher effort a lot is the notes, today researcher spend a lot of time just wiritng notes about what they found where the source, but becasue this is automatically linking a source to the data point that was extracted the system already knows where it came from, so that doesn;t need to be manually entered, that's already gonna be a huge time saving for researcher working on collecting data


What success look like?
第一层(model-level):AI extraction accuracy on validation sample。这个Data Ops负责。

第二层(system-level):confidence threshold的calibration——researcher override了多少auto-resolved item?如果override率超过5%,threshold设错了。

第三层(experience-level):researcher的effort and effiicency。error rate and time per error fix是不是下降了?session length是不是下降了?

Solution overview - What Changed

From scattered tools to one source of truth. Errors started getting caught.

Before: scattered, 6 tabs, 15 hours, errors didn't get caught
Now: AI prefills, traceable sources, researchers review instead of type, catches errors before they propagate

yh92-1766834.github.io/prototype/
Click into the prototypee — this is the launched running live.
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
Dataset Tool
30+ TOOLSfragmented, no shared standard
unified
UDCPUnified Data Collection Platform
CollectionEditor
QC + ReviewSurfaces
Entity Profiles+ Entitlements
AI Extraction + Confidence Scoring Layer
1 PLATFORMshared standards, AI-native, quality enforced

模型的强项和弱项分别是什么?

模型的强项和弱项 → 这个应该作为一段加进solution section。一段话:模型擅长什么、不擅长什么、所以我让human做什么、让AI做什么。这一段是AI product designer最重要的senior signal之一,你现在的case study里没有显式讲。

强项:

  • 从structured documents(10-K filings)里extract数值类data

  • 把同样格式的data跨公司复制

  • 做cross-reference和数据验证

弱项:

  • 不稳定的confidence calibration——它有时候很自信但是错的

  • 处理nested hierarchies或者derived fields

  • 处理sensitive financial data(错了代价太大)

  • 解释为什么它给出某个值

设计的逻辑:让AI做它的强项(high-confidence extraction),让human做AI的弱项(judgment、context、edge cases)human- in- the- loop很重要。这就是confidence routing的本质。

The Framework -Designing for a MOving TRAGET

Everyday the AI model improves. I started by asking "At what point does the AI chime in, and what does the human ultimately do?"

So the LLM team and I co-designed an AI maturity framework: three modes routed by confidence score unified under one schema and one interaction model. These three constants are what let the platform absorb model improvement without a redesign. The schema doesn't change. The patterns don't change. Only the routing does.

什么样的体验才是真正的好用?(Design Principle)

researcher打开app,看到queue里只有10个item需要她review。每个item她在3秒内看到confidence、AI rationale、before/after、做出决策。她从来不需要切换view,从来不需要打开新tab验证source。一个session结束,她处理了200个item,没有一次context loss。

好用 = 决策距离短 + context never broken

不是"美"。不是"feature多"。不是"AI很聪明"。是这两件事。

如何用第一性原理不断修正方向?

第一性原理不是"打破规则",是回到约束本身去重新推理

我在这个项目里用的第一性原理:

约束1:AI不完美。所以必须设计human fallback。但fallback不能让AI变得没意义——所以是confidence-based routing,不是uniform review。

约束2:30+ tools不能瞬间变成1。所以unification不是"一次性大重写",是"core schema先建好,让其他verticals逐步migrate到上面"。

约束3:researcher害怕被替代。所以not just keep them in the loop——是让他们的judgment成为model的training signal。他们的expertise被encoded,不被丢弃。

每次方向有疑问的时候,回到这三个约束之一。如果一个decision违反了任何一个约束,重做。这就是第一性原理在做的事——不是聪明,是不偏离。

哪些功能值得做,哪些不值得?

值得做:

  • Confidence routing(核心机制)

  • Inline resolution(高频动作的体验)

  • Undo + re-queue(trust的backstop)

  • Shared schema + pattern library(杠杆最大)

不值得做:

  • 给researcher看80个confidence score的细节dashboard。他们不需要那么多信息,他们需要决策信号。

  • 复杂的AI explainability视觉化。一个simple rationale一句话就够了。再多就是noise。

  • 对每个data type做unique UI。复用pattern的价值远大于"为这个type特别设计"的价值。

    判断标准:这个feature是不是降低了researcher到决策的距离?如果是,做。如果它增加了researcher需要process的信息量,不做。

怎么样降低了shipping friction?

三件事:

第一,把AI confidence分tier,让high confidence的data自动通过,不需要人review。这把"每件事都要人确认"的friction消除了。

第二,把QC enforcement从"submit之后人工catch"变成"submit之前系统拦截"。错误不会再下游传播,shipping cost降低。

第三,inline resolution——researcher不需要切换view来修错。一个动作完成,shipping一个修正。

The framework didn't arrive whole. My first sketches had only two modes — I was assuming AI confidence was a single threshold.

Mode 1 - Edit Mode

Mode 2 - Review Mode

THe Sequencing challenge

When our LLM team showed me that data accuracy varied significantly by datasets, I realized assuming AI confidence score as a single threshold made no sense.

That changed how I thought about the model itself. The AI was strong at structured field extraction from filings. It was weak at per-dataset accuracy variance, lack of historical context, and a tendency to be confidently wrong.

03 · Quality Check · Review Mode
Where Review Mode fits in the pipeline
01
Researcher Dashboard
Filing queue and personal stats. Pick up the next BDC filing and start a task.
02
Editor
AI-assisted data entry. Every value has a confidence score and a traceable source.
03
Quality Check
Review Mode — resolve flagged errors inline, with a side panel for the full queue.
04
Submission
Clean, validated data lands in the canonical dataset. One click to ship.
Optional flow
+
Borrower Resolution
Side flow when a reported borrower doesn't match any canonical PitchBook entity.

The middle mode emerged when I tried to design the moment the AI hands a decision back to a human. I went back and forth with the LLM team before the routing logic stabilized.

MODE 3 · Review Mode

Review Mode had one job: make the AI-to-human handoff legible — letting researchers resolve flags without losing their place in the queue.

I found that user would easily forget what was left, or lose context for the row.

A — Separate error panel

Cut

Value

!

Error

Field

Expected

Skip

Resolve

Why I cut it

Broke the researcher's position. Every resolution meant re-finding the row and remembering context.

B — Modal per error

Cut

Resolve error

×

VALUE

NOTE

Cancel

Save

Why I cut it

Dozens of flags per session. A modal per flag turned the queue into an interruption marathon.

C — Inline resolution

Cut

Subject

Value

VALUE

NOTE

Skip

Confirm

Why I cut it

Edit state inside the table works, but researchers lose scan-mode — every flag forces them back into a single row instead of reviewing the whole list.

D — Side panel + inline

Shipped

Review Panel

Subject

Value

!

Why I shipped it

It gave researchers two surfaces working in parallel: scan-mode and edit-mode could coexist instead of replacing each other.

如何快速把concept想法放到用户手里?

这个问题我做得不够好。

我做的:interactive HTML prototype直接给researcher测试,比Figma快。

我应该做但没做:把confidence threshold提早放到真实user面前测试。我等到accuracy validated by Data Ops之后才锁定threshold,但"safe enough to auto-process"是subjective experience,应该在更早阶段用真researcher测试。这是我retrospective里写的"做得不够"那部分。

Iteration

I held a strict bar on what each flag had to communicate to a researcher: who created this error, and is it mine to fix?

Some errors originated upstream. They weren't the current researcher's job to fix. They needed to be visible without being blocking. So I split flags into two types based on ownership: Warnings errors and Blocking errors.

Option a - side panel

Option b - bottom drawer

Strategic decision

Enforced Quality Check before submission

There are actually two different quality check happening in the system: System-led and Human-led. The temptation was to skip the 2nd Quality check when a human edited the row. I pushed back. Errors don't only come from the AI. A researcher copying values across six tabs makes mistakes too.

So we enforced QC as a step, not assumed it as a property. The system treats every submission the same way.

***如何降低跨团队摩擦?


降低跨团队摩擦(不讨好,比他们早想清楚trade-off) → 这个完全没有被你case study体现出来。你现在的"people I navigated" section讲的是stakeholder pushback,但没讲清楚你的方法论——**带着risk-mitigation的proposal去对话,而不是等他们给答案。**这是缺失的内容,应该加进去。


我跟Data Ops、Engineering、PM都有过冲突。降低摩擦的方式不是讨好,而是比他们更早把trade-off想清楚

举例:confidence threshold的位置,design直接影响data quality和engineering feasibility。我在Data Ops验证accuracy之前,就提出了一个defensible default + undo/re-queue机制来吸收错误。这意味着我不是等他们给答案,而是带着一个有risk-mitigation的proposal去对话。结果是他们的pushback变成iteration,不是blocker。

What I navigated

The hardest work wasn't the system. It was the people around it.

A lot of the friction on this project came from unclear decision ownership. Early on, I wrote down what each function actually owned

Researchers → Fear of replacement

Researchers worried AI would replace them.

I showed them what the AI kept getting wrong. I made the case that their judgment, made systematic, was more valuable than their data entry. The researchers’ judgment was becoming the training signal for the system that would eventually reduce their manual work.

LLM team

Owned per-dataset accuracy and confidence thresholds.

Design

Owned the interaction model and trust layer.

PM

Owned scope and rollout sequencing.

What this changed

Naming this explicitly meant fewer arguments about whose call it was.

And made the disagreements that did happen substantive instead of territorial.

如何建立可重复利用的发布流程?

可重复利用的发布流程(pattern library + schema) → 已经在impact section里讲了。这是你最强的senior signal之一,可以再加重一点。比如说:"Three designers joined mid-project. They extended my system. They didn't rewrite it. That's the test of a good foundation."

我做的pattern library + shared schema就是这个答案。当三个新designer加入项目,他们不是从零开始——他们extend我建的system。

可重复利用的关键不是"我交了一份style guide"。是"系统的核心抽象正确,所以新vertical加入的时候,pattern不需要重新发明,只需要被填充"。

Design System

Impact

Launched Q1 2026 across 5 datasets. The platform is expanding to additional verticals, and 500+ researchers use it daily.

For researchers
15+ hrs → minutes
Before
15+ hours per record across six tabs. No quality enforcement. Errors inherited downstream.
Now
Minutes per record on AI-handled data. One surface. Quality enforcement catches errors before they propagate.
For the product
0 → 1 unified platform
Before
No instrumentation, no shared schema, no path to AI extraction.
Now
The first unified data collection platform at PitchBook. MVP launched in Q1 2025 on the BDC dataset. Every future AI-powered tool builds on the schema and interaction model the team established.
For the design org
Reinvented → extended
Before
Designers reinvented patterns for each new collection tool.
Now
When three designers joined mid-project, they extended my system instead of starting over. The shared framework — table patterns, headers, navigation, entitlements — became the design language for every collection flow downstream.

What I'd do differently

Across the project, a few questions kept me honest:

If AI confidence doubled tomorrow, would this still be the right design? Is this a structure problem or a rules problem? Whose decision is this, really?

Most of the work was answering these questions in specific situations. The questions are what made the answers possible.


Push for researcher sessions earlier on where the confidence threshold actually felt trustworthy. The LLM team and I validated thresholds against accuracy numbers. But "this feels safe enough to auto-process" is a subjective question I should have tested with real users before locking it.