Data Collection: Measuring Your Process

Once you've discovered what exists in your process, you need to measure it. Data collection captures the "adjectives and adverbs"—the characteristics that describe how your process actually performs.


Why Data Matters

Gut feelings and anecdotes aren't enough for serious process improvement. You need data to:

  • Establish a baseline - How does the process perform today?
  • Identify problems - Where are the delays, errors, and bottlenecks?
  • Justify changes - What's the business case for improvement?
  • Measure success - Did the improvement actually work?
  • Maintain gains - Are we staying improved over time?

"Data that need to be generated or output by your process must support your business needs."


Types of Data

Continuous vs. Discrete

Understanding this distinction helps you choose the right collection and analysis methods.

Continuous Data Discrete Data
Numeric measurements Categories or counts
Can take any value in a range Limited set of values
Often physical measurements Often status or classification

Continuous examples:

  • Processing time (23.5 minutes)
  • Temperature (72.3°F)
  • Weight (16.2 pounds)
  • Cost ($142.87)

Discrete examples:

  • Status (Open, In Progress, Closed)
  • Quality (Pass, Fail)
  • Priority (High, Medium, Low)
  • Count (17 errors)

Point Values vs. Distributions

Data can be captured and used in two forms:

When to use point values:

  • Process is very consistent
  • High precision isn't critical
  • Data collection resources are limited

When to collect distributions:

  • Process varies significantly
  • You need to understand variability
  • Statistical analysis is planned
  • Simulation modeling is involved

What to Measure

The Core Process Metrics

Every process has metrics that matter. Common categories include:

Category Metrics Why It Matters
Time Cycle time, Wait time, Lead time Speed and responsiveness
Quality Defect rate, Rework rate, Accuracy Output reliability
Cost Unit cost, Resource cost, Overhead Financial performance
Volume Throughput, Capacity, Utilization Scale and efficiency
Reliability Uptime, On-time delivery, Consistency Dependability

Finding the Right Metrics

Ask these questions:

  1. What does success look like? Metrics should connect to business goals
  2. What can we actually measure? Don't design for data you can't get
  3. What will drive behavior? People optimize for what's measured
  4. What's worth the effort? Collection has costs too

Avoiding Bad Metrics

Watch out for metrics that:

  • Encourage gaming - People hit the number but miss the point
  • Measure activity, not outcomes - Busy doesn't mean effective
  • Ignore quality for speed - Faster isn't better if it's wrong
  • Create local optimization - One area improves while the system suffers

Data Collection Methods

1. Electronic Data Capture

Modern systems often capture process data automatically.

Sources:

  • Transaction logs
  • System timestamps
  • Database records
  • IoT sensors
  • Application metrics

Advantages:

  • Continuous, automatic collection
  • Large sample sizes
  • Objective (no observer bias)
  • Historical data available

Challenges:

  • Data may not align with process questions
  • Quality issues (missing data, errors)
  • May require technical expertise to access

2. Manual Observation

Direct observation captures what systems can't see.

Methods:

  • Time studies with stopwatches
  • Tally sheets for counting
  • Structured observation forms
  • Video recording for later analysis

Advantages:

  • Captures what systems miss
  • Flexible—can adapt to discoveries
  • Sees context and nuance

Challenges:

  • Labor intensive
  • May affect behavior (Hawthorne effect)
  • Limited sample sizes
  • Observer variability

3. Subject Matter Expert Estimates

When measurement isn't practical, experienced people can estimate.

When to use:

  • Historical data unavailable
  • Direct measurement too disruptive
  • Rare events can't be sampled adequately
  • Quick baseline needed

Best practices:

  • Use multiple SMEs and compare
  • Ask for ranges, not just single values
  • Understand basis for estimates
  • Validate against available data

4. Historical Records

Past data can reveal trends and patterns.

Sources:

  • Financial records
  • Quality reports
  • Customer complaints
  • Maintenance logs
  • Project archives

Considerations:

  • Ensure data definitions haven't changed
  • Account for process changes over time
  • Watch for survivorship bias (what wasn't recorded?)

Data Collection Planning

The Data Collection Matrix

For each metric, plan how you'll collect it:

Metric Source Method Sample Size Frequency Owner
Cycle time Order system Query logs All orders Daily Analyst
Defect rate QA reports Count All units Weekly QA lead
Wait time Observation Time study 50 samples One-time Consultant
Customer satisfaction Survey Questionnaire 200 responses Monthly Marketing

Sample Size Considerations

More data isn't always better—it costs time and money. Consider:

  • Variability - High variation needs more samples
  • Precision needed - Tighter estimates need more samples
  • Population size - Small populations may need census
  • Practical constraints - What can you actually collect?

Data Quality

Garbage in, garbage out. Ensure data quality by:

  • Defining terms clearly - Everyone measures the same thing
  • Training collectors - Consistent methods across observers
  • Validating data - Check for errors and outliers
  • Documenting context - Note anything that might affect interpretation

Real-World Collection Examples

Manufacturing: Cycle Time Study

Goal: Understand how long each production step takes

Method:

  • Video recorded 50 production cycles
  • Analyst reviewed footage and timed each step
  • Calculated mean, standard deviation, and range

Results:

Step Mean Time Std Dev Range
Setup 12.3 min 3.2 min 7-19 min
Processing 45.7 min 2.1 min 42-51 min
Inspection 8.4 min 4.8 min 3-22 min

Finding: Inspection had the highest variability—investigation revealed inconsistent criteria.

Healthcare: Patient Wait Times

Goal: Measure actual vs. perceived wait times

Method:

  • Electronic check-in timestamps from system
  • Patient survey asking perceived wait time
  • Observation study validating both sources

Results:

  • Actual mean wait: 23 minutes
  • Perceived mean wait: 38 minutes
  • Correlation between actual and perceived: 0.4

Finding: Perception didn't match reality—communication improvements could help more than speed improvements.

Service: Support Ticket Analysis

Goal: Identify drivers of resolution time

Method:

  • Extracted 6 months of ticket data from helpdesk system
  • Analyzed by category, priority, assignee, time of day
  • Statistical analysis to find significant factors

Results:

  • Category explained 45% of variation
  • Assignee explained 20%
  • Time of day: no significant effect

Finding: Some ticket categories needed specialized training or better documentation.


Data Conditioning

Raw data usually needs cleaning before analysis.

Common Issues

Problem Example Solution
Missing values Blank timestamps Investigate cause; decide to exclude or estimate
Outliers 900-hour cycle time Verify if real; exclude or note separately
Inconsistent units Mix of minutes and hours Standardize all values
Duplicate records Same transaction twice Identify and remove duplicates
Changed definitions "Complete" means different things over time Segment by time period

Validation Techniques

  • Range checks - Values within expected bounds?
  • Consistency checks - Related values make sense together?
  • Trend analysis - Sudden changes explained?
  • Source verification - Spot-check against source documents

Presenting Data

Data needs to be communicated effectively to drive decisions.

Visualization Basics

Effective Data Presentation

Do:

  • Label clearly—don't make people guess
  • Show context—what's good or bad?
  • Include sample sizes—how reliable is this?
  • Tell the story—what does this mean?

Don't:

  • Cherry-pick data that supports your view
  • Use misleading scales
  • Over-complicate with too many dimensions
  • Present without interpretation

Key Takeaways

  • Data collection follows discovery—you measure what you've found
  • Distinguish continuous from discrete, point values from distributions
  • Match collection method to what you need and what's practical
  • Plan data collection systematically with clear ownership
  • Ensure data quality through careful design and validation
  • Present data in ways that communicate clearly and honestly