Data Collection: Measuring Your Process
Once you've discovered what exists in your process, you need to measure it. Data collection captures the "adjectives and adverbs"—the characteristics that describe how your process actually performs.
Why Data Matters
Gut feelings and anecdotes aren't enough for serious process improvement. You need data to:
- Establish a baseline - How does the process perform today?
- Identify problems - Where are the delays, errors, and bottlenecks?
- Justify changes - What's the business case for improvement?
- Measure success - Did the improvement actually work?
- Maintain gains - Are we staying improved over time?
"Data that need to be generated or output by your process must support your business needs."
Types of Data
Continuous vs. Discrete
Understanding this distinction helps you choose the right collection and analysis methods.
| Continuous Data | Discrete Data |
|---|---|
| Numeric measurements | Categories or counts |
| Can take any value in a range | Limited set of values |
| Often physical measurements | Often status or classification |
Continuous examples:
- Processing time (23.5 minutes)
- Temperature (72.3°F)
- Weight (16.2 pounds)
- Cost ($142.87)
Discrete examples:
- Status (Open, In Progress, Closed)
- Quality (Pass, Fail)
- Priority (High, Medium, Low)
- Count (17 errors)
Point Values vs. Distributions
Data can be captured and used in two forms:
When to use point values:
- Process is very consistent
- High precision isn't critical
- Data collection resources are limited
When to collect distributions:
- Process varies significantly
- You need to understand variability
- Statistical analysis is planned
- Simulation modeling is involved
What to Measure
The Core Process Metrics
Every process has metrics that matter. Common categories include:
| Category | Metrics | Why It Matters |
|---|---|---|
| Time | Cycle time, Wait time, Lead time | Speed and responsiveness |
| Quality | Defect rate, Rework rate, Accuracy | Output reliability |
| Cost | Unit cost, Resource cost, Overhead | Financial performance |
| Volume | Throughput, Capacity, Utilization | Scale and efficiency |
| Reliability | Uptime, On-time delivery, Consistency | Dependability |
Finding the Right Metrics
Ask these questions:
- What does success look like? Metrics should connect to business goals
- What can we actually measure? Don't design for data you can't get
- What will drive behavior? People optimize for what's measured
- What's worth the effort? Collection has costs too
Avoiding Bad Metrics
Watch out for metrics that:
- Encourage gaming - People hit the number but miss the point
- Measure activity, not outcomes - Busy doesn't mean effective
- Ignore quality for speed - Faster isn't better if it's wrong
- Create local optimization - One area improves while the system suffers
Data Collection Methods
1. Electronic Data Capture
Modern systems often capture process data automatically.
Sources:
- Transaction logs
- System timestamps
- Database records
- IoT sensors
- Application metrics
Advantages:
- Continuous, automatic collection
- Large sample sizes
- Objective (no observer bias)
- Historical data available
Challenges:
- Data may not align with process questions
- Quality issues (missing data, errors)
- May require technical expertise to access
2. Manual Observation
Direct observation captures what systems can't see.
Methods:
- Time studies with stopwatches
- Tally sheets for counting
- Structured observation forms
- Video recording for later analysis
Advantages:
- Captures what systems miss
- Flexible—can adapt to discoveries
- Sees context and nuance
Challenges:
- Labor intensive
- May affect behavior (Hawthorne effect)
- Limited sample sizes
- Observer variability
3. Subject Matter Expert Estimates
When measurement isn't practical, experienced people can estimate.
When to use:
- Historical data unavailable
- Direct measurement too disruptive
- Rare events can't be sampled adequately
- Quick baseline needed
Best practices:
- Use multiple SMEs and compare
- Ask for ranges, not just single values
- Understand basis for estimates
- Validate against available data
4. Historical Records
Past data can reveal trends and patterns.
Sources:
- Financial records
- Quality reports
- Customer complaints
- Maintenance logs
- Project archives
Considerations:
- Ensure data definitions haven't changed
- Account for process changes over time
- Watch for survivorship bias (what wasn't recorded?)
Data Collection Planning
The Data Collection Matrix
For each metric, plan how you'll collect it:
| Metric | Source | Method | Sample Size | Frequency | Owner |
|---|---|---|---|---|---|
| Cycle time | Order system | Query logs | All orders | Daily | Analyst |
| Defect rate | QA reports | Count | All units | Weekly | QA lead |
| Wait time | Observation | Time study | 50 samples | One-time | Consultant |
| Customer satisfaction | Survey | Questionnaire | 200 responses | Monthly | Marketing |
Sample Size Considerations
More data isn't always better—it costs time and money. Consider:
- Variability - High variation needs more samples
- Precision needed - Tighter estimates need more samples
- Population size - Small populations may need census
- Practical constraints - What can you actually collect?
Data Quality
Garbage in, garbage out. Ensure data quality by:
- Defining terms clearly - Everyone measures the same thing
- Training collectors - Consistent methods across observers
- Validating data - Check for errors and outliers
- Documenting context - Note anything that might affect interpretation
Real-World Collection Examples
Manufacturing: Cycle Time Study
Goal: Understand how long each production step takes
Method:
- Video recorded 50 production cycles
- Analyst reviewed footage and timed each step
- Calculated mean, standard deviation, and range
Results:
| Step | Mean Time | Std Dev | Range |
|---|---|---|---|
| Setup | 12.3 min | 3.2 min | 7-19 min |
| Processing | 45.7 min | 2.1 min | 42-51 min |
| Inspection | 8.4 min | 4.8 min | 3-22 min |
Finding: Inspection had the highest variability—investigation revealed inconsistent criteria.
Healthcare: Patient Wait Times
Goal: Measure actual vs. perceived wait times
Method:
- Electronic check-in timestamps from system
- Patient survey asking perceived wait time
- Observation study validating both sources
Results:
- Actual mean wait: 23 minutes
- Perceived mean wait: 38 minutes
- Correlation between actual and perceived: 0.4
Finding: Perception didn't match reality—communication improvements could help more than speed improvements.
Service: Support Ticket Analysis
Goal: Identify drivers of resolution time
Method:
- Extracted 6 months of ticket data from helpdesk system
- Analyzed by category, priority, assignee, time of day
- Statistical analysis to find significant factors
Results:
- Category explained 45% of variation
- Assignee explained 20%
- Time of day: no significant effect
Finding: Some ticket categories needed specialized training or better documentation.
Data Conditioning
Raw data usually needs cleaning before analysis.
Common Issues
| Problem | Example | Solution |
|---|---|---|
| Missing values | Blank timestamps | Investigate cause; decide to exclude or estimate |
| Outliers | 900-hour cycle time | Verify if real; exclude or note separately |
| Inconsistent units | Mix of minutes and hours | Standardize all values |
| Duplicate records | Same transaction twice | Identify and remove duplicates |
| Changed definitions | "Complete" means different things over time | Segment by time period |
Validation Techniques
- Range checks - Values within expected bounds?
- Consistency checks - Related values make sense together?
- Trend analysis - Sudden changes explained?
- Source verification - Spot-check against source documents
Presenting Data
Data needs to be communicated effectively to drive decisions.
Visualization Basics
Effective Data Presentation
Do:
- Label clearly—don't make people guess
- Show context—what's good or bad?
- Include sample sizes—how reliable is this?
- Tell the story—what does this mean?
Don't:
- Cherry-pick data that supports your view
- Use misleading scales
- Over-complicate with too many dimensions
- Present without interpretation
Key Takeaways
- Data collection follows discovery—you measure what you've found
- Distinguish continuous from discrete, point values from distributions
- Match collection method to what you need and what's practical
- Plan data collection systematically with clear ownership
- Ensure data quality through careful design and validation
- Present data in ways that communicate clearly and honestly