A Frame-Level Validation on Automated Event Detection For Baseball Hitting and Pitching

February 24, 2026
Read time: 
10
 minutes

A Frame-Level Validation on Automated Event Detection For Baseball Hitting and Pitching

Shun Chen1 and Ricky Pimentel2

1 University of Western Ontario, London, Ontario, Canada

2 Uplift Labs, Palo Alto, CA, USA

February 20, 2026

[PDF DOWNLOAD]

Summary

  • We evaluated event detection accuracy across baseball hits and pitches by directly comparing  automated events derived from 3D keypoint analysis (AUTO) versus human annotated events (ground truth).
  • AUTO detected events as "Very Accurate" (<0.025 s) or "Accurate" (<0.05 s) in 89.5% of foot contact, 97.5% of ball contact, and 94.5% of swing through for hitting, and 83.3% of foot contact, and 91.7% of ball release for pitching. AUTO exhibited a consistent slight early-detection bias relative to human annotations.
  • Uplift's automated event detection algorithms estimate key baseball events within a MAE of 5.6 frames (0.023 seconds) across all events, supporting broad use for biomechanical analysis and performance assessments.

Background

This report evaluates the accuracy of Uplift Capture’s automated event detection (AUTO) algorithm for baseball hitting and pitching. For coaches, clinicians, and sports science labs, accurate timing directly affects the quality of kinematic and kinetic metrics. Accurate event detection ensures consistent, objective, and repeatable results without relying on time-consuming manual labeling. Automated event detection permits large volumes of data to be processed automatically, so users can receive and act upon movement analysis findings as soon as possible. 

A comparable frame-level validation approach has been used by Osawa et al. (2025), to automatically classify baseball pitching phases and evaluate performance by comparing model-predicted frame labels against expert-annotated ground truth. Their frame-difference–based evaluation provides a precedent for assessing automated event-detection accuracy relative to human annotation, which will be the basis for our analysis. 

The goal of this study is to assess the accuracy of Uplift’s (AUTO) algorithms for baseball hitting and pitching movements. We compared AUTO event times against human-annotated ground truth for key biomechanical events across hitting and pitching movements (Table 1).

Table 1: List and description of hitting and pitching events analyzed

Methods

Data Collection

We obtained a convenience sample of user-collected baseball hitting and pitching movements. All collections used Uplift Capture, a human movement analysis system, to record motions from two iOS devices at 240 frames per second. All collected data underwent keypoint extraction, 3D triangulation, and biomechanical analysis using Uplift’s proprietary cloud computing pipeline. 

Data Analysis - Ground Truth

Two raters (SC and RP) identified the exact frames corresponding to each key event (Table 1). Human annotated events serve as ground truth measures for comparison to AUTO. Raters annotated a total of 205 hitting trials and 108 pitching trials. SC labelled 128 hitting (64%) and 58 pitching (54%) videos while RP labeled 50 hitting (25%) and 37 pitching (34%) videos. Additionally, 22 hitting (11%) and 13 pitching (12%) were annotated by both raters to assess inter-rater reliability.

Data Analysis - Automated Event Detection

Uplift’s proprietary biomechanical analysis detected all hitting and pitching events via 3D keypoint analysis. Hitting foot contact occurs when the instantaneous velocity drops below 0.2 m/s for at least 5 consecutive frames. Hitting Ball Contact occurs either via audio signal (primary method - looking for the “crack” of the bat on ball) or swing through event (secondary method). Hitting Swing Through event happens when the rear wrist keypoint passes in front of the lead wrist keypoint, as in when that rear wrist is closer to the pitcher than the lead, aligning with commitment to a swing. Pitching Foot Contact occurs when either the lead ankle or lead toe reaches a local minimum position or velocity drops below a specific threshold. Pitching Ball Release occurs via one of 3 methods between low arm and wrist below hips events: 1) peak wrist velocity in the horizontal axis, 2) peak wrist height, 3) peak elbow extension angle.

Data Analysis and Statistics

We evaluated event-detection accuracy by comparing the frame numbers identified by AUTO with the corresponding human-annotated ground truth (GT) for all hitting and pitching events. For each trial, a frame difference score was computed by subtracting the GT frame from the AUTO frame. Negative values indicated that AUTO detected the event earlier than the human rater, whereas positive values indicated later detection. We conducted a threshold analysis to assess the practical accuracy into four tolerance categories: Very Accurate (0-6 frames, ≤0.025 s), Accurate (7-12 frames, 0.025 - 0.05 s), Moderate (13-24 frames, 0.05 - 0.1 s), or Inaccurate (>24 frames, >0.1 s). 

In Table 2, descriptive summary statistics were utilized to convey the frame differences between AUTO and ground truth annotations. These overall statistics (mean, standard deviation, median, mean absolute error, and range) are presented along with Pearson Correlation coefficient of determination (R²) and Concordance Correlation Coefficient (CCC) analysis (Lin, 1989) to assess the agreement between AUTO and human raters (ground truth annotations) since it provides a more comprehensive measure of agreement than the R²  by evaluating both precision and accuracy. 

All data processing, statistical analyses, and figure generation were completed in Python (version 3.13.5). All analyses were performed on de-identified datasets exported from the Uplift Capture cloud processing pipeline, ensuring consistency across hitting and pitching samples. All methods and analysis are available in this GitHub Repository

Table 2: Hitting and Pitching Distribution Summary Statistics. SD: Standard Deviation, MAE: Mean Absolute Error, CCC: Concordance Correlation Coefficient

Results

Auto was able to detect events in 199 out of 205 hitting trials. Of the 6 excluded trials, 4 of the videos did not have a swing, 1 was a foul/low contact, and 1 did not record the full swing, resulting in an event detection success rate of 97.1% for hitting. AUTO was able to detect all events for 108 of the 108 pitching trials, resulting in an event detection success rate of 100% for pitching. In Figure 1, we show the relative distribution of automated events vs ground truth via histogram plots. These demonstrate the relative accuracy of AUTO across various events.

AUTO tended to detect hitting events slightly earlier than human raters, with mean differences ranging from −1.9 frames (−0.008 s) for hitting swing through to −4.9 frames (−0.020 s) for hitting ball contact. Overall detection error (MAE) was quite low overall, ranging from 3.9 frames (0.016 s) for hitting swing through to 7.7 frames (0.032 s) for pitching foot contact. Event detection variability (standard deviation, SD) also tended to be decently low, ranging from 4.1 frames (0.017 s) for hitting ball contact to 11.2 frames (0.047 s) for hitting foot contact. All events demonstrated excellent agreement, with R² values exceeding 0.997 and CCC values exceeding 0.998 across both hitting and pitching tasks, indicating near-perfect linear agreement and concordance between automated and human annotations.

Figure 1: Histogram of Relative Timing for Automated Pitching and Hitting Events (Ground Truth = 0) Across All Trials

In Figure 2, the histogram conveys the level of accuracy Uplift Capture had on detecting events. For hitting events, Swing Through demonstrated the highest Very Accurate detection rate (93.5%), followed by Foot Contact (70.5%) and Ball Contact (68.5%). For pitching events, Release achieved 87.0% and Foot Contact achieved 51.9% within the Very Accurate threshold. When combining Very Accurate and Accurate categories (i.e., within 12 frames or 0.05 ms), Ball Contact reached 97.5%, Swing Through 94.5%, Foot Contact 89.5% for hitting, and Release 91.7%, Foot Contact 83.3% for pitching. Extending to the Moderate threshold (within 24 frames or 0.1 s), cumulative accuracy exceeded 95% for all hitting events and 97% for pitching events.

For hitting events comparing AUTO vs Human annotations, all three events demonstrated almost perfect agreement: Foot Contact (CCC = 0.9985), Ball Contact (CCC = 0.9997), and Swing Through (CCC = 0.9992). For pitching events, Foot Contact achieved CCC = 0.9992 and Release achieved CCC = 0.9995. The Pearson correlations (r) ranged from 0.9987  to 0.9998, and bias correction factors (Cb) ranged from 0.9997 to 1.0000, indicating minimal systematic bias between AUTO and human annotations.

For inter-rater reliability of the ground truth events (SC vs RP), the CCC values were similarly excellent across all events: Hitting Foot Contact (CCC = 0.9995), Ball Contact (CCC = 1.0000), Swing Through (CCC = 1.0000), Pitching Foot Contact (CCC = 0.9999), and Release (CCC = 1.0000). These values confirm that human raters demonstrate near-perfect agreement with each other, establishing a reliable ground truth benchmark.

Figure 2: Tolerance Interval Analysis Across All Trials

Discussion

Uplift two-camera event-detection algorithms identified events accurately or very accurately for approximately 91% of cases on average. AUTO event detection showed a consistent slight early-detection bias by 3.6 frames or 0.015 s on average. This pattern was observed across both hitting and pitching tasks. Early automated detection suggests that the model is identifying kinematic cues slightly before the moments that human raters mark visually. Yet, the tolerance interval analysis demonstrates that AUTO achieves practical accuracy levels suitable for most biomechanical applications. The high proportion of detections within tight tolerance bounds indicates reliable real-world deployment potential. The small percentage of inaccurate detections (typically <5% for hitting, <3% for pitching) may warrant case-by-case review for applications requiring the highest precision. Furthermore, the relatively large amount of moderate accuracy detection for pitching foot contact (14.8%) suggests opportunity for improvement. Across tasks, the results convey that AUTO is effective at detecting events but may be refined for greater accuracy.

For hitting, variability was highest in foot contact and lowest in swing-phase events such as ball contact and swing through. Thus, AUTO may not detect foot contact as easily as human annotators. However, the annotators considered foot contact to be the moment of weight acceptance onto the foot, not just the instant of initial contact with the ground. This likely contributed to both the wide range of errors and higher mean absolute error (MAE) for foot contact across both pitching and hitting motions. 

Pitching events showed stronger agreement, particularly for ball release. Release is characterized by the first frame the ball is no longer in contact with any part of the pitcher. Foot contact showed greater variability, which may relate to pitcher-specific stride patterns and differences in when ground-reaction–related cues appear in the kinematic signal compared to their visual onset. 

Overall, Uplift's automated event detection algorithms estimate key baseball events within a mean absolute error of 5.6 frames (0.023 seconds) across all events, supporting use for biomechanical analysis and performance assessments. Further improvements could be made for foot contact estimation across both hitting and pitching movements, as those events had higher variability compared to the other events.

References

Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155-163.

Lin, L.I. (1989). A Concordance Correlation Coefficient to Evaluate Reproducibility. Biometrics, 45(1), 255–268. https://doi.org/10.2307/2532051

Osawa, S., Inui, A., Mifune, Y., Yamaura, K., Yoshikawa, T., Shinohara, I., Kusunose, M., Tanaka, S., Takigami, S., Ehara, Y., Nakabayashi, D., Higashi, T., Wakamatsu, R., Hayashi, S., Matsumoto, T., & Kuroda, R. (2025). Automated Classification of Baseball Pitching Phases Using Machine Learning and Artificial Intelligence-Based Posture Estimation. Applied Sciences, 15(22), 12155.