PV-CLIP: Synergizing Geometric Heuristics and Zero-Shot Foundation Models for Efficient Fall Recognition
DOI:
https://doi.org/10.47852/bonviewAIA62028461Keywords:
fall detection, YOLOv11-pose, CLIP zero-shot, geometric heuristics, DeepSORT trackingAbstract
Falls are a major cause of death associated with injuries among the elderly and hence require efficient automated surveillance mechanisms. In a preliminary study, we presented a hybrid framework, which is Faster R-CNN and YOLOv10, at the ICDSAIA 2025. Nevertheless, conventional detectors tend to detect real falls as an ordinary activity of daily living (ADLs), and thus, false positives are high. This extended study has proposed a system of cascaded pipelines, PV-CLIP, which introduces a hybridization of geometric and semantic verification to the traditional bounding-box detecting system. PV-CLIP involves three steps: (1) YOLOv11-Pose, which is an algorithm that detects the human keypoints and measures the geometric properties such as the aspect-ratio collapse; (2) DeepSORT tracking, which is an algorithm that estimates the vertical velocity and removes the standing horizontal position; and (3) CLIP-based zero-shot semantic verification, which is an algorithm that judges high-risk frames with respect to the consistency with fall-related language indicators. The ablation experiments prove that the individual contribution of each pipeline stage itself is significant and results in 98.3% accuracy on the 50-video test set. The accuracy of the YOLOv11-Pose large variant increased to 100%, which proved that the result of the combination of the kinematic and vision-language reasoning is effective to reduce false alarms. Moreover, external validation carried out on the UR Fall Detection Dataset (70 sequences) demonstrated an accuracy of 95.24% indicating that fall detection is effective in practice-based healthcare systems.
Received: 27 November 2025 | Revised: 16 March 2026 | Accepted: 27 March 2026
Conflicts of Interest
The authors declare that they have no conflicts of interest to this work.
Data Availability Statement
The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/datasets/payutch/fall-video-dataset, and in UR Fall Detection Dataset at https://fenix.ur.edu.pl/mkepski/ds/uf.html.
Author Contribution Statement
Benedict Onochie Ibe: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Dagogo Godwin Orifama: Methodology, Investigation, Writing – review & editing. Gbubemi Erics: Investigation, Resources, Writing – review & editing. Dan Ifeanyi Ali: Validation, Formal analysis, Writing – review & editing. Ikechukwu Nwagbo Enumah: Conceptualization, Methodology, Resources, Writing – review & editing, Supervision. Dominic Ogbuagu: Conceptualization, Writing – review & editing, Supervision, Project administration.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.