PV-CLIP: Synergizing Geometric Heuristics and Zero-Shot Foundation Models for Efficient Fall Recognition

Authors

DOI:

https://doi.org/10.47852/bonviewAIA62028461

Keywords:

fall detection, YOLOv11-pose, CLIP zero-shot, geometric heuristics, DeepSORT tracking

Abstract

Falls are a major cause of death associated with injuries among the elderly and hence require efficient automated surveillance mechanisms. In a preliminary study, we presented a hybrid framework, which is Faster R-CNN and YOLOv10, at the ICDSAIA 2025. Nevertheless, conventional detectors tend to detect real falls as an ordinary activity of daily living (ADLs), and thus, false positives are high. This extended study has proposed a system of cascaded pipelines, PV-CLIP, which introduces a hybridization of geometric and semantic verification to the traditional bounding-box detecting system. PV-CLIP involves three steps: (1) YOLOv11-Pose, which is an algorithm that detects the human keypoints and measures the geometric properties such as the aspect-ratio collapse; (2) DeepSORT tracking, which is an algorithm that estimates the vertical velocity and removes the standing horizontal position; and (3) CLIP-based zero-shot semantic verification, which is an algorithm that judges high-risk frames with respect to the consistency with fall-related language indicators. The ablation experiments prove that the individual contribution of each pipeline stage itself is significant and results in 98.3% accuracy on the 50-video test set. The accuracy of the YOLOv11-Pose large variant increased to 100%, which proved that the result of the combination of the kinematic and vision-language reasoning is effective to reduce false alarms. Moreover, external validation carried out on the UR Fall Detection Dataset (70 sequences) demonstrated an accuracy of 95.24% indicating that fall detection is effective in practice-based healthcare systems.

 

Received: 27 November 2025 | Revised: 16 March 2026 | Accepted: 27 March 2026

 

Conflicts of Interest

The authors declare that they have no conflicts of interest to this work.

 

Data Availability Statement

The data that support the findings of this study are openly available in Kaggle at https://www.kaggle.com/datasets/payutch/fall-video-dataset, and in UR Fall Detection Dataset at https://fenix.ur.edu.pl/mkepski/ds/uf.html.

 

 

Author Contribution Statement

Benedict Onochie Ibe: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing, Visualization. Dagogo Godwin Orifama: Methodology, Investigation, Writing – review & editing. Gbubemi Erics: Investigation, Resources, Writing – review & editing. Dan Ifeanyi Ali: Validation, Formal analysis, Writing – review & editing. Ikechukwu Nwagbo Enumah: Conceptualization, Methodology, Resources, Writing – review & editing, Supervision. Dominic Ogbuagu: Conceptualization, Writing – review & editing, Supervision, Project administration.


Downloads

Published

2026-04-10

Issue

Section

Research Article

How to Cite

Ibe, B. O., Orifama, D. G., Erics, G., Ali, D. I., Enumah, I. N., & Ogbuagu, D. (2026). PV-CLIP: Synergizing Geometric Heuristics and Zero-Shot Foundation Models for Efficient Fall Recognition. Artificial Intelligence and Applications. https://doi.org/10.47852/bonviewAIA62028461