Introduction - If you have any usage issues, please Google them yourself
We present an approach to efficiently detect the 2D pose
of multiple people in an image. The approach uses a non-
parametric representation, which we refer to as Part Affinity
Fields (PAFs), to learn to associate body parts with individ-
uals in the image. The architecture encodes global con-
text, allowing a greedy bottom-up parsing step that main-
tains high accuracy while achieving realtime performance,
irrespective of the number of people in the image. The ar-
chitecture is designed to jointly learn part locations and
their association via two branches of the same sequential
prediction process. Our method placed first in the inaugu-
ral COCO 2016 keypoints challenge, and significantly ex-
ceeds the previous state-of-the-art result on the MPII Multi-
Person benchmark, both in performance and efficiency.