Human Action Recognition Using APJ3D and Random
Forests
方法概述:
First, we extract the 3D skeletal joint locations
from depth images. The APJ3D computed from the
action depth image sequences by employing the 3D joint position features and the 3D
joint angle features, and then clustered
into K-means algorithm, which represent the typical
postures of actions. By employing the improved Fourier
Temporal Pyramid, we recognize actions using random
forests.
通过从kinect的骨骼点信息,提取3D 点的位置特征和3D点的角度特征,并用二者构建新特征 —— APJ3D
手工选择15个关节点(能承受小的扰动)
从训练数据中提取出的APJ3D向量要通过Kmeans聚类,傅里叶时空金字塔,随机森林最后获得识别结果
动作识别的三大挑战:
First is description of human action.
Human action in the video sequence
is a dynamic process that characterized not only with
each frame of the body posture, but also with these the emergence of gesture
sequences and continuous time. And
even with a type of action, different individuals at the completion of the action
of the process will be different
due to the different height, shape, agility and so on. Therefore, on human
action identification process, how
to quickly extract simple but effective features is
still facing a great difficulty in
human action recognition. Second
is representation model of human action, the relatively
large changes in human action, but also has a strong
combination of structural features, and how to combine
these characteristics, design a strong distinction between the ability of the
action of the model is an important
issue in human action recognition. Third is efficient
action classification algorithm design, action recognition
has a high data dimension, training data acquisition
difficulties characteristics, we hope that the behavioral
categories algorithm has the training and classification
speed, good effect, generalization ability characteristics.
特征提取:
首先选择20个关节点:hip
center, spine, shoulder center, head,
L/ R shoulder, L/ R elbow, L/ R wrist, L/ R hand,
L/ R hip, L/ R knee, L/ R angle and L/ R foot.
Among these joints, hand
and wrist and foot and ankle are very close to each
other and thus superfluous for the characterization
of body part constitution.
所以最终确定的15个关节点:head, neck, L/ R shoulder, L/ R
elbow, L/ R hands, L/ R knee,
L/ R feet, torso center and L/ R hip.
从人面对kinect的方向,判断出左右肢体
节点角度
每个关节点有其几何位置(全局笛卡尔坐标系中)
The joints
contiguous to the torso are usually called
first-degree joints, while joints contiguous to firstdegree joints are
classified as second-degree joints. Firstdegree joints include the elbows,
the knees and the head, while
second-degree joints are the extremities: the hands and feet.
每一个关节点有两个自由度:a
zenith angle θ and an azimuth angle
μ (相连两点的距离保持不变)
角度信息的获取需要将每个joint的全局坐标转化成局部坐标 —— 论文没说清,我理解应该是,从torso basis
计算出坐标系的方向和尺度(正则化),进而计算出相互连接的第一度,第二度节点
节点位置
The pairwise relative positions
of the joints results in more discriminative features
for representing the human movement is our key suggestion. Due to the coordinates are
normalized, so the motion is invariant
to the absolute body position, the initial body orientation
and the body size.
For each joint i , we extract the pairwise relative position
features by taking the difference between the position
of joint i and that of each other joint j:
The 3D joint feature for joint i is defined as:
APJ3D
用同样的torso basis
来计算第一度节点
用旋转后的标准正交的 torso basis
的信息计算第二度节点:比如,定义右肩膀-右肘为V,定义右肘-右手为W,要获取右手的特征。首先旋转torso
basis 这样,被旋转后的坐标基就移动到右肘上,然后定义球坐标系,
每一个节点对应球坐标系中的两个坐标,然后We also
compute the
angle η between the directional vector z from the RGB-D sensor and the inverted
vector t ?from the torso basis,
to detect torso inclinations. 最后的身体节点角度信息表示为:
Afterward, we select the pairwise relative position features
as
—— m the relative position between the torso center and
the hands
—— n the relative position between the torso center and
the feet
Thus, we use vector to act as the
features for action.
最后的APJ3D 特征信息:
傅里叶金字塔
we propose to use the improved Fourier Temporal Pyramid
to represent the temporal dynamics of these frame-level
features, and to solve the problem of temporal
interval.
每个动作表现为APJ3D特征的连续变化序列,通过Kmeans聚类,每个动作被表示成一系列的
key postures
In order to capture the temporal
structure of the action, apart from the global Fourier
coefficients, we recursively partition the action into
a pyramid, and use the short time Fourier transform for all the segments
。Thefinal feature is the concatenation
of the Fourier coefficients from
all the segments.
改进方法如下:
For each key
posture s, let denote its
overall feature vector where p is its
3D pairwise position vector and
vis its 3D joint angle vector.
Note that each element g is
a function of time and we can write it as . For each time segment at each
pyramid level, we use Short Fourier
Transform to element and acquire
its Fourier
coefficients, and we utilize its high-frequency and low-frequency coefficients
as features.
低频的特征可以保持对噪声的鲁棒,高频特征可表示动作的突变
经过傅里叶变换之后,对暂时扰动不再敏感because time series with
temporal translation have
the same Fourier coefficient magnitude, and the temporal
structure of the actions can be characterized by the pyramid
structure
实现中将动作分为4层金字塔
随机树训练
extract features from the training
sets are trained with the random forests
classifier, and assembled by a set of randomized decision
trees. In each decision tree, W segment features are
randomly selected from the training sets and put at a root node, and mapped to a
set of termination leaf nodes by
the interior binary splitting joints.
At each interior joint,
f variables are randomly selected out of the
Ffeature
dimension and the decision threshold T is correspondingly
chosen in the range The
splitting
function is defined as:
To measure the training quality of each leaf node, the proportion of segments from
sequences of a same action
falling into the same leaf node, the information gain is defined at each split
node:
信息增益
In the testing stage, each segment
feature is pushed to the root node
of each decision tree in the random forests classifier, and eventually forwarded to a
terminating leaf node. The path
between a root node and a terminating leaf node consists
of a set of split nodes, and each split node contains
a binary splitting function.
When the
segment feature drops into a terminating leaf node, a
histogram Prefers
to the proportion of segments per class label that
fall into this leaf node during training stage, which is the soft
voting result at the decision tree Finally,
the prediction histogram of the whole forests is acquired
by summing up the voting histograms from all the
decision trees:
因为加入了傅里叶变换,整个识别系统的抗噪声能力是杠杠滴~~
http://ojs.academypublisher.com/index.php/jsw/article/view/jsw080922382245
动作识别之APJ3D和随机森林,布布扣,bubuko.com