运动识别之HOJ3D和HMM

http://cvrc.ece.utexas.edu/Publications/Xia_HAU3D12.pdf

View Invariant Human Action Recognition Using Histograms of 3D
Joints

The HOJ3D computed
from the action depth sequences are reprojected using LDA and then clustered
into k posture visual words, which
represent the prototypical poses of
actions. The temporal
evolutions of those visual words are modeled
by discrete
hidden Markov models (HMMs).

特征定义

In this representation, the 3D space
is partitioned into n bins using a
modified spherical coordinate system. We manually select 12 informative joints to
build a compact representation of human
posture. To make our representation robust against minor posture variation,
votes of 3D skeletal joints are cast into
neighboring bins using a Gaussian weight function.

we acquire the
3D locations of 20 skeletal joints
which comprise hip center, spine,
shoulder center, head,
L/ R shoulder, L/ R elbow, L/ R wrist, L/ R hand, L/ R hip, L/ R knee, L/
R angle and L/ R foot.

we compute
our histogram based representation of postures
from 12 of the 20 joints, including
head, L/ R elbow, L/ R hands, L/ R knee, L/ R
feet, hip
center and L/ R hip. We take the hip center as the
center of
the reference coordinate system, and
define the x-direction
according to L/ R hip. The rest 9 joints are used to compute
the 3D spatial histogram.

要达到视不变（不同视角下相同姿态正确归类）：We
achieve this by
aligning our spherical coordinates with
the person’s specific
direction。We define
the center of the spherical
coordinates as the hip center joint.Define
the horizontal reference vector α to
be the vector from
the left hip center to the right hip center projected
on the
horizontal plane (parallel to the ground), and the
zenith reference vector θ as the
vector that is perpendicular to the ground
plane and passes through the coordinate center.

partition the 3D space into n bins

The inclination angle
is divided into 7 bins from the zenith vector θ: [0,
15], [15,
45], [45, 75], [105, 135], [165, 180]

Our HOJ3D descriptor is
computed by casting the rest 9 joints
into the corresponding spatial histogram
bins.

To make
the representation robust against
minor errors of joint locations, we
vote the 3D bins using a Gaussian weight function:

For each
joint, we only vote over the bin
it is in and the 8 neighboring bins. We
calculate the probabilistic voting on θ and
α separately since they are independent (see Fig. 4).
The probabilistic voting for each
of the 9 bins is the product of the
probability on α direction and θ direction. Let the
joint

location
be The vote of
a joint location to bin is

输入为20*3（20个关节点，xyz3维空间坐标），输出为84位HOJ3D特征

特征为84维向量，水平方向12，垂直方向7

1，12个关节点局部坐标的计算：1，根据L_HIP和R_HIP的连线方向计算转换后的坐标
； 2，计算相对于HIP_CENTER的坐标

2，之后计算两个偏转角 vector
α 和 vector θ

3，在每个关节所属的bin中的8个邻域内，按双方向的单高斯分布乘积投票

特征降维

Linear discriminant analysis (LDA) is
performed to extract
the dominant features.

降维的目的是得到区分度更大的9个维度信息

输入为84维HOJ3D特征，输出为9维降维特征

特征聚类

We cluster the
vectors into K clusters (a K-word vocabulary) using K-means. Then
each posture is represented as a single number of a
visual word.

聚类是为了减少观察特征表示，训练阶段需要把所有观测数据（所有动作，每一个动作包含若干帧，每帧的20个骨骼节点经过LDA降维成9）在9维空间中聚类，可以得到25个聚类中心的坐标（9维），依次标号

在识别阶段，将LDA之后的特征，分配到最近邻的聚类中心，记录其标号，作为HMM的输入参数

训练阶段，输入为所有动作的9维特征，输出为25个聚类中心

识别阶段，输入为每一帧的动作特征（9维），输出为距其最近的聚类中心的标号

动作识别

the HMM gives a state based representation
for each action. After forming the models for each activity, we take an
action sequence and
calculate its probability of a
modelfor
the observation
sequence, for every model, which
can be solved using the forward
algorithm. Then we classify the