A 3d tongue model based on MRI data

Abstract

A new three-dimensional tongue model has been developed within the KTH 3D vocal tract project using manually extracted tongue contours from MR Images of a reference subject producing 43 artificially sustained Swedish articulations. The six linear parameters jaw height, tongue body, tongue dorsum, tongue tip, tongue advance and tongue width were determined using an ordered linear factor analysis controlled by articulatory measures. 88% of the variation in the midsagittal plane and 78% of the overall sagittal variation was explained by the first five factors of the analysis. The six parameter model is able to reconstruct the modeled articulations in 3D with an overall RMS reconstruction error of 0.13 cm sagittally and 0.12 cm laterally, and it specifically handles lateral differences and the observed asymmetries in tongue shape. Figure 1. Examples of tongue contours in / /: the axial set (top left), the oblique set (top right), the coronal set (bottom left) and the acquisition grid (bottom right). In the subsequent transformation to the KTH 3D model, the overlap was removed by limiting the tongue contours in the axial and semi-polar parts of the grid by the first grid-plane in the alveolar part of the grid (plane 23 in Fig. 2a). The resulting contours were then resampled to have equally spaced points, such that the half-contours in the axial and the semi-polar parts of the grid have 18 evenly spread points and those of the frontal part 30. A polygon mesh of 420 vertices and about 800 polygons was constructed by connecting each vertex (vi) to its neighbour in the same grid-plane (vi+1) and to the corresponding vertex (vj) and its neighbour on the previous grid-plane (vj+1) (cf. Fig. 2b). The neutral tongue shape (subject at rest with closed jaw) was used as the reference shape for the polygon model as well as in the parameter extraction process. All other articulations were hence modelled as deformations from the neutral shape using the articulatory control parameters defined in the component analysis described below. 3. THE ARTICULATORY MODEL 3.1 The Linear Component Analysis The extraction of the model’s parameters was done by decomposing the geometrical points of the tongue in linear components, through a Linear Component Analysis (LCA), where the factors to be extracted were imposed on the model using articulatory measures from the MR Images. LCA was chosen instead of PCA, at the cost of sub-optimal data variance explanation, as it suited the control parameter definitions in the KTH Vocal Tract model better. The order in which factors were extracted, the region of influence and the direction of activation were chosen based on an earlier extraction of the midsagittal tongue control parameters for the same subject ([9]), employing guided PCA ([10]), i.e. pure PCA alternated with LCA. Five parameters JH, TB, TD, TT and TA were determined in the same order as for the midsagittal model and a sixth parameter, tongue width (TW), was added to account for width variations of the blade and tip. 3.2 Parameter Definitions The parameters in the KTH visual speech synthesis system ([11]) are defined using the activation A (-1<A<1) of the movement of a prototype vertex (P) towards a target vertex (T) and a weight vector Wi, determining the influence of the parameter on every vertex of the mesh. The only deformation type used in the present tongue model is translations, and the displacement (∆x, ∆y, ∆z) of vertex i due to an activation A is hence (∆xi, ∆yi, ∆zi) = A•Wi•(Tx-Px, Ty-Py, Tz-Pz). All tongue control parameters have been redefined and reduced in number compared to those in [8], as the asymmetric model and statistically defined weight vectors allow for lateral variations, such that tongue grooving and lateral asymmetries are handled automatically by the midsagittal control parameters. The five parameters JH, TB, TD, TT and TA were defined by midsagittal translations, with midsagittal prototypes and targets, whereas TW was defined orthogonal to the midsagittal plane. The prototype, target and activation were set according to articulatory measures and the weight vector of each parameter was extracted through the factor analysis, minimising the difference between the Cartesian vertex coordinates of the reference shape and those of the corpus in the least square sense. When one parameter had been extracted, its contribution was withdrawn from all the articulations of the corpus and the next parameter was determined using the residual. Jaw height – JH was defined as a linear translation such that its vertical component is equal to the articulatory measure JawHei, the jaw height, and its horizontal component is the value of JawAdv, the jaw advance, predicted from JawHei ([9]). The configuration with the maximal jaw height has a JH value of 1.0 in the vertical direction and JH of the other articulations are proportional to the JawHei quota. JawHei was measured in the MR Images as the increase in the mean distance ∆si between the centres of gravity of the frontal air sinuses of the nose and the pulp of the lower incisors compared to the reference with closed jaw. The weight vector wJH is laterally asymmetric, reflecting the fact that the subject consistently lowers the left tongue edge more than the right, as shown in the nomogram in Fig. 3a). Tongue Body – TB controls the front-back movement of the tongue (corresponding to genioglossus activation), raising the tongue relative the palate and at the same time contracting in the pharynx. In the present model TB consists of two correlated translational deformations, using separate prototypes and targets for the oral and pharyngeal parts, as the parameters in the model are limited to uni-directional translations. TB was determined as the normalised deviation from the reference shape of the measure TngBody. TngBody is the Euclidean distance from the grid centre to the midsagittal tongue contour along the median line of maximal deviations in the alveopalatal part. The range of TB is shown in Fig. 3b). Tongue Dorsum – TD controls the velar arching of the tongue body (corresponding to activation of the styloglossus). The activation of TD for each articulation is given by the articulatory measure TngDors: the mean value of the Euclidean distance from the grid centre to seven midsagittal points in the velar region. TngDors, centred on the reference shape and normalised, determines TD, that also controls the grooving of the tongue, increasing the groove with decreasing TD, as shown in Fig. 3c). Figure 2. a) The ICP grid and the initial contour overlap in / /. 23 vj+1 vj vi+1 vi

References

Page 1

	Year	Citations

Page 1