People use a combination of language and gestures to convey intentions, making the generation of natural co-speech gestures a challenging task. In audio-driven gesture generation, relying solely on features extracted from raw audio waveforms limits the model's ability to fully learn the joint distribution between audio and gestures. To address this limitation, we integrated key features from both raw audio waveforms and Mel-spectrograms. Specifically, we employed cascaded 1D convolutions to extract features from the audio waveform and a two-stage attention mechanism to capture features from the Mel-spectrogram. The fused features were then input into a Transformer with cross-dimension attention for sequence modeling, which mitigated accumulated non-autoregressive errors and reduced redundant information. We developed a diffusion model-based Audio to Diffusion Gesture (A2DG) generation pipeline capable of producing high-quality and diverse gestures. Our method demonstrated superior performance in extensive experiments compared to established baselines. Regarding the TED Gesture and TED Expressive datasets, the Fréchet Gesture Distance (FGD) performance improved by 16.8 and 56%, respectively. Additionally, a user study validated that the co-speech gestures generated by our method are more vivid and realistic.
Citation: Hongze Yao, Yingting Xu, Weitao WU, Huabin He, Wen Ren, Zhiming Cai. Audio2DiffuGesture: Generating a diverse co-speech gesture based on a diffusion model[J]. Electronic Research Archive, 2024, 32(9): 5392-5408. doi: 10.3934/era.2024250
[1] |
Jin-Yun Guo, Cong Xiao, Xiaojian Lu .
On |
[2] | Heesung Shin, Jiang Zeng . More bijections for Entringer and Arnold families. Electronic Research Archive, 2021, 29(2): 2167-2185. doi: 10.3934/era.2020111 |
[3] | Fabian Ziltener . Note on coisotropic Floer homology and leafwise fixed points. Electronic Research Archive, 2021, 29(4): 2553-2560. doi: 10.3934/era.2021001 |
[4] | Amira Khelifa, Yacine Halim . Global behavior of P-dimensional difference equations system. Electronic Research Archive, 2021, 29(5): 3121-3139. doi: 10.3934/era.2021029 |
[5] | Tran Hong Thai, Nguyen Anh Dai, Pham Tuan Anh . Global dynamics of some system of second-order difference equations. Electronic Research Archive, 2021, 29(6): 4159-4175. doi: 10.3934/era.2021077 |
[6] | Neşet Deniz Turgay . On the mod p Steenrod algebra and the Leibniz-Hopf algebra. Electronic Research Archive, 2020, 28(2): 951-959. doi: 10.3934/era.2020050 |
[7] | Doston Jumaniyozov, Ivan Kaygorodov, Abror Khudoyberdiyev . The algebraic classification of nilpotent commutative algebras. Electronic Research Archive, 2021, 29(6): 3909-3993. doi: 10.3934/era.2021068 |
[8] | Yusi Fan, Chenrui Yao, Liangyun Chen . Structure of sympathetic Lie superalgebras. Electronic Research Archive, 2021, 29(5): 2945-2957. doi: 10.3934/era.2021020 |
[9] | Dušan D. Repovš, Mikhail V. Zaicev . On existence of PI-exponents of unital algebras. Electronic Research Archive, 2020, 28(2): 853-859. doi: 10.3934/era.2020044 |
[10] | Peigen Cao, Fang Li, Siyang Liu, Jie Pan . A conjecture on cluster automorphisms of cluster algebras. Electronic Research Archive, 2019, 27(0): 1-6. doi: 10.3934/era.2019006 |
People use a combination of language and gestures to convey intentions, making the generation of natural co-speech gestures a challenging task. In audio-driven gesture generation, relying solely on features extracted from raw audio waveforms limits the model's ability to fully learn the joint distribution between audio and gestures. To address this limitation, we integrated key features from both raw audio waveforms and Mel-spectrograms. Specifically, we employed cascaded 1D convolutions to extract features from the audio waveform and a two-stage attention mechanism to capture features from the Mel-spectrogram. The fused features were then input into a Transformer with cross-dimension attention for sequence modeling, which mitigated accumulated non-autoregressive errors and reduced redundant information. We developed a diffusion model-based Audio to Diffusion Gesture (A2DG) generation pipeline capable of producing high-quality and diverse gestures. Our method demonstrated superior performance in extensive experiments compared to established baselines. Regarding the TED Gesture and TED Expressive datasets, the Fréchet Gesture Distance (FGD) performance improved by 16.8 and 56%, respectively. Additionally, a user study validated that the co-speech gestures generated by our method are more vivid and realistic.
Path algebras are very important in representation theory and related mathematical fields, it is interesting to study their counterparts in higher representation theory [21,20]. The
It is well known that the path algebras of acyclic quivers are classified as finite, tame and wild representation types according to their quivers. This classification has a great influence in representation theory. When Iyama and his coauthors study
Regarding
McKay quiver for a finite subgroup of a general linear group is introduced in [27], which connects representation theory, singularity theory and many other mathematical fields. McKay quiver is also very interesting in studying higher representation theory of algebras [20,15]. Over the algebraically closed field of characteristic zero, the preprojective algebras of path algebras of tame type are Morita equivalent to the skew group algebras of finite subgroups of
We also describe the quivers and relations for the tame
The paper is organized as follows. In Section 2, concepts and results needed in this paper are recalled. We recall the constructions of the
In this paper, we assume that
Recall that a bound quiver
Let
A bound quiver
ρ⊥=⋃i,j∈Q0ρ⊥i,j. | (1) |
The quadratic dual quiver of
To define and study
Recall that a bound quiver
Let
The
With an
(Z|n−1Q)0={(u,t)|u∈Q0,t∈Z} |
and arrow set
(Z|n−1Q)1=Z×Q1∪Z×Q1,M, |
with
Z×Q1={(α,t):(i,t)⟶(i′,t)|α:i⟶i′∈Q1,t∈Z}, |
and
Z×Q1,M={(βp,t):(i′,t)⟶(i,t+1)|p∈M,s(p)=i,t(p)=i′}. |
The relation set of
Zρ={∑sas(αs,t)(α′s,t)|∑sasαsα′s∈ρ,t∈Z}, |
ZρM={(βp′,t)(βp,t+1)|βp′βp∈ρM,t∈Z}, |
and
Zρ0={∑s′as′(βp′s′,t)(α′s′,t+1)+∑sbs(αs,t)(βps,t)|∑s′as′βp′s′α′s′+∑sbsαsβps∈ρ0,t∈Z}, |
when the returning arrow quiver
Recall a complete
Given a finite stable
Z⋄˜Q0={(i,n)|i∈Q0,n∈Z}; |
and the arrow set
Z⋄˜Q1={(α,n):(i,n)→(j,n+1)|α:i→j∈Q1,n∈Z}. |
If
ρZ⋄˜Q={ζ[m]|ζ∈˜ρ,m∈Z} | (2) |
here
A connected quiver
Clearly, a nicely graded quiver is acyclic.
Proposition 2.1. 1. Let
2. All the connected components of
3. Each connected component of
Proof. The first and the second assertions follow from Proposition 4.3 and Proposition 4.5 of [12], respectively. The last follows from the definition of
An
Proposition 2.2. Assume that
Proof. Let
ϕ(i,t)=(i,t(n+1)+d(i)−d(i0)). |
It is easy to see that
By definition, an
Proposition 2.3. If
Let
Z⋄Q0[m,l]={(i,t)|m≤t≤l}. |
ρZ⋄˜Q[m,l]={x|x∈ρZ⋄˜Q,s(x),t(x)∈Z⋄Q0[m,l]}. | (3) |
By Proposition 6.2 of [12], we get the following for any
Proposition 2.4.
The complete
Starting with an acyclic
We have the following picture to illustrate the relationship among these quivers.
![]() |
(4) |
The quadratic dual quiver
Recall that a graded algebra
⋯⟶Ptft⟶⋯⟶P1f1⟶P0f0⟶˜Λ0⟶0, | (5) |
such that
Let
The following Proposition justifies the name
Proposition 2.5. Let
Starting with a quadratic acyclic
With the quivers related to
We also associate algebras to the bound quivers
Taking the usual path grading on
So we can get criteria for
Proposition 2.6. Let
Write
![]() |
(6) |
The left triangle is induced by the picture (4) depicting quivers. The left vertically up arrow indicate taking a
Now we assume that
For
At(˜Λ)=(dimke1˜Λte1dimke2˜Λte1⋯dimkem˜Λte1dimke1˜Λte2dimke2˜Λte2⋯dimkem˜Λte2⋯⋯⋯⋯dimke1˜Λtemdimke2˜Λtem⋯dimkem˜Λtem). |
Let
The Loewy matrix
L(˜Λ)=(A1(˜Λ)−E0⋯00A2(˜Λ)0−E⋯00⋅⋅⋅⋯⋅⋅An(˜Λ)00⋯0−EAn+1(˜Λ)00⋯00) | (7) |
with size
Let
l.dimM=(dimMh⋮dimMh+n), | (8) |
where
Assume that
⟶Ps⟶⋯⟶P1→P0⟶M→0 |
such that
l.dimΩsM=Ls(˜Λ)l.dimM | (9) |
by Proposition 1.1 of [18].
Write
V0=(E0⋮0)m(n+1)×m. |
The following Proposition follows from (9) and the definition of the
Proposition 2.7. Assume that
Now assume that
We can restate the Theorems 2.4 and 2.5 of [18] as follows.
Proposition 2.8.
Then by Theorem 3.1 of [30],
The following is part (a) of Proposition 2.9 of [18].
Proposition 2.9. Let
Let
By applying Proposition 2.9, the following is proved as Theorem 2.10 in [18].
Proposition 2.10. If
Recall that for a graded algebra
GKdimΓ=¯limm→∞logmm⨁t=1dimkΓt. | (10) |
By using Koszul duality, the Gelfand-Kirilov dimension of the quadratic dual
The following is Theorem 3.2 in [18]. Let
Theorem 2.11. If
If
The following is Theorem 3.1 in [18].
Theorem 2.12. If
By picture (6), we see an
To classify the
We call an algebra
Lemma 3.1. A stable
Proof. For stable
Now assume that
Clearly, being weakly periodic is a special case of complexity
We have the following characterization of the periodicity and of the complexities for stable
Theorem 3.2. Let
1. The algebra
2. The algebra
3. The algebra
Proof. Assume that
Now assume that
If
Now assume that
Let
Theorem 3.3. Let
1.
2.
3.
Proof. If
If
Combine Theorems 3.2 and 3.3, we get the following.
Theorem 3.4. Let
1.
2.
3.
By Theorem 2.12, we also have the following.
Theorem 3.5. If
Assume that
ΔνΛ=ΔνΓ!,op≃Π(Γ)!,op, |
where
When
Now we define that an
As an immediate consequence of Theorem 3.4, we get a classification of
Theorem 3.6. An
For an
Theorem 3.7. 1.
2.
3.
Proof. This follows from the above Theorems 3.3 and 3.4.
When
Proposition 3.8. If
It is natural to ask if the converse of Proposition 3.8 is true?
By Proposition 3.8, if the
In this section, we assume that
McKay quiver was introduced in [27]. Let
V⊗Si=⨁jSai,jj, 1≤i≤l, |
here
We recall some results on the McKay quivers of Abelian groups and on the relationship between McKay quivers of same group in
Let
G=G(r1,⋯,rm)=Cr1×⋯×Crm |
is a direct product of
(ξi1r1,ξi2r2,⋯,ξimrm)⟶diag(ξi1r1,ξi2r2,⋯,ξimrm), |
for
Proposition 4.1. The McKay quiver
˜Q0(r1,⋯,rm)=Z/r1Z×⋯×Z/rmZ | (11) |
and the arrow set
˜Q1(r1,⋯,rm)={αi,t:i→i+et|i∈Z/r1Z×⋯×Z/rmZ,1≤t≤m}∪{αi,m+1:i→i−e|i∈Z/r1Z×⋯×Z/rmZ}. | (12) |
Proof. We prove by induction on
The Abelian subgroup in
Assume Proposition holds for
Embed the group
(ξi1r1,⋯,ξihrh)⟶diag(ξi1r1,⋯,ξihrh,ξih+1rh+1), |
then group
G(r1,⋯,rh,rh+1)∩SL(h+1,C)=G(r1,⋯,rh), |
and we have
G(r1,⋯,rh,rh+1)=G(r1,⋯,rh)×C′rh+1, |
where
G(r1,⋯,rh,rh+1)/G(r1,⋯,rh,rh+1)∩SL(h+1,C)≃(ξrh+1). |
By Theorem 1.2 of [11], the McKay quiver of
αi,h+1:(i(h),t)→(i(h)−e(h),t+1) |
from one copy
˜Q′1(r1,⋯,rh,rh+1)={αi,t:i→i+e(h+1)t|i∈Z/r1Z×⋯×Z/rh+1Z,1≤t≤h+1} |
as the arrow set.
Now embed
(ξi1r1,⋯,ξih+1rh+1)⟶diag(ξi1r1,⋯,ξih+1rh+1,ξ−i1r1⋯ξ−ih+1rh+1). |
Since Nakayama permutation
This shows that Proposition holds for
Note that the Nakayama permutation for the subgroup of a special linear group is trivial. As a direct consequence of Proposition 3.1 of [11], we also have the following Proposition to describe the McKay quiver of finite group
Proposition 4.2. Let
Let
Theorem 4.3. Let
Let
Note that
Proposition 4.4. Let
Starting with a finite group
Let
Let
{(i,ˉm)|(i,m)∈Z|n−1Q(G),ˉm∈Z/(n+1)Z}. |
For a path
Proposition 4.5.
Proof. For any
⟶˜Ptft⟶˜Pt−1ft−1⟶⋯˜P1f1⟶˜P0⟶˜Λ0(G)ei⟶0 | (13) |
for the simple
Clearly (13) is exact if and only if
(d1,⋯,dht)=(d′1,⋯,d′ht+1)Ct+1. |
Now for any
⋯⟶˜Pt[¯m]˜ft[¯m]⟶˜Pt−1[¯m]˜ft−1[¯m]⟶⋯˜P0[¯m]˜f1[¯m]⟶˜P0⟶Δ(G)ei,¯m⟶0. | (14) |
By comparing the matrices defining the sequences (13) and (14), we see that (13) is exact if and only if (14) is exact. That is, (13) is a projective resolution of simple
Since
By comparing (13) and (14), the simple modules
We call
It is interesting to know if the converse of Proposition 4.5 is true, that is, for an indecomposable nicely graded tame
For the three pairs of quadratic duals of algebras:
For an AS-regular algebra
Theorem 4.6. The following categories are equivalent as triangulate categories:
1. the bounded derived category
2. the bounded derived category
3. the bounded derived category
4. the stable category
5. the stable category
6. the stable category
If the lengths of the oriented circles in
(7) the bounded derived category
(8) the stable category
Proof. By Theorem 4.14 of [28], we have that
By Theorem 1.1 of [7], we have that
By Corollary 1.2 of [7], we have that
On the other hand, we have
By Lemma Ⅱ.2.4 of [19],
Similarly,
When the lengths of the oriented circles in
We remark that the equivalence of (1) and (2) can be regarded as a McKay quiver version of Beilinson correspondence and the equivalence of (1) and (4) can be regarded as a McKay quiver version of Berstein-Gelfand-Gelfand correspondence [5,4]. So we have the following analog to (6), for the equivalences of the triangulate categories in Theorem 4.6:
![]() |
(15) |
In the classical representation theory, we take a slice from the translation quiver and view the path algebra as
Now we consider the case of
Assume that
Since
L(˜Λ)=(M(˜Q)−E0M′(˜Q)0−EE00). | (16) |
This is exactly the Loewy matrix of
Proposition 5.1. Let
1. If there is an arrow from
2. If there is only one arrow from
Proof. The proposition follows directly from
Therefore the number of arrows from
We also have the following Proposition.
Proposition 5.2. If
Proof. Due to that there is no arrow from
Now we determine the relations for the McKay quiver of finite Abelian subgroup of
Let
Let
![]() |
For each vertex
z(γ,i,ci)=ciβi+e1αi+αi+xe2βi,z(β,i,bi)=biαi−eγi+γi+e1αi, |
and
z(α,i,ai)=aiβi−eγi+γi+e2βi, |
for
˜ρcomm(s,r,C(a,b,c))={z(α,i,ai),z(β,i,bi),z(γ,i,ci)|i∈˜Q0}˜ρzero(s,r)={αi+e1αi,βi+e2βi,γi−eγi|i∈˜Q0}, |
and let
˜ρ(s,r,C(a,b,c))=˜ρcomm(s,r,C(a,b,c))∪˜ρzero(s,r). | (17) |
Proposition 5.3. If the quotient algebra
Proof. Assume that
Consider the square with vertices
ciβi+e1αi+c′iαi+e2βi∈I, |
by (2) of Proposition 5.1. Since
z(γ,i,ci)=z(γ,i,ci,1)∈I. |
Similarly, there are
z(β,i,bi)=biαi−eγi+γi+e1αi∈I, |
and
z(α,i,ai)=aiβi−eγi+γi+e2βi∈I. |
For each
αi+e1αi,βi+e2βi,γi−eγi∈I, |
by Proposition 5.2.
So
Let
ei˜Λ2=kβi−e2αi−e+kγi+eβi+be1+kγi+eαi+be2, |
and
˜Λ3ei=kγi+eβi+e1αi, |
by computing directly using the relations in
dimkei˜Λ3ei≤1=dimkei˜Λ3(G(s,r))ei, |
and
dimkei˜Λ2ei′{≤1,for i′=i−e,i+e1,i+e2;=0,otherwise. |
This implies that
dimkei˜Λ2ei′≤dimkei˜Λ2ei′, |
for any
So
ei′˜Λtei=ei′˜Λt(G(s,r))ei |
for all
This proves
For the quadratic dual quiver
Proposition 5.4. If the quotient algebra
˜ρ⊥(s,r,C(a,b,c))={z(α,i,−a−1i),z(β,i,−b−1i),z(γ,i,−c−1i)|i∈˜Q0}. | (18) |
Proof. Let
Now construct the quiver
Recall by (2), the relation set for
z(γ,(i,t),ci)=ciβi+e1,t+αi+e2,t+1βi,t,z(β,(i,t),bi)=biαi−e,t+1γi,t+γi+e1,t+1αi,t, |
and
z(α,(i,t),ai)=aiβi−e,t+1γi,t+γi+e2,t+1βi,t, |
for
˜ρcomm(s,r,C(a,b,c))[t]={z(α,(i,t),ai),z(β,(i,t),bi),z(γ,(i,t),ci)|i∈˜Q0}˜ρzero(s,r)[t]={αi+e1,t+1αi,t,βi+e2,t+1βi,t,γi−e,t+1γi,t|i∈˜Q0}, |
and let
ρZ⋄˜Q(s,r)=⋃t∈Z(˜ρcomm(s,r,C(a,b,c))[t]∪˜ρzero(s,r)[t]). | (19) |
By taking a connected component
![]() |
Here we denote by
Let
ρ(s,r)={x∈ρZ⋄˜Q(s,r)|e[0,2]xe[0,2]=x}=˜ρcomm(s,r,C(a,b,c))[0]∪˜ρzero(s,r)[0]. |
Since any sequence of relations in
ρ(s,r)={αi+e1,1αi,0,βi+e2,1βi,0,γi−e1,1γi,0|i∈Z/sZ×Z/rZ}∪{αi+e2,1βi,0−βi+e1,1αi,0,αi−e,1γi,0−γi+e1,1αi,0,αi−e1,1γi,0−γi+e1,1αi,0|i∈Z/sZ×Z/rZ}. | (20) |
Proposition 5.5. If the quotient algebra
The quadratic dual quiver
ρ⊥(s,r)={αi+e2,d+1βi,d1=βi+e1,d+1αi,d,αi−e,1γi,0+γi+e1,1αi,0,αi−e1,1γi,0+γi+e1,1αi,0|i∈Z/sZ×Z/rZ}. | (21) |
Proposition 5.6. If the quotient algebra
So the relations for the
Let
![]() |
![]() |
The vertices for
We have immediate the following on the arrows of these McKay quivers.
Lemma 5.7. Let
1. There is a loop at each vertex of
2. There is at most one arrow from
3. There is an arrow
4. There are at most
Let
Lemma 5.8. If
αj,hαi,j,βj,hβi,j,αj,hβi,j,βj,hαi,j∈I(Ξ) |
if such paths exist.
Proof. If
αj,hαi,j∈I(Ξ). |
Similarly we have
βj,hβi,j,αj,hβi,j,βj,hαi,j∈I(Ξ) |
if such paths exist.
Write
˜ρ11(Ξ)={αj,hαi,j|i,h,j∈˜Q0(Ξ),i≠h},˜ρ12(Ξ)={αj,hβi,j|i,h,j∈˜Q0(Ξ),i≠h},˜ρ21(Ξ)={αj,hβi,j|i,h,j∈˜Q0(Ξ),i≠h},˜ρ22(Ξ)={βj,hβi,j|i,h,j∈˜Q0(Ξ),i≠h}. |
Take
˜ρp(Ξ)=˜ρ11(Ξ)∪˜ρ12(Ξ)∪˜ρ21(Ξ)∪˜ρ22(Ξ). | (22) |
As a corollary of Lemma 5.8, we get the following.
Proposition 5.9.
Let
˜ρa(Ξ,Ca)={αi,jγi−γjαi,j,βj,iγj−γiβj,i|αi,j∈˜Q1,ai,j∈Ca,i<j}. | (23) |
Proposition 5.10. By choosing the representatives of the arrows suitably, we have a set
˜ρa(Ξ,Ca)⊆I(Ξ). |
Proof. By Lemma 5.7, there is an arrow
βj,iγj−bj,iγiβj,i∈I(Ξ), |
for some
αi,jγi−a′i,jγjαi,j∈I(Ξ), |
for some
Starting from
βj,iγj−γiβj,i∈I(Ξ) |
for all arrows
This proves that by choosing the representatives of the arrows suitably, we have
Write
Lemma 5.11. For each arrow
Write
˜ρa⊥(Ξ,Ca)={ai,jαi,jγi+γjαi,j,βj,iγj+γiβj,i|αi,j∈˜Q1,ai,j∈Ca,i<j}. | (24) |
Now consider
Fix
μi,j={αi,ji<j,βi,ji>j, and ζj,i={βj,ii<j,αj,ii>j. | (25) |
Then
Consider the following cases.
Lemma 5.12. Assume that there is only one arrow
1. If
2. We have
{γ2i−ciζj,iμi,j,ciγ2i+ζj,iμi,j} | (26) |
is a orthogonal basis for
Proof. Apply Proposition 5.1 for the arrow
The second assertion follows from direct computation.
Lemma 5.13. Assume that there are exactly two arrows
1. There is a
2. If
3. We have
{biζj1,iμi,j1+ζj2,iμi,j2,ζj1,iμi,j1−b′iζj2,iμi,j2,γ2i}and{biζj1,iμi,j1−ζj2,iμi,j2,ciζj1,iμi,j1−γ2i,ζj1,iμi,j1+biζj2,iμi,j2+ciγ2i} | (27) |
are orthogonal bases for
Proof. By Proposition 5.1, we have that the images of
If
The rest follows from direct computations.
Lemma 5.14. Assume that there are three arrows
1. There are
biζj1,iμi,j1+ζj2,iμi,j2,b′iζj1,iμi,j1+ζj3,iμi,j3∈I(Ξ). |
2. If
3. We have
{biζj1,iμi,j1−ζj2,iμi,j2,b′iζj1,iμi,j1−ζj3,iμi,j3,γ2i−ciζj1,iμi,j1,ciγ2i+ζj1,iμi,j1+biζj2,iμi,j2+b′iζj3,iμi,j3}and{γ2,biζj1,iμi,j1−ζj2,iμi,j2,b′iζj1,iμi,j1−ζj3,iμi,j3,ζj1,iμi,j1+biζj2,iμi,j2+b′iζj3,iμi,j3,γ2} | (28) |
are orthogonal bases of
Proof. The lemma follows from Proposition 5.1, similar to above two lemmas.
Denote by
Ci={{ci}⊂k∗i∈˜Q01(Ξ),{ci,bi}⊂k∗i∈˜Q02(Ξ),{ci,bi,b′i}⊂k∗i∈˜Q03(Ξ), | (29) |
and set
C′i={∅i∈˜Q01(Ξ),{bi}⊂k∗i∈˜Q02(Ξ),{bi,b′i}⊂k∗i∈˜Q03(Ξ). | (30) |
Let
Ui(Ci)={{ciγ2i+ζj,iμi,j}i∈˜Q01(Ξ),{ζj1,iμi,j1−biζj2,iμi,j2,γ2i−ciζj1,iμi,j1}i∈˜Q02(Ξ),{biζj1,iμi,j1−ζj2,iμi,j2,b′iζj1,iμi,j1−ζj3,iμi,j3,γ2i−ciζj1,iμi,j1}i∈˜Q03(Ξ). | (31) |
and let
U−i(C′i)={{ζj,iμi,j}i∈˜Q01(Ξ),{γ2,ζj1,iμi,j1−biζj2,iμi,j2}i∈˜Q02(Ξ),{γ2i,biζj1,iμi,j1−ζj2,iμi,j2,b′iζj1,iμi,j1−ζj3,iμi,j3}i∈˜Q03(Ξ). | (32) |
Lemma 5.15. A basis of the orthogonal subspace of
U⊥i(Ci)={{ζj,iμi,j+ciγ2i}i∈˜Q01(Ξ),{ζj1,iμi,j1+b−1iζj2,iμi,j2+c−1iγ2i}i∈˜Q02(Ξ),{ζj1,iμi,j1+b−1iζj2,iμi,j2+b′−1iζj3,iμi,j3+ciγ2i}i∈˜Q03(Ξ). | (33) |
and let
U−,⊥i(C′i)={{ζj,iμi,j}i∈˜Q01(Ξ),{biζj1,iμi,j1+ζj2,iμi,j2}i∈˜Q02(Ξ),{ζj1,iμi,j1+b−1iζj2,iμi,j2+b′−1iζj3,iμi,j3}i∈˜Q03(Ξ). | (34) |
Proof. This follow immediately from (2) of Lemma 5.12, (3) of Lemma 5.13 and (3) of Lemma 5.14.
Fix
CJ=Ca∪⋃i∈Q0∖JCi∪⋃i∈JC′i, | (35) |
for
˜ρ(Ξ,J,CJ)=˜ρp(Ξ)∪˜ρa(Ξ,Ca)∪⋃i∈˜Q0∖JUi(Ci)∪⋃i∈JU−i(C′i). | (36) |
By Lemma 5.11 and Lemma 5.15, we have the following.
Proposition 5.16.
˜ρ⊥(Ξ,J,CJ)=˜ρa⊥(Ξ,Ca)∪⋃i∈˜Q0∖JU⊥i(Ci)∪⋃i∈JU−,⊥i(Ci). | (37) |
Proposition 5.17. Let
Proof. Take
Let
It is also immediate that
By Lemma 5.16, a quadratic dual relation of
Proposition 5.18. Let
Now construct the nicely-graded quiver
ρZ⋄˜Q(Ξ)=ρZ⋄˜Q(Ξ)(J,CJ)={z[m]|z∈˜ρ(Ξ,J,CJ),t∈Z} |
for some parameter set
Λ(Ξ,J,C)≃kZ⋄˜Q(Ξ)/(ρZ⋄˜Q(Ξ)(J,CJ)), |
if
By taking the complete
![]() |
![]() |
They are all nicely-graded quivers. We get
ρ(Ξ,J,C)={z[0]|z∈˜ρ(Ξ,J,C)}. | (38) |
For any parameter set
Write
μi,j,t={αi,j,ti<j,t=0,1,βi,j,ti>j,t=0,1, and ζj,i={βj,i,ti<j,t=0,1,αj,i,ti>j,t=0,1. |
Let
Ui(Ci)={{γi,1γi,0+ζj,i,1μi,j,0}i∈˜Q01(Ξ),{ζj1,i,1μi,j1,0−ζj2,i,1μi,j2,0,γi,1γi,0−ζj1,i,1μi,j1,0}i∈˜Q02(Ξ),{ζj1,i,1μi,j1,0−ζj2,i,1μi,j2,0,ζj1,i,1μi,j1,0−ζj3,i,1μi,j3,0,γi,1γi,0−ζj1,i,1μi,j1,0}i∈˜Q03(Ξ), | (39) |
and let
U−i(C′i)={{ζj,i,1μi,j,0}i∈˜Q01(Ξ),{γ2,ζj1,i,1μi,j1,0−ζj2,i,1μi,j2,0}i∈˜Q02(Ξ),{ζj1,i,1μi,j1,0+ζj2,i,1μi,j2,0+ζj3,i,1μi,j3,0}i∈˜Q03(Ξ), | (40) |
U⊥i(Ci)={{ζj,i,1μi,j,0+γi,1γi,0}i∈˜Q01(Ξ),{ζj1,i,1μi,j1,0+ζj2,i,1μi,j2,0+γi,1γi,0}i∈˜Q02(Ξ),{ζj1,i,1μi,j1,0+b−1iζj2,i,1μi,j2,0+ζj3,i,1μi,j3,0+γi,1γi,0}i∈˜Q03(Ξ), | (41) |
and let
U−,⊥i(C′i)={{ζj,i,1μi,j,0}i∈˜Q01(Ξ),{ζj1,i,1μi,j1,0+ζj2,i,1μi,j2,0}i∈˜Q02(Ξ),{ζj1,i,1μi,j1,0+ζj2,i,1μi,j2,0+ζj3,i,1μi,j3,0}i∈˜Q03(Ξ). | (42) |
Take
ρp(Ξ)={αi,j,1αj,h,0|i,h,j∈˜Q0(Ξ),i≠h}∪{βi,j,1βj,h,0|i,h,j∈˜Q0(Ξ),i≠h}∪{βi,j,1αj,h,0|i,h,j∈˜Q0(Ξ),i≠h}∪{βi,j,1αj,h,0|i,h,j∈˜Q0(Ξ),i≠h}, |
and
ρa(Ξ)={αi,j,1γi,0−γj,1αi,j,0,βj,i,1γj,0−γi,1βj,i,0|αi,j∈˜Q1,i<j}. |
Write
ρa⊥(Ξ)={αi,j,1γi,0+γj,1αi,j,0,βj,i,1γj,0+γi,1βj,i,0|αi,j∈˜Q1,i<j}. |
They are subsets of the space
Take a subset
ρ(Ξ,J)=ρp(Ξ)∪ρa(Ξ)∪⋃i∈˜Q0∖JUi,0∪⋃i∈JU−i,0, | (43) |
and
ρ⊥(Ξ,J)=ρa⊥(Ξ)∪⋃i∈˜Q0∖JU⊥i,0∪⋃i∈JU−,⊥i,0. | (44) |
We have the following descriptions of the relation sets for the
Proposition 5.19. Let
Proposition 5.20. Let
1. By constructing
2. The complete
3. Though we need the field
We would like to thank the referees for reading the manuscript carefully and for suggestions and comments on revising and improving the paper. They also thank the referee for bring [24] to their attention.
[1] | S. Van Mulken, E. André, J. Müller, The Persona Effect: How Substantial Is It?, in People and Computers XⅢ : Proceedings of HCI'98, Springer London, (1998), 53–66. https://doi.org/10.1007/978-1-4471-3605-7_4 |
[2] |
J. Cassell, D. McNeill, K. E. McCullough, Speech-gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information, Pragmatics Cognit., 7 (1999), 1–34. https://doi.org/10.1075/pc.7.1.03cas doi: 10.1075/pc.7.1.03cas
![]() |
[3] | T. Kucherenko, P. Jonell, Y. Yoon, P. Wolfert, A large, crowdsourced evaluation of gesture generation systems on common data: The GENEA challenge 2020, in 26th International Conference on Intelligent User Interfaces (IUI), (2021), 11–21. https://doi.org/10.1145/3397481.3450692 |
[4] | C. M. Huang, B. Mutlu, Robot behavior toolkit: generating effective social behaviors for robots, in Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI), (2012), 25–32. https://doi.org/10.1145/2157689.2157694 |
[5] |
M. Salem, S. Kopp, I. Wachsmuth, K. Rohlfing, F. Joublin, Generation and evaluation of communicative robot gesture, Int. J. Social Rob., 4 (2012), 201–217. https://doi.org/10.1007/s12369-011-0124-9 doi: 10.1007/s12369-011-0124-9
![]() |
[6] | A. Kranstedt, S. Kopp, I. Wachsmuth, Murml: A multimodal utterance representation markup language for conversational agents, in AAMAS'02 Workshop Embodied Conversational Agents-Let's Specify and Evaluate Them!, 2002. |
[7] | J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B. Achorn, T. Becket, et al., Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents, in Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), (1994), 413–420. https://doi.org/10.1145/192161.192272 |
[8] |
Y. Yoon, B. Cha, J. H. Lee, M. Jang, J. Lee, J. Kim, et al., Speech gesture generation from the trimodal context of text, audio, and speaker identity, ACM Trans. Graphics, 39 (2020), 1–16. https://doi.org/10.1145/3414685.3417838 doi: 10.1145/3414685.3417838
![]() |
[9] | U. Bhattacharya, E. Childs, N. Rewkowski, D. Manocha, Speech2affectivegestures: Synthesizing co-speech gestures with generative adversarial affective expression learning, in Proceedings of the 29th ACM International Conference on Multimedia (MM), (2021), 2027–2036. https://doi.org/10.1145/3474085.3475223 |
[10] | X. Liu, Q. Wu, H. Zhou, Y. Xu, R. Qian, X. Lin, et al., Learning hierarchical cross-modal association for co-speech gesture generation, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 10452–10462. https://doi.org/10.1109/CVPR52688.2022.01021 |
[11] | S. Ginosar, A. Bar, G. Kohavi, C. Chan, A. Owens, J. Malik, Learning individual styles of conversational gesture, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 3492–3501. https://doi.org/10.1109/CVPR.2019.00361 |
[12] |
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, et al., Generative adversarial nets, Commun. ACM, 63 (2020), 139–144. https://doi.org/10.1145/3422622 doi: 10.1145/3422622
![]() |
[13] | J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models, in Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS), (2020), 6840–6851. |
[14] |
S. Alexanderson, G. E. Henter, T. Kucherenko, J. Beskow, Style‐controllable speech‐driven gesture synthesis using normalising flows, Comput. Graphics Forum, 39 (2020), 487–496. https://doi.org/10.1111/cgf.13946 doi: 10.1111/cgf.13946
![]() |
[15] | T. Kucherenko, P. Jonell, S. Van Waveren, G. E. Henter, S. Alexandersson, I. Leite, et al. Gesticulator: A framework for semantically-aware speech-driven gesture generation, in Proceedings of the 2020 International Conference on Multimodal Interaction (ICMI), (2020), 242–250. https://doi.org/10.1145/3382507.3418815 |
[16] | S. Qian, Z. Tu, Y. Zhi, W. Liu, S. Gao, Speech drives templates: Co-speech gesture synthesis with learned templates, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 11057–11066. https://doi.org/10.1109/ICCV48922.2021.01089 |
[17] | Y. Yoon, W. R. Ko, M. Jang, J. Lee, J. Kim, G. Lee, Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots, in 2019 International Conference on Robotics and Automation (ICRA), (2019), 4303–4309. https://doi.org/10.1109/ICRA.2019.8793720 |
[18] | C. Ahuja, L. P. Morency, Language2pose: Natural language grounded pose forecasting, in 2019 International Conference on 3D Vision (3DV), (2019), 719–728. https://doi.org/10.1109/3DV.2019.00084 |
[19] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), (2017), 6000–6010. https://doi.org/10.48550/arXiv.1706.03762 |
[20] |
S. Nyatsanga, T. Kucherenko, C. Ahuja, G. E. Henter, M. Neff, A comprehensive review of data‐driven co‐speech gesture generation, Comput. Graphics Forum, 42 (2023), 569–596. https://doi.org/10.1111/cgf.14776 doi: 10.1111/cgf.14776
![]() |
[21] | D. Hasegawa, N. Kaneko, S. Shirakawa, H. Sakuta, K. Sumi, Evaluation of speech-to-gesture generation using Bi-directional LSTM network, in Proceedings of the 18th International Conference on Intelligent Virtual Agents (IVA), (2018), 79–86. https://doi.org/10.1145/3267851.3267878 |
[22] | T. Kucherenko, D. Hasegawa, G. E. Henter, N. Kaneko, H. Kjellströ m, Analyzing input and output representations for speech-driven gesture generation, in Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (IVA), (2019), 97–104. https://doi.org/10.1145/3308532.3329472 |
[23] |
T. Ao, Q. Gao, Y. Lou, B. Chen, L. Liu, Rhythmic gesticulator: Rhythm-aware co-speech gesture synthesis with hierarchical neural embeddings, ACM Trans. Graphics, 41 (2022), 1–19. https://doi.org/10.1145/3550454.3555435. doi: 10.1145/3550454.3555435
![]() |
[24] | S. Ye, Y. H. Wen, Y. Sun, Y. He, Z. Zhang, Y. Wang, et al., Audio-driven stylized gesture generation with flow-based model, in European Conference on Computer Vision, 13665 (2022), 712–728. https://doi.org/10.1007/978-3-031-20065-6_41 |
[25] | H. Liu, Z. Zhu, N. Iwamoto, Y. Peng, Z. Li, Y. Zhou, et al., BEAT: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis, in European Conference on Computer Vision, 13667 (2022), 612–630. https://doi.org/10.1007/978-3-031-20071-7_36 |
[26] | H. Yi, H. Liang, Y. Liu, Q. Cao, Y. Wen, T. Bolkart, et al., Generating holistic 3D human motion from speech, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), 469–480. |
[27] | R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, High-resolution image synthesis with latent diffusion models, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), 10684–10695. https://doi.org/10.48550/arXiv.2112.10752 |
[28] | P. Dhariwal, A. Nichol, Diffusion models beat GANs on image synthesis, in Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS), (2021), 8780–8794. https://doi.org/10.48550/arXiv.2105.05233 |
[29] |
M. Zhang, Z. Cai, L. Pan, F. Hong, X, Guo, L, Yang, et al., MotionDiffuse: Text-driven human motion generation with diffusion model, IEEE Trans. Pattern Anal. Mach. Intell., 46 (2024), 4115–4128. https://10.1109/TPAMI.2024.3355414 doi: 10.1109/TPAMI.2024.3355414
![]() |
[30] | X. Chen, B. Jiang, W. Liu, Z. Huang, B. Fu, T. Chen, et al., Executing your commands via motion diffusion in latent space, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), 18000–18010. https://10.1109/CVPR52729.2023.01726 |
[31] |
S. Alexanderson, R. Nagy, J. Beskow, G. E. Henter, listen, denoise, action! audio-driven motion synthesis with diffusion models, ACM Trans. Graphics, 42 (2023). https://doi.org/10.1145/3592458 doi: 10.1145/3592458
![]() |
[32] | L. Zhu, X. Liu, X. Liu, R. Qian, Z. Liu, L. Yu, Taming diffusion models for audio-driven co-speech gesture generation, in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), 10544–10553. https://doi.org/10.1109/CVPR52729.2023.01016 |
[33] | S. Yang, Z. Wu, M. Li, Z. Zhang, L. Hao, W. Bao, et al., DiffuseStyleGesture: stylized audio-driven co-speech gesture generation with diffusion models, in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence (IJCAI), 650 (2023), 5860–5868. https://doi.org/10.24963/ijcai.2023/650 |
[34] | Y. Yuan, J. Song, U. Iqbal, A. Vahdat, J. Kautz, PhysDiff: Physics-guided human motion diffusion model, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), (2023), 16010–16021. https://doi.org/10.48550/arXiv.2212.02500 |
[35] |
T. Ao, Z. Zhang, L. Liu, GestureDiffuCLIP: Gesture diffusion model with CLIP latents, ACM Trans. Graphics, 42 (2023), 1–18. https://doi.org/10.1145/3550454.3555435 doi: 10.1145/3550454.3555435
![]() |
[36] | Z. Cao, T. Simon, S. Wei, Y. Sheikh, Realtime multi-person 2D pose estimation using part affinity fields, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 1302–1310. https://doi.org/10.1109/CVPR.2017.143 |
[37] | V. Choutas, G. Pavlakos, T. Bolkart, D. Tzionas, M. J. Black, Monocular expressive body regression through body-driven attention, in 16th European Conference Computer Vision (ECCV), 12355 (2020), 20–40. https://doi.org/10.1007/978-3-030-58607-2_2 |
[38] | Y. Chen, Y. Kalantidis, J. Li, S. Yan, J. Feng, A.2-Nets: Double attention networks, in Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS), (2018), 350–359. https://doi.org/10.48550/arXiv.1810.11579 |
[39] | T. Y. Lin, A. RoyChowdhury, S. Maji, Bilinear CNN models for fine-grained visual recognition, in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), (2015), 1449–1457. https://doi.org/10.1109/ICCV.2015.170 |
[40] | A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, et al., An image is worth 16×16 words: Transformers for image recognition at scale, preprint, arXiv: 2010.11929v2. https://doi.org/10.48550/arXiv.2010.11929 |
[41] | D. Misra, T. Nalamada, A. U. Arasanipalai, Q. Hou, Rotate to attend: Convolutional triplet attention module, in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2021), 3139–3148. https://doi.org/10.1109/WACV48630.2021.00318 |
[42] | M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, GANs trained by a two time-scale update rule converge to a local nash equilibrium, in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), (2017), 6629–6640. https://doi.org/10.48550/arXiv.1706.08500. |
[43] |
C. Ionescu, D. Papava, V. Olaru, C. Sminchisescu, Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2013), 1325–1339. https://doi.org/10.1109/TPAMI.2013.248 doi: 10.1109/TPAMI.2013.248
![]() |
[44] | R. Li, S. Yang, D. A. Ross, A. Kanazawa, AI choreographer: Music conditioned 3D dance generation with AIST++, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 13401–13412. https://doi.org/10.1109/ICCV48922.2021.01315 |
[45] | H. Y. Lee, X. Yang, M. Y. Liu, T. C. Wang, Y. D. Lu, M. H. Yang, et al., Dancing to music, in Proceedings of the 33rd International Conference on Neural Information Processing Systems, (2019), 3586–3596. https://doi.org/10.48550/arXiv.1911.02001 |
[46] | T. Kucherenko, P. Wolfert, Y. Yoon, C. Viegas, T. Nikolov, M. Tsakov, et al., Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022, ACM Trans. Graphics, 43 (2024). https://doi.org/10.1145/3656374 |
1. | Jin Yun Guo, Yanping Hu, On n-hereditary algebras and n-slice algebras, 2024, 00218693, 10.1016/j.jalgebra.2024.07.020 | |
2. | Jin Yun Guo, Yanping Hu, Deren Luo, Multi-layer quivers and higher slice algebras, 2024, 23, 0219-4988, 10.1142/S021949882450186X |