On the symmetries in the dynamics of wide two-layer neural networks

Karl Hajjar; Lénaïc Chizat; Karl Hajjar; Lénaïc Chizat

doi:10.3934/era.2023112

Electronic Research Archive

2023, Volume 31, Issue 4: 2175-2212. doi: 10.3934/era.2023112

Previous Article Next Article

Research note Special Issues

On the symmetries in the dynamics of wide two-layer neural networks

Karl Hajjar ^{1
,
,},
Lénaïc Chizat ²

1.
Laboratoire de Mathématiques d'Orsay, Université Paris-Saclay, Orsay 91405, France
2.
Institut de Mathématiques, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Received: 05 December 2022 Revised: 18 January 2023 Accepted: 01 February 2023 Published: 22 February 2023

We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^*$ and the input distribution, are preserved by the dynamics. We then study more specific cases. When $f^*$ is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When $f^*$ has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.

Keywords:

Citation: Karl Hajjar, Lénaïc Chizat. On the symmetries in the dynamics of wide two-layer neural networks[J]. Electronic Research Archive, 2023, 31(4): 2175-2212. doi: 10.3934/era.2023112

Related Papers:

[1]	Giuseppe Maria Coclite, Carlotta Donadello . Vanishing viscosity on a star-shaped graph under general transmission conditions at the node. Networks and Heterogeneous Media, 2020, 15(2): 197-213. doi: 10.3934/nhm.2020009
[2]	John D. Towers . An explicit finite volume algorithm for vanishing viscosity solutions on a network. Networks and Heterogeneous Media, 2022, 17(1): 1-13. doi: 10.3934/nhm.2021021
[3]	Boris Andreianov, Kenneth H. Karlsen, Nils H. Risebro . On vanishing viscosity approximation of conservation laws with discontinuous flux. Networks and Heterogeneous Media, 2010, 5(3): 617-633. doi: 10.3934/nhm.2010.5.617
[4]	Wen Shen . Traveling wave profiles for a Follow-the-Leader model for traffic flow with rough road condition. Networks and Heterogeneous Media, 2018, 13(3): 449-478. doi: 10.3934/nhm.2018020
[5]	Martin Gugat, Mario Sigalotti . Stars of vibrating strings: Switching boundary feedback stabilization. Networks and Heterogeneous Media, 2010, 5(2): 299-314. doi: 10.3934/nhm.2010.5.299
[6]	Giuseppe Maria Coclite, Nicola De Nitti, Mauro Garavello, Francesca Marcellini . Vanishing viscosity for a $ 2\times 2 $ system modeling congested vehicular traffic. Networks and Heterogeneous Media, 2021, 16(3): 413-426. doi: 10.3934/nhm.2021011
[7]	Michael Herty, Niklas Kolbe, Siegfried Müller . Central schemes for networked scalar conservation laws. Networks and Heterogeneous Media, 2023, 18(1): 310-340. doi: 10.3934/nhm.2023012
[8]	Joachim von Below, José A. Lubary . Isospectral infinite graphs and networks and infinite eigenvalue multiplicities. Networks and Heterogeneous Media, 2009, 4(3): 453-468. doi: 10.3934/nhm.2009.4.453
[9]	Alessia Marigo . Equilibria for data networks. Networks and Heterogeneous Media, 2007, 2(3): 497-528. doi: 10.3934/nhm.2007.2.497
[10]	Gen Qi Xu, Siu Pang Yung . Stability and Riesz basis property of a star-shaped network of Euler-Bernoulli beams with joint damping. Networks and Heterogeneous Media, 2008, 3(4): 723-747. doi: 10.3934/nhm.2008.3.723

Abstract

1. Introduction

We consider a family of scalar conservation laws defined on an oriented graph $\Gamma$ consisting of $m$ incoming and $n$ outgoing edges $\Omega_\ell$ , $\ell = 1, \ldots m+n$ joining at a single vertex. Incoming edges are parametrized by $x\in(-\infty,0]$ while outgoing edges by $x\in[0,\infty)$ in such a way that the junction is always located at $x = 0$ . We use the index $i$ , $i = 1,\ldots,m$ , to refer to incoming edges and $j$ , $j = m+1, \ldots, m+n$ , for the outgoing ones.

On the edge $\Omega_\ell$ we introduce a scalar conservation law, describing the evolution of a density $\rho_\ell$ . Then on the incoming edges we have

$\begin{equation} \partial_t \rho_i + \partial_x f_i( \rho_i) = 0, \qquad t > 0,\, x < 0,\,i = 1,...,m, \end{equation}$

(1)

and on the outgoing ones

$\begin{equation} \partial_t \rho_j + \partial_x f_j( \rho_j) = 0, \qquad t > 0,\,x > 0,\,j = m+1,...,m+n. \end{equation}$

(2)

The fluxes $f_1,...,f_{m+n}$ , differ in general, however we assume that they are bell-shaped (unimodal), Lipschitz and non-degenerate nonlinear, i.e.

(H.1) for each $\ell\in\{1,...,m+n\}$ , $f_\ell\in C^{2}\left([0,1]\right)$ , $f(0) = f(1) = 0$ , $f_\ell\ge0$ , and there exist $\overline{\rho}_\ell\in (0,1)$ such that $f_\ell'(\rho)\; (\overline{\rho}_\ell-\rho)>0$ for every $\rho \in [0,1]\setminus \{\overline{\rho}_\ell\}$ ;

(H.2) for any $\ell\in\{1,...,m+n\}$ , $\left\vert \left\{ \rho \: : \: f''_\ell(\rho) = 0 \right\}\right\vert = 0$ .

We augment (1) and (2) with the initial conditions

$\begin{equation} \begin{cases} \rho_i(0,x) = \rho_{i,0}(x),&\qquad x < 0,\,i = 1,...,m,\\ \rho_j(0,x) = \rho_{j,0}(x),&\qquad x > 0,\,j = m+1,...,m+n, \end{cases} \end{equation}$

(3)

assuming that

(H.3) $\rho_{1,0},...,\rho_{m,0}\in L^1(-\infty,0)\cap BV(-\infty,0)$ ,

$\rho_{m+1,0},...,\rho_{m+n,0}\in L^1(0,\infty)\cap BV(0,\infty)$

and $0\le \rho_{1,0},...,\,\rho_{m+n,0}\le 1$ .

Finally, we introduce the necessary conservation assumption at the node, which transforms our family of independent equations into a single problem

$\sum\limits_{i = 1}^m f_i( \rho_i(t,0-)) = \sum\limits_{j = m+1}^{m+n}f_j( \rho_j(t,0+))\quad \text{ for a.e. } t\ge0.$

Questions related to existence, uniqueness and stability of solutions for problems of this kind have been extensively investigated in recent years, mainly in relation with traffic modeling. The interested reader can refer to [7,13] for an overview of the subject. Here our point of view is different, as we do not focus on a specific model. We consider a parabolic regularization of the problem, similarly to what has been done in [11,10], but instead of enforcing a continuity condition at the node for the regularized solutions, we introduce a more general set of transmission conditions on the parabolic fluxes.

In this work we adopt the following definition of weak solution for the problem (1), (2), and (3). We stress that this definition is for sure not sufficient to ensure uniqueness. On the contrary it fix somehow a minimal set of properties that any reasonable solution is expected to satisfy, see [3] and references therein for a more detailed discussion on this point.

Definition 1.1. Let $\rho_1,...,\rho_m:[0,\infty)\times(-\infty,0]\to \mathbb{R}$ and $\rho_{m+1},...,\rho_{m+n}:[0,\infty)\times[0,\infty)\to \mathbb{R}$ be functions. We say that $(\rho_1,....,\rho_{m+n})$ is a weak solution of (1), (2), and (3) if

(D.1) $f_1(\rho_1),...,f_m(\rho_m)\in BV_{loc}((0,\infty)\times(-\infty,0))$ and $f_{m+1}(\rho_{m+1}),..., f_{m+n}(\rho_{m+n})\in BV_{loc}((0,\infty)\times(0,\infty))$ ;

(D.2) for every $i\in\{1,...,m\}$ , every $c\in \mathbb{R}$ and every nonnegative test function $\varphi\in C^\infty( \mathbb{R}\times (-\infty,0))$ with compact support

$\begin{align*} \int_0^\infty\int_{-\infty}^0\left(| \rho_i-c| \partial_t \varphi+\mathrm{sign}\left({ \rho_i-c}\right)(f_i( \rho_i)-f_i(c)) \partial_x \varphi\right)dtdx&\\ +\int_{-\infty}^0 |\rho_{i,0}(x)-c| \varphi(0,x)dx&\ge0; \end{align*}$

(D.3) for every $j\in\{m+1,...,m+n\}$ , every $c\in \mathbb{R}$ and every nonnegative test function $\varphi\in C^\infty( \mathbb{R}\times (0,\infty))$ with compact support

$\begin{align*} \int_0^\infty\int_0^{\infty}\left(| \rho_j-c| \partial_t \varphi+\mathrm{sign}\left({ \rho_j-c}\right)(f_j( \rho_R)-f_j(c)) \partial_x \varphi\right)dtdx&\\+\int_0^{\infty} |\rho_{j,0}(x)-c| \varphi(0,x)dx&\ge0; \end{align*}$

(D.4) $\sum\limits_{i = 1}^m f_i( \rho_i(t,0-)) = \sum\limits_{j = m+1}^{m+n}f_j( \rho_j(t,0+))$ for a.e. $t\ge0$ .

Figure 1. A junction consisting of

$m$ incoming and

$n$ outgoing edges.

DownLoad: Full-Size Img PowerPoint

In [10] the authors approximated (1), (2), and (3) in the following way

$\begin{equation} \begin{cases} \partial_t \rho_{i,\varepsilon} + \partial_x f_i( \rho_{i,\varepsilon}) = \varepsilon \partial_{xx}^2 \rho_{i,\varepsilon}, &\qquad t > 0,\, x < 0,\,i,\\ \partial_t \rho_{j,\varepsilon} + \partial_x f_j( \rho_{j,\varepsilon}) = \varepsilon \partial_{xx}^2 \rho_{j,\varepsilon},& \qquad t > 0,\,x > 0,\,j,\\ \rho_{i,\varepsilon}(t,0) = \rho_{j,\varepsilon}(t,0),&\qquad t > 0,\,i,\,j,\\ \sum\limits_{i = 1}^m (f_i( \rho_{i,\varepsilon}(t,0))- \varepsilon \partial_x \rho_{i,\varepsilon}(t,0))&{}\\ \qquad\qquad = \sum\limits_{j = m+1}^{m+n}(f_j( \rho_{j,\varepsilon}(t,0))- \varepsilon \partial_x \rho_{j,\varepsilon}(t,0)),&\qquad t > 0,\\ \rho_{i,\varepsilon}(0,x) = \rho_{i,0, \varepsilon}(x),&\qquad x < 0,\,i,\\ \rho_{j,\varepsilon}(0,x) = \rho_{j,0, \varepsilon}(x),&\qquad x > 0,\,j, \end{cases} \end{equation}$

(4)

where $i\in\{1,...,m\}$ and $j\in\{m+1,...,m+n\}$ and $\rho_{i,0, \varepsilon},\, \rho_{j,0, \varepsilon}$ are smooth approximations of $\rho_{i,0},\, \rho_{j,0}$ . In this setting they showed that

$\begin{align*} \rho_{i,\varepsilon}\to \rho_i&\quad \text{a.e. in $(0,\infty)\times(-\infty,0)$ and}\\ &\quad \text{in $L^p_{loc}((0,\infty)\times(-\infty,0)),\,1\le p < \infty,$ as $ \varepsilon\to0$ for every $i$},\\ \rho_{j,\varepsilon}\to \rho_j&\quad \text{a.e. in $(0,\infty)\times(0,\infty)$ and}\\ &\quad \text{ in $L^p_{loc}((0,\infty)\times(0,\infty)),\,1\le p < \infty,$ as $ \varepsilon\to0$ for every $j$}, \end{align*}$

where $(\rho_1,....,\rho_{m+n})$ is a weak solution of (1), (2), (3), in the sense of Definition 1.1.

In this paper we modify the transmission condition of (4) and inspired by [14] we consider the following viscous approximation of (1), (2), and (3)

$\begin{equation} \begin{cases} \partial_t \rho_{i,\varepsilon} + \partial_x f_i( \rho_{i,\varepsilon}) = \varepsilon \partial_{xx}^2 \rho_{i,\varepsilon}, &&\qquad t > 0,\, x < 0,\,i,\\ \partial_t \rho_{j,\varepsilon} + \partial_x f_j( \rho_{j,\varepsilon}) = \varepsilon \partial_{xx}^2 \rho_{j,\varepsilon},&& \qquad t > 0,\,x > 0,\,j,\\ f_i( \rho_{i,\varepsilon}(t,0))- \varepsilon \partial_x \rho_{i,\varepsilon}(t,0)&{}\\ \qquad\qquad\quad& = \beta_i(\rho_{1, \varepsilon}(t,0),....,\rho_{m+n, \varepsilon}(t,0)),&\qquad t > 0,\,i,\\ f_j( \rho_{j,\varepsilon}(t,0))- \varepsilon \partial_x \rho_{j,\varepsilon}(t,0)&{}\\ \qquad\qquad\quad& = \beta_j(\rho_{1, \varepsilon}(t,0),....,\rho_{m+n, \varepsilon}(t,0)),&&\qquad t > 0,\,j,\\ \rho_{i,\varepsilon}(0,x) = \rho_{i,0, \varepsilon}(x),&&\qquad x < 0,\,i,\\ \rho_{j,\varepsilon}(0,x) = \rho_{j,0, \varepsilon}(x),&&\qquad x > 0,\,j, \end{cases} \end{equation}$

(5)

where, of course,

$\begin{equation} \sum\limits_{i = 1}^{m}\beta_i(\rho_{1, \varepsilon}(t,0),\ldots,\rho_{m+n, \varepsilon}(t,0)) = \sum\limits_{j = m+1}^{m+n}\beta_j(\rho_{1, \varepsilon}(t,0),\ldots,\rho_{m+n, \varepsilon}(t,0)). \end{equation}$

(6)

The additional assumptions we make on the functions $\beta_\ell$ and on the initial conditions $\rho_{\ell,0, \varepsilon}$ are postposed to the next section.

The main result of the paper is the following.

Theorem 1.2. Assume(H.1), (H.2), and (H.3). There exist a sequence $\{ \varepsilon_k\}_{k\in \mathbb{N}}\subset(0,\infty)$ , $\varepsilon_k\to 0$ , and a solution $(\rho_{1}, \ldots, \rho_{m+n})$ of (1), (2), and (3), in the sense of Definition 1.1, such that

$\begin{align} & \rho_{i, \varepsilon_k} \longrightarrow \rho_i,\qquad {a.e.\;and\;in\;} L^p_{loc}((0,\infty)\times(-\infty, 0)), \end{align}$

(7)

$\begin{align} & \rho_{j, \varepsilon_k} \longrightarrow \rho_j,\qquad {a.e.\;and\;in\;} L^p_{loc}((0,\infty)\times(0,\infty)), \end{align}$

(8)

$\begin{align} &f_1(\rho_1),...,f_m(\rho_m) \in BV((0,\infty) \times (-\infty, 0)), \\&\notag f_{m+1}(\rho_{m+1}),...,f_{m+n}(\rho_{m+n}) \in BV((0,\infty) \times (0,\infty)),\end{align}$

(9)

for every $1\le p<\infty,\, i\in\{1,..,m\},\,j\in\{m+1,..,m+n\},$ where $( \rho_{1, \varepsilon_k},..., \rho_{m+n, \varepsilon_k})$ is the corresponding solution of (5).

It worth mentioning that a complete characterization of the limit solution obtained from (4) as $\varepsilon\to0$ is given in [3], where the authors adapt to a star shaped graph setting some ideas and techniques originally developed for conservation laws with discontinous flux, see in particular [2,4,5].

At the moment we are not able to formulate a similar characterization of the limit of (5). In general, however, the limits coming from parabolic regularization subject to the two different kinds of transmission conditions are different.

To show this consider the simple case of a junction with one incoming and one outgoing edges. So we have the conservation law

$\begin{equation} \partial_t\rho_1 + \partial_x f_1(\rho_1) = 0, \qquad t > 0,\, x < 0, \end{equation}$

(10)

on the incoming edge and

$\begin{equation} \partial_t\rho_2 + \partial_x f_2(\rho_2) = 0, \qquad t > 0,\, x > 0, \end{equation}$

(11)

on the outgoing one. Assume that

$\begin{equation} \begin{split} &f_1(0) = f_1(1) = f_2(0) = f_2(1) = 0,\qquad f_1'',f_2'' < 0,\\ &\text{there exists $0 < \check\rho < \hat\rho < 1$ and $G > 0$ such that $f_1(\hat\rho) = f_2(\check\rho) = G(\hat \rho-\check\rho)$}. \end{split} \end{equation}$

(12)

Consider the simplified version of (5)

$\begin{equation} \begin{cases} \partial_t\rho_{1, \varepsilon}+ \partial_x f_1(\rho_{1, \varepsilon}) = \varepsilon \partial_{xx}^2\rho_{1, \varepsilon}, &\quad t > 0,\, x < 0,\\ \partial_t\rho_{2, \varepsilon} + \partial_x f_2(\rho_{2, \varepsilon}) = \varepsilon \partial_{xx}^2 \rho_{2, \varepsilon},& \quad t > 0,\,x > 0,\\ f_1(\rho_{1, \varepsilon}(t,0))- \varepsilon \partial_x\rho_{1, \varepsilon}(t,0)&{}\\ \qquad = f_2(\rho_{2, \varepsilon}(t,0))- \varepsilon \partial_x\rho_{2, \varepsilon}(t,0) = G(\rho_{1, \varepsilon}-\rho_{2, \varepsilon}),&\quad t > 0,\\ \rho_{1, \varepsilon}(0,x) = \hat\rho,&\quad x < 0,\\ \rho_{2, \varepsilon}(0,x) = \check\rho,&\quad x > 0. \end{cases} \end{equation}$

(13)

The unique solution of (13) is

$\begin{equation} \rho_{1, \varepsilon}(\cdot,\cdot) = \hat\rho,\qquad \rho_{2, \varepsilon}(\cdot,\cdot) = \check\rho,\qquad \varepsilon > 0. \end{equation}$

(14)

Therefore, as $\varepsilon\to0$ we get the solution of (10)-(11)

$\begin{equation} \rho_{1}(\cdot,\cdot) = \hat\rho,\qquad \rho_{2}(\cdot,\cdot) = \check\rho. \end{equation}$

(15)

This stationary solution is not admissible in the sense of the classical vanishing viscosity germ, see [5,Sec. 5], as it consists of a nonclassical shock. However, when dealing with conservation laws with discontinuous flux, it is well known that infinitely many $L^1$ contractive semigroups of solutions exist, also in relation with different physical applications. In particular, when the right and left fluxes are bell-shaped, as we assume in condition (H.1), each of those notions of admissible solution is uniquely determined by the choice of a $(A,B)$ -connection, see [1,5,9,12] for precise definitions and exemples. In the exemple above the couple $(\hat\rho, \check\rho)$ is a connection.

It is worth noticing that entropy solutions admissible in the sense of a $(A,B)$ -connection can be obtained as limits of a sequence of parabolic approximations made with adapted viscosities but a classical condition of continuity at the interface, see [5,Sec. 6.2] for a general result, but also [2,15] for an application to the Buckley-Leverett equation.

It is difficult, however, to establish a direct equivalence between the aforementioned results and the one we put forward in this paper. In particular, in the present case we miss information on the boundary layers at the parabolic level and we do not know how the transmission conditions we impose on the parabolic fluxes translates into a condition for the hyperbolic problem.

This means in particular that we have little information on the germ associated to the family of limit solutions obtained in Theorem 1.2 and, so far, we have not been able to prove that this germ is $L^1$ -dissipative. We conjecture, however, that this is due to a technical obstruction and that uniqueness of the limit solutions holds.

The paper is organized as follows: Section 2 contains the precise list of assumptions on the initial and transmission conditions in the parabolic problem (5). In Section 3 we present the proofs of all necessary a priori estimates on (5). Finally, in Section 4 we detail the proof of Theorem 1.2.

2. Initial and transmission conditions for the parabolic problem

The initial conditions $\rho_{\ell, 0}$ , $\ell = 1,\ldots, m+n$ , on the hyperbolic problem (1), (2), and (3) satisfy (H.3).

Once the functions $\rho_{\ell, 0}$ are fixed, we impose on (5) initial conditions $\rho_{\ell,0, \varepsilon}$ such that

$\begin{equation} \begin{split} &\rho_{i,0, \varepsilon}\in C^\infty((-\infty,0])\cap L^1(-\infty,0),& \rho_{j,0, \varepsilon}\in C^\infty([0,\infty))\cap L^1(0,\infty),\quad \varepsilon > 0,\\ &\rho_{i, 0, \varepsilon}\to\rho_{i,0}\qquad \text{a.e. in $(-\infty,0)$ and in $L^p_{loc}(-\infty,0),\,1\le p < \infty,$ as $ \varepsilon\to0$},\\ &\rho_{j, 0, \varepsilon}\to\rho_{j,0}\qquad \text{a.e. in $(0,\infty)$ and in $L^p_{loc}(0,\infty),\,1\le p < \infty,$ as $ \varepsilon\to0$},\\ &0\le\rho_{i, 0, \varepsilon},\, \rho_{j, 0, \varepsilon}\le 1,\qquad \varepsilon > 0,\\ &\left\|{\rho_{i, 0, \varepsilon}}\right\|_{L^1(-\infty,0)}\le \left\|{\rho_{i, 0}}\right\|_{L^1(-\infty,0)},& \left\|{\rho_{j, 0, \varepsilon}}\right\|_{L^1(0,\infty)}\le \left\|{\rho_{j, 0}}\right\|_{L^1(0,\infty)},\quad \varepsilon > 0,\\ &\left\|{\rho_{i, 0, \varepsilon}}\right\|_{L^2(-\infty,0)}\le \left\|{\rho_{i, 0}}\right\|_{L^2(-\infty,0)},& \left\|{\rho_{j, 0, \varepsilon}}\right\|_{L^2(0,\infty)}\le \left\|{\rho_{j, 0}}\right\|_{L^2(0,\infty)},\quad \varepsilon > 0,\\ &\left\|{ \partial_x\rho_{i,0, \varepsilon}}\right\|_{L^1(-\infty,0)}\le TV(\rho_{i, 0}),\qquad \left\|{ \partial_x\rho_{j,0, \varepsilon}}\right\|_{L^1(0,\infty)}\le TV(\rho_{j, 0}),\qquad \varepsilon > 0,\\ & \varepsilon \left\|{ \partial_x\rho_{i, 0, \varepsilon}}\right\|_{L^1(-\infty,0)},\, \varepsilon\left\|{ \partial_{xx}^2\rho_{j, 0, \varepsilon}}\right\|_{L^1(0,\infty)}\le C,\qquad \varepsilon > 0, \end{split} \end{equation}$

(16)

for some constant $C>0$ independent on $\varepsilon$ .

The functions $\beta_\ell$ appearing in the transmission conditions in (5) take the form

$\begin{equation} \begin{aligned} \beta_i&(\rho_{1, \varepsilon}(t,0),....,\rho_{m+n, \varepsilon}(t,0)) = \sum\limits_{j = m+1}^{m+n}G_{i,j}( \rho_{i,\varepsilon}(t,0), \rho_{j,\varepsilon}(t,0))\\ &+ \varepsilon\left( \sum\limits_{h = 1}^{m}K_{i,h}( \rho_{i,\varepsilon}(t,0), \rho_{h,\varepsilon}(t,0))-\sum\limits_{h = 1}^{m+n}K_{h,i}( \rho_{h,\varepsilon}(t,0), \rho_{i,\varepsilon}(t,0))\right); \end{aligned} \end{equation}$

(17)

for $i\in\{1,\ldots, m\}$ , and for $j\in\{m+1,\ldots, m+n\}$

$\begin{equation} \begin{aligned} \beta_j&(\rho_{1, \varepsilon}(t,0),....,\rho_{m+n, \varepsilon}(t,0)) = \sum\limits_{i = 1}^{m}G_{i,j}( \rho_{i,\varepsilon}(t,0), \rho_{j,\varepsilon}(t,0))\\ &+ \varepsilon\left(\sum\limits_{h = m+1}^{m+n}K_{h,j}( \rho_{h,\varepsilon}(t,0), \rho_{j,\varepsilon}(t,0))-\sum\limits_{h = 1}^{m+n}K_{j,h}( \rho_{j,\varepsilon}(t,0), \rho_{h,\varepsilon}(t,0))\right). \end{aligned} \end{equation}$

(18)

The functions $G_{i,j}(u,v)\in C^\infty( \mathbb{R}^2)$ , $i\in\{1,\ldots, m\}$ , $j\in\{m+1,\ldots, m+n\}$ , and $K_{h,\ell}(u,v)\in C^\infty( \mathbb{R}^2)$ , $h,\,\ell\in\{1,\ldots, m+n\}$ , satisfy

$\begin{equation} \begin{split} \partial_vG_{i,j}(\cdot,\cdot)\le0\le \partial_u G_{i,j}(\cdot,\cdot),&\qquad G_{i,j}(0,0) = G_{i,j}(1,1) = 0,\\ \partial_uK_{h,\ell}(\cdot,\cdot)\le0\le \partial_v K_{h,\ell}(\cdot,\cdot),&\qquad K_{h,\ell}(0,0) = K_{h,\ell}(1,1) = 0. \end{split} \end{equation}$

(19)

In particular, (19) implies

$\begin{equation} \begin{split} (\mathrm{sign}\left({u}\right)-\mathrm{sign}\left({v}\right))\nabla G_{i,j}(\cdot,\cdot)\cdot(u,v)\ge0,&\quad u,v\in \mathbb{R},\\ (\mathrm{sign}\left({u}\right)-\mathrm{sign}\left({v}\right))\nabla K_{h,\ell}(\cdot,\cdot)\cdot(u,v)\le0,&\quad u,v\in \mathbb{R},\\ (\mathrm{sign}\left({u-u'}\right)-\mathrm{sign}\left({v-v'}\right)) (G_{i,j}(u,v)-G_{i,j}(u',v'))\ge0,&\quad u,u',v,v'\in \mathbb{R},\\ (\mathrm{sign}\left({u-u'}\right)-\mathrm{sign}\left({v-v'}\right)) (K_{h,\ell}(u,v)-K_{h,\ell}(u',v'))\le0,&\quad u,u',v,v'\in \mathbb{R},\\ (\chi_{(-\infty,0)}(u)-\chi_{(-\infty,0)}(v)) G_{i,j}(u,v)\le0,&\quad u,v\in \mathbb{R},\\ (\chi_{(-\infty,0)}(u)-\chi_{(-\infty,0)}(v))K_{h,\ell} (u,v)\ge0,&\quad u,v\in \mathbb{R}, \end{split} \end{equation}$

(20)

where $\chi_{(-\infty,0)}$ is the characteristic function of the set $(-\infty,0)$ .

This specific form of transmission conditions is reminiscent of the parabolic transmission conditions considered in [14,8], which were originally inspired from the Kedem-Katchalsky conditions for membrane permeability introduced in [16]

$\begin{equation} {\mathcal{G}_{h,\ell}}(u,v) = \mathfrak{c}_{h,\ell}(u-v), \end{equation}$

(21)

for some constants $\mathfrak{c}_{h,\ell}>0$ . Our conditions are more general and in particular we can notice that the function ${\mathcal{G}_{h,\ell}}$ above satisfies

$\begin{equation} {\mathcal{G}_{h,\ell}}(u,v)(u-v)\ge0, \end{equation}$

(22)

that allows the authors in [14] to get the $L^2$ conservation (see Lemma 3.3 below).

We can observe that the equality (6) holds as

$\begin{equation} \begin{split} \sum\limits_{i = 1}^{m}\beta_i&(\rho_{1, \varepsilon}(t,0),....,\rho_{m+n, \varepsilon}(t,0)) = \sum\limits_{i = 1}^{m}\sum\limits_{j = m+1}^{m+n}G_{i,j}( \rho_{i,\varepsilon}(t,0), \rho_{j,\varepsilon}(t,0))\\ &+ \varepsilon\sum\limits_{i = 1}^{m}\left( \sum\limits_{h = 1}^{m}K_{i,h}( \rho_{i,\varepsilon}(t,0), \rho_{h,\varepsilon}(t,0))-\sum\limits_{h = 1}^{m+n}K_{h,i}( \rho_{h,\varepsilon}(t,0), \rho_{i,\varepsilon}(t,0))\right)\\ = &\sum\limits_{i = 1}^{m}\sum\limits_{j = m+1}^{m+n}\left( G_{i,j}( \rho_{i,\varepsilon}(t,0), \rho_{j,\varepsilon}(t,0))- \varepsilon K_{j,i}( \rho_{j,\varepsilon}(t,0), \rho_{i,\varepsilon}(t,0))\right) \end{split} \end{equation}$

(23)

and analogously

$\begin{equation} \begin{split} \sum\limits_{j = m+1}^{m+n}\beta_j&(\rho_{1, \varepsilon}(t,0),....,\rho_{m+n, \varepsilon}(t,0))\\ & = \sum\limits_{j = m+1}^{m+n}\sum\limits_{i = 1}^{m}\left( G_{i,j}( \rho_{i,\varepsilon}(t,0), \rho_{j,\varepsilon}(t,0))- \varepsilon K_{j,i}( \rho_{j,\varepsilon}(t,0), \rho_{i,\varepsilon}(t,0))\right). \end{split} \end{equation}$

(24)

3. A priori estimates

This section is devoted to establish a priori estimates, uniform with respect to $\varepsilon$ , which are necessary toward the proof of our main convergence result in the next section.

For every $\varepsilon > 0$ , let $(\rho_{1, \varepsilon},...,\rho_{m+n, \varepsilon})$ be a solution of (5) satisfying (16).

Lemma 3.1 ( $L^\infty$ estimate). We have that

$\begin{equation} 0\le \rho_{i,\varepsilon},\, \rho_{j,\varepsilon}\le 1,\qquad i,\,j. \end{equation}$

(25)

Proof. Consider the function

$\begin{equation*} \eta(\xi) = -\xi\chi_{(-\infty,0)}(\xi). \end{equation*}$

Since

$\begin{equation*} \eta'(\xi) = -\chi_{(-\infty,0)}(\xi), \end{equation*}$

using (19) we obtain

$\begin{align*} \frac{d}{dt}&\left(\sum\limits_{i = 1}^m\int_{-\infty}^0\eta( \rho_{i,\varepsilon})dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\eta( \rho_{j,\varepsilon})dx\right)\\ = &\sum\limits_{i = 1}^m\int_{-\infty}^0\eta'( \rho_{i,\varepsilon}) \partial_t \rho_{i,\varepsilon} dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\eta'( \rho_{j,\varepsilon}) \partial_t \rho_{j,\varepsilon} dx\\ = &-\sum\limits_{i = 1}^m\int_{-\infty}^0\chi_{(-\infty,0)}( \rho_{i,\varepsilon}) \partial_t \rho_{i,\varepsilon} dx-\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\chi_{(-\infty,0)}( \rho_{j,\varepsilon}) \partial_t \rho_{j,\varepsilon} dx\\ = &\sum\limits_{i = 1}^m\int_{-\infty}^0\chi_{(-\infty,0)}( \rho_{i,\varepsilon}) \partial_x(f_i( \rho_{i,\varepsilon})- \varepsilon \partial_x \rho_{i,\varepsilon})dx\\ &+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\chi_{(-\infty,0)}( \rho_{j,\varepsilon}) \partial_x(f_j( \rho_{j,\varepsilon})- \varepsilon \partial_x \rho_{j,\varepsilon})dx\\ = &\sum\limits_{i = 1}^m\chi_{(-\infty,0)}(\rho_{i, \varepsilon}(t,0))(f_i(\rho_{i, \varepsilon}(t,0))- \varepsilon \partial_x\rho_{i, \varepsilon}(t,0))\\ &-\sum\limits_{j = m+1}^{m+n}\chi_{(-\infty,0)}(\rho_{j, \varepsilon}(t,0))(f_j(\rho_{j, \varepsilon}(t,0))- \varepsilon \partial_x\rho_{j, \varepsilon}(t,0))\\ &+\underbrace{\sum\limits_{i = 1}^m\int_{-\infty}^0 \partial_x \rho_{i,\varepsilon}(f_i( \rho_{i,\varepsilon})- \varepsilon \partial_x \rho_{i,\varepsilon})d\delta_{\{ \rho_{i,\varepsilon} = 0\}}}_{\le0}\\ &+\underbrace{\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \partial_x \rho_{j,\varepsilon}(f_j( \rho_{j,\varepsilon})- \varepsilon \partial_x \rho_{j,\varepsilon})d\delta_{\{ \rho_{j,\varepsilon} = 0\}}}_{\le0}\\ \le&\sum\limits_{j = m+1}^{m+n}\sum\limits_{i = 1}^m\left(\chi_{(-\infty,0)}(\rho_{i, \varepsilon}(t,0))-\chi_{(-\infty,0)}(\rho_{j, \varepsilon}(t,0))\right)\cdot\\ &\qquad\cdot\left( G_{i,j}(\rho_{i, \varepsilon}(t,0),\rho_{j, \varepsilon}(t,0))- \varepsilon K_{j,i}(\rho_{j, \varepsilon}(t,0),\rho_{i, \varepsilon}(t,0))\right) \le0, \end{align*}$

where $\delta_{\{ \rho_{i,\varepsilon} = 0\}}$ and $\delta_{\{ \rho_{j,\varepsilon} = 0\}}$ are the Dirac deltas concentrated on the sets $\{ \rho_{i,\varepsilon} = 0\}$ and $\{ \rho_{j,\varepsilon} = 0\}$ , respectively and we apply [6,Lemma 2] to compute the value of the integrals as a limit. Integrating over $(0,t)$ and using (16) we get

$\begin{align*} 0\le&\sum\limits_{i = 1}^m\int_{-\infty}^0\eta( \rho_{i,\varepsilon}(t,x))dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\eta( \rho_{j,\varepsilon}(t,x))dx\\ \le &\sum\limits_{i = 1}^m\int_{-\infty}^0\eta(\rho_{i,0, \varepsilon})dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\eta(\rho_{j,0, \varepsilon})dx = 0 \end{align*}$

and then

$\begin{equation*} \rho_{i,\varepsilon},\, \rho_{j,\varepsilon}\ge 0,\qquad i,\,j, \end{equation*}$

that proves the lower bounds in (25). The upper bounds in (25) can be proved in the same way using the function $\xi\mapsto (\xi-1)\chi_{(1,\infty)}(\xi)$ .

Lemma 3.2 ( $L^1$ estimate). We have that

$\begin{equation} \begin{split} \sum\limits_{i = 1}^m\left\|{ \rho_{i,\varepsilon}(t,\cdot)}\right\|_{L^1(-\infty,0)}&+\sum\limits_{j = m+1}^{m+n}\left\|{ \rho_{j,\varepsilon}(t,\cdot)}\right\|_{L^1(0,\infty)}\\ \le &\sum\limits_{i = 1}^m\left\|{\rho_{i,0}}\right\|_{L^1(-\infty,0)}+\sum\limits_{j = m+1}^{m+n}\left\|{\rho_{j,0}}\right\|_{L^1(0,\infty)},\qquad t\ge0. \end{split} \end{equation}$

(26)

Proof. Thanks to (5), (23), (24), and (25), we have that

$\begin{align*} \frac{d}{dt}&\left(\sum\limits_{i = 1}^m\int_{-\infty}^0| \rho_{i,\varepsilon}|dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}| \rho_{j,\varepsilon}|dx\right)\\ = &\frac{d}{dt}\left(\sum\limits_{i = 1}^m\int_{-\infty}^0 \rho_{i,\varepsilon} dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \rho_{j,\varepsilon} dx\right)\\ = &\sum\limits_{i = 1}^m\int_{-\infty}^0 \partial_t \rho_{i,\varepsilon} dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \partial_t \rho_{j,\varepsilon} dx\\ = &-\sum\limits_{i = 1}^m\int_{-\infty}^0 \partial_x(f_i( \rho_{i,\varepsilon})- \varepsilon \partial_x \rho_{i,\varepsilon})dx-\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \partial_x(f_j( \rho_{j,\varepsilon})- \varepsilon \partial_x \rho_{j,\varepsilon})dx\\ = &-\sum\limits_{i = 1}^{m}\beta_i(\rho_{1, \varepsilon}(t,0),\ldots,\rho_{m+n, \varepsilon}(t,0))+\sum\limits_{j = m+1}^{m+n}\beta_j(\rho_{1, \varepsilon}(t,0),\ldots,\rho_{m+n, \varepsilon}(t,0)) = 0. \end{align*}$

Integrating over $(0,t)$ and using (16) we get (26).

Lemma 3.3 ( $L^2$ estimate). We have that

$\begin{equation} \begin{split} \sum\limits_{i = 1}^m&\left\|{ \rho_{i,\varepsilon}(t,\cdot)}\right\|_{L^2(-\infty,0)}^2+\sum\limits_{j = m+1}^{m+n}\left\|{ \rho_{j,\varepsilon}(t,\cdot)}\right\|_{L^2(0,\infty)}^2 \\&+2 \varepsilon\int_0^t\left(\sum\limits_{i = 1}^m\left\|{ \partial_x \rho_{i,\varepsilon}(s,\cdot)}\right\|_{L^2(-\infty,0)}^2+\sum\limits_{j = m+1}^{m+n}\left\|{ \partial_x \rho_{j,\varepsilon}(s,\cdot)}\right\|_{L^2(0,\infty)}^2\right)ds\\ \le& \sum\limits_{i = 1}^m\left\|{\rho_{i,0}}\right\|_{L^2(-\infty,0)}^2+\sum\limits_{j = m+1}^{m+n}\left\|{\rho_{j,0}}\right\|_{L^2(0,\infty)}^2 \\&+2\left(\sum\limits_{\ell = 1}^{m+n}\left\|{\beta_\ell}\right\|_{L^\infty((0,1)^{m+n})} +\sum\limits_{i = 1}^m\left\|{f_i}\right\|_{L^1(0,1)}\right)t, \end{split} \end{equation}$

(27)

for every $t\ge0$ .

Proof. Thanks to (5), we have that

$\begin{align*} \frac{d}{dt}&\left(\sum\limits_{i = 1}^m\int_{-\infty}^0\frac{ \rho_{i,\varepsilon}^2}{2}dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\frac{ \rho_{j,\varepsilon}^2}{2}dx\right)\\ = &\sum\limits_{i = 1}^m\int_{-\infty}^0 \rho_{i,\varepsilon} \partial_t \rho_{i,\varepsilon} dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \rho_{j,\varepsilon} \partial_t \rho_{j,\varepsilon} dx\\ = &-\sum\limits_{i = 1}^m\int_{-\infty}^0 \rho_{i,\varepsilon} \partial_x(f_i( \rho_{i,\varepsilon})- \varepsilon \partial_x \rho_{i,\varepsilon})dx-\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \rho_{j,\varepsilon} \partial_x(f_j( \rho_{j,\varepsilon})- \varepsilon \partial_x \rho_{j,\varepsilon})dx\\ = &-\sum\limits_{i = 1}^m\rho_{i, \varepsilon}(t,0)(f_i(\rho_{i, \varepsilon}(t,0))- \varepsilon \partial_x\rho_{i, \varepsilon}(t,0))\\&+\sum\limits_{j = m+1}^{m+n}\rho_{j, \varepsilon}(t,0)(f_j(\rho_{j, \varepsilon}(t,0))- \varepsilon \partial_x\rho_{j, \varepsilon}(t,0))\\ &+\sum\limits_{i = 1}^m\int_{-\infty}^0 \partial_x\left(\int_0^{ \rho_{i,\varepsilon}(t,x)}f_i(\xi)d\xi\right)dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \partial_x\left(\int_0^{ \rho_{j,\varepsilon}(t,x)}f_j(\xi)d\xi\right)dx\\ &- \varepsilon\sum\limits_{i = 1}^m\int_{-\infty}^0( \partial_x\rho_{i, \varepsilon})^2dx- \varepsilon\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}( \partial_x\rho_{j, \varepsilon})^2dx\\ = &\sum\limits_{i = 1}^m\rho_{j, \varepsilon}(t,0)\beta_i(\rho_{1, \varepsilon}(t,0),....,\rho_{m+n, \varepsilon}(t,0)) \\&-\sum\limits_{j = m+1}^{m+n}\rho_{i, \varepsilon}(t,0))\beta_j(\rho_{1, \varepsilon}(t,0),....,\rho_{m+n, \varepsilon}(t,0))\\ &+\sum\limits_{i = 1}^m\int_0^{ \rho_{i,\varepsilon}(t,0)}f_i(\xi)d\xi\underbrace{-\sum\limits_{j = m+1}^{m+n}\int_0^{ \rho_{j,\varepsilon}(t,0)}f_j(\xi)d\xi}_{\le0} \\&- \varepsilon\sum\limits_{i = 1}^m\int_{-\infty}^0( \partial_x\rho_{i, \varepsilon})^2dx- \varepsilon\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}( \partial_x\rho_{j, \varepsilon})^2dx\\ \le&\sum\limits_{\ell = 1}^{m+n}\left\|{\beta_\ell}\right\|_{L^\infty((0,1)^{m+n})}+\sum\limits_{i = 1}^m\left\|{f_i}\right\|_{L^1(0,1)} \\&- \varepsilon\sum\limits_{i = 1}^m\int_{-\infty}^0( \partial_x\rho_{i, \varepsilon})^2dx- \varepsilon\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}( \partial_x\rho_{j, \varepsilon})^2dx. \end{align*}$

Integrating over $(0,t)$ and using (16) we get (27).

Lemma 3.4 ( $BV$ estimate). We have that

$\begin{equation} \begin{split} \sum\limits_{i = 1}^m\left\|{ \partial_t \rho_{i,\varepsilon}(t,\cdot)}\right\|_{L^1(-\infty,0)}&+\sum\limits_{j = m+1}^{m+n}\left\|{ \partial_t \rho_{j,\varepsilon}(t,\cdot)}\right\|_{L^1(0,\infty)}\\ \le& (m+n)C+\sum\limits_{i = 1}^m\left\|{f_i'}\right\|_{L^\infty(0,1)}TV(\rho_{i,0})\\&+\sum\limits_{j = m+1}^{m+n}\left\|{f_j'}\right\|_{L^\infty(0,1)}TV(\rho_{j,0}), \end{split} \end{equation}$

(28)

for every $t\ge0.$

Proof. From (5) we get

$\begin{align*} & \partial_{tt}^2 \rho_{i,\varepsilon} + \partial_x (f_i'( \rho_{i,\varepsilon}) \partial_t \rho_{i,\varepsilon}) = \varepsilon \partial_{txx}^3 \rho_{i,\varepsilon}, \\ & \partial_{tt}^2 \rho_{j,\varepsilon} + \partial_x (f_j'( \rho_{j,\varepsilon}) \partial_t \rho_{j,\varepsilon}) = \varepsilon \partial_{txx}^3 \rho_{j,\varepsilon}, \\ &f_i'(\rho_{i, \varepsilon}(t,0)) \partial_t \rho_{i, \varepsilon}(t,0)- \varepsilon \partial_{tx}^2\rho_{i, \varepsilon}(t,0)\\ &\quad = \sum\limits_{j = m+1}^{m+n}\nabla G_{i,j}(\rho_{i, \varepsilon}(t,0),\rho_{j, \varepsilon}(t,0))\cdot( \partial_t\rho_{i, \varepsilon}(t,0), \partial_t\rho_{j, \varepsilon}(t,0))\\ &\qquad+ \varepsilon\sum\limits_{h = 1}^{m}\nabla K_{i,h}( \rho_{i,\varepsilon}(t,0), \rho_{h,\varepsilon}(t,0))\cdot( \partial_t \rho_{i,\varepsilon}(t,0), \partial_t \rho_{h,\varepsilon}(t,0))\\ &\qquad- \varepsilon\sum\limits_{h = 1}^{m+n}\nabla K_{h,i}( \rho_{h,\varepsilon}(t,0), \rho_{i,\varepsilon}(t,0))\cdot( \partial_t \rho_{h,\varepsilon}(t,0), \partial_t \rho_{i,\varepsilon}(t,0)),\\ &f_j'(\rho_{j, \varepsilon}(t,0)) \partial_t \rho_{j, \varepsilon}(t,0)- \varepsilon \partial_{tx}^2\rho_{j, \varepsilon}(t,0)\\ &\quad = \sum\limits_{i = 1}^{m}\nabla G_{i,j}(\rho_{i, \varepsilon}(t,0),\rho_{j, \varepsilon}(t,0))\cdot( \partial_t\rho_{i, \varepsilon}(t,0), \partial_t\rho_{j, \varepsilon}(t,0))\\ &\qquad+ \varepsilon\sum\limits_{h = m+1}^{m+n}\nabla K_{h,j}( \rho_{h,\varepsilon}(t,0), \rho_{j,\varepsilon}(t,0))\cdot( \partial_t\rho_{h, \varepsilon}(t,0), \partial_t\rho_{j, \varepsilon}(t,0))\\ &\qquad- \varepsilon\sum\limits_{h = 1}^{m+n}\nabla K_{j,h}( \rho_{j,\varepsilon}(t,0), \rho_{h,\varepsilon}(t,0))\cdot( \partial_t\rho_{i, \varepsilon}(t,0), \partial_t\rho_{h, \varepsilon}(t,0)). \end{align*}$

Thanks to (20), we have that

$\begin{align*} \frac{d}{dt}&\left(\sum\limits_{i = 1}^m\int_{-\infty}^0| \partial_t \rho_{i,\varepsilon}|dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}| \partial_t \rho_{j,\varepsilon}|dx\right)\\ = &\sum\limits_{i = 1}^m\int_{-\infty}^0 \partial_{tt}^2 \rho_{i,\varepsilon}\mathrm{sign}\left({ \partial_t \rho_{i,\varepsilon}}\right)dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \partial_{tt}^2 \rho_{j,\varepsilon}\mathrm{sign}\left({ \partial_t \rho_{j,\varepsilon}}\right)dx\\ = &-\sum\limits_{i = 1}^m\int_{-\infty}^0\mathrm{sign}\left({ \partial_t \rho_{i,\varepsilon}}\right) \partial_x(f_i'( \rho_{i,\varepsilon}) \partial_t \rho_{i,\varepsilon}- \varepsilon \partial_{tx}^2 \rho_{i,\varepsilon})dx\\ &-\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\mathrm{sign}\left({ \partial_t \rho_{j,\varepsilon}}\right) \partial_x(f_j'( \rho_{j,\varepsilon}) \partial_t \rho_{j,\varepsilon}- \varepsilon \partial_{tx}^2 \rho_{j,\varepsilon})dx\\ = &-\sum\limits_{i = 1}^m\mathrm{sign}\left({ \partial_t\rho_{i, \varepsilon}(t,0)}\right)(f_i'(\rho_{i, \varepsilon}(t,0)) \partial_t \rho_{i,\varepsilon}(t,0)- \varepsilon \partial_{tx}^2\rho_{i, \varepsilon}(t,0))\\ &+\sum\limits_{j = m+1}^{m+n}\mathrm{sign}\left({ \partial_t\rho_{j, \varepsilon}(t,0)}\right)(f_j'(\rho_{j, \varepsilon}(t,0)) \partial_t \rho_{j,\varepsilon}(t,0)- \varepsilon \partial_{tx}^2\rho_{j, \varepsilon}(t,0))\\ &+2\sum\limits_{i = 1}^m\underbrace{\int_{-\infty}^0 \partial_{tx}^2 \rho_{i,\varepsilon}(f_i'( \rho_{i,\varepsilon}) \partial_t \rho_{i,\varepsilon}- \varepsilon \partial_{tx}^2 \rho_{i,\varepsilon})d\delta_{\{ \partial_t \rho_{i,\varepsilon} = 0}\}}_{\le0}\\ &+2\sum\limits_{j = m+1}^{m+n}\underbrace{\int_{-\infty}^0 \partial_{tx}^2 \rho_{j,\varepsilon}(f_j'( \rho_{j,\varepsilon}) \partial_t \rho_{j,\varepsilon}- \varepsilon \partial_{tx}^2 \rho_{j,\varepsilon})d\delta_{\{ \partial_t \rho_{j,\varepsilon} = 0}\}}_{\le0}\\ \le&-\sum\limits_{i = 1}^m\sum\limits_{j = m+1}^{m+n}\left(\mathrm{sign}\left({ \partial_t\rho_{i, \varepsilon}(t,0)}\right)-\mathrm{sign}\left({ \partial_t\rho_{j, \varepsilon}(t,0)}\right)\right)\times\\ &\qquad\qquad\times\nabla G_{i,j}(\rho_{i, \varepsilon}(t,0),\rho_{j, \varepsilon}(t,0)) \cdot( \partial_t\rho_{i, \varepsilon}(t,0), \partial_t\rho_{j, \varepsilon}(t,0))\\ &+ \varepsilon\sum\limits_{i = 1}^m\sum\limits_{j = m+1}^{m+n}\left(\mathrm{sign}\left({ \partial_t\rho_{i, \varepsilon}(t,0)}\right)-\mathrm{sign}\left({ \partial_t\rho_{j, \varepsilon}(t,0)}\right)\right)\times\\ &\qquad\qquad\times\nabla K_{j,i}(\rho_{i, \varepsilon}(t,0),\rho_{j, \varepsilon}(t,0)) \cdot( \partial_t\rho_{i, \varepsilon}(t,0), \partial_t\rho_{j, \varepsilon}(t,0))\le0, \end{align*}$

where $\delta_{\{ \partial_t \rho_{i,\varepsilon} = 0\}}$ and $\delta_{\{ \partial_t \rho_{j,\varepsilon} = 0\}}$ are the Dirac deltas concentrated on the sets $\{ \partial_t \rho_{i,\varepsilon} = 0\}$ and $\{ \partial_t \rho_{j,\varepsilon} = 0\}$ , respectively and we apply [6,Lemma 2].

Integrating over $(0,t)$ and using (16), (25) we get

$\begin{align*} &\sum\limits_{i = 1}^m\left\|{ \partial_t \rho_{i,\varepsilon}(t,\cdot)}\right\|_{L^1(-\infty,0)}+\sum\limits_{j = m+1}^{m+n}\left\|{ \partial_t \rho_{j,\varepsilon}(t,\cdot)}\right\|_{L^1(0,\infty)}\\ &\qquad \le \sum\limits_{i = 1}^m\left\|{ \partial_t \rho_{i,\varepsilon}(0,\cdot)}\right\|_{L^1(-\infty,0)}+\sum\limits_{j = m+1}^{m+n}\left\|{ \partial_t \rho_{j,\varepsilon}(0,\cdot)}\right\|_{L^1(0,\infty)}\\ &\qquad = \sum\limits_{i = 1}^m\left\|{ \varepsilon \partial_{xx}^2\rho_{i,0, \varepsilon}- \partial_x f_i(\rho_{i,0, \varepsilon})}\right\|_{L^1(-\infty,0)}\\+ &\qquad\qquad\sum\limits_{j = m+1}^{m+n}\left\|{ \varepsilon \partial_{xx}^2\rho_{j,0, \varepsilon}- \partial_x f_j(\rho_{j,0, \varepsilon})}\right\|_{L^1(0,\infty)}\\ &\qquad \le \sum\limits_{i = 1}^m\left( \varepsilon\left\|{ \partial_{xx}^2\rho_{i,0, \varepsilon}}\right\|_{L^1(-\infty,0)}+\left\|{f_i'(\rho_{i,0, \varepsilon})}\right\|_{L^\infty(-\infty,0)}\left\|{ \partial_x\rho_{i,0, \varepsilon}}\right\|_{L^1(-\infty,0)}\right)\\ &\qquad\qquad+ \sum\limits_{j = m+1}^{m+n}\left( \varepsilon\left\|{ \partial_{xx}^2\rho_{j,0, \varepsilon}}\right\|_{L^1(0,\infty)}+\left\|{f_j'(\rho_{j,0, \varepsilon})}\right\|_{L^\infty(0,\infty)}\left\|{ \partial_x\rho_{j,0, \varepsilon}}\right\|_{L^1(0,\infty)}\right)\\ &\qquad \le (m+n)C+\sum\limits_{i = 1}^m\left\|{f_i'}\right\|_{L^\infty(0,1)}TV(\rho_{i,0})+\sum\limits_{j = m+1}^{m+n}\left\|{f_j'}\right\|_{L^\infty(0,1)}TV(\rho_{j,0}), \end{align*}$

that is (28).

Lemma 3.5 (Stability estimate). Let $(\rho_{1, \varepsilon},...,\rho_{m+n, \varepsilon})$ and $(\overline{\rho}_{1, \varepsilon},...,\overline{\rho}_{m+n, \varepsilon})$ be two solutions of (5). The following estimate holds

$\begin{equation} \begin{split} \sum\limits_{i = 1}^m&\left\|{ \rho_{i,\varepsilon}(t,\cdot)- \overline{\rho}_{i,\varepsilon}(t,\cdot)}\right\|_{L^1(-\infty,0)}+\sum\limits_{j = m+1}^{m+n}\left\|{ \rho_{j,\varepsilon}(t,\cdot)- \overline{\rho}_{j,\varepsilon}(t,\cdot)}\right\|_{L^1(0,\infty)}\\ \le&\sum\limits_{i = 1}^m\left\|{\rho_{i,0, \varepsilon}-\overline{\rho}_{i,0, \varepsilon}}\right\|_{L^1(-\infty,0)}+\sum\limits_{j = m+1}^{m+n}\left\|{\rho_{j,0, \varepsilon}-\overline{\rho}_{j,0, \varepsilon}}\right\|_{L^1(0,\infty)},\qquad t\ge0. \end{split} \end{equation}$

(29)

Proof. From (5) we get

$\begin{align*} \partial_t( \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon}) + \partial_x (f_i( \rho_{i,\varepsilon})-f_i( \overline{\rho}_{i,\varepsilon})) & = \varepsilon \partial_{xx}^2( \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon}), \\ \partial_t( \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon}) + \partial_x (f_j( \rho_{j,\varepsilon})-f_j( \overline{\rho}_{j,\varepsilon})) & = \varepsilon \partial_{xx}^2( \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon}). \end{align*}$

Thanks to (5), (20), and (25), we have that

$\begin{align*} \frac{d}{dt}&\left(\sum\limits_{i = 1}^m\int_{-\infty}^0| \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon}|dx+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}| \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon}|dx\right)\\ = &\sum\limits_{i = 1}^m\int_{-\infty}^0\mathrm{sign}\left({ \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon}}\right) \partial_t( \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon})dx\\ &+\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\mathrm{sign}\left({ \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon}}\right) \partial_t( \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon})dx\\ = &-\sum\limits_{i = 1}^m\int_{-\infty}^0\mathrm{sign}\left({ \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon}}\right) \partial_x((f_i( \rho_{i,\varepsilon})-f_i( \overline{\rho}_{i,\varepsilon}))- \varepsilon \partial_x( \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon}))dx\\ &-\sum\limits_{j = m+1}^{m+n}\int_0^{\infty}\mathrm{sign}\left({ \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon}}\right) \partial_x((f_j( \rho_{j,\varepsilon})-f_j( \overline{\rho}_{j,\varepsilon}))- \varepsilon \partial_x( \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon}))dx\\ = &-\sum\limits_{i = 1}^m\sum\limits_{j = m+1}^{m+n}[\mathrm{sign}\left({ \rho_{i,\varepsilon}(t,0)- \overline{\rho}_{i,\varepsilon}(t,0)}\right)-\mathrm{sign}\left({ \rho_{j,\varepsilon}(t,0)- \overline{\rho}_{j,\varepsilon}(t,0)}\right)]\times\\ &\qquad\qquad\qquad\times[G_{i,j}(\rho_{i, \varepsilon}(t,0),\rho_{j, \varepsilon}(t,0))-G_{i,j}( \overline{\rho}_{i, \varepsilon}(t,0), \overline{\rho}_{j, \varepsilon}(t,0))]\\ &+ \varepsilon\sum\limits_{i = 1}^m\sum\limits_{j = m+1}^{m+n}[\mathrm{sign}\left({ \rho_{i,\varepsilon}(t,0)- \overline{\rho}_{i,\varepsilon}(t,0)}\right)-\mathrm{sign}\left({ \rho_{j,\varepsilon}(t,0)- \overline{\rho}_{j,\varepsilon}(t,0)}\right)]\times\\ &\qquad\qquad\qquad\times[K_{j,i}(\rho_{i, \varepsilon}(t,0),\rho_{j, \varepsilon}(t,0))-G_{i,j}( \overline{\rho}_{i, \varepsilon}(t,0), \overline{\rho}_{j, \varepsilon}(t,0))]\\ &+2\underbrace{\sum\limits_{i = 1}^m\int_{-\infty}^0 \partial_x( \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon})((f_i( \rho_{i,\varepsilon})-f_i( \overline{\rho}_{i,\varepsilon}))- \varepsilon \partial_x( \rho_{i,\varepsilon}- \overline{\rho}_{i,\varepsilon}))d\delta_{\{ \rho_{i,\varepsilon} = \overline{\rho}_{i,\varepsilon}\}}}_{\le0}\\ &+2\underbrace{\sum\limits_{j = m+1}^{m+n}\int_0^{\infty} \partial_x( \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon})((f_i( \rho_{j,\varepsilon})-f_i( \overline{\rho}_{j,\varepsilon}))- \varepsilon \partial_x( \rho_{j,\varepsilon}- \overline{\rho}_{j,\varepsilon}))d\delta_{\{ \rho_{j,\varepsilon} = \overline{\rho}_{j,\varepsilon}\}}}_{\le0}\le0, \end{align*}$

where we use [6,Lemma 2] and we denote by $\delta_{\{ \rho_{i,\varepsilon} = \overline{\rho}_{i,\varepsilon}\}}$ and $\delta_{\{ \rho_{j,\varepsilon} = \overline{\rho}_{j,\varepsilon}\}}$ respectively the Dirac deltas concentrated on the sets $\{ \rho_{i,\varepsilon} = \overline{\rho}_{i,\varepsilon}\}$ and $\{ \rho_{j,\varepsilon} = \overline{\rho}_{j,\varepsilon}\}$ .

Integrating over $(0,t)$ we get (29).

4. Proof of Theorem 1.2

The well-posedness of smooth solutions for (5) can be proved following the argument used in [10,Theorem 1.2] to establish the well-posedness of smooth solutions for (4). Indeed, the existence of a linear semigroup of solutions in the linear case (i.e., when $f_\ell\equiv 0$ ) is shown in [14]. Then the Duhamel Formula, estimates similar to the ones in the previous section and a fixed point argument lead to the result.

The main result of this section is the following.

Lemma 4.1. Let $( \rho_{1, \varepsilon},...,\rho_{m+n, \varepsilon})$ be the solution of (5). There exist a sequence $\{ \varepsilon_k\}_{k\in \mathbb{N}} \subset (0,\infty),\, \varepsilon_k\to0$ , and $m+n$ maps $\rho_{1},...,\rho_{m+n}$ such that

$\begin{align} & \rho_1,...,\rho_m\in L^1((0,\infty)\times(-\infty,0)) \cap L^\infty((0,\infty)\times(-\infty,0)), \end{align}$

(30)

$\begin{align} & \rho_{m+1},...,\rho_{m+n}\in L^1((0,\infty)\times(0,\infty))\cap L^\infty((0,\infty)\times(0,\infty)), \end{align}$

(31)

$\begin{align} & 0\le \rho_\ell\le 1, \quad \ell\in \{1,...,m+n\}, \end{align}$

(32)

$\begin{align} & \rho_{i, \varepsilon_k} \longrightarrow \rho_i,\quad {a.e.\;and\;in\;} L^p_{loc}((0,\infty)\times(-\infty,0)), \end{align}$

(33)

$\begin{align} & \rho_{j, \varepsilon_k} \longrightarrow \rho_j,\quad \rm{a.e.\;and\;in\;} L^p_{loc}((0,\infty)\times(0,\infty)), \end{align}$

(34)

for every $1\le p<\infty,\, i\in\{1,..,m\},\, j\in\{m+1,..,m+n\}.$ Moreover, we have that

$\begin{align} \sum \limits_{i = 1}^m & \left\|{ \rho_i(t,\cdot)}\right\|_{L^1(-\infty,0)} + \sum \limits_{j = m+1}^{m+n} \left\|{ \rho_j(t,\cdot)}\right\|_{L^1(0,\infty)} \end{align}$

(35)

$\begin{align} &\quad\quad\le \sum \limits_{i = 1}^m \left\|{\rho_{i,0}}\right\|_{L^1(-\infty,0)} + \sum \limits_{j = m+1}^{m+n} \left\|{\rho_{j,0}}\right\|_{L^1(0,\infty)},\\ \sum\limits_{i = 1}^m & \left\|{ \rho_i(t,\cdot)}\right\|_{L^2(-\infty,0)}^2 + \sum\limits_{j = m+1}^{m+n}\left\|{ \rho_j(t,\cdot)}\right\|_{L^2(0,\infty)}^2 \end{align}$

(36)

$\begin{align} &\quad \quad\le \sum \limits_{i = 1}^m \left\|{\rho_{i,0}}\right\|_{L^2(-\infty,0)}^2 + \sum \limits_{j = m+1}^{m+n} \left\|{\rho_{j,0}}\right\|_{L^2(0,\infty)}^2 \\ &\quad\quad\qquad+2\left(\sum\limits_{\ell = 1}^{m+n}\left\|{\beta_\ell}\right\|_{L^\infty((0,1)^{m+n})}+\sum\limits_{i = 1}^m\left\|{f_i}\right\|_{L^1(0,1)}\right)t,\\ \sum \limits_{i = 1}^m & TV(f_i( \rho_i(t,\cdot))) + \sum \limits_{j = m+1}^{m+n} TV(f_j( \rho_j(t,\cdot))) \\&\quad\quad = \sum \limits_{i = 1}^m \left\|{ \partial_t \rho_i(t,\cdot)}\right\|_{\mathcal{M}(-\infty,0)} + \sum \limits_{j = m+1}^{m+n} \left\|{ \partial_t \rho_j(t,\cdot)}\right\|_{\mathcal{M}(0,\infty)} \\ &\quad\quad \le (m+n)C+\sum\limits_{i = 1}^m\left\|{f_i'}\right\|_{L^\infty(0,1)}TV(\rho_{i,0})+\sum\limits_{j = m+1}^{m+n}\left\|{f_j'}\right\|_{L^\infty(0,1)}TV(\rho_{j,0}).\end{align}$

(37)

Thanks to the genuine nonlinearity of $f_1,...,f_{m+n}$ , we can use the Tartar compensated compactness method [18] to obtain strong convergence of a subsequence of viscosity approximations. The notation $\mathfrak{R}$ can stand for $(0,\infty)$ or $(-\infty, 0)$ .

Theorem 4.2 (Tartar). Let $\{v_\nu\}_{\nu>0}$ be a family of functions defined on $(0,\infty)\times\mathfrak{R}$ such that

$\begin{equation*} \left\|{v_\nu}\right\|_{L^\infty((0,T)\times\mathfrak{R})}\le M_T,\qquad T,\,\nu > 0, \end{equation*}$

and the family

$\begin{equation*} \{ \partial_t\eta(v_\nu)+ \partial_x q_\ell(v_\nu)\}_{\nu > 0} \end{equation*}$

is compact in $H^{-1}_{loc}((0,\infty)\times\mathfrak{R})$ , for every convex $\eta\in C^2( \mathbb{R})$ , where $q_\ell' = f_\ell'\eta'$ . Then there exist a sequence $\{\nu_n\}_{n\in \mathbb{N}}\subset(0,\infty),\,\nu_n\to 0,$ and a map $v\in L^\infty((0,T)\times\mathfrak{R}),\,T>0,$ such that

$\begin{equation*} v_{\nu_n}\longrightarrow v\quad {a.e.\;and\;in\;} L^p_{loc}((0,\infty)\times\mathfrak{R}),\,1\le p < \infty. \end{equation*}$

The following compact embedding of Murat [17] is useful.

Theorem 4.3 (Murat). Let $\Omega$ be a bounded open subset of $\mathbb{R}^N$ , $N\ge 2$ . Suppose the sequence $\left\{{\mathcal L}_n\right\}_{n\in \mathbb{N}}$ of distributions is bounded in $W^{-1,\infty}(\Omega)$ . Suppose also that

$\begin{equation*} {\mathcal L}_n = {\mathcal L}_{1,n} + {\mathcal L}_{2,n}, \end{equation*}$

where $\left\{{\mathcal L}_{1,n}\right\}_{n\in \mathbb{N}}$ lies in a compact subset of $H_{\mathrm{loc}}^{-1}(\Omega)$ and $\left\{ {\mathcal L}_{2,n}\right\}_{n\in \mathbb{N}}$ lies in a bounded subset of $L^1_{loc}(\Omega)$ . Then $\left\{{\mathcal L}_n\right\}_{n\in \mathbb{N}}$ lies in a compact subset of $H_{\mathrm{loc}}^{-1}(\Omega)$ .

Proof of Lemma 4.1. Let us fix $i\in\{1,...,m\}$ and prove the lemma for the incoming edges, as the proof for the outgoing ones is analogous.

Let $\eta: \mathbb{R}\to \mathbb{R}$ be any convex $C^2$ entropy function, and let $q_i: \mathbb{R}\to \mathbb{R}$ be the corresponding entropy flux defined by $q_i' = \eta'f_i'$ . By multiplying $i-$ th equation in (5) by $\eta'( \rho_{i,\varepsilon})$ and using the chain rule, we get

$\begin{equation} \partial_t \eta( \rho_{i,\varepsilon})+ \partial_x q_i( \rho_{i,\varepsilon}) = \underbrace{ \varepsilon \partial_{xx}^2 \eta( \rho_{i,\varepsilon})}_{{\mathcal L}_{1, \varepsilon}} \underbrace{- \varepsilon \eta''( \rho_{i,\varepsilon}) \left( \partial_x \rho_{i,\varepsilon}\right)^2}_{{\mathcal L}_{2, \varepsilon}}. \end{equation}$

(38)

We claim that

$\begin{equation} \begin{split} & {{\mathcal L}_{1, \varepsilon}\longrightarrow 0{\rm\;in}\;H^{-1}((0,T)\times(-\infty, 0)),\,T > 0, as \varepsilon\to 0},\\ & {\{{\mathcal L}_{2, \varepsilon}\}_ \varepsilon {\rm\;is\;uniformly\;bounded\;in}\;L^1((0,T)\times(-\infty, 0)),\,T > 0.} \end{split} \end{equation}$

(39)

Indeed, (25) and (27) imply

$\begin{align*} &\left\|{ \varepsilon \partial_x \eta( \rho_{i,\varepsilon})}\right\|_{L^2((0,T)\times(-\infty,0))} \le \sqrt{ \varepsilon}\left\|{\eta'}\right\|_{L^\infty(0,1)} \left\|{\sqrt{ \varepsilon} \partial_x \rho_{i,\varepsilon}}\right\|_{L^2((0,\infty)\times(-\infty,0))}\\ & \qquad\qquad\qquad \le\sqrt{ \varepsilon}\left\|{\eta'}\right\|_{L^\infty(0,1)} \left( \sum\limits_{i = 1}^m\left\|{\rho_{i, \varepsilon,0}}\right\|_{L^2(-\infty,0)}+\sum\limits_{j = m+1}^{m+n}\left\|{\rho_{j, \varepsilon,0}}\right\|_{L^2(0,\infty)}\right.\\ &\qquad\qquad\qquad\qquad \left. +\sqrt{2\left(\sum\limits_{\ell = 1}^{m+n}\left\|{\beta_\ell}\right\|_{L^\infty((0,1)^{m+n})}+\sum\limits_{i = 1}^m\left\|{f_i}\right\|_{L^1(0,1)}\right)T} \right) \to 0,\\ &\left\|{ \varepsilon \eta''( \rho_{i,\varepsilon}) \left( \partial_x \rho_{i,\varepsilon}\right)^2}\right\|_{L^1((0,T)\times(-\infty,0))} \le \left\|{\eta''}\right\|_{L^\infty(0,1)} \left( \sum\limits_{i = 1}^m\left\|{\rho_{i, \varepsilon,0}}\right\|^2_{L^2(-\infty,0)}\right.\\ &\qquad\qquad\qquad\qquad+\sum\limits_{j = m+1}^{m+n} \left\|{\rho_{j, \varepsilon,0}}\right\|^2_{L^2(0,\infty)}\\ &\qquad\qquad\qquad\qquad\left.+{2\left(\sum\limits_{\ell = 1}^{m+n}\left\|{\beta_\ell}\right\|_{L^\infty((0,1)^{m+n})}+\sum\limits_{i = 1}^m\left\|{f_i}\right\|_{L^1(0,1)}\right)T} \right). \end{align*}$

Due to (16), (39) follows. Therefore, Theorems 4.3 and 4.2 give the existence of a subsequence $\{ \rho_{i,\varepsilon_k}\}_{k\in \mathbb{N}}$ and a limit function $\rho_i$ satisfying (30) such that as $k\to \infty$

$\begin{equation} \begin{split} & {\; \rho_{i,\varepsilon_k} \longrightarrow \rho_i\; {\rm in} \;L^p_{loc}((0,\infty)\times(-\infty,0))\;{\rm\; for\;any} \;p\in [1,\infty)\;},\\ & {\; \rho_{i,\varepsilon_k} \longrightarrow \rho_i\; a.e.\; {\rm\; in }\;(0,\infty)\times(-\infty,0)\;,} \end{split} \end{equation}$

(40)

that guarantees (32) and (33).

Finally, thanks to Lemmas 3.2, 3.3, and 3.4 we have (35), (36), and (37).

Proof of Theorem 1.2.. The first part of the statement related to the convergence of vanishing viscosity approximations has been proved in Lemma 4.1.

Let us fix $i \in \{ 1, \ldots, m \}$ and prove (9) for the incoming edges, the case of the outgoing ones is analogous.

Thanks to (3.4) and (33), for all $\varphi \in C^\infty((0,\infty)\times(-\infty,0))$ with compact support, we have

$\begin{align*} \int_0^\infty & \int^0_{-\infty} \rho_i \partial_t \varphi\, dx dt = \lim \limits_k \int_0^\infty \int^0_{-\infty} \rho_{i, \varepsilon_k} \partial_t \varphi \, dx dt \\ = & - \lim \limits_k \int_0^\infty \int^0_{-\infty} \partial_t \rho_{i, \varepsilon_k} \varphi \, dx dt \\ \le & \left\|{ \varphi}\right\|_{L^\infty((0,\infty)\times(-\infty, 0))} \left( (m+n)C +\sum\limits_{i = 1}^m\left\|{f_i'}\right\|_{L^\infty(0,1)}TV(\rho_{i,0})\right.\\ &\qquad\qquad\left.+\sum\limits_{j = m+1}^{m+n}\left\|{f_j'}\right\|_{L^\infty(0,1)}TV(\rho_{j,0})\right), \end{align*}$

therefore

$\begin{equation} \partial_t \rho_i \in \mathcal{M} ((0,\infty)\times(-\infty,0)), \end{equation}$

(41)

where $\mathcal{M}((0,\infty)\times(-\infty,0))$ is the set of all Radon measures on $(0,\infty)\times(-\infty,0).$ Moreover, from the equations in (1) and (2) we have also

$\begin{equation} \partial_x f_i(\rho_i)\in \mathcal{M}((0,\infty)\times(-\infty,0)). \end{equation}$

(42)

Clearly (41) and (42) give (9) and so the trace at the junction $f(\rho_{i}(t,0-))$ exists for a.e. $t > 0$ .

We prove now that the identity

$\begin{equation} \sum \limits_{i = 1}^m f_i(\rho_i(t,0-)) = \sum \limits_{j = m + 1}^{n + m} f_j(\rho_j(t,0+)) \end{equation}$

(43)

holds for a.e. $t > 0$ ; consequently the functions $\rho_1,\ldots, \rho_{m+n}$ provide a solution to (1), (2), and (3) in the sense of Definition 1.1.

Let $\varphi \in C^1([0,\infty)), \, \varphi(0) = 0$ with compact support. Consider the sequence $\{r_\nu\}_{\nu\in \mathbb{N}\setminus\{0\}}\subset C^2([0,\infty))$ of cut-off functions satisfying

$\begin{equation} 0 \le r_\nu(x) \le 1, \qquad r_\nu(0) = 1, \qquad {\rm{supp}}\, \left( r_\nu \right) \subseteq \left[0, \frac 1\nu \right], \end{equation}$

(44)

for every $x \ge 0$ and $\nu \ge 1$ . Moreover, for every $\nu \ge 1$ , we define the sequence $\{\tilde r_\nu\}_{\nu\in \mathbb{N}\setminus\{0\}}\subset C^2((-\infty,0])$ by writing $\tilde r_\nu(x) = r_\nu(-x)$ for every $x \le 0$ .

From (5) we have that

$\begin{align*} 0 = & \sum \limits_{i = 1}^m \int_0^\infty \int^0_{-\infty} \left( \partial_t \rho_{i,\varepsilon_k} + \partial_x f_i( \rho_{i,\varepsilon_k}) - \varepsilon_k \partial_{xx}^2 \rho_{i,\varepsilon_k} \right) \varphi(t) \tilde r_\nu(x) dx dt\\ & + \sum \limits_{j = m+1}^{m+n} \int_0^\infty \int_0^\infty \left( \partial_t \rho_{j,\varepsilon_k} + \partial_x f_j( \rho_{j,\varepsilon_k}) - \varepsilon_k \partial_{xx}^2 \rho_{j,\varepsilon_k} \right) \varphi(t) r_\nu(x) dx dt\\ = & -\sum \limits_{i = 1}^m \int_0^\infty \int^0_{-\infty} \left( \rho_{i,\varepsilon_k} \varphi'(t) \tilde r_\nu(x) + f_i( \rho_{i,\varepsilon_k}) \varphi(t) \tilde r_\nu'(x) - \varepsilon_k \partial_x \rho_{i,\varepsilon_k} \varphi(t) \tilde r_\nu'(x) \right) dx dt\\ & - \sum \limits_{j = {m+1}}^{m+n} \int_0^\infty \int_0^\infty \left( \rho_{j,\varepsilon_k} \varphi'(t)r_\nu(x) + f_j( \rho_{j,\varepsilon_k}) \varphi(t)r_\nu'(x) - \varepsilon_k \partial_x \rho_{j,\varepsilon_k} \varphi(t) r_\nu'(x) \right) dx dt\\ & + \sum\limits_{i = 1}^m \int_0^\infty \left( f_i( \rho_{i,\varepsilon_k}(t,0))- \varepsilon_k \partial_x \rho_{i,\varepsilon_k}(t,0)\right) \varphi(t)dt\\ & - \sum \limits_{j = m+1}^{m+n} \int_0^\infty \left(f_j( \rho_{j,\varepsilon_k}(t,0))- \varepsilon_k \partial_x \rho_{j,\varepsilon_k}(t,0)\right) \varphi(t) dt\\ = & -\sum \limits_{i = 1}^m \int_0^\infty \int^0_{-\infty} \left( \rho_{i,\varepsilon_k} \varphi'(t) \tilde r_\nu(x) + f_i( \rho_{i,\varepsilon_k}) \varphi(t) \tilde r_\nu'(x) - \varepsilon_k \partial_x \rho_{i,\varepsilon_k} \varphi(t) \tilde r_\nu'(x) \right) dx dt\\ & - \sum \limits_{j = {m+1}}^{m+n} \int_0^\infty \int_0^\infty \left( \rho_{j,\varepsilon_k} \varphi'(t)r_\nu(x) + f_j( \rho_{j,\varepsilon_k}) \varphi(t) r_\nu'(x) - \varepsilon_k \partial_x \rho_{j,\varepsilon_k} \varphi(t) r_\nu'(x)\right) dx dt. \end{align*}$

As $k\to\infty$ , due to (27), (33), and (34),

$\begin{align*} 0 = & -\sum \limits_{i = 1}^m \int_0^\infty \int^0_{-\infty} \left( \rho_i \varphi'(t)\tilde r_\nu(x) + f_i( \rho_i) \varphi(t) \tilde r_\nu'(x)\right) dx dt\\ & - \sum\limits_{j = {m+1}}^{m+n}\int_0^\infty\int_0^\infty \left( \rho_j \varphi'(t)r_\nu(x) + f_j( \rho_j) \varphi(t)r_\nu'(x)\right) dx dt. \end{align*}$

Finally, sending $\nu\to \infty$ ,

$\begin{equation*} 0 = -\sum\limits_{i = 1}^m\int_0^\infty f_i( \rho_i(t,0-)) \varphi(t)dt+ \sum\limits_{j = {m+1}}^{m+n}\int_0^\infty f_j( \rho_j(t,0+)) \varphi(t)dt, \end{equation*}$

that gives (43).

References

[1]	N. Cammarata, S. Carter, G. Goh, C. Olah, M. Petrov, L. Schubert, et al., Thread: Circuits, Distill, 5 (2020), e24.
[2]	M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in European Conference on Computer Vision, Springer, (2014), 818–833. https://doi.org/10.1007/978-3-319-10590-1_53
[3]	I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
[4]	L. Chizat, F. Bach, Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss, in Conference on Learning Theory, PMLR, (2020), 1305–1338.
[5]	S. Wojtowytsch, On the convergence of gradient descent training for two-layer relu-networks in the mean field regime, preprint, arXiv: 2005.13530.
[6]	S. Mei, A. Montanari, P. M. Nguyen, A mean field view of the landscape of two-layer neural networks, Proc. Natl. Acad. Sci., 115 (2018), E7665–E7671.
[7]	P. Nguyen, H. T. Pham, A rigorous framework for the mean field limit of multilayer neural networks, preprint, arXiv: 2001.11443.
[8]	L. Chizat, E. Oyallon, F. Bach, On lazy training in differentiable programming, in Advances in Neural Information Processing Systems, 32 (2019), 1–11.
[9]	G. Yang, E. J. Hu, Tensor programs iv: Feature learning in infinite-width neural networks, in International Conference on Machine Learning, PMLR, (2021), 11727–11737.
[10]	A. Jacot, F. Gabriel, C. Hongler, Neural tangent kernel: Convergence and generalization in neural networks, Adv. Neural Inf. Proc. Syst., 31 (2018), 1–10.
[11]	L. Chizat, F. Bach, On the global convergence of gradient descent for over-parameterized models using optimal transport, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, (2018), 3040–3050.
[12]	G. Rotskoff, E. Vanden-Eijnden, Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks, in Advances in Neural Information Processing Systems (eds. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett), 31 (2018), 1–10.
[13]	J. Sirignano, K. Spiliopoulos, Mean field analysis of neural networks: A law of large numbers, SIAM J. Appl. Math., 80 (2020), 725–752. https://doi.org/10.1137/18M1192184 doi: 10.1137/18M1192184
[14]	C. Ma, L. Wu, Machine learning from a continuous viewpoint, I, Sci. Chin. Math., 63 (2020), 2233–2266, https://doi.org/10.1007%2Fs11425-020-1773-8.
[15]	H. Daneshmand, F. Bach, Polynomial-time sparse measure recovery, preprint, arXiv: 2204.07879.
[16]	F. Bach, Breaking the curse of dimensionality with convex neural networks, J. Mach. Learn. Res., 18 (2017), 629–681.
[17]	A. Cloninger, T. Klock, A deep network construction that adapts to intrinsic dimensionality beyond the domain, Neural Networks, 141 (2021), 404–419. https://doi.org/10.1016/j.neunet.2021.06.004 doi: 10.1016/j.neunet.2021.06.004
[18]	A. Damian, J. Lee, M. Soltanolkotabi, Neural networks can learn representations with gradient descent, in Conference on Learning Theory, PMLR, (2022), 5413–5452.
[19]	A. Mousavi-Hosseini, S. Park, M. Girotti, M. Ioannis, M. Erdogdu, Neural networks efficiently learn low-dimensional representations with SGD, preprint, arXiv: 2209.14863.
[20]	J. Paccolat, L. Petrini, M. Geiger, K. Tyloo, M. Wyart, Geometric compression of invariant manifolds in neural networks, J. Stat. Mech. Theory Exp., 2021 (2021), 044001. https://doi.org/10.1088/1742-5468/abf1f3 doi: 10.1088/1742-5468/abf1f3
[21]	E. Abbe, E. Boix-Adsera, T. Misiakiewicz, The merged-staircase property: a necessary and nearly sufficient condition for sgd learning of sparse functions on two-layer neural networks, preprint, arXiv: 2202.08658.
[22]	E. Abbe, E. Boix-Adsera, M. S. Brennan, G. Bresler, D. Nagaraj, The staircase property: How hierarchical structure can guide deep learning, Adv. Neural Inf. Proc. Syst., 34 (2021), 26989–27002.
[23]	Z. Allen-Zhu, Y. Li, Y. Liang, Learning and generalization in overparameterized neural networks, going beyond two layers, in Proceedings of the 33rd International Conference on Neural Information Processing Systems, (2019), 6158–6169.
[24]	J. Ba, M. A. Erdogdu, T. Suzuki, Z. Wang, D. Wu, G. Yang, High-dimensional asymptotics of feature learning: How one gradient step improves the representation, preprint, arXiv: 2205.01445.
[25]	G. Yehudai, O. Shamir, On the power and limitations of random features for understanding neural networks, Adv. Neural Inf. Proc. Syst., 32 (2019), 1–11.
[26]	B. Bloem-Reddy, Y. W. Teh, Probabilistic symmetries and invariant neural networks, J. Mach. Learn. Res., 21 (2020), 90–91. https://doi.org/10.1109/MMM.2020.3008308 doi: 10.1109/MMM.2020.3008308
[27]	I. Ganev, R. Walters, The QR decomposition for radial neural networks, preprint, arXiv: 2107.02550.
[28]	G. Głuch, R. Urbanke, Noether: The more things change, the more stay the same, preprint, arXiv: 2104.05508.
[29]	Z. Ji, M. Telgarsky, Gradient descent aligns the layers of deep linear networks, preprint, arXiv: 1810.02032.
[30]	T. Gallouët, M. Laborde, L. Monsaingeon, An unbalanced optimal transport splitting scheme for general advection-reaction-diffusion problems, ESAIM Control. Optim. Calc. Var., 25 (2019), 8. https://doi.org/10.1051/cocv/2018001 doi: 10.1051/cocv/2018001
[31]	L. Chizat, Sparse optimization on measures with over-parameterized gradient descent, Math. Program., 194 (2022), 487–532. https://doi.org/10.1007/s10107-021-01636-z doi: 10.1007/s10107-021-01636-z
[32]	F. Santambrogio, Optimal transport for applied mathematicians, Birkäuser, NY, 55 (2015), 94. https://doi.org/10.1007/978-3-319-20828-2
[33]	F. Santambrogio, {Euclidean, metric, and Wasserstein} gradient flows: an overview, Bull. Math. Sci., 7 (2017), 87–154. https://doi.org/10.1007/s13373-017-0101-1 doi: 10.1007/s13373-017-0101-1
[34]	K. Atkinson, W. Han, Spherical Harmonics and Approximations on the Unit Sphere: An Introduction, Springer, 2012. https://doi.org/10.1007/978-3-642-25983-8

This article has been cited by:

1.	Markus Musch, Ulrik Skre Fjordholm, Nils Henrik Risebro, Well-posedness theory for nonlinear scalar conservation laws on networks, 2022, 17, 1556-1801, 101, 10.3934/nhm.2021025
2.	Francesca R. Guarguaglini, Roberto Natalini, Vanishing viscosity approximation for linear transport equations on finite star-shaped networks, 2021, 21, 1424-3199, 2413, 10.1007/s00028-021-00688-0
3.	John D. Towers, An explicit finite volume algorithm for vanishing viscosity solutions on a network, 2022, 17, 1556-1801, 1, 10.3934/nhm.2021021
4.	Ulrik S. Fjordholm, Markus Musch, Nils H. Risebro, Well-Posedness and Convergence of a Finite Volume Method for Conservation Laws on Networks, 2022, 60, 0036-1429, 606, 10.1137/21M145001X
5.	Jon Asier Bárcena-Petisco, Márcio Cavalcante, Giuseppe Maria Coclite, Nicola De Nitti, Enrique Zuazua, Control of hyperbolic and parabolic equations on networks and singular limits, 2024, 0, 2156-8472, 0, 10.3934/mcrf.2024015
6.	Dilip Sarkar, Shridhar Kumar, Pratibhamoy Das, Higinio Ramos, Higher-order convergence analysis for interior and boundary layers in a semi-linear reaction-diffusion system networked by a $ k $-star graph with non-smooth source terms, 2024, 19, 1556-1801, 1085, 10.3934/nhm.2024048

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1 1.3

Metrics

Article views(1704) PDF downloads(55) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Electronic Research Archive

On the symmetries in the dynamics of wide two-layer neural networks

Related Papers:

Abstract

1. Introduction

2. Initial and transmission conditions for the parabolic problem

3. A priori estimates

4. Proof of Theorem 1.2

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Abstract

1. Introduction

2. Initial and transmission conditions for the parabolic problem

3. A priori estimates

4. Proof of Theorem 1.2

References

Electronic Research Archive

On the symmetries in the dynamics of wide two-layer neural networks

Related Papers:

Abstract

1. Introduction

2. Initial and transmission conditions for the parabolic problem

3. A priori estimates

4. Proof of Theorem 1.2

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

Abstract

1. Introduction

2. Initial and transmission conditions for the parabolic problem

3. A priori estimates

4. Proof of Theorem 1.2

References