Citation: Radi P. Romansky, Irina S. Noninska. Challenges of the digital age for privacy and personal data protection[J]. Mathematical Biosciences and Engineering, 2020, 17(5): 5288-5303. doi: 10.3934/mbe.2020286
[1] | Minlong Lin, Ke Tang . Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data and Information Analytics, 2017, 2(1): 1-21. doi: 10.3934/bdia.2017005 |
[2] | Subrata Dasgupta . Disentangling data, information and knowledge. Big Data and Information Analytics, 2016, 1(4): 377-390. doi: 10.3934/bdia.2016016 |
[3] | Qinglei Zhang, Wenying Feng . Detecting Coalition Attacks in Online Advertising: A hybrid data mining approach. Big Data and Information Analytics, 2016, 1(2): 227-245. doi: 10.3934/bdia.2016006 |
[4] | Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu . Why Curriculum Learning & Self-paced Learning Work in Big/Noisy Data: A Theoretical Perspective. Big Data and Information Analytics, 2016, 1(1): 111-127. doi: 10.3934/bdia.2016.1.111 |
[5] | Xin Yun, Myung Hwan Chun . The impact of personalized recommendation on purchase intention under the background of big data. Big Data and Information Analytics, 2024, 8(0): 80-108. doi: 10.3934/bdia.2024005 |
[6] | Pankaj Sharma, David Baglee, Jaime Campos, Erkki Jantunen . Big data collection and analysis for manufacturing organisations. Big Data and Information Analytics, 2017, 2(2): 127-139. doi: 10.3934/bdia.2017002 |
[7] | Zhen Mei . Manifold Data Mining Helps Businesses Grow More Effectively. Big Data and Information Analytics, 2016, 1(2): 275-276. doi: 10.3934/bdia.2016009 |
[8] | Ricky Fok, Agnieszka Lasek, Jiye Li, Aijun An . Modeling daily guest count prediction. Big Data and Information Analytics, 2016, 1(4): 299-308. doi: 10.3934/bdia.2016012 |
[9] | M Supriya, AJ Deepa . Machine learning approach on healthcare big data: a review. Big Data and Information Analytics, 2020, 5(1): 58-75. doi: 10.3934/bdia.2020005 |
[10] | Sunmoo Yoon, Maria Patrao, Debbie Schauer, Jose Gutierrez . Prediction Models for Burden of Caregivers Applying Data Mining Techniques. Big Data and Information Analytics, 2017, 2(3): 209-217. doi: 10.3934/bdia.2017014 |
For a continuous risk outcome
Given fixed effects
In this paper, we assume that the risk outcome
y=Φ(a0+a1x1+⋯+akxk+bs), | (1.1) |
where
Given random effect model (1.1), the expected value
We introduce a family of interval distributions based on variable transformations. Probability densities for these distributions are provided (Proposition 2.1). Parameters of model (1.1) can then be estimated by maximum likelihood approaches assuming an interval distribution. In some cases, these parameters get an analytical solution without the needs for a model fitting (Proposition 4.1). We call a model with a random effect, where parameters are estimated by maximum likelihood assuming an interval distribution, an interval distribution model.
In its simplest form, the interval distribution model
The paper is organized as follows: in section 2, we introduce a family of interval distributions. A measure for tail fatness is defined. In section 3, we show examples of interval distributions and investigate their tail behaviours. We propose in section 4 an algorithm for estimating the parameters in model (1.1).
Interval distributions introduced in this section are defined for a risk outcome over a finite open interval
Let
Let
Φ:D→(c0,c1) | (2.1) |
be a transformation with continuous and positive derivatives
Given a continuous random variable
y=Φ(a+bs), | (2.2) |
where we assume that the range of variable
Proposition 2.1. Given
g(y,a,b)=U1/(bU2) | (2.3) |
G(y,a,b)=F[Φ−1(y)−ab]. | (2.4) |
where
U1=f{[Φ−1(y)−a]/b},U2=ϕ[Φ−1(y)] | (2.5) |
Proof. A proof for the case when
G(y,a,b)=P[Φ(a+bs)≤y] |
=P{s≤[Φ−1(y)−a]/b} |
=F{[Φ−1(y)−a]/b}. |
By chain rule and the relationship
∂Φ−1(y)∂y=1ϕ[Φ−1(y)]. | (2.6) |
Taking the derivative of
∂G(y,a,b)∂y=f{[Φ−1(y)−a]/b}bϕ[Φ−1(y)]=U1bU2. |
One can explore into these interval distributions for their shapes, including skewness and modality. For stress testing purposes, we are more interested in tail risk behaviours for these distributions.
Recall that, for a variable X over (−
For a risk outcome over a finite interval
We say that an interval distribution has a fat right tail if the limit
Given
Recall that, for a Beta distribution with parameters
Next, because the derivative of
{z = \mathrm{\Phi }}^{-1}\left(y\right) | (2.7) |
Then
Lemma 2.2. Given
(ⅰ)
(ⅱ) If
(ⅲ) If
Proof. The first statement follows from the relationship
{\left[g\left(y, a, b\right){\left({y}_{1}-y\right)}^{\beta }\right]}^{-1/\beta } = \frac{{\left[g\left(y, a, b\right)\right]}^{-1/\beta }}{{y}_{1}-y} = \frac{{\left[g\left(\mathrm{\Phi }\left(\mathrm{z}\right), a, b\right)\right]}^{-1/\beta }}{{y}_{1}-\mathrm{\Phi }\left(\mathrm{z}\right)}. | (2.8) |
By L’Hospital’s rule and taking the derivatives of the numerator and the denominator of (2.8) with respect to
For tail convexity, we say that the right tail of an interval distribution is convex if
Again, write
h\left(z, a, b\right) = \mathrm{log}\left[g\left(\mathrm{\Phi }\left(z\right), a, b\right)\right], | (2.9) |
where
g\left(y, a, b\right) = \mathrm{exp}\left[h\left(z, a, b\right)\right]. | (2.10) |
By (2.9), (2.10), using (2.6) and the relationship
{g}_{y}^{'} = {[h}_{z}^{'}\left(z\right)/{\rm{ \mathsf{ ϕ} }}\left(\mathrm{z}\right)]\mathrm{e}\mathrm{x}\mathrm{p}[h({\mathrm{\Phi }}^{-1}\left(y\right), a, b)], \\ {g}_{yy}^{''} = \left[\frac{{h}_{zz}^{''}\left(z\right)}{{{\rm{ \mathsf{ ϕ} }}}^{2}\left(\mathrm{z}\right)}-\frac{{h}_{z}^{'}\left(z\right){{\rm{ \mathsf{ ϕ} }}}_{\mathrm{z}}^{'}\left(z\right)}{{{\rm{ \mathsf{ ϕ} }}}^{3}\left(\mathrm{z}\right)}+\frac{{h}_{\mathrm{z}}^{\mathrm{'}}\left(\mathrm{z}\right){h}_{\mathrm{z}}^{\mathrm{'}}\left(\mathrm{z}\right)}{{{\rm{ \mathsf{ ϕ} }}}^{2}\left(\mathrm{z}\right)}\right]\mathrm{e}\mathrm{x}\mathrm{p}\left[h\right({\mathrm{\Phi }}^{-1}\left(y\right), a, b) ]. | (2.11) |
The following lemma is useful for checking tail convexity, it follows from (2.11).
Lemma 2.3. Suppose
In this section, we focus on the case where
One can explore into a wide list of densities with different choices for
A.
B.
C.
D.D.
Densities for cases A, B, C, and D are given respectively in (3.3) (section 3.1), (A.1), (A.3), and (A5) (Appendix A). Tail behaviour study is summarized in Propositions 3.3, 3.5, and Remark 3.6. Sketches of density plots are provided in Appendix B for distributions A, B, and C.
Using the notations of section 2, we have
By (2.5), we have
\mathrm{log}\left(\frac{{U}_{1}}{{U}_{2}}\right) = \frac{{-z}^{2}+2az-{a}^{2}+{b}^{2}{z}^{2}}{2{b}^{2}} | (3.1) |
= \frac{{-\left(1-{b}^{2}\right)\left(z-\frac{a}{1-{b}^{2}}\right)}^{2}+\frac{{b}^{2}}{1-{b}^{2}}{a}^{2}}{2{b}^{2}}\text{.} | (3.2) |
Therefore, we have
g\left(\mathrm{y}, a, b\right) = \frac{1}{b}\mathrm{e}\mathrm{x}\mathrm{p}\left\{\frac{{-\left(1-{b}^{2}\right)\left(z-\frac{a}{1-{b}^{2}}\right)}^{2}+\frac{{b}^{2}}{1-{b}^{2}}{a}^{2}}{2{b}^{2}}\right\}\text{.} | (3.3) |
Again, using the notations of section 2, we have
g\left(y, p, \rho \right) = \sqrt{\frac{1-\rho }{\rho }}\mathrm{e}\mathrm{x}\mathrm{p}\{-\frac{1}{2\rho }{\left[{\sqrt{1-\rho }{\mathrm{\Phi }}^{-1}\left(y\right)-\mathrm{\Phi }}^{-1}\left(p\right)\right]}^{2}+\frac{1}{2}{\left[{\mathrm{\Phi }}^{-1}\left(y\right)\right]}^{2}\}\text{, } | (3.4) |
where
Proposition 3.1. Density (3.3) is equivalent to (3.4) under the relationships:
a = \frac{{\Phi }^{-1}\left(p\right)}{\sqrt{1-\rho }} \ \ \text{and}\ \ b = \sqrt{\frac{\rho }{1-\rho }}. | (3.5) |
Proof. A similar proof can be found in [19]. By (3.4), we have
g\left(y, p, \rho \right) = \sqrt{\frac{1-\rho }{\rho }}\mathrm{e}\mathrm{x}\mathrm{p}\{-\frac{1-\rho }{2\rho }{\left[{{\mathrm{\Phi }}^{-1}\left(y\right)-\mathrm{\Phi }}^{-1}\left(p\right)/\sqrt{1-\rho }\right]}^{2}+\frac{1}{2}{\left[{\mathrm{\Phi }}^{-1}\left(y\right)\right]}^{2}\} |
= \frac{1}{b}\mathrm{exp}\left\{-\frac{1}{2}{\left[\frac{{\Phi }^{-1}\left(y\right)-a}{b}\right]}^{2}\right\}\mathrm{e}\mathrm{x}\mathrm{p}\left\{\frac{1}{2}{\left[{\mathrm{\Phi }}^{-1}\left(y\right)\right]}^{2}\right\} |
= {U}_{1}/{(bU}_{2}) = g(y, a, b)\text{.} |
The following relationships are implied by (3.5):
\rho = \frac{{b}^{2}}{1{+b}^{2}}, | (3.6) |
a = {\Phi }^{-1}\left(p\right)\sqrt{1+{b}^{2}}\text{.} | (3.7) |
Remark 3.2. The mode of
\frac{\sqrt{1-\rho }}{1-2\rho }{\mathrm{\Phi }}^{-1}\left(p\right) = \frac{\sqrt{1+{b}^{2}}}{1-{b}^{2}}{\mathrm{\Phi }}^{-1}\left(p\right) = \frac{a}{1-{b}^{2}}. |
This means
Proposition 3.3. The following statements hold for
(ⅰ)
(ⅱ)
(ⅲ) If
Proof. For statement (ⅰ), we have
Consider statement (ⅱ). First by (3.3), if
{\left[g\left(\mathrm{\Phi }\left(\mathrm{z}\right), a, b\right)\right]}^{-1/\beta } = {b}^{1/\beta }\mathrm{e}\mathrm{x}\mathrm{p}(-\frac{{\left({b}^{2}-1\right)z}^{2}+2az-{a}^{2}}{2\beta {b}^{2}}) | (3.8) |
By taking the derivative of (3.8) with respect to
-\left\{\partial {\left[g\left(\mathrm{\Phi }\left(\mathrm{z}\right), a, b\right)\right]}^{-\frac{1}{\beta }}/\partial z\right\}/{\rm{ \mathsf{ ϕ} }}\left(\mathrm{z}\right) = \sqrt{2\pi }{b}^{\frac{1}{\beta }}\frac{\left({b}^{2}-1\right)z+a}{\beta {b}^{2}}\mathrm{e}\mathrm{x}\mathrm{p}(-\frac{{\left({b}^{2}-1\right)z}^{2}+2az-{a}^{2}}{2\beta {b}^{2}}+\frac{{z}^{2}}{2})\text{.} | (3.9) |
Thus
\left\{\partial {\left[g\left(\mathrm{\Phi }\left(\mathrm{z}\right), a, b\right)\right]}^{-\frac{1}{\beta }}/\partial z\right\}/{\rm{ \mathsf{ ϕ} }}\left(\mathrm{z}\right) = -\sqrt{2\pi }{b}^{\frac{1}{\beta }}\frac{\left({b}^{2}-1\right)z+a}{\beta {b}^{2}}\mathrm{e}\mathrm{x}\mathrm{p}(-\frac{{\left({b}^{2}-1\right)z}^{2}+2az-{a}^{2}}{2\beta {b}^{2}}+\frac{{z}^{2}}{2})\text{.} | (3.10) |
Thus
For statement (ⅲ), we use Lemma 2.3. By (2.9) and using (3.2), we have
h\left(z, a, b\right) = \mathrm{log}\left(\frac{{U}_{1}}{{bU}_{2}}\right) = \frac{{-\left(1-{b}^{2}\right)\left(z-\frac{a}{1-{b}^{2}}\right)}^{2}+\frac{{b}^{2}}{1-{b}^{2}}{a}^{2}}{2{b}^{2}}-\mathrm{l}\mathrm{o}\mathrm{g}\left(b\right)\text{.} |
When
Remark 3.4. Assume
li{m}_{z⤍+\infty }-\left\{{\partial \left[g\left(\mathrm{\Phi }\left(\mathrm{z}\right), a, b\right)\right]}^{-\frac{1}{\beta }}/\partial z\right\}/{\rm{ \mathsf{ ϕ} }}\left(\mathrm{z}\right) |
is
For these distributions, we again focus on their tail behaviours. A proof for the next proposition can be found in Appendix A.
Proposition 3.5. The following statements hold:
(a) Density
(b) The tailed index of
Remark 3.6. Among distributions A, B, C, and Beta distribution, distribution B gets the highest tailed index of 1, independent of the choices of
In this section, we assume that
First, we consider a simple case, where risk outcome
y = \mathrm{\Phi }\left(v+bs\right), | (4.1) |
where
Given a sample
LL = \sum _{i = 1}^{n}\left\{\mathrm{log}f\left(\frac{{z}_{i}-{v}_{i}}{b}\right)-\mathrm{l}\mathrm{o}\mathrm{g}{\rm{ \mathsf{ ϕ} }}\left({z}_{i}\right)-logb\right\}\text{, } | (4.2) |
where
Recall that the least squares estimators of
SS = \sum _{i = 1}^{n}{({z}_{i}-{v}_{i})}^{2} | (4.3) |
has a closed form solution given by the transpose of
{\rm{X}} = \left\lceil {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {1\;\;{x_{11}} \ldots {x_{k1}}}\\ {1\;\;{x_{12}} \ldots {x_{k2}}} \end{array}}\\ \ldots \\ {1\;\;{x_{1n}} \ldots {x_{kn}}} \end{array}} \right\rceil , {\rm{Z}} = \left\lceil {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}} {{z_1}}\\ {{z_2}} \end{array}}\\ \ldots \\ {{z_n}} \end{array}} \right\rceil . |
The next proposition shows there exists an analytical solution for the parameters of model (4.1).
Proposition 4.1. Given a sample
Proof. Dropping off the constant term from (4.2) and noting
LL = -\frac{1}{2{b}^{2}}\sum _{i = 1}^{n}{({z}_{i}-{v}_{i})}^{2}-nlogb, | (4.4) |
Hence the maximum likelihood estimates
Next, we consider the general case of model (1.1), where the risk outcome
y = \mathrm{\Phi }[v+ws], | (4.5) |
where parameter
(a)
(b)
Given a sample
LL = \sum _{i = 1}^{n}-{\frac{1}{2}[\left({z}_{i}-{v}_{i}\right)}^{2}/{w}_{i}^{2}-{u}_{i}], | (4.6) |
LL = \sum _{i = 1}^{n}\{-\left({z}_{i}-{v}_{i}\right)/{w}_{\mathrm{i}}-2\mathrm{log}[1+\mathrm{e}\mathrm{x}\mathrm{p}[-({z}_{i}-{v}_{i})/{w}_{i}]-{u}_{i}\}, | (4.7) |
Recall that a function is log-concave if its logarithm is concave. If a function is concave, a local maximum is a global maximum, and the function is unimodal. This property is useful for searching maximum likelihood estimates.
Proposition 4.2. The functions (4.6) and (4.7) are concave as a function of
Proof. It is well-known that, if
For (4.7), the linear part
In general, parameters
Algorithm 4.3. Follow the steps below to estimate parameters of model (4.5):
(a) Given
(b) Given
(c) Iterate (a) and (b) until a convergence is reached.
With the interval distributions introduced in this paper, models with a random effect can be fitted for a continuous risk outcome by maximum likelihood approaches assuming an interval distribution. These models provide an alternative regression tool to the Beta regression model and fraction response model, and a tool for tail risk assessment as well.
Authors are very grateful to the third reviewer for many constructive comments. The first author is grateful to Biao Wu for many valuable conversations. Thanks also go to Clovis Sukam for his critical reading for the manuscript.
We would like to thank you for following the instructions above very closely in advance. It will definitely save us lot of time and expedite the process of your paper's publication.
The views expressed in this article are not necessarily those of Royal Bank of Canada and Scotiabank or any of their affiliates. Please direct any comments to Bill Huajian Yang at h_y02@yahoo.ca.
[1] | E. J. Bloustein, N. J. Pallone, Individual and Group Privacy, Routledge, New York, 2017. |
[2] | M. Oostveen, U. Irion, The golden age of personal data: How to regulate an enabling fundamental right?, in Personal Data in Competition, Consumer Protection and Intellectual Property Law (eds. M. Bakhoum, B. Conde Gallego, M. O. Mackenrodt, G. Surblytė-Namavičienė), Springer, (2018), 7-26. Available from: https://link.springer.com/chapter/10.1007/978-3-662-57646-5_2. |
[3] | R. Romansky, A survey of digital world opportunities and challenges for user's privacy, Int. J. Inform. Technol. Secur., 9 (2017), 97-112. |
[4] | Regulation (EU) 2916/679 of the European Parliament and the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protect Regulation), European Commission, 2016. Available from: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32016R0679. |
[5] | J. J. Hanus, H. G. Relyea, A policy assessment of the privacy act of 1974, Am. Univ. Law Rev., 25 (1976), 555. |
[6] | M. Shabani, P. Borry, Rules for processing genetic data for research purposes in view of the new EU general data protection regulation, Eur. J. Human Genet., 26 (2018), 149-156. |
[7] | A. V. Tsaregorodtsev, O. Ja. Kravets, O. N. Choporov, A. N. Zelenina, Information security risk estimation for cloud infrastructure, Int. J. Inform. Technol. Secur., 10 (2018), 67-76. |
[8] | O. Yu. Zaslavskaya, l. A. Zaslavskiy, V. E. Bolnokin, O. Ja. Kravets, Features of ensuring information security when using cloud technologies in educational institutions, Int. J. Inform. Technol. Secur., 10 (2018), 93-102. |
[9] | P. Wandra, H. Jie, DeepProfile: Finding fake profile in online social network using dynamic CNN, J. Inform. Secur. Appl., 52 (2020), article 102465. Available from: https://www.sciencedirect.com/science/article/abs/pii/S2214212619303801. |
[10] | V. Kharchenko, Big Data and Internet of Things for safety critical applications: Challenges, methodology and industry cases, Int. J. Inform. Technol. Secur., 10 (2018), 3-16. |
[11] | I. Alsmadi, R. Burdwell, A. Aleroud, A. Wahbeh, M. Al-Qudah, A. Al-Omari, Introduction to information security, in Practical Information Security (eds. I. Alsmadi, R. Burdwell, A. Aleroud, A. Wahbeh, M. Al-Qudah, A. Al-Omari), Springer, (2018), 1-16. Available from: https://www.springer.com/gp/book/9783319721187. |
[12] | H. Paanen, M. Lapke, M. Siponen, State of the art in information security policy development. Comp. Secur., 88 (2020), article 101608. Available from: https://www.sciencedirect.com/science/article/pii/S0167404818313002. |
[13] | M. A. Ferrag, H. Janicke, Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study, J. Inform. Secur. Appl., 50 (2020), article 102418. Available from: https://www.sciencedirect.com/science/article/pii/S2214212619305046. |
[14] | A. R. Mahlous, SSR: A framework for a secure software reuse, Int. J. Inform. Technol. Secur., 10 (2018), 87-98. |
[15] | Y. A. Ivanova, Assessment of the probability of cyberattacks on Transport Management Systems, Int. J. Inform. Technol. Secur., 10 (2018), 99-106. |
[16] | M. A. P. Chamikara, P. Bertok, D. Liu, S. Camtepe, I. Khalil, An efficient and scalable privacy preserving algorithm for big data and data streams. Comp. & Security, Special issue "Security and Privacy in Smart Cyber-physical Systems" (2019), article 101570. Available from: https://www.sciencedirect.com/journal/computers-and-security/special-issue/109XHWZ5JSX. |
[17] | Tz. Tzolov, Data model in the context of the general data protection regulation, Int. J. Inform. Technol. Secur., 9 (2017), 113-122. |
[18] | R. Romansky, I. Noninska, Principles of secure access and privacy in combined e-learnng environment: Architecture, formalization and modelling, in Multidisciplinary Perspectives on Human Capital and Information Technology Professionals (eds. V. Ahuja, S. Rathore), IGI Global Publ., USA (2018), 152-178. |
[19] | M. Aminzade, Confidentiality, integrity and availability—finding a balanced IT framework, Netw. Secur., 50 (2018), 9-11. Available from: https://www.sciencedirect.com/science/article/pii/S1353485818300436. |
[20] | Thales, 2020 Data Threat Report - Global Edition. Survey and Analysis from IDC, 2020. Available from: https://cpl.thalesgroup.com/data-threat-report. |
[21] | Guidelines on the Use of Cloud Computing Services by the European Institutions and Bodies, European Data Protection Supervisor, 2018. Available from: https://edps.europa.eu/data-protection/our-ork/publications/guidelines/guidelines-use-cloud-computing-services-european_en. |
[22] | Maximizing the value of your data privacy investments - data privacy benchmark study, CISCO Cybersecurity Series, 2019. Available from: https://www.cisco.com/c/dam/en_us/about/doing_business/trust-center/docs/dpbs-2019.pdf. |
[23] | Casey Crane, 20 surprising IoT statistics you don't already know, Security Boulevard, 5 Sep 2019. Available from: https://securityboulevard.com/2019/09/20-surprising-iot-statistics-you-dont-already-know/. |
[24] | A. Azmoodeh, A. Dehghantanha. Big data and privacy: Challenges and opportunities, in Handbook of Big Data Privacy (ed. K-K. R. Choo, A. Dehghantanha), Springer-Cham, Switzerland, (2020), 1-6. |