Issues, Problems and Potential Solutions when Simulating Continuous, Non-normal Data in the Social Sciences

Computer simulations have become one of the most prominent tools for methodologists in the social sciences to evaluate the properties of their statistical techniques and to offer best practice recommendations. Amongst the many uses of computer simulations, evaluating the robustness of methods to their assumptions, particularly univariate or multivariate normality, is crucial to ensure the appropriateness of data analysis. In order to accomplish this, quantitative researchers need to be able to generate data where they have a degree of control over its nonnormal properties. Even though great advances have been achieved in statistical theory and computational power, the task of simulating multivariate, non-normal data is not straightforward. There are inherent conceptual and mathematical complexities implied by the phrase “non-normality” which are not always reflected in the simulations studies conduced by social scientists. The present article attempts to offer a summary of some of the issues concerning the simulation of multivariate, non-normal data in the social sciences. An overview of common algorithms is presented as well as some of the characteristics and idiosyncrasies that implied in them which may exert undue influence in the results of simulation studies. A call is made to encourage the meta-scientific study of computer simulations in the social sciences in order to understand how simulation designs frame the teaching, usage and practice of statistical techniques within the social sciences.


Introduction
The method of Monte Carlo simulation has become the workhorse of the modern quantitative methodologist, enabling researchers to overcome a wide range of issues, from handling intractable estimation problems to helping guide and evaluate the development of new mathematical and statistical theory (Beisbart & Norton, 2012). From its inception in Los Alamos National laboratory, Monte Carlo simulations have provided insights to mathematicians, physicists, statisticians and almost any researcher who relies on quantitative analyses to further their field. I posit that computer simulations can address three broad classes of issues depending on the ultimate goal of the simulation itself: issues of estimation and mathematical tractability, issues of data modelling and issues of robustness evaluation. The first issue is perhaps best exemplified in the development of Markov Chain Monte Carlo (MCMC) techniques to estimate parameters for Bayesian analysis or to approximate the solution of complex integrals. The second is more often seen within areas such as mathematical biology or financial mathematics, where the behaviour of chaotic systems can be approximated as if they were random procsses (e.g. Hoover & Hoover, 2015). In this case, computer simulations are designed to answer "what if" type questions where slight alterations to the initial conditions of the system may yield widely divergent results. The final issue (and the one that will concern the rest of this article) is a hybrid of the previous two and particularly popular within psychology and the social sciences: the evaluation of robustness in statistical methods (Carsey & Harden, 2013). Whether it is testing for violations of distributional assumptions, presence of outliers, model misspecifications or finite sample studies where the asymptotic properties of estimators are evaluated under realistic sample sizes, the vast majority of quantitative research published within the social sciences is either exclusively based on computer simulations or presents a new theoretical development which is also evaluated or justified through simulations. Just by looking at the table of contents of three leading journals in quantitative psychology for the present year, Multivariate Behavioural Research, the British Journal of Mathematical and Statistical Psychology and Psychometrika, one can see that every article present makes use of computer simulations in one way or another.This type of simulation studies can be described in four general steps: (1) Decide the models and conditions from which the data will be generated (i.e. what "holds" in the population).
(3) Estimate the quantities of interest for the models being studied in Step (1).
(4) Save the parameter estimates, standard errors, goodness-of-fit indices, etc. for later analyses and go back to Step (2).
Steps (2)-(4) would be considered a replication within the framework of a Monte Carlo simulation and repeating them a large number of times shows the patterns of behaviour of the statistical methods under investigation that will result in further recommendations for users of these methods.
Robustness simulation studies emphasize the decisions made in Step (1) because the selection of statistical methods to test and data conditions will guide the recommendations that will subsequently inform data practice. For the case of non-normality, the level of skewness or kurtosis, presence/absence of outliers, etc. would be encoded here. Most of the time, Steps (2) through (4) are assumed to operate seamlessly either because the researcher has the sufficient technical expertise to program them in a computer or because it is just assumed that the subroutines and algorithms employed satisfy the requests of the researcher. A crucial aspect of the implementation of these algorithms and of the performance of the simulation in general is the ability of the researcher to ensure that the simulation design and the actual computer implementation of it are consistent with one another. If this consistency is not there then Step (2) is brought into question and one, either as a producer or consumer of simulation research, needs to wonder whether or not the conclusions obtained from the Monte Carlo studies are reliable. This issue constitutes the central message of this article as it pertains to how one would simulate multivariate, nonnormal data, the types of approaches that exist to do this and what researchers should be on the lookout for..

Non-normal data simulation in the social sciences
Investigating possible violations of distributional assumptions is one of the most prevalent types of robustness studies within the quantitative social sciences. Monte Carlo simulations have been used for such investigations on the general linear model (e.g., Beasley & Zumbo, 2003;Finch, 2005), multilevel modelling (e.g., Shieh, 2000), logistic regression (e.g., Hess, Olejnik, & Huberty, 2001), structural equation modelling (e.g., Curran, West & Finch, 1996) and many more. When univariate properties are of interest (such as, for example, the impact that non-normality has on the t-test or ANOVA) researchers have a plethora of distribution types to choose from. Distributions such as the exponential, log-normal and uniform are usually employed to test for non-zero skewness or excess kurtosis (e.g., Oshima & Algina (1992); Wiedermann & Alexandrowicz (2007); Zimmerman & Zumbo (1990). However, when the violation of assumptions implies a multivariate, non-normal structure, the data-generating process becomes considerably more complex because, for the continuous case, many candidate densities can be called the "multivariate" generalization of a well-known univariate distribution. (Kotz, Balakrishnan & Johnson, 2004). Consider, for instance, the case of a closelyrelated multivariate distribution to the normal: the multivariate t distribution. Kotz and Nadarajah (2004) list fourteen different representations of distributions that could be considered as "multivariate t", with the most popular representation being used primarily out of convenience, due to its connection with elliptical distributions. In general, there can be many mathematical objects which could be consider the multivariate generalization or "version" of well-known univariate probability distributions, and choosing among the potential candidates is not always a straightforward task.

From multivariate normal to multivariate nonnormal distributions: What works and what does not work
The normal distribution possesses a property that eases the generalization process from univariate to multivariate spaces in a relatively straightforward fashion: it is closed under convolutions (i.e., closed under linear combinations) such that adding normal random variables and multiplying them times constants results in a random variable which is itself normallydistributed. Let A 1 , A 2 , A 3 , . . . , A n be independent, normally distributed random variables such that A i ∼ N(µ i , σ 2 i ) for i = 1, 2, . . . , n. If k 1 , k 2 , k 3 , . . . , k n are realvalued constants then it follows that: For any real-valued matrix B of proper dimensions (besides the null matrix), define Y = BZ. Then, by using the property presented in Equation (1), the matrixvalued random variable Y + µ µ µ follows a multivariate normal distribution with mean vector µ µ µ and covariance matrix Σ = BB , which is known as the Cholesky decomposition or Cholesky factorization. For this particular calculation, a matrix (in this case the covariance matrix Σ) can be "decomposed" or "factorized" into a lower-triangular matrix (B in the example) and its transpose. It is important to point out that other matrixdecomposition approaches (such as Principal Component Analysis or Factor Analysis) could serve a similar role.
Although it would be tempting to follow the same general approach to construct a multivariate, nonnormal distribution (i.e., select a covariance/correlation matrix, decompose it in its factors, BB and multiply them times the matrix with uncorrelated, non-normal distributions of choice) and it has been done in the past (see, for instance, Hittner, May & Silver, 2003;Silver, Hittner & May, 2004;Wilcox & Tian Tian, 2008), it is of utmost importance to highlight that this procedure would only guarantee that the population correlation or covariance matrix is the one intended by the researcher. This property holds irrespective of whether the distributions to correlate are normal or non-normal. The univariate marginal distributions would lose their unique structures and, by the Central Limit Theorem, would become more and more normally distributed the more one-dimensional marginals are added. Figure 1 highlights this fact by reproducing the simulation conditions described in Silver, Hittner and May (2004) for the uniform case. Consider four independent, identically-distributed random variables (X 1 , X 2 , X 3 , X 4 ) which follow a standard, uniform distribution, U(0, 1) and a population correlation matrix R 4×4 with equal correlations of 0.5 in the off-diagonals. The process of multiplying the matrix with standard-uniform random variables times the Cholesky-decomposed R 4×4 to induce the correlational structure (matrix B in the paragraph above) ends up altering the univariate distributions such that they no longer follow the non-normal distributions intended by the researchers (Vale & Maurelli, 1983). The R code below exemplifies this process. In order to truly generate multivariate non-normal structures with a degree of control over the marginal distributions and the correlation structure simultaneously, more complex simulation techniques are needed.
(2) Apply the probability integral transformation by using the multivariate normal cumulative den- (3) Choose a suitable inverse CDF, F −1 (·), to apply to each marginal density of U so the the new multivariate random variable is defined as X = (F −1 1 (u 1 ), F −1 2 (u 2 ), ..., F −1 d (u d )) . Notice that the F −1 i (·) need not be the same for each i.
The NORTA algorithm encompasses a variety of methods where transformations of the one dimensional marginal distributions are applied to a multivariate normal structure in an attempt to induce non-normality from the univariate to multivariate marginals.
Specific applications of the NORTA framework to simulate non-normal data have different types of computational limitations, depending on how the method is implemented. The first one can be found in Cario and Nelson (2007) Equation (1) which pertains to the way in which the correlation matrix of the distribution is assembled. Applying the transformations in Step 2 above will almost always end up changing the final value of the Pearson correlation between the marginal distributions. Nevertheless, if Equation (1) in Cario and Nelson (2007) is solved for the correlation values intended for the researcher, then this correlation can be used in Step 1 above so that the final joint distribution has the population value originally desired. This approach, however, requires the correlation matrix to be created entry-byentry. The estimation may result in a final correlation matrix of the non-normal density which is not positive definite 1 . Care needs to be taken when implementing the NORTA approach to ensure both the non-normal and correlational structures are feasible.

The 3 rd order polynomial transformation
The 3 rd order polynomial transformation, or the Fleishman method, deserves a special mention because of the influence it has exerted on evaluating the robustness of statistical methods within psychology and the social sciences. Its original univariate characterization proposed by Fleishman (1978) and the multivariate extension developed by Vale and Maurelli (1983) are, by far, the most widely used algorithms to evaluate violations to the normality assumption in the social sciences. With upwards of 1100 citations combined between both articles, no other method to simulate non-normal data within these fields has been as extensively used as this approach.
The Fleishman method begins by defining a nonnormal random variable as follows: where Z ∼ N(0, 1) and {a, b, c, d} are real-valued constants that act as polynomial coefficients to define the non-normally distributed variable E. The coefficients are obtained by finding the solution to the following system of equations: where γ 1 is the population skewness and γ 2 is the population excess kurtosis defined by the user. Fleishmangenerated random variables are assumed to be standardized (at least initially), so the mean is fixed at 0 and the variance is fixed at 1. By solving this system of equations, given the input of the user for (γ 1 , γ 2 ), the resulting polynomial coefficients can be plugged into Equation (3) so that the non-normal random variable E has its first four moments determined by the user.
As an example, pretend a user is interested in obtaining a distribution with a population skewness of 1 and a 5 population excess kurtosis of 15. The following R code would generate the non-normal distribution E with the user-specified non-normality: sum(eqn * eqn) } sol <-nlminb (start = c(0 ,0 ,1) , objective = fl_syst , skew = sk , kurt = krt) } ## Solves the Fleishman system for skewness =1, kurtosis =15 fleishman (1 ,15) $par [1] 1.534711 0.170095 -0.306848 Although the original system contains four equations, notice that knowing c fully determines a (it simply switches sign). Once the polynomial coefficients are obtained, one simply needs to substitute them back in Equation (3): E = −0.170095 + 1.534711Z + 0.170095Z 2 − 0.306848Z 3 Notice that, by construction, E is standardized, which can be seen in the system described in Equation (4). Notice how the moments of E are the solutions of the system, and the system has the first equation set to 0 (the mean of E) and the second equation set to 1 (the variance of E). The same values become negative in the R code to solve for the system. Vale and Maurelli (1983) extended the Fleishman method by acknowledging the same limitation that was described in Section 2.2: If one Fleishman-transforms standard normal random variables and induces a correlation structure through a covariance matrix decom-position approach (as described in Section 2.1), the resulting marginal distributions would no longer have the (γ 1 , γ 2 ) values intended by the researcher. If, on the other hand, one begins with multivariate normal data and then applies the Fleishman transformation to each marginal distribution, then the resulting correlation structure would not be the same as the one originally intended by the researcher. Their solution, which is very much in line to the procedure described in Section 2.1, consisted of proposing a step between the final correlation structure and the 3 rd order polynomial transformation called the "intermediate correlation matrix". With this added step, one would simulate multivariate normal data where the intermediate correlation matrix holds in the population. Then one proceeds to apply the 3 rd order polynomial transformation to each marginal and, as they are transformed, the population correlations are altered to result in the final correlation matrix originally intended by the researcher. The intermediate correlation matrix is calculated as follows: where ρ E 1 E 2 is the intended correlation between the non-normal variables, {a i , b i , c i , d i } are the polynomial coefficients needed to implement the Fleishman transformation as described above and ρ Z 1 Z 2 is the intermediate correlation coefficient. Solving for this correlation coefficient would give the user control over univariate skewness, excess kurtosis and the correlation/covariance matrix. The 3 rd order polynomial transformation proposed by Fleishman (1978) (and extended by Vale and Maurelli (1983)) has several limitations. Tadikamalla (1980) and Headrick (2010) have commented on the fact that the combinations of skewness and excess kurtosis that can be simulated by this approach are limited when compared to other methods, such as the 5 th order polynomial transformation. The correlation matrix implied by Equation (5) is defined bivariately so that the intermediate correlation matrix has to be assembled one coefficient at a time, increasing the probability that it may not be positive definite. The range of correlations that can be simulated is also restricted and contingent on the values of the intermediate correlation coefficients (Headrick, 2002). For instance, if one were to correlate a standard normal variable and a Fleishman-defined variable with (γ 1 = 1, γ 2 = 15), the researcher can only choose correlations in the approximate [-0.614, 0.614] range. Correlations outside that range would make Equation (4) yield either non-real solutions or solutions outside [-1, 1] so that the correlation matrix becomes unviable. If one were to continue with the example above, for the normal case it would imply that b 1 = 1 and a 1 = c 1 = d 1 = 0 such that E 1 = 0+(1)Z +(0)Z 2 +(0)Z 3 and for E 2 = −0.170095 + 1.534711Z + 0.170095Z 2 − 0.306848Z 3 . Substituting these coefficients in Equation (5) would yield: by setting ρ Z 1 Z 2 = ±1 one can see that the maximum possible correlation between the normal E 1 and nonnormal E 2 is 0.614167. For ρ E 1 E 2 to be greater than that, ρ Z 1 Z 2 would have to be greater than 1, which would make the intermediate correlation matrix non-positive definite.
Although the previous limitations can be somewhat attenuated depending on the choice of γ 1 , γ 2 and ρ E 1 E 2 , there is one aspect of the Fleishman-Vale-Maurelli method that cannot be avoided because it is implicit within the theoretical framework in which it was developed. The system described in Equation (4) and the intermediate correlation in Equation (5) are all polynomials of high degree. As such, they have multiple solutions and there is little indication as far as which solution should be preferred over others. Astivia andZumbo (2018, 2019) have studied this issue before and documented the fact that the idiosyncrasies of the data generated by each solution can be as disparate as to alter the conclusions from previously published simulation studies. Although the 3 rd order polynomial method has been used extensively to investigate the properties of statistical methods, more research is needed to understand the properties and uses of the method itself to clarify to what extent the results from published simulation studies are contingent on the type of data that can be generated through this particular method. For instance, would the results from previously-published simulation studies generalize if other data-generating algorithms were implemented? Can the data that applied researchers collect be modelled through the 3 rd order polynomial method? Or does it follow other distribution types? The 3 rd order polynomial method offers control over the first four moments of a distribution (mean, variance, skewness and kurtosis). Is this sufficient to characterize the data? Or would methods that allow control over even higher moments needed?

Copula distributions
Recently, copula distribution theory has begun to make an incursion into psychometric modelling and the behavioural sciences (e.g., Jones, Mair, Kuppens & Weisz, 2019). Although the methods and techniques associated with it are known within the fields of financial mathematics and actuarial science, the flexibility and power of this framework is becoming popularized in other areas to enhance the modelling of multivariate, non-normal data.
In its simplest conceptualization, a copula is a type of multivariate density where the marginal distributions are uniform and the dependence structure is specified through a copula function (Joe, 2014). Two important mathematical results power the flexibility of this theoretical framework: the probability integral transform and Sklar's theorem. The probability integral transform allows one to convert any random variable with a welldefined CDF into a uniformly distributed random variable (or vice-versa if there is an inverse CDF). Sklar's theorem proves that any multivariate cumulative distribution function can be broken down into two independent parts: its unidimensional marginals and the copula function that relates them. Because of the generality of Sklar's theorem, one can be guaranteed that, given some mild regularity conditions, (interested reader can consult Durante, Fernández-Sánchez & Sempi, 2013) for any multivariate distribution a copula function that parameterizes its joint CDF exists.

Introduction to Gaussian copulas
Gaussian copulas comprise perhaps the most commonly used copula family for the analysis and simulation of data. They inherit many of the properties of the multivariate normal distribution that make them both analytically tractable and easy to interpret for researchers. In particular, Gaussian copulas rely on the covariance/correlation matrix to model the dependencies among its one-dimensional marginal distributions, so that the same covariance modelling that social scientists are familiar with generally translates into this type of copula modelling.
The Gaussian copula can be defined as follows: where u i , i = 1, 2, . . . , d are realizations of standard, uniformly distributed random variables, Φ −1 is the inverse CDF of the univariate normal distribution, and Φ R d×d is the CDF for the d-variate Gaussian distribution with correlation matrix R d×d . Similar to the NORTA approach, the process of building a Gaussian copula can be schematized in the following series of steps: (1) Simulate from a multivariate normal distribution with desired correlation matrix.
(2) Apply the normal CDF to the newly-generated multivariate normal vector so that the columns are now standard uniform bounded between 0 and 1.
(3) If the inverse CDF (the quantile function) of the desired non-normal distribution exist, apply said inverse CDF to each column of data. The resulting distribution would be a Gaussian copula with its one-dimensional marginal distributions selected by the researcher.

Correlation shrinkage
Although the Gaussian copula induces a relationship between the Gamma and Uniform marginals, it is important to highlight that the original correlation of  (2004) with assumed U(0, 1) marginal distributions ρ = 0.5 has now changed. When calculating the Pearson correlation of the original sample from the bivariate normal distribution against the one from the Gaussian copula, the difference becomes apparent. There is a shrinkage of about 0.05-0.06 units in the correlation metric, with the shrinkage contingent on the the size of the initial correlation. Figure 3 2 further clarifies this issue by relating the initial correlation of the bivariate normal distribution to the final correlation of the Gaussian copula. In other words, there is a downward bias of approximately 0.15 units in the correlation metric for the theoretically maximum correlation of this copula. There are two important reasons for why this happens, even though it is not always acknowledged within the simulation literature in the social sciences.
First, both the probability integral transform (U<-pnorm(Z)) and the quantile function (or inverse CDF) needed to obtain the non-normal marginal distributions (qgamma and qunif) are not linear transformations. The Pearson correlation is only invariant under linear transformations so it stands to reason that if nonlinear transformations are applied, there is no expectation that the correlation will remain the same. Second, there exists a result from copula distribution theory that places further restrictions on the range of the Pearson correlation referred to as the Fréchet-Hoeffding bounds. Hoeffding (1940) showed that the covariance 8 Figure 3. Relationship between the original correlation for the bivariate normal distribution and the final correlation for the Gaussian copula with G(1, 1) and U(0, 1) univariate marginals. The horizontal axis includes values for the correlation for the bivariate normal distribution and the vertical axis presents the transformed correlation after the copula is constructed.The identity function (straight line) is included as reference.
between two random variables (S , T ) can be expressed as: where H(s, t) is their joint cumulative distribution function and F(s), G(t) are the marginal distribution functions of the random variables S and T respectively. Fréchet (1951) and Hoeffding (1940) independently proved that: and, by using these bounds in Equation (7) above, it can be shown that ρ min ≤ ρ ≤ ρ max , where ρ is the linear correlation between the marginal distributions. The implication of this inequality is that, given the distributional shape defined by F(s) and G(t), the correlation coefficient may not fully span the [-1, 1] theoretical range. For instance, Astivia and Zumbo (2017) (Joe, 2014;Nelsen, 2010). Although there is nothing that can expand the Fréchet-Hoeffding bounds to the full [-1, 1] range if the marginals are fixed, the intermediate correlation matrix approach described in Section 2.2 can be used to find the proper value for the correlation coefficient needed to initialize the Gaussian copula, if a specific population value is desired. As long as the value on this intermediate correlation is within the bounds specified by the non-normal marginals, the final correlation after the marginal transformation is completed will match the population parameter intended by the researcher.

Relationship of the NORTA method, the Vale-Maurelli algorithm and Gaussian copulas.
Gaussian copulas have other important connections to the simulation work done within the social sciences, notably, the fact that the NORTA method can be parameterized as a Gaussian copula (Qing, 2017). By extension, the Vale-Maurelli algorithm has also been proved to be a special case of Gaussian copulas so that the majority of simulation work conducted in the social sciences has really only considered the Gaussian copula as its test case (Foldnes & Grønneberg, 2015;Grønneberg & Foldnes, 2019). The same issues and limitations presented in Sections 2.2 and 2.3 are, in fact, exchangeable given that the data-generating methods considered in both cases share the same essential properties.

Distributions closed under linear transformation and their connection to simulating multivariate, non-normality
As presented in Section 2.1, one of the many attractive properties of the normal distribution is that the sum of independent, normal random variables is itself normally distributed. This property is known as being "closed under convolutions" (i.e., when one combines in a certain way or "convolves" random variables, the resulting random variable belongs to the same family as its original components. Through the use of this property, one can define the multivariate normal distribution by finding linear combinations (i.e., convolutions) of the one-dimensional normal marginals that will result in X n×d ∼ N(µ µ µ d×1 , Σ Σ Σ d×d ). Although not very common, this property is shared by some other probabil-ity distributions, making it the preferred starting point to defining multivariate generalizations of them. Continuous distributions such as the Cauchy and Gamma share this property and their multivariate extensions depend on it. The Gamma distribution is a particularly relevant case given its connection to other well-known probability distributions such as the exponential and the chi-square. If X 1 ∼ G(α 1 , β) and X 2 ∼ G(α 2 , β) then X 1 + X 2 ∼ G(α 1 + α 2 , β). Notice how the property of closeness is only true the rate parameter β is the same. The sum of two generic gamma distributions is not necessarily gamma-distributed (see Moschopoulos, 1985). For the interested reader, an introduction to the theory of gamma distributions can be found in Chapter 15 of Krishnamoorthy (2016).
As a motivating example to showcase a multivariate distribution that is not Gaussian, yet closed under convolutions, consider P and Q to be independently distributed Poisson random variables with parameters (λ P , λ Q ) respectively. If W = P + Q then W ∼ Poisson(λ W = λ P + λ Q ). By using this property, one can generalize the univariate Poisson distribution to multivariate spaces. Consider P, Q and V to be independent, Poisson distributed random variables with respective parameters λ P , λ Q and λ V . Define two new random variables P * and Q * as follows: Because P * and Q * share V in common, (P * , Q * ) exhibits Poisson-distributed, univariate marginal distributions with a covariance equal to λ V . Notice that this construction only allows for the case where the covariance between P * and Q * is positive because, by definition, the parameter λ of a Poisson distribution must be positive. Figure 4 shows the bivariate histogram of a simulated example with P ∼ Poisson(1), Q ∼ Poisson(2) and V ∼ Poisson(3). In R code: #Block 4 set.seed (124) ## Simulates independent Poisson random variables P <-rpois (100000 , lambda = 1) Q <-rpois (100000 , lambda = 1) V <-rpois (100000 , lambda = 3)

## Creates joint distribution with marginal
Poisson random variables Pstar <-P + V Qstar <-Q + V cov(Pstar , Qstar) [1] 2.969006 With the exception of the multivariate normal distribution, relying on the property of being closed under convolutions is not a widely used method to simulate non-normal data for the social sciences. Very few distributions have this property and, even among those that do, there may be further restrictions in place that limit either the type of distributions that can be generated or the dependency structures that can be modelled (Florescu, 2014). As shown in the example above, although one could use this method recursively to generate a multivariate Poisson distribution with a pre-specified covariance matrix, one is restricted to only positive covariances, given the limitation that all λ parameters must be positive. In spite of the lack of attention given to this simulation approach, the present section is intended to remind the reader that the properties of univariate distributions rarely generalize to multivariate spaces unaltered. If approaches similar to this are used without showing closedness under convolutions first, there will be a discrepancy between the simulation design and the actual implementation of the simulation method so that one can no longer be sure which exact distributions are being generated.

Alternative Approaches
There exist a variety of alternative approaches that were not considered in this overview but which are quickly gaining use and prominence within the simula-tion literature in the social sciences. Although not all of these methods share a common theoretical framework and the details of many of them are beyond the scope of the present manuscript, there is relevance in mentioning them for the interested reader. Particularly because they can be used to explore simulation conditions beyond the skewness and kurtosis of the univariate distributions, which is the general approach that permeate the simulation literature in the social sciences. Grønneberg and Foldnes (2017) and Mair, Satorra and Bentler (2012) have extended the copula distribution approaches beyond the Gaussian copula to help create more flexible, non-normal structures that induce different types of non-normalities, such as tail dependence or multivariate skew. Qu, Liu and Zhang (2019) have recently extended the Fleishman approach to multivariate higher moments and, to this day, it is one of the few methods available that allow researchers to set population values of Mardia's multivariate skewness and kurtosis, which allows researchers a certain degree of control over both univariate and multivariate non-normality. Ruscio and Kaczetow (2008) developed a sampling-and-iterate algorithm that allows one to simulate from any arbitrary number of distributions while keeping control of the correlational structure. Although not much is known about the theoretical properties of this method, it offers the advantage of allowing users to induce any level of correlation to an empirical datasets that they may have collected. Therefore, the user is given a choice to either simulate from theoretical distributions or from a particular dataset of interest. Auerswald and Moshagen (2015) as well as Mattson (1997) have considered the problem by restating it under a latent variable framework and inducing the non-normality through the latent variables. Methods like this would allow the users to more accurately control the distributional assumptions of latent variables, which could be of interest to researchers in Structural Equation Modelling or Item Response Theory. The work of Kowalchuk andHeadrick (2010), Pant andHeadrick (2013) and Koran, Headrick and Kuo (2015) has extended the properties of univariate, non-normal distributions such as the g-and-h family of distributions or the Burr family of distributions to multivariate spaces, in an attempt to allow researchers the flexibility to select from a wider collection of nonnormal structures. These methods share some similarities with the NORTA approaches in terms of generating multivariate non-normal data by manipulating the unidimensional marginals of the joint distribution. Nevertheless, most of this work uses L−moments as the coefficients that control the non-normality of the data, not the conventional 3 rd and 4 th order moments (i.e., skewness and kurtosis) familiar to most researchers. The creators of these methods offer readily-available software implementations of them, although not all are available in the same programming languages.
In spite of these modern advances, the NORTA approaches in general (and the 3 rd order polynomial transformation in particular) have dominated the robustness literature in simulation studies within psychology and the social sciences. As such, I present three recommendations that I believe would aid in the design, planning and reporting of simulation studies: Recommendations for the use of the 3 rd order polynomial method If the Fleishman (1978) method (for the univariate case) or the Vale and Maurelli (1983) multivariate extension are used in simulations, it would be beneficial to report both the transformation coefficients, {a, b, c, d} in Equation (4), and the intermediate correlation matrix assembled from Equation (5). For instance, Astivia and Zumbo (2015) conjectured and Foldnes and Grønneberg (2017) proved that the asymptotic covariance matrix used in the robust corrections to non-normality within SEM depends on these polynomial coefficients and the intermediate correlation matrix. Different sets of solutions create different asymptotic covariance matrices so that small-sample recommendations that use this simulation technique may be highly contingent on the type of data that could be generated through this approach. By reporting which coefficients were used, one can at least provide a clearer, more reproducible simulation for other researchers to interpret. For a concrete example, please see Sheng and Sheng (2012)'s "Study Design" section where the polynomial coefficients for each type of non-normality are listed.

Accounting for different types of multivariate nonnormality
Since there is an infinite way for multivariate distributions to deviate from the normality assumption (yet there is only one way for data to be multivariate normal), attempting different simulation methods may offer a more comprehensive view of what type of robustness properties the methods under investigation are sensitive to. Consider, for instance, Falk's (2018) simulation study investigating the performance of robust corrections to constructing confidence intervals within SEM. Three data-generating mechanisms were used for non-normal data: the Vale-Maurelli approach, contaminated normal and a Gumbel copula distribution. The contaminated normal and the Gumbel copula case had univariate distributions which were very close to the normal, yet the coverage for the confidence intervals was very poor, bringing into question recommendations (such as those in Finney & DeStefano, 2006) who argue that robust corrections for SEM models should be used based on the distributions of the indicators. Although the previous recommendations make sense in light of previous simulation work (which relied almost entirely on the 3 rd order polynomial approach) the results found in Falk (2018) help highlight that non-normalities in higher dimensions can still impact data analysis, even if researchers are unaware of these properties.

Justify and tailor your type of non-normality
It is crucial for methodologists and quantitative social scientists to be able to tailor their simulation results to the kind of scientific audience and research field they wish to inform. One of the main objections to any simulation study is its generalizability and how the findings presented and recommendations offered would fare if the simulation conditions were altered. As it stands today, whenever a simulation study exploring the issue of non-normality is conducted, the simulation design is usually informed by previous simulations and, for lack of a better term, "tradition". Take, for instance, the population values of skewness and excess kurtosis γ 1 = 2, γ 2 = 7 to denote "moderate" non-normality and γ 1 = 3, γ 2 = 21 for "extreme" non-normality. These values were originally used in Curran, West and Finch's (1996) simulation study on the influence that nonnormality exerts on the chi-squared test of fit for SEM. Since then, these (γ 1 , γ 2 ) (absolute) values, have appeared in Berkovits, Hancock and Nevitt (2000) (2017), there has not been much interest within the published literature in documenting both the type of univariate and multivariate distributions that are commonly found in our areas of research. At this point in time, I would argue that we, as methodologists, do not have a good sense of whether the type of data we simulate in our studies is reflective of the type of data that exists in the real world. An important solution to address this problem is the movement towards open science, reproduciblity and open data. Having access to raw data grants methodologists and quantitative researchers the ability to actually mimic the idiosyncrasies that applied researchers face every day and offer recommendations that address them directly. Becoming familiar with alternative modes of data-simulation would also help improve the generalizability of simulation results. As commented on Section 2.2.1, the 3 rd order polynomial approach to nonnormal simulation enjoys a considerable predilection amongst quantitative researchers, to the detriment of other approaches. Considering even just one other algorithm when conducting simulations would help alleviate this limitation and encompass a wider class of non-normalities than what one can find the the 3 rd order polynomial method. Finally, quantitative methodologists may benefit from using population parameters found in the literature. Relying on previously-published simulations to choose effect sizes is the current, "standard" which may limit recommendations that do not necessarily match different areas of research. If one simulates something it should at least attempt to emulate real life, not other simulations.

The meta-scientific investigation of simulation studies
The goal of a considerable amount of simulation research in the social sciences is to provide guidelines and best practice recommendations for data analysis under violation of distributional assumptions. Because of this, it is of utmost importance that applied researchers also become familiar (to a certain degree) with how quantitative methodologists conduct their research to be able to understand whether or not their recommendations are relevant to the analyses they may conduct. Ideally, if an applied researcher is unfamiliar with the methodological literature yet looks for a better understanding of how simulation results may aid in their analyses, they should consider consulting with a quantitative expert. Much like in the case of applied research, every simulation study can have their own idiosyncrasies and design peculiarities that require a more nuanced understanding of what the original authors presented, and spotting potential gaps on a simulation design usually requires a certain degree of mathematical sophistication that, although desirable, is not usually a required skill amongst applied researchers. Perhaps the easiest way in which these issues can be conceptualized for applied researchers is by considering simulation studies as the analog of empirical, experimental work as opposed to formal mathematical argumentation.
Although simulation studies would ideally go handin-hand with the statistical and mathematical theory that lends legitimacy to their results, enough examples and case studies have been provided in the previous sections of this article to highlight the fact that this is not often the case. Just as "methods effects" can intro-duce unnecessary noise and uncertainty when conducting experiments, operating from a "black box" perspective when conducting simulation studies can also create a disconnection between the theory and the design of a simulation study. The fact of the matter is that almost no research exists that attempts to analyze and validate the current practices of computer simulation studies. Whereas the movement towards open, reproducible science has yielded important insights into questionable research practices like p-hacking or researcher degrees of freedom, quantitative fields have remained virtually unexplored, due perhaps to their technical nature and the fact that a more solid theoretical foundation in statistics is needed in order to recognize the issues presented. A meta-scientific study of the theory and practice of simulation studies is desperately needed in order to begin to understand the types of questions and answers that are presented within the quantitative fields of the social sciences. It is my sincere hope that by offering this overview, researchers can begin to familiarize themselves with some of the methodological and epistemological issues that computer simulations pose and open a dialogue between methodological and applied researchers in an area that is usually restricted to the most technically-minded amongst us.