Filter by type:

Sort by year:

Optimal stopping of Gauss–Markov bridges

Azze, A., D'Auria, B., and García-Portugués, E. (2025).
Journal paper Advances in Applied Probability, to appear.

Abstract

We solve the non-discounted, finite-horizon optimal stopping problem of a Gauss–Markov bridge by using a time-space transformation approach. The associated optimal stopping boundary is proved to be Lipschitz continuous and differentiable anywhere away from the horizon, and it is characterized by the unique solution of an integral equation. A Picard iteration algorithm is discussed and implemented to exemplify the numerical computation and geometry of the optimal stopping boundary for some illustrative cases.

Optimal stopping of an Ornstein–Uhlenbeck bridge

Azze, A., D'Auria, B., and García-Portugués, E. (2024).
Journal paper Stochastic Processes and their Applications, 172:104342. doi:10.1016/j.spa.2024.104342.

Abstract

In this paper we make a rigorous analysis of the existence and characterization of the free boundary related to the optimal stopping problem that maximizes the mean of an OrnsteinUhlenbeck bridge. The result includes the Brownian bridge problem as a limit case. The methodology hereby presented relies on a time-space transformation that casts the original problem into a more tractable one with an infinite horizon and a Brownian motion underneath. We comment on two different numerical algorithms to compute the free-boundary equation and discuss illustrative cases that shed light on the boundary’s shape. In particular, the free boundary generally does not share the monotonicity of the Brownian bridge case.

Nonparametric tests of independence for circular data based on trigonometric moments

García-Portugués, E., Lafaye de Micheaux, P., Meintanis, S. G., and Verdebout, T. (2024).
Journal paper Statistica Sinica, 34(2):567–588. doi:10.5705/ss.202021.0416.

Abstract

We introduce nonparametric tests of independence for bivariate circular data based on trigonometric moments. Our contributions lie in (i) proposing nonparametric tests that are locally and asymptotically optimal against bivariate cosine von Mises alternatives and (ii) extending these tests, via the empirical characteristic function, to obtain consistent tests against broader sets of alternatives, eventually being omnibus. We thus provide a collection of trigonometric-based tests of varying generality and known optimalities. The large-sample behaviours of the tests under the null and alternative hypotheses are obtained, while simulations show that the new tests are competitive against previous proposals. Two data applications in astronomy and forest science illustrate the usage of the tests.

Testing for linearity in scalar on function regression with responses missing at random

Febrero-Bande, M., Galeano, P., García-Portugués, E., and González-Manteiga, W. (2024).
Journal paper Computational Statistics, to appear. doi:10.1007/s00180-023-01445-2.

Abstract

A goodness-of-fit test for the Functional Linear Model with Scalar Response (FLMSR) with responses Missing at Random (MAR) is proposed in this paper. The test statistic relies on a marked empirical process indexed by the projected functional covariate and its distribution under the null hypothesis is calibrated using a wild bootstrap procedure. The computation and performance of the test rely on having an accurate estimator of the functional slope of the FLMSR when the sample has MAR responses. Three estimation methods based on the Functional Principal Components (FPCs) of the covariate are considered. First, the simplified method estimates the functional slope by simply discarding observations with missing responses. Second, the imputed method estimates the functional slope by imputing the missing responses using the simplified estimator. Third, the inverse probability weighted method incorporates the missing response generation mechanism when imputing. Furthermore, both cross-validation and LASSO regression are used to select the FPCs used by each estimator. Several Monte Carlo experiments are conducted to analyze the behavior of the testing procedure in combination with the functional slope estimators. Results indicate that estimators performing missing-response imputation achieve the highest power. The testing procedure is applied to check for linear dependence between the average number of sunny days per year and the mean curve of daily temperatures at weather stations in Spain.

Optimal exercise of American options under time-dependent Ornstein–Uhlenbeck processes

Azze, A., D'Auria, B., and García-Portugués, E. (2024).
Journal paper Stochastics, 96(1):921–946. doi:10.1080/17442508.2024.2325402.

Abstract

We study the barrier that gives the optimal time to exercise an American option written on a time-dependent Ornstein–Uhlenbeck process, a diffusion often adopted by practitioners to model commodity prices and interest rates. By framing the optimal exercise of the American option as a problem of optimal stopping and relying on probabilistic arguments, we provide a non-linear Volterra-type integral equation characterizing the exercise boundary, develop a novel comparison argument to derive upper and lower bounds for such a boundary, and prove its Lipschitz continuity in any closed interval that excludes the expiration date and, thus, its differentiability almost everywhere. We implement a Picard iteration algorithm to solve the Volterra integral equation and show illustrative examples that shed light on the boundary’s dependence on the process’s drift and volatility.

High expectations on phase locking: Better quantifying the concentration of circular data

Andrzejak, R. G., Espinoso, A., García-Portugués, E., Pewsey, A., Epifanio, J., Leguia, M. G., and Schindler, K. (2023).
Journal paper Chaos, 33(9):091106. doi:10.1063/5.0166468.

Abstract

The degree to which unimodal circular data are concentrated around the mean direction can be quantified using the mean resultant length, a measure known under many alternative names, such as the phase locking value or the Kuramoto order parameter. For maximal concentration, achieved when all of the data take the same value, the mean resultant length attains its upper bound of one. However, for a random sample drawn from the circular uniform distribution, the expected value of the mean resultant length achieves its lower bound of zero only as the sample size tends to infinity. Moreover, as the expected value of the mean resultant length depends on the sample size, bias is induced when comparing the mean resultant lengths of samples of different sizes. In order to ameliorate this problem, here, we introduce a re-normalized version of the mean resultant length. Regardless of the sample size, the re-normalized measure has an expected value that is essentially zero for a random sample from the circular uniform distribution, takes intermediate values for partially concentrated unimodal data, and attains its upper bound of one for maximal concentration. The re-normalized measure retains the simplicity of the original mean resultant length and is, therefore, easy to implement and compute. We illustrate the relevance and effectiveness of the proposed re-normalized measure for mathematical models and electroencephalographic recordings of an epileptic seizure.

On new omnibus tests of uniformity on the hypersphere

Fernández-de-Marcos, A. and García-Portugués, E. (2023).
Journal paper Test, 32(4):1508–1529. doi:10.1007/s11749-023-00882-x.

Abstract

Two new omnibus tests of uniformity for data on the hypersphere are proposed. The new test statistics exploit closed-form expressions for orthogonal polynomials, feature tuning parameters, and are related to a "smooth maximum" function and the Poisson kernel. We obtain exact moments of the test statistics under uniformity and rotationally symmetric alternatives, and give their null asymptotic distributions. We consider approximate oracle tuning parameters that maximize the power of the tests against known generic alternatives and provide tests that estimate oracle parameters through cross-validated procedures while maintaining the significance level. Numerical experiments explore the effectiveness of null asymptotic distributions and the accuracy of inexpensive approximations of exact null distributions. A simulation study compares the powers of the new tests with other tests of the Sobolev class, showing the benefits of the former. The proposed tests are applied to the study of the (seemingly uniform) nursing times of wild polar bears.

Toroidal PCA via density ridges

García-Portugués, E. and Prieto-Tirado, A. (2023).
Journal paper Statistics and Computing, 33(5):107. doi:10.1007/s11222-023-10273-9.

Abstract

Principal Component Analysis (PCA) is a well-known linear dimension-reduction technique designed for Euclidean data. In a wide spectrum of applied fields, however, it is common to observe multivariate circular data (also known as toroidal data), rendering spurious the use of PCA on it due to the periodicity of its support. This paper introduces Toroidal Ridge PCA (TR-PCA), a novel construction of PCA for bivariate circular data that leverages the concept of density ridges as a flexible first principal component analog. Two reference bivariate circular distributions, the bivariate sine von Mises and the bivariate wrapped Cauchy, are employed as the parametric distributional basis of TR-PCA. Efficient algorithms are presented to compute density ridges for these two distribution models. A complete PCA methodology adapted to toroidal data (including scores, variance decomposition, and resolution of edge cases) is introduced and implemented in the companion R package ridgetorus. The usefulness of TR-PCA is showcased with a novel case study involving the analysis of ocean currents on the coast of Santa Barbara.

Data-driven stabilizations of goodness-of-fit tests

Fernández-de-Marcos, A. and García-Portugués, E. (2023).
Journal paper Computational Statistics and Data Analysis, 179:107653. doi:10.1016/j.csda.2022.107653.

Abstract

Exact null distributions of goodness-of-fit test statistics are generally challenging to obtain in tractable forms. Practitioners are therefore usually obliged to rely on asymptotic null distributions or Monte Carlo methods, either in the form of a lookup table or carried out on demand, to apply a goodness-of-fit test. There exist simple and useful transformations of several classic goodness-of-fit test statistics that stabilize their exact-n critical values for varying sample sizes n. However, detail on the accuracy of these and subsequent transformations in yielding exact p-values, or even deep understanding on the derivation of several transformations, is still scarce nowadays. The latter stabilization approach is explained and automated to (\textit{i}) expand its scope of applicability and (\textit{ii}) yield upper-tail exact p-values, as opposed to exact critical values for fixed significance levels. Improvements on the stabilization accuracy of the exact null distributions of the Kolmogorov–Smirnov, Cramér–von Mises, Anderson–Darling, Kuiper, and Watson test statistics are shown. In addition, a parameter-dependent exact-n stabilization for several novel statistics for testing uniformity on the hypersphere of arbitrary dimension is provided. A data application in astronomy illustrates the benefits of the advocated stabilization for quickly analyzing small-to-moderate sequentially-measured samples.

Scaled torus principal component analysis

Zoubouloglou, P., García-Portugués, E., and Marron, J. S. (2023).
Journal paper Journal of Computational and Graphical Statistics, 32(3)1024–1035. doi:10.1080/10618600.2022.2119985.

Abstract

A particularly challenging context for dimensionality reduction is multivariate circular data, i.e., data supported on a torus. Such kind of data appears, e.g., in the analysis of various phenomena in ecology and astronomy, as well as in molecular structures. This paper introduces Scaled Torus Principal Component Analysis (ST-PCA), a novel approach to perform dimensionality reduction with toroidal data. ST-PCA finds a data-driven map from a torus to a sphere of the same dimension and a certain radius. The map is constructed with multidimensional scaling to minimize the discrepancy between pairwise geodesic distances in both spaces. ST-PCA then resorts to principal nested spheres to obtain a nested sequence of subspheres that best fits the data, which can afterwards be inverted back to the torus. Numerical experiments illustrate how ST-PCA can be used to achieve meaningful dimensionality reduction on low-dimensional torii, particularly with the purpose of clusters separation, while two data applications in astronomy (three-dimensional torus) and molecular biology (on a seven-dimensional torus) show that ST-PCA outperforms existing methods for the investigated datasets.

On a projection-based class of uniformity tests on the hypersphere

García-Portugués, E., Navarro-Esteban, P., and Cuesta-Albertos, J. A. (2023).
Journal paper Bernoulli, 29(1):181–204. doi:10.3150/21-BEJ1454.

Abstract

We propose a projection-based class of uniformity tests on the hypersphere using statistics that integrate, along all possible directions, the weighted quadratic discrepancy between the empirical cumulative distribution function of the projected data and the projected uniform distribution. Simple expressions for several test statistics are obtained for the circle and sphere, and relatively tractable forms for higher dimensions. Despite its different origin, the proposed class is shown to be related with the well-studied Sobolev class of uniformity tests. Our new class proves itself advantageous by allowing to derive new tests for hyperspherical data that neatly extend the circular tests by Watson, Ajne, and Rothman, and by introducing the first instance of an Anderson-Darling-like test for such data. The asymptotic distributions and the local optimality against certain alternatives of the new tests are obtained. A simulation study evaluates the theoretical findings and evidences that, for certain scenarios, the new tests are competitive against previous proposals. The new tests are employed in three astronomical applications.

Recent advances in directional statistics

Pewsey, A. and García-Portugués, E. (2021).
Journal paper Test, 30(1):1–58. doi:10.1007/s11749-021-00759-x. This is an invited paper with a discussion, rejoinder, and a companion BibTeX file that collects virtually all the contributions related to Directional Statistics up to the publication date.

Abstract

Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, space situational awareness, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.

A goodness-of-fit test for the functional linear model with functional response

García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G., and González-Manteiga, W. (2021).
Journal paper Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486.

Abstract

The Functional Linear Model with Functional Response (FLMFR) is one of the most fundamental models to assess the relation between two functional random variables. In this paper, we propose a novel goodness-of-fit test for the FLMFR against a general, unspecified, alternative. The test statistic is formulated in terms of a Cramér–von Mises norm over a doubly-projected empirical process which, using geometrical arguments, yields an easy-to-compute weighted quadratic norm. A resampling procedure calibrates the test through a wild bootstrap on the residuals and the use convenient computational procedures. As a sideways contribution, and since the statistic requires a reliable estimator of the FLMFR, we discuss and compare several regularized estimators, providing a new one specifically convenient for our test. The finite sample behavior of the test is illustrated via a simulation study. Also, the new proposal is compared with previous significance tests. Two novel real datasets illustrate the application of the new test.

On optimal tests for rotational symmetry against new classes of hyperspherical distributions

García-Portugués, E., Paindaveine, D., and Verdebout, T. (2020).
Journal paper Journal of the American Statistical Association, 115(532):1873–1887. doi:10.1080/01621459.2019.1665527.

Abstract

Motivated by the central role played by rotationally symmetric distributions in directional statistics, we consider the problem of testing rotational symmetry on the hypersphere. We adopt a semiparametric approach and tackle problems where the location of the symmetry axis is either specified or unspecified. For each problem, we define two tests and study their asymptotic properties under very mild conditions. We introduce two new classes of directional distributions that extend the rotationally symmetric class and are of independent interest. We prove that each test is locally asymptotically maximin, in the Le Cam sense, for one kind of the alternatives given by the new classes of distributions, both for specified and unspecified symmetry axis. The tests, aimed to detect location-like and scatter-like alternatives, are combined into convenient hybrid tests that are consistent against both alternatives. We perform Monte Carlo experiments that illustrate the finite-sample performances of the proposed tests and their agreement with the asymptotic results. Finally, the practical relevance of our tests is illustrated on a real data application from astronomy. The R package rotasym implements the proposed tests and allows practitioners to reproduce the data application.

Discounted optimal stopping of a Brownian bridge, with application to American options under pinning

D'Auria, B., García-Portugués, E., and Guada-Azze, A. (2020).
Journal paper Mathematics, 8(7):1159. doi:10.3390/math8071159.

Abstract

Mathematically, the execution of an American-style financial derivative is commonly reduced to solving an optimal stopping problem. Breaking the general assumption that the knowledge of the holder is restricted to the price history of the underlying asset, we allow for the disclosure of future information about the terminal price of the asset by modeling it as a Brownian bridge. This model may be used under special market conditions, in particular we focus on what in the literature is known as the "pinning effect", that is, when the price of the asset approaches the strike price of a highly-traded option close to its expiration date. Our main mathematical contribution is in characterizing the solution to the optimal stopping problem when the gain function includes the discount factor. We show how to numerically compute the solution and we analyze the effect of the volatility estimation on the strategy by computing the confidence curves around the optimal stopping boundary. Finally, we compare our method with the optimal exercise time based on a geometric Brownian motion by using real data exhibiting pinning.

Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes

Cuesta-Albertos, J. A., García-Portugués, E., Febrero-Bande, M., and González-Manteiga, W. (2019).
Journal paper The Annals of Statistics, 47(1):439–467. doi:10.1214/18-AOS1693.

Abstract

We consider marked empirical processes indexed by a randomly projected functional covariate to construct goodness-of-fit tests for the functional linear model with scalar response. The test statistics are built from continuous functionals over the projected process, resulting in computationally efficient tests that exhibit root-n convergence rates and circumvent the curse of dimensionality. The weak convergence of the empirical process is obtained conditionally on a random direction, whilst the almost surely equivalence between the testing for significance expressed on the original and on the projected functional covariate is proved. The computation of the test in practice involves calibration by wild bootstrap resampling and the combination of several p-values, arising from different projections, by means of the false discovery rate method. The finite sample properties of the tests are illustrated in a simulation study for a variety of linear models, underlying processes, and alternatives. The software provided implements the tests and allows the replication of simulations and data applications.

Langevin diffusions on the torus: estimation and applications

García-Portugués, E., Sørensen, M., Mardia, K. V., and Hamelryck, T. (2019).
Journal paper Statistics and Computing, 29(1):1–22. doi:10.1007/s11222-017-9790-2.

Abstract

We introduce stochastic models for continuous-time evolution of angles and develop their estimation. We focus on studying Langevin diffusions with stationary distributions equal to well-known distributions from directional statistics, since such diffusions can be regarded as toroidal analogues of the Ornstein-Uhlenbeck process. Their likelihood function is a product of transition densities with no analytical expression, but that can be calculated by solving the Fokker-Planck equation numerically through adequate schemes. We propose three approximate likelihoods that are computationally tractable: (i) a likelihood based on the stationary distribution; (ii) toroidal adaptations of the Euler and Shoji-Ozaki pseudo-likelihoods; (iii) a likelihood based on a specific approximation to the transition density of the wrapped normal process. A simulation study compares, in dimensions one and two, the approximate transition densities to the exact ones, and investigates the empirical performance of the approximate likelihoods. Finally, two diffusions are used to model the evolution of the backbone angles of the protein G (PDB identifier 1GB1) during a molecular dynamics simulation. The software package sdetorus implements the estimation methods and applications presented in the paper.

Distance weighted discrimination of face images for gender classification

Benito, M., García-Portugués, E., Marron, J. S., and Peña, D. (2017).
Journal paper Stat, 6(1):2049–1573. doi:10.1002/sta4.151.

Abstract

We illustrate the advantages of distance weighted discrimination for classification and feature extraction in a High Dimension Low Sample Size (HDLSS) situation. The HDLSS context is a gender classification problem of face images in which the dimension of the data is several orders of magnitude larger than the sample size. We compare distance weighted discrimination with Fisher's linear discriminant, support vector machines, and principal component analysis by exploring their classification interpretation through insightful visuanimations and by examining the classifiers' discriminant errors. This analysis enables us to make new contributions to the understanding of the drivers of human discrimination between males and females.

A generative angular model of protein structure evolution

Golden, M., García-Portugués, E., Sørensen, M., Mardia, K. V., Hamelryck, T., and Hein, J. (2017).
Journal paper Molecular Biology and Evolution, 34(8):2085–2100. doi:10.1093/molbev/msx137. Corrigendum.

Abstract

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both "smooth" conformational changes and "catastrophic" conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.

Testing parametric models in linear-directional regression

García-Portugués, E., Van Keilegom, I., Crujeiras, R. M., and González-Manteiga, W. (2016).
Journal paper Scandinavian Journal of Statistics, 43(4):1178–1191. doi:10.1111/sjos.12236.

Abstract

This paper presents a goodness-of-fit test for parametric regression models with scalar response and directional predictor, that is, a vector on a sphere of arbitrary dimension. The testing procedure is based on the weighted squared distance between a smooth and a parametric regression estimator, where the smooth regression estimator is obtained by a projected local approach. Asymptotic behavior of the test statistic under the null hypothesis and local alternatives is provided, jointly with a consistent bootstrap algorithm for application in practice. A simulation study illustrates the performance of the test in finite samples. The procedure is applied to test a linear model in text mining.

Central limit theorems for directional and linear random variables with applications

García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. (2015).
Journal paper Statistica Sinica, 25(3):1207–1229. doi:10.5705/ss.2014.153.

Abstract

A central limit theorem for the integrated squared error of the directional-linear kernel density estimator is established. The result enables the construction and analysis of two testing procedures based on the squared loss: a nonparametric independence test for directional and linear random variables and a goodness-of-fit test for parametric families of directional-linear densities. Limit distributions for both test statistics, and a consistent bootstrap strategy for the goodness-of-fit test, are developed for the directional-linear case and adapted to the directional-directional setting. Finite sample performance for the goodness-of-fit test is illustrated in a simulation study. This test is also applied to real datasets from biology and environmental sciences.

A test for directional-linear independence, with applications to wildfire orientation and size

García-Portugués, E., Barros, A. M. G., Crujeiras, R. M., González-Manteiga, W., and Pereira, J. M. C. (2014).
Journal paper Stochastic Environmental Research and Risk Assessment, 28(5):1261–1275. doi:10.1007/s00477-013-0819-6.

Abstract

The relation between wildfire orientation and size is analyzed by means of a nonparametric test for directional-linear independence. The test statistic is designed for assessing the independence between two random variables of different nature, specifically directional (fire orientation, circular or spherical, as particular cases) and linear (fire size measured as burnt area, scalar), based on a directional-linear nonparametric kernel density estimator. In order to apply the proposed methodology in practice, a resampling procedure based on permutations and bootstrap is provided. The finite sample performance of the test is assessed by a simulation study, comparing its behavior with other classical tests for the circular-linear case. Finally, the test is applied to analyze wildfire data from Portugal.

A goodness-of-fit test for the functional linear model with scalar response

García-Portugués, E., González-Manteiga, W., and Febrero-Bande, M. (2014).
Journal paper Journal of Computational and Graphical Statistics, 23(3):761–778. doi:10.1016/j.jmva.2013.06.009.

Abstract

In this work, a goodness-of-fit test for the null hypothesis of a functional linear model with scalar response is proposed. The test is based on a generalization to the functional framework of a previous one, designed for the goodness-of-fit of regression models with multivariate covariates using random projections. The test statistic is easy to compute using geometrical and matrix arguments, and simple to calibrate in its distribution by a wild bootstrap on the residuals. The finite sample properties of the test are illustrated by a simulation study for several types of basis and under different alternatives. Finally, the test is applied to two datasets for checking the assumption of the functional linear model and a graphical tool is introduced. Supplementary materials are available online.

Exact risk improvement of bandwidth selectors for kernel density estimation with directional data

García-Portugués, E. (2013).
Journal paper Electronic Journal of Statistics, 7:1655–1685. doi:10.1214/13-EJS821. Corrigendum.

Abstract

New bandwidth selectors for kernel density estimation with directional data are presented in this work. These selectors are based on asymptotic and exact error expressions for the kernel density estimator combined with mixtures of von Mises distributions. The performance of the proposed selectors is investigated in a simulation study and compared with other existing rules for a large variety of directional scenarios, sample sizes and dimensions. The selector based on the exact error expression turns out to have the best behaviour of the studied selectors for almost all the situations. This selector is illustrated with real data for the circular and spherical cases.

Kernel density estimation for directional-linear data

García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. (2013).
Journal paper Journal of Multivariate Analysis, 121:152–175. doi:10.1016/j.jmva.2013.06.009. Corrigendum.

Abstract

A nonparametric kernel density estimator for directional-linear data is introduced. The proposal is based on a product kernel accounting for the different nature of both (directional and linear) components of the random vector. Expressions for the bias, variance, and Mean Integrated Squared Error (MISE) are derived, jointly with an asymptotic normality result for the proposed estimator. For some particular distributions, an explicit formula for the MISE is obtained and compared with its asymptotic version, both for directional and directional-linear kernel density estimators. In this same setting, a closed expression for the bootstrap MISE is also derived.

Exploring wind direction and SO2 concentration by circular-linear density estimation

García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. (2013).
Journal paper Stochastic Environmental Research and Risk Assessment, 27(5):1055–1067. doi:10.1007/s00477-012-0642-5.

Abstract

The study of environmental problems usually requires the description of variables with different nature and the assessment of relations between them. In this work, an algorithm for flexible estimation of the joint density for a circular–linear variable is proposed. The method is applied for exploring the relation between wind direction and SO2 concentration in a monitoring station close to a power plant located in Galicia (NW–Spain), in order to compare the effectiveness of precautionary measures for pollutants reduction in two different years.

Hippocampus shape analysis via skeletal models and kernel smoothing

García-Portugués, E. and Meilán-Vila, A. (2023).
Book chapter In Larriba, Y. (Ed.), Statistical Methods at the Forefront of Biomedical Advances, 63–82. Springer, Cham. doi:10.1007/978-3-031-32729-2_4.

Abstract

Skeletal representations (s-reps) have been successfully adopted to parsimoniously parametrize the shape of three-dimensional objects, and have been particularly employed in analyzing hippocampus shape variation. Within this context, we provide a fully-nonparametric dimension-reduction tool based on kernel smoothing for determining the main source of variability of hippocampus shapes parametrized by s-reps. The methodology introduces the so-called density ridges for data on the polysphere and involves addressing high-dimensional computational challenges. For the analyzed dataset, our model-free indexing of shape variability reveals that the spokes defining the sharpness of the elongated extremes of hippocampi concentrate the most variation among subjects.

A review of goodness-of-fit tests for models involving functional data

González-Manteiga, W., Crujeiras, R. M., and García-Portugués, E. (2023).
Book chapter In Balakrishnan, N., Gil, M. Á., Martín, N., Morales, D., and Pardo, M. C. (Eds.), Trends in Mathematical, Information and Data Sciences, 349–358. Springer, Cham. doi:10.1007/978-3-031-04137-2_29.

Abstract

A sizable amount of goodness-of-fit tests involving functional data have appeared in the last decade. We provide a relatively compact revision of most of these contributions, within the independent and identically distributed framework, by reviewing goodness-of-fit tests for distribution and regression models with functional predictor and either scalar or functional response.

A Cramér–von Mises test of uniformity on the hypersphere

García-Portugués, E., Navarro-Esteban, P., and Cuesta-Albertos, J. A. (2021).
Book chapter In Balzano, S., Porzio, G. C., Salvatore, R., Vistocco, D., and Vichi, M. (Eds.), Statistical Learning and Modeling in Data Analysis, 107–116. Springer, Cham. doi:10.1007/978-3-030-69944-4_12.

Abstract

Testing uniformity of a sample supported on the hypersphere is one of the first steps when analysing multivariate data for which only the directions (and not the magnitudes) are of interest. In this work, a projection-based Cramér–von Mises test of uniformity on the hypersphere is introduced. This test can be regarded as an extension of the well-known Watson test of circular uniformity to the hypersphere. The null asymptotic distribution of the test statistic is obtained and, via numerical experiments, shown to be tractable and practical. A novel study on the uniformity of the distribution of craters on Venus illustrates the usage of the test.

Goodness-of-fit tests for functional linear models based on integrated projections

García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G., and González-Manteiga, W. (2020).
Book chapter In Aneiros, G., Horová, I., Hušková, M., and Vieu, P. (Eds.), Functional and High-Dimensional Statistics and Related Fields, pp. 107–114. Springer, Cham. doi:10.1007/978-3-030-47756-1_15.

Abstract

Functional linear models are one of the most fundamental tools to assess the relation between two random variables of a functional or scalar nature. This contribution proposes a goodness-of-fit test for the functional linear model with functional response that neatly adapts to functional/scalar responses/predictors. In particular, the new goodness-of-fit test extends a previous proposal for scalar response. The test statistic is based on a convenient regularized estimator, is easy to compute, and is calibrated through an efficient bootstrap resampling. A graphical diagnostic tool, useful to visualize the deviations from the model, is introduced and illustrated with a novel data application. The R package goffda implements the proposed methods and allows for the reproducibility of the data application.

Toroidal diffusions and protein structure evolution

García-Portugués, E., Golden, M., Sørensen, M., Mardia, K. V., Hamelryck, T., and Hein, J. (2018).
Book chapter In Ley, C. and Verdebout, T. (Eds.), Applied Directional Statistics, pp. 61–93. CRC Press, Boca Raton. doi:10.1201/9781315228570-12.

Abstract

This chapter shows how toroidal diffusions are convenient methodological tools for modelling protein evolution in a probabilistic framework. The chapter addresses the construction of ergodic diffusions with stationary distributions equal to well-known directional distributions, which can be regarded as toroidal analogues of the Ornstein–Uhlenbeck process. The important challenges that arise in the estimation of the diffusion parameters require the consideration of tractable approximate likelihoods and, among the several approaches introduced, the one yielding a specific approximation to the transition density of the wrapped normal process is shown to give the best empirical performance on average. This provides the methodological building block for Evolutionary Torus Dynamic Bayesian Network (ETDBN), a hidden Markov model for protein evolution that emits a wrapped normal process and two continuous-time Markov chains per hidden state. The chapter describes the main features of ETDBN, which allows for both "smooth" conformational changes and "catastrophic" conformational jumps, and several empirical benchmarks. The insights into the relationship between sequence and structure evolution that ETDBN provides are illustrated in a case study.

Smoothing-based tests with directional random variables

García-Portugués, E., Crujeiras, R. M., and González-Manteiga, W. (2018).
Book chapter In Gil, E., Gil, E., Gil, J., and Gil, M. A. (Eds.), The Mathematics of the Uncertain, pp. 175–184. Springer, Cham. doi:10.1007/978-3-319-73848-2_17.

Abstract

Testing procedures for assessing specific parametric model forms, or for checking the plausibility of simplifying assumptions, play a central role in the mathematical treatment of the uncertain. No certain answers are obtained by testing methods, but at least the uncertainty of these answers is properly quantified. This is the case for tests designed on the two most general data generating mechanisms in practice: distribution/density and regression models. Testing proposals are usually formulated on the Euclidean space, but important challenges arise in non-Euclidean settings, such as when directional variables (i.e., random vectors on the hypersphere) are involved. This work reviews some of the smoothing-based testing procedures for density and regression models that comprise directional variables. The asymptotic distributions of the revised proposals are presented, jointly with some numerical illustrations justifying the need of employing resampling mechanisms for effective test calibration.

sphunif: Uniformity Tests on the Circle, Sphere, and Hypersphere

García-Portugués, E. and Verdebout, T. (2024).

Abstract

Implementation of uniformity tests on the circle and (hyper)sphere. The main function of the package is unif_test(), which conveniently collects more than 35 tests for assessing uniformity on S^{p-1} = {x in R^p : ||x|| = 1}, p > 2. The test statistics are implemented in the unif_stat() function, which allows computing several statistics for different samples within a single call, thus facilitating Monte Carlo experiments. Furthermore, the unif_stat_MC() function allows parallelizing them in a simple way. The asymptotic null distributions of the statistics are available through the function unif_stat_distr(). The core of 'sphunif' is coded in C++ by relying on the 'Rcpp' package. The package also provides several novel datasets and gives the replicability for the data applications/simulations in García-Portugués et al. (2021) <doi:10.1007/978-3-030-69944-4_12>, García-Portugués et al. (2023) <doi:doi:10.3150/21-BEJ1454>, García-Portugués et al. (2024) <arXiv:2108.09874v2>, and Fernández-de-Marcos and García-Portugués (2024) <arXiv:2405.13531>.

ridgetorus: PCA on the Torus via Density Ridges

García-Portugués, E. and Prieto-Tirado, A. (2023).

Abstract

Implementation of a Principal Component Analysis (PCA) in the torus via density ridge estimation. The main function, ridge_pca(), obtains the relevant density ridge for bivariate sine von Mises and bivariate wrapped Cauchy distribution models and provides the associated scores and variance decomposition. Auxiliary functions for evaluating, fitting, and sampling these models are also provided. The package provides replicability to García-Portugués and Prieto-Tirado (2023) <doi:10.1007/s11222-023-10273-9>.

rotasym: Tests for Rotational Symmetry on the Hypersphere

García-Portugués, E., Paindaveine, D., and Verdebout, T. (2021).

Abstract

Implementation of the tests for rotational symmetry on the hypersphere proposed in García-Portugués, Paindaveine and Verdebout (2020) <doi:10.1080/01621459.2019.1665527>. The package also implements the proposed distributions on the hypersphere, based on the tangent-normal decomposition, and allows for the replication of the data application considered in the paper.

goffda: Goodness-of-Fit Tests for Functional Data

García-Portugués, E. and Álvarez-Liébana, J. (2021).

Abstract

Implementation of several goodness-of-fit tests for functional data. Currently, mostly related with the functional linear model with functional/scalar response and functional/scalar predictor. The package allows for the replication of the data applications considered in García-Portugués, Álvarez-Liébana, Álvarez-Pérez and González-Manteiga (2019) <arXiv:1909.07686>.

DirStats: Nonparametric Methods for Directional Data

García-Portugués, E. (2021).

Abstract

Nonparametric kernel density estimation, bandwidth selection, and other utilities for analyzing directional data. Implements the estimator in Bai, Rao and Zhao (1987) <doi:10.1016/0047-259X(88)90113-3>, the cross-validation bandwidth selectors in Hall, Watson and Cabrera (1987) <doi:10.1093/biomet/74.4.751> and the plug-in bandwidth selectors in García-Portugués (2013) <doi:10.1214/13-ejs821>.

sdetorus: Statistical Tools for Toroidal Diffusions

García-Portugués, E. (2021).

Abstract

Implementation of statistical methods for the estimation of toroidal diffusions. Several diffusive models are provided, most of them belonging to the Langevin family of diffusions on the torus. Specifically, the wrapped normal and von Mises processes are included, which can be seen as toroidal analogues of the Ornstein-Uhlenbeck diffusion. A collection of methods for approximate maximum likelihood estimation, organized in four blocks, is given: (i) based on the exact transition probability density, obtained as the numerical solution to the Fokker-Plank equation; (ii) based on wrapped pseudo-likelihoods; (iii) based on specific analytic approximations by wrapped processes; (iv) based on maximum likelihood of the stationary densities. The package allows the reproducibility of the results in García-Portugués et al. (2019) <doi:10.1007/s11222-017-9790-2>.

rp.flm.test: Goodness-of-fit Test for the Functional Linear Model with Scalar Response Based on Random Projections

García-Portugués, E. and Febrero-Bande, M. (2019).

Abstract

Goodness-of-fit test for the functional linear model with scalar response based on random projections. The package implements the method described in Cuesta-Albertos et al. (2019) <doi.org/10.1214/18-AOS1693>.

Kernel density estimation with polyspherical data and its applications

García-Portugués, E. and Meilán-Vila, A. (2024).
Preprint

Abstract

A kernel density estimator for data on the polysphere S^(d_1) × ··· × S^(d_r), with r,d_1,...,d_r ≥ 1, is presented in this paper. We derive the main asymptotic properties of the estimator, including mean square error, normality, and optimal bandwidths. We address the kernel theory of the estimator beyond the von Mises–Fisher kernel, introducing new kernels that are more efficient and investigating normalizing constants, moments, and sampling methods thereof. Plug-in and cross-validated bandwidth selectors are also obtained. As a spin-off of the kernel density estimator, we propose a nonparametric k-sample test based on the Jensen–Shannon divergence. Numerical experiments illuminate the asymptotic theory of the kernel density estimator and demonstrate the superior performance of the k-sample test with respect to parametric alternatives in certain scenarios. Our smoothing methodology is applied to the analysis of the morphology of a sample of hippocampi of infants embedded on the high-dimensional polysphere (S^2)^168 via skeletal representations (s-reps).

A stereographic test of spherical uniformity

Fernández-de-Marcos, A. and García-Portugués, E. (2024).
Preprint

Abstract

We introduce a test of uniformity for (hyper)spherical data motivated by the stereographic projection. The closed-form expression of the test statistic and its null asymptotic distribution are derived using Gegenbauer polynomials. The power against rotationally symmetric local alternatives is provided, and simulations illustrate the non-null asymptotic results. The stereographic test outperforms other tests in a testing scenario with antipodal dependence.

On a class of Sobolev tests for symmetry of directions, their detection thresholds, and asymptotic powers

García-Portugués, E., Paindaveine, D., and Verdebout, T. (2024).
Preprint

Abstract

We consider a class of symmetry hypothesis testing problems including testing isotropy on R^d and testing rotational symmetry on the hypersphere S^(d-1). For this class, we study the null and non-null behaviors of Sobolev tests, with emphasis on their consistency rates. Our main results show that: (i) Sobolev tests exhibit a detection threshold (see Bhattacharya, 2019, 2020) that does not only depend on the coefficients defining these tests; and (ii) tests with non-zero coefficients at odd (respectively, even) ranks only are blind to alternatives with angular functions whose $k$th-order derivatives at zero vanish for any k odd (even). Our non-standard asymptotic results are illustrated with Monte~Carlo exercises. A case study in astronomy applies the testing toolbox to evaluate the symmetry of orbits of long- and short-period comets.

An overview of uniformity tests on the hypersphere

García-Portugués, E. and Verdebout, T. (2018).
Preprint

Abstract

When modeling directional data, that is, unit-norm multivariate vectors, a first natural question is to ask whether the directions are uniformly distributed or, on the contrary, whether there exist modes of variation significantly different from uniformity. We review in this article a reasonably exhaustive collection of uniformity tests for assessing uniformity in the hypersphere. Specifically, we review the classical circular-specific tests, the large class of Sobolev tests with its many notable particular cases, some recent alternative tests, and novel results in the high-dimensional low-sample size case. A reasonably comprehensive bibliography on the topic is provided.

Bootstrap independence test for functional linear models

González-Manteiga, W., González-Rodríguez, G., Martínez-Calvo, A., García-Portugués, E. (2012).
Preprint

Abstract

Functional data have been the subject of many research works over the last years. Functional regression is one of the most discussed issues. Specifically, significant advances have been made for functional linear regression models with scalar response. Let (H, ·, ·) be a separable Hilbert space. We focus on the model Y = Θ, X+b+ε, where Y and ε are real random variables, X is an H-valued random element, and the model parameters b and Θ are in R and H, respectively. Furthermore, the error satisfies that E(ε|X)=0 and E(ε2|X)=σ2∞. A consistent bootstrap method to calibrate the distribution of statistics for testing H0: Θ=0 versus H1: Θ≠0 is developed. The asymptotic theory, as well as a simulation study and a real data application illustrating the usefulness of our proposed bootstrap in practice, is presented.

Nonparametric inference with directional and linear data

García-Portugués, E. Supervised by Wenceslao González Manteiga and Rosa M. Crujeiras.
PhD dissertationDefended on 12 December 2014. Grade: Sobresaliente cum laude (highest).

Jury

Ricardo Cao Abad (UDC), Arthur Richard Pewsey (UEX), Juan Carlos Pardo Fernández (UVIGO), Irène Gijbels (KU), and Alberto Rodríguez Casal (USC).

Reviewers

Ingrid van Keilegom (KU) and Christophe Ley (ULB).

A goodness-of-fit test for the functional linear model with scalar response

García-Portugués, E. Supervised by Wenceslao González Manteiga and Manuel Febrero Bande.
MSc dissertationDefended on 12 June 2012. Grade: Matrícula de Honor (highest).

Jury

Carmen M. Cadarso Suárez (USC), César A. Sánchez Sellero (USC), and M. Estela Sánchez Rodríguez (UVIGO).

Funciones cópula en probabilidad y estadística. Aplicaciones

García-Portugués, E. Supervised by Wenceslao González Manteiga.
BSc dissertation Defended on 1 December 2010. Grade: Sobresaliente (highest).

Jury

Wenceslao González Manteiga (USC), Juan José Nieto Roig (USC), and Rosa M. Crujeiras (USC).