mvmapper: Interactive spatial mapping of genetic structures

Characterizing genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata are not always easily integrated into these methods in a user‐friendly fashion. Here, we present a deployable Python‐based web‐tool, mvmapper, for visualizing and exploring results of multivariate analyses in geographic space. This tool can be used to map results of virtually any multivariate analysis of georeferenced data, and routines for exporting results from a number of standard methods have been integrated in the R package adegenet, including principal components analysis (PCA), spatial PCA, discriminant analysis of principal components, principal coordinates analysis, nonmetric dimensional scaling and correspondence analysis. mvmapper's greatest strength is facilitating dynamic and interactive exploration of the statistical and geographic frameworks side by side, a task that is difficult and time‐consuming with currently available tools. Source code and deployment instructions, as well as a link to a hosted instance of mvmapper, can be found at https://popphylotools.github.io/mvMapper/.

Multivariate analyses stand out as powerful tools for summarizing genetic variability (Jombart, Pontier, & Dufour, 2009). A wide diversity of such methods exist, each with their own particular applications (reviewed in Jombart et al., 2009). As a whole, these statistics provide many analytical advantages for population genetics, including, but not limited to, few overarching assumptions regarding the data (e.g., Hardy-Weinberg expectations and linkage equilibria, which can mask subtle clinal population structure ), low computational requirements for the analysis of large data sets (e.g., thousands of markers and individuals (Jombart & Ahmed, 2011;Patterson, Price, & Reich, 2006)) and the statistical flexibility to address complex population genetic questions (Jombart et al., 2009 and references therein).
While some methods explicitly incorporate geographic information (e.g., spatial principal components analysis (sPCA)  and spatial correspondence analysis (Dray, Sa€ ıd, & D ebias, 2008)) and provide valuable geographic context to population genetic data, nonspatial analyses also benefit from visualization in geographic space (Cavalli-Sforza, Menozzi, & Piazza, 1994;Wang, Zollner, & Rosenberg, 2012). However, incorporating geographic context into multivariate analyses often requires the laborious comparison of ordination plots to maps of sampling localities, or technical expertise in map-making or geographic information systems (GIS) that may be beyond the comfort zone of the average researcher. While some streamlined tools exist for specific geographic visualizations (e.g., the Geography of Genetic Variants browser (Marcus & Novembre, 2017)), generalized tools for straightforward visualization are lacking.
Here, we present a tool for the visualization and exploration of multivariate analyses in geographic space. MVMAPPER is a Pythonbased, deployable web-based tool that can process outputs of virtually any multivariate analysis as well as sample locality information and allows users to interactively explore the statistical framework of the multivariate analysis in both ordination and geographic space ( Figure 1). The input format is a simple comma-delimited tabular file that can either be assembled manually, or generated using MVMAPPER's input generation function in the ADEGENET library (Jombart, 2008) in R (R Core Team 2016), giving access to a wide range of commonly used methods. | 363 automated data preparation script is implemented in the ADEGENET library (Jombart, 2008) in R (R Core Team 2016). Links to MVMAPPER's source code, documentation, a ready to deploy Docker container (Merkel, 2014, see https://www.docker.com/) and a hosted instance of the web application can be found on our project page at https:// popphylotools.github.io/mvMapper/. Although deploying a standalone instance of MVMAPPER provides a great deal of flexibility through the customization of the configuration file (default displayed statistical parameters, data set, etc.), here, we generally refer to the default configuration available on our hosted instance. All modern desktop web browsers support MVMAPPER.

| Data input
The primary input for MVMAPPER is a comma-delimited tabular file that contains individuals in rows and information about those individuals in columns. A typical file contains columns such as specimen identification code (we refer to this unique identifier as key), collection locality information (latitude and longitude, or lat and lon, using the WGS84 (EPSG:4326) geographic coordinate system), a population identifier, results of the multivariate analysis (specimen coordinates across multiple dimensions of an analysis, e.g., principal components) and any other metadata related to the specimens (sex, host, morphological characteristics, etc.). Given that many of these analyses are conducted in R (R Core Team 2016), we have incorporated a data preparation function to the widely used R library ADEGENET (Jombart, 2008). This function, export_to_mvmapper, combines an active R object from a multivariate analysis with locality information for each specimen. Currently, multivariate analyses conducted in ADEGENET and those based on the duality diagram (dudi.* functions) in ADE4 (Dray & Dufour, 2007) are supported, including: sPCA and discriminant analysis of principal components (DAPC: Jombart, Devillard, and Balloux (2010)) in ADEGENET and principal components analysis (PCA), principal coordinates analysis (PCoA), nonmetric dimensional scaling (NMDS), correspondence analysis (CA) and others in ADE4. Locality information is then incorporated into the multivariate analysis through another R object. This is most easily done by preparing an additional file with at least three columns, key, lat and lon, where key matches the unique individual identifiers used in the multivariate analysis. After reading this locality file into R, ex-port_to_mvmapper will combine the two R objects (the multivariate analysis and the locality information) into MVMAPPER input format and automatically write the output to a comma-delimited file.
Locality information can be incorporated via other means (e.g., when latitude and longitude are already part of a genind object), however the advantage of creating an additional file, as described here, is that any additional specimen-based information can be included in that file (named localities.csv in the following example), such as specimen sex, host information, and morphological or ecological characters. Alternatively, rather than using ex-port_to_mvmapper, the input data file can be generated manually from results of multivariate analyses in different programs or R libraries, as the tabular format is general and user-friendly.
Below we provide an example of data preparation from a DAPC, which in addition to standard multivariate analysis results (distribution of individuals along principal components) provide additional components recognized by MVMAPPER, such as membership to a priori-assigned and DAPC-assigned groups, and the posterior probabilities of the DAPC-assigned groups. By default, MVMAPPER is configured to display the microsatellite data set of Rosenberg et al. (2005) from the example above. Users can upload their own data sets through the upload tab linked in the navigation bar at the top of the page (Figure 1, top). Files uploaded in this manner are named using an alphanumeric random string that is integrated into the web address used to select that data set; users can return to a previously uploaded data set using its unique web address until it expires after 14 days.

| Interface and functionality
The main interface of MVMAPPER consists of three components: a statistical panel, a mapping panel and a metadata panel (Figure 1).
Aspects of these panels are linked, so that, for example, selecting points can be separated with a jitter function, and the zoom tool is dynamic: zooming in or out will access finer-scale or coarser-scale map tiles with more or less detail, respectively (e.g., labelling countries, cities, roads or other scale-appropriate geographic features).
This allows MVMAPPER to function at both global and local geographic scales ( Figure 2c). Selecting individuals in either the statistical or mapping panel displays their metadata in the lower panel, which can be sorted by clicking on column headers. Selected data can also be downloaded (as a comma-delimited file) to facilitate downstream analysis, for example re-analysis of individual population groups or hierarchical analysis (V€ ah€ a, Erkinaro, Niemela, & Primmer, 2007).

| DISCUSSION
Visualizing population structure across geographic space is fundamental to most population genetic studies. However, combining multiple "data wrangling" tools (Kandel et al., 2011), including population genetic data processing, multivariate analysis and particularly mapmaking or GIS, is a time-consuming, error-prone and generally daunting task (e.g., Fletcher-Lartey & Caprarelli, 2016;Rickles & Ellul, 2014;Sipe & Dale, 2003). MVMAPPER greatly facilitates this process by providing an accessible, open-access, user-friendly interface for exploring and visualizing results of multivariate analysis in geographic space, and perhaps most importantly facilitates dynamic and interactive exploration of these spaces. Interactivity, in particular, is key to enable users to quickly assess the geographic patterns of any combinations of principal components, population groupings, additional statistical parameters (assignments to groups based on discriminant functions in DAPC or lag-vectors of principal components in sPCA) and any other specimen-based metadata with a few mouse clicks in the dropdown menus to the left of the statistical panel. Given these characteristics, we envision MVMAPPER to be of wide interest to a broad range of researchers as well as for teaching and training purposes. Additionally, MVMAPPER's highly generalized and modular approach allows it to be modified for more specific uses; for example, including metadata corresponding to whether specimens of an invasive species were collected in its native vs. introduced range allows MVMAPPER to become a tool for source determination of intercepted material (Roderick, 2004).