tidyLPA update

Rationale for the update

tidyLPA has received a major update. Your old function calls might not work as expected anymore. The new version of tidyLPA aims to make Latent Profile Analysis (LPA) even more accessible to a broad audience, by means of a user-friendly interface, and to fit more seamlessly into a ‘tidyverse’ workflow.

As before, tidyLPA offers a parallel workflow for the open-source package Mclust, and the commercial package Mplus.

These changes necessitated some simplification of the existing functions. Advanced options are still available, but require slightly more work on the user’s part. The upside of this is that, for most users, the tidyLPA workflow and documentation are substantially simplified.

Major changes

There have been major changes to the two primary functions, estimate_profiles() and compare_solutions()

  • estimate_profiles() now estimates any number of profiles, for any number of models, using any algorithm. It has a simplified user interface. This makes it a more general function, of which the former version (which could only estimate one number of profiles for one model) is a special, restricted case.

  • compare_solutions() now compares solutions estimated by estimate_profiles(), suggesting the optimal model and number of classes. This replaces its dual role of estimating AND comparing solutions. *compare_solutions()gives you the same set of fit indices for Mclust and Mplus. It provides a broader range of fit indices than before *compare_solutions()callsAHP()``, a function which uses the Automatic Hierarchical Process to determine the optimal number of classes, based on several reputable fit indices

  • plot_profiles() now incorporates best practices for plotting latent profile analyses: Visualizing cluster centroids, standard errors, cluster variances, and classification uncertainty by means of weighted raw data

  • impute_missing() offers two easy-to-use single imputation options, for use with the Mclust package (which does not natively handle missing data). Mplus handles partially missing data by default.

The best way to understand these changes to tidyLPA, though, is by explaining the new workflow.

Example analysis using estimate_profiles()

A full analysis might look like the following:

## tidyLPA is intended for academic use. We do not make any money on this and only ask that you please cite this in publications when you use the results. You can use the function citation('tidyLPA') to create a citation.Mplus is not installed. Use only package = 'mclust' when calling estimate_profiles().
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##     filter, lag
## The following objects are masked from 'package:base':
##     intersect, setdiff, setequal, union
pisaUSA15[1:100, ] %>%
    select(broad_interest, enjoyment, self_efficacy) %>%
    single_imputation() %>%
## tidyLPA analysis using mclust: 
##  Model Classes AIC    BIC    Entropy prob_min prob_max n_min n_max BLRT_p
##  1     3       637.97 674.45 0.82    0.86     0.95     0.05  0.66  0.01

You can use Mplus simply by changing the package argument for estimate_profiles():

pisaUSA15[1:100, ] %>%
    select(broad_interest, enjoyment, self_efficacy) %>%
    single_imputation() %>%
    estimate_profiles(3, package = "MplusAutomation")

A simple summary of the analysis is printed to the console (and its posterior probability). The resulting object can be further passed down a pipeline to other functions, such as plot, compare_solutions, get_data, get_fit, etc. This is the “tidy” part, in that the function can be embedded in a tidy analysis pipeline.

If you have Mplus installed, you can call the version of this function that uses MPlus in the same way, by adding the argument package = "MplusAutomation.

We can plot the profiles by piping the output to plot_profiles().

pisaUSA15[1:100, ] %>%
    select(broad_interest, enjoyment, self_efficacy) %>%
    single_imputation() %>%
    scale() %>%
    estimate_profiles(3, package = "MplusAutomation") %>% 

Example analysis using compare_solutions()

The function compare_solutions() compares the fit of several estimated models, with varying numbers of profiles and model specifications:

pisaUSA15[1:100, ] %>%
    select(broad_interest, enjoyment, self_efficacy) %>%
    single_imputation() %>%
                      variances = c("equal", "varying"),
                      covariances = c("zero", "varying")) %>%
    compare_solutions(statistics = c("AIC", "BIC"))

Deprecated functionality:

  • compare_solutions() no longer estimates profiles. It merely compares solutions estimated by estimate_profiles()

  • all Mplus-specific functions are deprecated. estimate_profiles() offers a unified interface to Mclust and Mplus. These Mplus-specific functions are deprecated:

  • estimate_profiles and compare_solutions no longer select variables, passed as unquoted variable names, from ‘df’. Instead, they use all variables in df. To select variables prior to analysis, use:

    • The dplyr function select(df, Your, Selected, Variable, Names)
      • The base R function df[, c(“Your”, “Selected”, “Variable”, “Names”)]
  • estimate_profiles and compare_solutions no longer center and scale the raw data. To center and scale variables prior to analysis, use scale(df, center = TRUE, scale = TRUE)

  • estimate_profiles no longer accepts arguments specific to Mplus or Mclust. Instead, you can specify additional arguments directly to the core functions mplusModeler() and Mclust(). See the help file for these functions for more details. Deprecated arguments, whose functionality is supplanted by passing arguments directly to these core functions, are:

For Mclust:

* prior_control

For Mplus:

* starts
* m_iterations
* st_iterations
* convergence_criterion
* optseed
* cluster_ID
* include_VLMR
* include_BLRT (but BLRT is returned by default now)