An application of statistical learning to optimize rewards programs
Pay equity analysis relies on regression models, which predict pay from legitimate factors for different workforce segments. The segments are intended to reflect differences in rewards philosophies — differences in how, for instance, experience and performance are expected to affect pay. This study looks across 54 companies, and 817 segments across those companies, to reveal actual variation in these philosophies. In particular, the work does various things: characterizes philosophy “clusters”; describes the salient differences between segments that fall into these clusters; and examines how these clusters vary within industries.
The study also provides guidance as to how to look critically at such regression models to classify the philosophies that actually play out “on the ground” and identify opportunities for optimization of practices. At a baseline level, the models will be confirmatory and relied upon to support statements related to the employee value proposition (e.g., “we pay for performance”). But the models can also reveal misalignments — places where, for example, employees who should be paid for performance are rewarded instead based on company experience.
The study relies on data from pay equity projects conducted by Mercer during the past two years, representing more than 2 million employees. The data include, for each employee in the analysis, information on pay (base pay and/or an appropriate total compensation construct) and potential “drivers of pay.” These potential drivers include the following: experience (proxied by age, tenure and time in job); performance (above-average performance rating); career level or pay grade; job function or job family; the number of direct reports; business unit or department; and work city.
To calculate the impact of these drivers on pay, we run a linear regression for each workforce segment. The dependent variable modeled is total compensation, if incentive data are available, and base pay otherwise. Generally, where incentive data are not available, incentives are not a substantial part of the rewards package. All models rely on the consistent set of drivers described earlier. To ensure comparability, we limited the models to full-time employees.
Our analysis database for clustering includes the impacts of these drivers for each segment. In particular, the impacts of age, tenure, time in job and high rating are points of examination. Further, a separate model that accounts for only career levels or grades, as separate indicator variables, provides a single, sufficient measure of the importance of position; the “R-Squared” statistic represents the percentage of variance in pay explained by position alone.
Table 1 shows illustrative driver effects for a set of three segments. For age, tenure and time in job, the table represents the average one-year incremental impact of each on pay in each segment, all else equal. The effect of high rating shows the difference in pay between those with a high rating and those without a high rating, again, all else equal. For career level, the table represents R-squared as defined above. In the first segment, all forms of experience are equivalently rewarded, whereas age (general experience) and time in job are rewarded in the second, and tenure is rewarded in the third. Tenure is notably penalized in the second segment, denoting market pressure reflected in higher pay of new hires. The second segment heavily rewards performance and the third segment rewards strongly to role.
To support the clustering of the segments, the five drivers are statistically standardized to level the influence of each relative to each other. This analysis database contains 906 global segments with broad industry coverage (367 segments represent non-U.S. employees). Results based on this database are robust to the inclusion or exclusion of 89 outliers with effects below the first percentile or above the 99th percentile, though results are presented having excluded those cases. Our analysis, therefore, covers 817 segments across 54 global companies.
The core contribution of this study is to cluster the segments according to similarity in pay practices, using cluster analysis. Cluster analysis is a branch of machine learning that partitions observations across groups based on their similarity on a set of factors, which in this case are the drivers.
Agglomerative hierarchical clustering (AHC), a popular clustering approach, is the chosen technique. AHC uses a tree-based algorithm that, at each successive step, groups together the most similar “leaf” with growing “branches” or clusters (James et al. 2013). This analysis produces a plot, called a dendrogram. A dendrogram looks like an upside-down tree, with leaves at the bottom and the trunk at the top.
Figure 1 shows a dendrogram produced after running the algorithm on the analysis dataset. There are 817 leaves at the bottom of the graph — one for each segment. Moving up the tree, leaves merge to form branches. The order in which these segments, or clusters of segments, fuse is based on their similarity. At each step of the algorithm, the pairwise correlation across the drivers is evaluated between every remaining cluster (using the average drivers in each cluster), and the two “closest” clusters are combined. (The metric chosen to measure similarity between observations is the correlation-based distance, as opposed to the more common Euclidian distance. The latter weighs similarity factor by factor and can show similarity based on just a small number of factors being close together, while correlation-based distance weighs similarity across all the factors, which is consistent with our intent of having clusters reflect distinct pay patterns.) On the first iteration, each segment makes up its own cluster. On the last iteration, all the segments come together into one cluster.
To align on a set of pay philosophies, one must define the clusters such that they are suitably different from one another — with distinct observable differences in the drivers as opposed to nuances — but also such that differences within the cluster are minimized. The blue line drawn on Figure 1 intersects five branches to produce as many clusters below, which appear to best achieve the trade-off described based on review of the resulting clusters.
The five identified clusters each represent what we think is a common pay philosophy. To represent these philosophies, we computed the average effects of the key fields across all segments in each cluster. The average effects and the number of segments in each cluster are represented in Table 2. For comparison purposes, the average effects for the whole population of segments are represented at the top of the table.
The clusters can be characterized as follows:
Pay philosophies vary within a company. For one global financial services company, for example, we found that front-office segments were more often associated with the performance cluster, but IT employees were more often associated with the specialist cluster. Similarly, we found North American and European segments associated with the performance cluster and Asian segments associated with generalist and specialist clusters.
Noting the regional and job function-based variation within companies, there are distinct philosophies associated with industries (as seen in Table 3). Specifically:
While pay philosophies reflect differences in pay practices, they may also reflect differences in compensation review processes and unexplained pay gaps — pay differences between groups (e.g., genders) that exist after accounting for legitimate factors (Levine and Greenfield 2015). While Figure 2 shows that there is variance in unexplained gender pay gaps in every cluster, the average gap is lower and the spread of gaps diminished for the role-based and commitment-based clusters. It is likely that these clusters are associated with lower levels of pay discretion that itself drives gaps.
Statistical models from pay equity analysis can drive significant insights about compensation philosophies that materialize in different areas of an organization. Indeed, analysis conducted over more than 800 employee segments across more than 50 organizations shows that at least five different compensation philosophies can be uncovered through review of such models, and there is value in assessing the appropriateness of the revealed philosophy for a segment. It is the authors’ view that, within an organization, such models can be reviewed against the typology derived in this study with potential to optimize compensation practices. The authors also provide evidence that discretion in rewards, while providing important benefits, can lead to inequities. Such inequities can be reduced to the degree that procedural guardrails can be applied.