How does PayScale maintain the proprietary taxonomy?
We maintain our proprietary taxonomy through a series of steps summarized below. These steps ensure usable data that is both accurate and granular, which allows us to make pay predictions.
To normalize the crowdsourced data that we collect from our online salary survey, we maintain proprietary group classifications based on repudiated third-party structures that are industry standard for the United States. We have an automated rules engine that applies this taxonomy to the factors in our database that impact pay. This vastly improves the accuracy of our salary predications because similarity between individual classes of answers is baked-in. We save the most specific answer, but once the taxonomy has been applied we can easily move up and down the hierarchy for things like job titles and locations when data is sparse. We maintain internal classifications for Job Titles (groupings loosely related to ONET codes), Industries (groupings related to NAICS), Locations (city metro definitions), Education, Employer Names, Skills, and Certifications. These answers are maintained using a combination of automation and human validation. We review our taxonomy once a quarter to check for new answers and to review client feedback in order to make updates. Taxonomy updates, depending on the nature, can take time to process because all of our algorithms must be updated to reflect the changes before they are available via API or in product. When adding new answers, changes will be reflected when we capture enough data via our survey to model the results.
Why does the taxonomy matter?
On a basic level, applying a taxonomy makes the data usable. If we didn't apply the taxonomy we would just have a bunch of disparate strings that were similar (Analyst, Data - Data Analyst - Data Analyst II) but maybe not considered to be the same entity.
How is PayScale able to predict pay in places where you have light or no data?
PayScale has the ability to pull in similar entities when data is somewhat low for the given query. In some platforms, data might be limited to a specific city or a specific job title, but with our taxonomy, we generally know what cities fall in which metros and can pull in more data at a granular level to make a prediction. Job titles, in particular, might not exist in sufficient numbers in certain geographic locations, but we know how similar jobs are paid in that area.
What is the country/geographic coverage of the queries?
PayScale only has a compensation model built for 8 countries: US, Canada, Australia, New Zealand, South Africa, UK, Ireland and India. Our data is strongest in English-speaking countries (or in India’s case, where English is not the primary language, however hosts 125 million English-speakers) due to our survey being in English. Therefore, data we collect from non-English speaking countries have a tendency to not be an accurate representation of the labor force (i.e., biased), as it is from those who have either learned English or ex-pats residing and working in a foreign country.
How you maintain and update the data and taxonomy/ontologies that power the API?
In terms of our promise of fresh data, we include relevant profiles in a market report collected as recently as the previous day. Our model does place a higher emphasis on more recent profiles when trying to find the best matched 45 profiles, but will extend the time horizon to ensure we get the most relevant profiles to the position described, while never going back more than 2 years. For more information on why 45 profiles are used, see Why 45 Profiles?.
How does a submitted Answer influence the Reports I receive from the API?
Answers submitted in the Request are used by the Compensation Model to select the best 45 Profiles which are used to generate the pay distributions that make up the values in the Reports. While not every Answer is used by every Report (for instance, a multi-request to ["pay", "yoe"] with a YearsExperience value of 5 will influence the PayReport, but not the YearsExperienceReport), the majority are cross-compatible.
What does <field> from my Report mean?
For a detailed description of every response in every Report, please check out the Reports Page.
What is JobTitleRating?
In short, the JobTitleRating is a score from 0 to 100 that represents the confidence our algorithm has that the submitted JobTitle is accurately described by the MatchedJobTitle for which the Report was run.
For a detailed description of every response in every Report, please check out the JobTitleRating page.
What is "Base Pay" in a Base Pay Report?
Base Pay, or Salary, is defined as the salary reported by a set of Profiles from our crowdsourced data. In general, Salary is defined as the recurring annual pay received by a worker without additional Bonuses or Incentives.
Why 45 Profiles?
The 45 most relevant Profiles offer a nice balance between statistical power for accurate reporting, coverage when data is sparse, and signal for a particular sub-set of the data. In the limit, if only one Profile was required, getting a distribution would be impossible. If, say, two hundred Profiles were required, sparse data would be overpowered by similar, but less relevant, data. By choosing 45, a more intermediate value which was calibrated during the creation of the Compensation Model, a good balance is achieved between signal, noise, and coverage.
How Do I Know What My Job Title Auto-Resolved To?
Any Report which has the optional argument AutoResolveJobTitle will return a Context object in the response if AutoResolveJobTitle is True. This Context object, described on the Reports Page, contains a SubmittedJobTitle (the JobTitle submitted in the request, which may or may not be a PayScale Confirmed JobTitle), and MatchedJobTitle (a PayScale Confirmed JobTitle which is our best guess at mapping the SubmittedJobTitle to our internal taxonomy).
In the case where this guess is poor, human intervention may be required in order to search for a more fitting JobTitle to submit. There are two endpoints that can help with this - JobTitleMatch, which hits the same service as AutoResolveJobTitle, and AutoComplete, which exposes a more direct, syntactic search method.
Why Can A Skill Differential Be Negative?
Skill differentials, or Skill Impacts, are calculated against the median Base Pay from a cohort of Profiles associated with the given JobTitle. Because they are calculated against the median of the entire cohort, and because Skills are not randomly distributed, there will naturally be Skills associated with positive impact and there will be Skills associated with a negative impact, where the median Base Pay associated with Profiles for which that Skill is present is below the median Base Pay for all Profiles in the cohort.
When this lower median associated with a Skill is compared to the higher median associated with the entire cohort, the resulting differential will be negative.
Other differentials (including Certifications) are calculated the same way.