STAGE 4 - Data Validation
Thomson Reuters is committed to collecting high quality data. Institutional data is provided and confirmed by each academic institution to ensure authenticity, and our first objective is to facilitate a smooth submission process that limits errors. We have provided university representatives a strong support structure and, where possible, pre-populated profiles with authoritative third-party data which institutions can confirm or revise (more details at Stage 3).
While these processes have helped minimize most data errors, there are inevitable cases where data requests are misinterpreted or simply entered incorrectly. It is therefore important that Thomson Reuters perform quality control to ensure an accurate representation of an institution’s activities. This data validation process can be broken down into simple steps:
Identification of logical data errors. A series of algorithms can easily spot logical data errors for universities to rectify. For example, if the number of international students exceeds the total student body, the data are flagged for follow-up. Such errors are mostly caused by incorrectly entered data or misinterpretation of the data requirements.
Data comparisons. By comparing newly submitted data to reliable third-party sources—most notably government agencies—we can spot potential problems. Although our data definitions are not always the same as third-party sources, we can modify the source data to make meaningful comparisons. For example, where we ask for “Number of Academic Staff FTE” (full-time equivalent), we can create comparative third-party estimates by totaling full-time and part-time staff, multiplied by 50%. The comparisons are not perfect, and we allow for reasonable variance, but such estimates have been instrumental in encouraging universities to revisit their submitted data.
Data anomalies. For data errors not identified in the previous steps, it is still possible to compare data to expected values and search for outliers. For example, if a university has a funding-per-faculty ratio that is far higher than the ratio for the university’s peer group, the data is flagged for follow-up.
Once an error or anomaly has been identified, we contact the university for an explanation or correction. There are sometimes perfectly sound reasons for variance, but we prefer to hear from the source. In most cases, universities are more than willing to help qualify and correct their data.
