Technical Approaches to Sampling Fraud Detection for Surveys


Good fraud detection process requires a careful understanding of the different tactics that might be used by those with nefarious intent. Many of these quality checks should be built into the process from the moment of registration, but also as part of panel quality control over the long term.

What does that mean? It means carefully building a system that protects the integrity of the entire panel against fraud. Here are some of the most important safeguards that should be implemented.

Device Fingerprinting

This information, collected from the device being used by the respondent, serves as a specific identification that can be used in validation. Ideally, every device has its own fingerprint and can be identified in future engagement. Per a 2016 Fraud Report, this technology has been implemented by 32% of companies with 17% more indicating a plan to implement in the future. TrueSample verification places a digital fingerprint on each user when they enter a survey, allowing us to assess for potential bot activity, black-listed IP addresses and other fraud markers.

IP Geo Location Information

IP GEO location information has been implemented by more than half of all companies with another 13% indicating plans to implement. Pegging the location of users based on IP address can be done for most devices, though other factors here can become an issue if IP is overridden by anonymous browsers, VPN, or other privacy tools.

Identity Address Validation

By validating real world identity via name, postal address, email address, and other personal information against existing consumer databases, a real name can be assigned to each respondent. This allows for effective removal of duplicate entries as well.

GEO Location Distance Check

By combining the above two factors, we can evaluate where someone says they are versus the IP address logged when they are active to determine if they are where they say they are (or are using a secondary device).

While fraud on a larger scale may be able to fool several of these factors, the multi-step process can help identify those who are using device emulation, spoofing location, farming survey responses, or who are otherwise bypassing frontend quality checks during registration and survey activities.

Response Validation

A good response validation tool leverages real-time Bayesian statistical models and analysis to determine the engagement of users. In short, are they engaging in a way that matches typical human behavior?

Using these statistical models, a respondent can be flagged as unengaged in a survey if they speed on a certain percentage of pages they saw during the survey. There are established norms and standard deviations for each page that can be calculated and updated in real time as the page submissions from this and other respondents are received by the platform. In short, outliers are flagged almost immediately and can be evaluated for engagement levels.

Another important factor that this technology can measure is response pattern. Undesirable response patterns on a certain percentage of pages can also be considered unengaged with the survey and flagged as such.

screenshot231.jpgThe quality of your data is vital, so it’s important to understand how sample is sourced, what safeguards are put in place to protect against fraud, and how respondents are measured over time to keep that quality level high.

Learn more about our approach to data quality and additional actions that can and should be taken to protect it in our new eBook, Defining Quality in Sample.