What is data sampling in Google Analytics?
When you open a report in Google Analytics, it takes time (and resources) to calculate the results and present them in your reports. In some cases, Google Analytics will take a portion of the data and use this to estimate the total.
For example, let’s say you look after a popular website, you’ve created a custom report in Google Analytics, and you select a date range that includes 800,000 sessions. Instead of creating the custom report based on all of the sessions, Google Analytics might use half of those sessions and then provide an estimated total for the report. Now Google Analytics only needs to calculate figures based on half of the data and the report is quicker to load.
When does Google Analytics sample data?
If you’re using the standard (free) version of Google Analytics, then data sampling can occur when you have more than 500,000 sessions included in the selected date range.
Since Google Analytics pre-processes data for your standard reports, including the Audience, Acquisition, Behavior and Conversion reports, these reports will include unsampled data. So you will only see data sampling if you modify a standard report and the data range includes more than 500,000 sessions. For example, if you apply a segment to a standard report or if you add a secondary dimension. You will also find that data is sampled when you create a custom report (depending on the metrics and dimensions you choose).
Is data sampling good or bad?
If you are trying to analyze data for something that needs to be precise, like your website’s conversion rate or total revenue, then yes, data sampling can cause problems.
Let’s say you want to report on the total number of goal conversions coming from organic traffic over 12 months. You have over 750,000 sessions, which means you’re over the 500,000 limit. You create a custom report that includes a few different metrics and apply a segment.
The report says that there were 51,541 goal conversions for the date range. The data is sampled and is based on 65.71% of the available sessions.
Your raw data from your website says that you received 47,533 goal conversions for the same period. This means that the sampled data is reporting 8.43% more goal conversions than you received. This isn’t too bad, but if your report is based on a smaller sample (say 50% or 40% instead of 65%, then the accuracy will decrease further).
So if you’re looking for accurate figures for a subset of your data, then yes, data sampling can cause problems. However, if you’re looking to establish overall trends, for say, your total number of users, then even if accuracy isn’t 100%, you’re still able to see the overall trend. It comes down to how accurate you need your report to be for your analysis.
How can you tell if data is sampled?
The shield icon at the top of your reports tells you if the report uses sampled data or not. If the shield is green, then you’re looking at unsampled data.
If the shield is yellow, then you’re looking at sampled data. Hovering over the shield gives you more details.
We can see that this report is based on 40.87% of sessions.
Can you adjust data sampling?
Yes and no.
You can switch between ‘Greater Precision’ and ‘Faster Response’ for the data sampling used for your reports. The default is ‘Greater Precision’ which will automatically use the largest sample possible to create your report. So unless you’re looking to speed up your reporting further (which also reduces the number of sessions used for the sample), then you will probably want to stick with ‘Greater Precision’ for your reports.
To switch between ‘Greater Precision’ and ‘Faster Response’, click the shield icon at the top of your report.
After switching, it’s a good idea to check the percentage of sessions used for the sample.
How can you avoid data sampling?
There are different approaches you can take to avoid data sampling in Google Analytics. The one you use will depend on the type of analysis you’re trying to perform, the amount of time you have to create the report and your budget. Let’s take a look.
#1 Reduce the date range
The quickest way to avoid data sampling is to reduce the date range you are using for your report. When you reduce the number of days in your report, you will also be reducing the number of sessions. Once you’re under 500,000 sessions, then you will be looking at unsampled data.
For example, if you have three months selected for your date range and you see sampled data, then try reducing the date range to one or two months. Even if data is still sampled, Google Analytics will be using a larger portion of sessions for the report.
#2 Simplify your reporting
If you’re able to simplify your report (and the data you’re requesting), then you can reduce data sampling in Google Analytics. For example, if you remove the segment you have applied or the secondary dimension, then this should improve data accuracy.
Also check the standard reports to see if they meet your needs (since these include pre-processed data, which isn’t sampled).
#3 Use Supermetrics
Paid third-party tools like Supermetrics let you export your data from Google Analytics and avoid data sampling. Supermetrics includes a handy option that will automatically try to reduce data sampling:
#4 Start using an App + Web Property
This article explained data sampling in a standard website-only property. However, you also have the option of setting up an App + Web Property in Google Analytics. This is a new type of property that has different features and limits compared to a standard property. For example, an App + Web Property is currently limited to 14 months of historical data, while a standard property isn’t. App + Web Properties also have a limited number of standard reports. That being said, one of the benefits of an App + Web Property is higher limits for data sampling.
When you create a custom report in an App + Web Property (under ‘Analysis’), data is sampled when you go over 10 million events. Depending on what you’re tracking, this means you should encounter less data sampling in your reports.
#5 Use Google BigQuery
You can send data from Google Analytics to Google BigQuery. This lets you access raw, unsampled data. If you’ve set up an App + Web Property, then you can currently send data to BigQuery for free, or if you’re paying for Google Analytics 360, then there is also an option to export your data to BigQuery. Google BigQuery is a separate product and you will need to be comfortable with SQL to query the data.
#6 Pay for Google Analytics
If you’re using the paid version of Google Analytics, then data sampling won’t kick in until you have over 100 million sessions in the selected date range for the reporting view. So if you’re looking after a larger-scale website and you have the budget, then upgrading to Google Analytics 360 will unlock higher limits and the ability to export unsampled data.
Data sampling is designed to speed up reporting in Google Analytics and depending on the report, sampling may (or may not) be an issue. Sticking with the standard reports will mean you’re looking at unsampled data. And if you do modify your reports, then remember, the quickest and easiest way to reduce data sampling is to select a shorter date range.
PS. You can also use Google Analytics API and store and analyze data yourself. It’s a more advanced but also more reliable solution.