Google Analytics Spam: Why Your Data is Wrong and How to Fix it
If you’re at all interested in the performance of your website, like your conversion rate of traffic to sales leads or purchases, or how people are coming across your site and the pages they’re looking at, then you’ll probably be looking at Google Analytics monthly if not weekly as your source of truth for this critical business information.
With more than 50% of websites running Google’s free website statistics package, you could be forgiven for relying on the data presented to you to be accurate. After all, it’s Google, right?
At Hallway, we look after several websites for clients where having the right performance data is vital for measuring Return on Investment (ROI). One morning we got rather excited when we checked a client’s stats to see a huge spike in visits.
Being the ever-diligent professionals we are, and seeing as we hadn’t run any campaigns on the date the spike was showing on, we thought we’d better double check before trumpeting this fab news to our client.
After a bit of digging, we noticed the source of all this traffic was a bit suspect. In fact, very suspect. It turned out that this traffic was all coming from a few different “referring websites”, similar to the below:
These websites have absolutely nothing to do with our client’s business, and when we visited these sites, there was nothing there to suggest those sites were linking to our client’s site. We checked a few other Google Analytics profiles (including our own) and noticed similar traffic from every single one – albeit on different levels of severity ranging from less than 1% of total sessions to as much as 40%.
Clearly, this is spam traffic. Not something we expected to see, so after further investigation, we noticed that total traffic contributions from these spam sources have been on the increase for quite some time across many different profiles.
This is obviously a growing problem for Google which they haven’t exactly been vocal about. A problem that no doubt is creating a lot of misinformation amongst a lot of website owners, as polluted Google Analytics is not obvious to the untrained eye. Especially if you’re just checking out the 'Overview' report every few weeks and taking the numbers at face value - something a lot of website owners do.
Why is this happening?
Polluting Google Analytics accounts with spam data is designed to do one thing: get you to visit a website that in one way or another, earns somebody money or something else of value.
When you see huge spikes in traffic, the first thing most Google Analytics users are going to do is check out the source of that traffic, just like we did. If a lot of traffic is being referred to the website in question by one or two third party websites, the next thing most people will do is visit that website.
These websites (run by spammers) typically offer something that generates them money. It may be as simple as driving traffic to a site to get users to click on adverts (such as AdSense ads), as website owners are paid for each click on those ads. In the case of software download websites, even though the software may be free, it may well contain malware which enables a remote party to steal data on your computer to sell to hackers, or join your computer to a botnet so that DDoS attacks can be orchestrated. These things can all be sold, hence the motivation.
How is this happening?
The spam data is getting into Analytics through the Google Analytics Measurement Protocol. This essentially allows anyone to submit traffic data to Google Analytics, so long as the Google Analytics Property ID is known for the website. This can’t be turned off because it’s how the legitimate traffic on your website gets recorded.
Since Google Analytics Property IDs are sequential, it’s trivial for a spammer to write a software program that simply loops through all the IDs starting from 000001, sending their fake traffic to each one.
Another possibility is that spammers could be scanning the web, looking at the public HTML source code of websites that use Google Analytics. Since the Property ID is stored in the HTML, this could be stored by spammers along with your website address, so that traffic can be registered by knowing both your Property ID and your website address – making spam traffic a bit harder to detect.
How to identify if your Google Analytics data is affected
Since the easiest way to send fake traffic to Google Analytics is by guessing Property IDs without knowing the website address the ID is associated with, we can use the Hostname report in Analytics to identify much of it.
The hostname should be set by legitimate traffic from normal visitors and will contain your website domain name (e.g. “hallway.agency” would be the hostname for the Hallway site).
If a hostname isn’t set, or set to something that you don't recognise as your own, it’s very likely to be spam – these will show up in Analytics under the Hostname report.
The first place to look is the Hostname report in Google Analytics. Access this by first viewing Audience > Technology > Network, then clicking the “Hostname” link next to “Primary Dimension” (this appears underneath the graph).
If you’re seeing rows registering sessions under “(not set)”, or under a Hostname that is nothing to do with your website, then you can be certain you’ve got spam data in your account.
How to fix spam data in Google Analytics
It’s possible to create multiple Views in Google Analytics. These are designed to allow you different perspectives on your website traffic data using Filters and other customisations.
We can create a new View in Analytics which we’ll add Filters to, so that only traffic from allowed hostnames are included. Unfortunately, Filters won’t apply retrospectively – although we have a workaround for that too.
First, let’s set up a new View to use going forward.
In Google Analytics, click 'Admin'.
Under the third column labelled 'View', click 'View Settings' (making sure that the selected view is the default one, normally labelled “All Website Data”, but this may vary)
Click the 'Copy view' button
Enter a sensible name for this View. We suggest 'All Data - Spam Filtered', then click the 'Copy view' button again.
You should now return to the previous page, but with your new View selected in the right hand column. Click 'Filters', 'Add Filter'
Choose 'Create new Filter', enter a filter name – we suggest 'Include hostname X', where 'X' is the exact hostname you enter. Under Filter Type, choose 'Predefined', then 'Include only', 'traffic to the hostname', 'that are equal to' and enter the exact hostname you want to match followed by 'Save'.
Remember that if your domain name is example.com, you’ll need to enter two filters, one for “example.com” and another “www.example.com”.
Take care to ensure you are including all the hostnames that could legitimately send traffic to your site. This will depend on the way your site has been set up, so it may be a good idea to consult a web developer for help here. For instance, Shopify sites use a hosted checkout which is on the checkout.shopify.com domain name, so you’ll want to allow this hostname as well as your own domain name.
Make sure you keep an original View in place, to use as your base profile which collects all data without applying Filters. Because Filters discard data when traffic matches them, you can’t get that data back later. It’s sensible to keep a View without filters just in case there’s a problem later with the way your filters are set up, so you can access unfiltered data.
As a final step, you should also set the 'Exclude all hits from known bots and spiders' tick box on all Views. This is located under 'View Settings' under the 'Bot Filtering' heading. This does help, but you’ll still need to set up filters if you want to block the majority of spam visits.
How to view historical Google Analytics data without spam traffic data
As mentioned above, Filters won’t work retrospectively. If you want to view historical data minus the spam traffic, you can use Advanced Segments. Then follow the steps below:
Open the Reporting tab for the original View (the one without your new filters)
Click 'Add Segment', then 'New Segment'
Enter a name for the segment. We suggest 'Authorised hostnames'
Select 'Conditions' under the 'Advanced' heading.
On the dropdowns, select 'Hostname', 'exactly matches'
Enter a legitimate hostname your site uses. You should notice Google Analytics will show a list of matching hostnames in your data as you type to help you pick the right one.
If you need to add more than one hostname, click 'AND' and repeat until all hostnames are covered.
Click the blue 'Save' button. The new hostname filtered segment should now be applied. If the 'All Users' segment is still showing, you might want to disable that to avoid confusion.
- All data you view with this segment, including historical data, should now only include those matching the hostnames you specified.
Keep an eye on the segments each time you load a new report page, as Analytics sometimes reverts to the default All Users segment between screens and sessions.
With this information, you should have a better understanding of where your traffic is coming from, and how much of it is legitimate, allowing you to better plan and track the success of your digital marketing campaigns.
If you have any questions or would like help with your website analytics, get in touch.