Author: Jamie Hoyle
Date: June 27, 2025
🧠 Summary / Purpose
This guide outlines how to configure Google Tag Manager (GTM) and Google Analytics 4 (GA4) to identify and exclude visits from specific crawler user-agents, such as MirrorWeb's. This ensures crawler traffic does not skew your analytics reporting.
🔍 Use Case / Scenario
Organizations that use crawlers (like MirrorWeb) to archive or scan their websites may see inflated or misleading metrics in GA4 if these visits are not filtered out. This process allows you to label that traffic as "crawler" and permanently exclude it from GA4 reporting.
🛠️ Step-by-Step Instructions
🔐 Prerequisites
- Editor access to your GA4 property
- Access and publishing rights for the GTM container that handles GA4 tagging
1. Create a Custom JavaScript Variable in GTM
- In GTM, go to Variables ▸ User-defined variables ▸ New.
- Select Custom JavaScript and paste the following code:
function () { var ua = navigator.userAgent || ''; return /mirrorweb/i.test(ua) ? 'crawler' : undefined; }
- Name the variable:
cjs – traffic_type (crawler)
- Click Save
This function returns the string crawler
only when the user-agent contains "mirrorweb".
2. Pass the Variable to GA4 as a Parameter
- Open your GA4 Configuration tag in GTM
- Under Configuration Parameters, click Add Parameter:
- Field / Parameter Name:
traffic_type
- Value:
{{cjs – traffic_type (crawler)}}
- Field / Parameter Name:
- Save and Publish the container after testing
3. (Optional) Define a Rule in GA4 for Organization
- In Google Analytics, navigate to:
- Admin ▸ Data Streams ▸ Web stream ▸ Configure tag settings ▸ Show all ▸ Define internal traffic ▸ Create
- Set:
- Rule name: Crawler traffic
- traffic_type value:
crawler
- Leave IP conditions empty
- Click Create
This step helps organize internal filters but is not mandatory.
4. Create a Data Filter in GA4 to Exclude Crawler Traffic
- Navigate to Admin ▸ Data Filters ▸ Create Filter
- Set the following:
- Filter type: Internal traffic
- Filter name: Exclude crawler traffic
- Filter operation: Exclude
- traffic_type equals:
crawler
- Filter state: Start with Testing, verify results, then change to Active
- Click Create
⚠️ Once active, GA4 will permanently exclude traffic labeled with
traffic_type=crawler
.
5. Validate and Monitor
- Publish your GTM container
- Wait for the crawler to revisit the site (MirrorWeb crawlers usually scan daily)
- Use DebugView in GA4 to verify that the
traffic_type=crawler
parameter is being sent - Allow 24–36 hours for the exclusion filter to take full effect