Universally Unique Identifier generated per event using UUID v4. Guarantees global uniqueness across all records. Use as a deduplication key when joining datasets.
Full English country name resolved from the anonymized network signal. Follows ISO 3166-1 country naming conventions.
Two-letter ISO 3166-1 Alpha-2 country code. Consistent, machine-readable identifier suitable for geo-segmentation and JOIN operations with external country dimension tables.
State, province, or administrative region within the country. Useful for sub-national geographic segmentation in retail and consumer trend analysis.
City-level geographic resolution. Derived from coarse IP geolocation lookup. Accuracy is city-level; never street or postal-code precision. No precise location data stored.
Postal or ZIP code approximated from coarse geolocation data. Provided on a best-effort basis; may be null in regions where postal code inference is unreliable at city-level resolution.
Coarse latitude coordinate in decimal degrees (WGS 84). Resolution is intentionally limited to ~10–50km radius to prevent re-identification. Never GPS-precise. Suitable for regional heatmaps.
Coarse longitude coordinate in decimal degrees (WGS 84). Always paired with LOCATION_LATITUDE. Apply the same resolution caveat: city-block or GPS precision is never available.
Unix epoch timestamp in milliseconds (ms) since 1970-01-01T00:00:00Z. High-resolution event time for session reconstruction, journey velocity analysis, and time-series modelling. Divide by 1000 for seconds.
Calendar date of the event in ISO 8601 format (YYYY-MM-DD), normalized to UTC. Derived from TIMESTAMP for convenience. Use for daily aggregation, date-range partitioning, and S3 prefix-based partitioning.
Time-of-day component in UTC, formatted as HH:MM:SS. Always UTC — apply timezone offsets using COUNTRY or CITY fields if local time analysis is required. Millisecond component is not preserved in this field.
HTTP request method associated with the event. In the context of clickstream data, this is almost always GET. Other values (POST, HEAD) indicate non-navigation requests such as form submissions or prefetch requests.
Raw HTTP User-Agent string as reported by the client browser. Contains browser engine, version, operating system, and device tokens. The parsed, categorical fields (BROWSER, OS, DEVICE_NAME) are derived from this value. Max length 512 characters.
Structured object containing browser name (e.g. Chrome, Safari, Firefox, Edge) and major version number parsed from the User-Agent string. Minor and patch versions are omitted to reduce cardinality.
Operating system name and version derived from the User-Agent. Name values include Windows, macOS, iOS, Android, Linux. Version is the major OS version only. Useful for platform split analysis in retail and media clients.
Commercial device model name inferred from the User-Agent string or client hints API. Examples: iPhone 15 Pro, Galaxy S24, MacBook Pro. Null for unrecognized or generic desktop browser signatures where model is not disclosed.
Manufacturer brand name of the device. Values include Apple, Samsung, Google, Microsoft, Huawei, Xiaomi, and others. Normalized to consistent capitalization. Distinct from DEVICE_NAME which carries the model; this field carries only the brand.
Normalized device form-factor classification. One of three values:
Desktop
Mobile
Tablet.
Current panel distribution: Desktop 92%, Mobile 8%. Tablets are classified as Mobile in the default aggregation.
The full URL of the page that linked to the current destination, as reported by the HTTP Referer header. Null when the navigation was direct (typed URL, bookmark, or new tab). Critical for attribution modeling and traffic-source analysis. Query parameters are preserved but sanitized to remove tokens that match PII patterns.
Complete destination URL of the event including protocol, subdomain, domain, path, and query string. Query string parameters that match session tokens, user IDs, or email patterns are redacted to [REDACTED] before storage.
Registered domain extracted from the URL (e.g. amazon.com). Subdomains are stripped. This is the primary field for brand-level aggregation, domain visit counting, and competitive share-of-traffic analysis. Pre-indexed for fast GROUP BY queries.
Subdomain component of the URL if present (e.g. www, m, app). Null when the URL has no subdomain. Useful for distinguishing desktop vs mobile subdomain traffic and identifying product sub-sections within large platforms.
URL scheme component. In practice almost always
https.
http records indicate legacy or misconfigured origins and may warrant quality filtering before use in consumer brand analysis.
URL path component after the domain (e.g. /dp/B09X7). Enables page-level traffic analysis, funnel stage classification, and product category detection. Path segments that match UUID or session token patterns are redacted.
The query string portion of the URL (everything after ?). Parameters identified as PII vectors (email=, user_id=, token=, auth=) are automatically redacted. Preserved parameters include search terms (q=), category (cat=), and UTM tags for attribution.
{ "OBJECT_ID": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "COUNTRY": "United States", "COUNTRY_CODE": "US", "REGION": "California", "CITY": "San Francisco", "POSTAL_CODE": "94105", "LOCATION_LATITUDE": 37.77, "LOCATION_LONGITUDE":-122.41, "TIMESTAMP": 1735905600000, "DATE": "2026-01-03", "TIME": "14:23:07", "METHOD": "GET", "USER_AGENT": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36", "BROWSER": { "name": "Chrome", "version": "120" }, "OS": { "name": "macOS", "version": "14" }, "DEVICE_NAME": "MacBook Pro", "DEVICE_BRAND": "Apple", "PLATFORM": "Desktop", "REFERRER": "https://google.com/search?q=enterprise+data+platform", "URL": "https://amazon.com/dp/B09X7MFLJZ", "DOMAIN_NAME": "amazon.com", "SUB_DOMAIN": "www", "PROTOCOL": "https", "PATH": "/dp/B09X7MFLJZ", "QUERY_STRING": "ref=nav_bestseller" }
Ready to integrate?
Request a free sample dataset and receive a Parquet file with 10,000 real records matching this schema.