Logo Logo max ostapenko
  • learning
    • Selected knowledge sources
    • Tech radar
  • blog
  • projects
    • Google Analytics 4 Data Import
    • Google Cloud Regions Locator

    • Datahub
    • Abusive Experience Report
    • Ad Experience Report
    • Safe Browsing

    • HTTP Archive
    • Web Almanac - Privacy
  • about

Safe Browsing

  • Google Analytics 4 Data Import
  • Google Cloud Regions Locator
  • Datahub
    • Abusive Experience Report
    • Ad Experience Report
    • Safe Browsing

  • HTTP Archive
    • Web Almanac - Privacy

On this page

  • Overview
  • Usage
  • Preview
  • Schema
    • hash_lists
    • hash_entries

Other Links

  • Open in BigQuery

Safe Browsing

The Safe Browsing dataset provides hash-based threat intelligence from the Google Safe Browsing API, enabling URL safety checks directly in BigQuery.

Overview

The Safe Browsing dataset provides hash-based threat intelligence sourced from the Google Safe Browsing API v5. It enables analysts to check URLs against Google’s global threat lists directly in BigQuery — no API key required.

The dataset is refreshed regularly and includes threat coverage for malware, phishing, unwanted software, and potentially harmful applications.

Available at:

  • Public Dataset on BigQuery Analytics Hub

Provider: Google

Usage

After subscribing to the dataset, check URLs for threats using the bundled stored procedure:

CALL `safebrowsing.check_urls`([
  'https://example.com',
  'http://suspicious-site.example/'
]);

Results include the matched URL, hash prefix, list name, threat types, and a description of the threat category.

Preview

Schema

hash_lists

Column Description
name Threat list identifier, e.g. mw-4b
metadata.threat_types Array of threat type strings, e.g. ["MALWARE"]
metadata.description Human-readable description of the threat list
ingested_at Timestamp of last ingestion

hash_entries

Column Description
hash_prefix 4-byte SHA256 prefix as hex string
list_name Threat list this entry belongs to
hash_length_bytes Length of the hash prefix (always 4 for this dataset)
version List version token
partial_update Whether this was a partial update
ingested_at Timestamp of last ingestion
Back to top

Built with Quarto