Skip to content

fix(get_url_parameter): only match complete parameter names#1080

Open
toseekandfind wants to merge 1 commit into
dbt-labs:mainfrom
toseekandfind:fix/get-url-parameter-substring-match
Open

fix(get_url_parameter): only match complete parameter names#1080
toseekandfind wants to merge 1 commit into
dbt-labs:mainfrom
toseekandfind:fix/get-url-parameter-substring-match

Conversation

@toseekandfind
Copy link
Copy Markdown

resolves #980
(also resolves the duplicate report in #1014)

Problem

dbt_utils.get_url_parameter silently returns the wrong value when the
requested parameter name is a suffix of a different parameter name that
actually exists in the URL.

The macro splits the URL on <url_parameter>=, so any parameter whose name
ends with the search token matches. Two examples from the linked issues:

  • get_url_parameter('url', 'g') against http://example.com/?msg=myvalue
    returns myvalue (because msg= contains g=).
  • get_url_parameter('url', 'ku') against
    http://example.com/?sku=EXAMPLE_SKU&u=... returns EXAMPLE_SKU.

The expected behaviour is to return NULL whenever the URL does not contain
a parameter whose full name matches the search token.

Solution

Normalize the URL so every parameter is preceded by &, then split on
&<url_parameter>=. Requiring the leading & in the search token means
only complete parameter names match.

The normalization uses cross-database primitives that dbt-utils already
relies on elsewhere (dbt.replace, dbt.concat, dbt.split_part), so the
change stays dispatch-friendly and adapter-neutral:

{%- set formatted_url_parameter = "'&" + url_parameter + "='" -%}
{%- set normalized_field = dbt.concat(["'&'", dbt.replace(field, "'?'", "'&'")]) -%}
{%- set split = dbt.split_part(dbt.split_part(normalized_field, formatted_url_parameter, 2), "'&'", 1) -%}

Alternatives considered:

  • Regex-based extraction (regexp_substr / regexp_match): cleanest
    semantically, but each warehouse uses a different regex function and
    argument order, so it would require per-adapter dispatch. Sticking with
    split_part / replace / concat keeps the macro single-implementation.
  • Prepend only inside the search token (&<param>=) without also
    prepending to field: fails when the requested parameter is the first
    one in the query string after normalization, because there is no leading
    & in the URL itself.

Tests

Added a singular test at
integration_tests/tests/web/test_get_url_parameter_substring.sql that
covers both regression cases and the legitimate short-name cases:

  • Short-name suffix regressions (g against ?msg=..., ku against
    ?sku=...) must return NULL.
  • Single-character parameter names (?m=..., ?s=...) must still resolve
    correctly.
  • Pre-existing behaviour (utm_medium, utm_source) is unchanged.

The test compares actual and expected with NULL-safe equality so
NULL = NULL is treated as a pass rather than "unknown" (which would
otherwise let regressions slip through the existing != comparison).

Verified locally against dbt-postgres (make setup-db + dbt build):

  • The new test fails on main with 2 failing rows (confirming it
    exercises the bug).
  • The new test passes on this branch.
  • Existing test_urls, test_url_host, and test_url_path tests all
    still pass.

Checklist

  • This code is associated with an issue which has been triaged and accepted for development.
  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the README.md (if applicable) — not applicable; public API unchanged

Previously, `get_url_parameter` split the URL on `<param>=`, which also
matched when the search token appeared as a *suffix* of a longer
parameter name. For example, calling `get_url_parameter(url, 'g')`
against `http://example.com/?msg=myvalue` incorrectly returned
`myvalue` because the `msg=` substring contains `g=`.

The fix normalizes the URL so every parameter is preceded by `&`
(using `dbt.replace` to turn `?` into `&`, and prepending `&` with
`dbt.concat` so the first parameter is also delimited), then splits
on `&<param>=`. Requiring the leading `&` in the search token means
only complete parameter names match.

Resolves dbt-labs#980 (also resolves the duplicate report in dbt-labs#1014).
@toseekandfind toseekandfind requested a review from a team as a code owner April 18, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dbt_utils.get_url_parameter will incorrectly match fields if they end with url_parameter

1 participant