Skip to content

Add hashed_endpoint to decrease duplicates over time #11

@bonrow

Description

@bonrow

Add hashed_endpoint to disallow multiple shortened URLs (with same configs) to point to the same endpoint.

Since each endpoint is decrypted individually, using a per URL randomly generated IV (seed), the current implementation cannot test against duplicate endpoints without manually decrypting every row (which would be VERY bad and slow).

Solution:
Add a hashed_endpoint column to the database, using the same hashing algorithm used in the UrlCryptography interface and using a to the project unique salt. It represents the endpoint, but hashed, to keep it private, but allow for fast duplicate checks (requires index).

User experience:
When a user generates a shortened URL, pointing to an endpoint that has already been shortened, whose shortened URL has the same configuration, no new shortened URL is generated but the already existing is used and forwarded to the user.

When are two shortened URLs equal to each other?
A shortened URL is equal to another one, if their configs (i.e. expiration date, one-time use, password, etc.) and hashed endpoints are equal. Optionally, a unique index can be created that combines all of the said columns and ensures this behaviour on database level, in case the backend services fail (this is not necessarily recommended tho).

What is if two URLs are equal, but they have equal one-time uses?
If shortened URL A and B are both equal and both have a one-time use, they are not seen as duplicates.

Why would we implement this?

  • Save storage: Over time, this can save a lot of unnecessarily wasted storage. Most users might go about and shorten a URL without modifiying it further (so setting no password, one-time use, etc.).
  • Reduce indexes and saves speed: Commonly used URLs (such as https://google.com) will not cause many many indexes of the same endpoint.
  • Helps prevent spam

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestwontfixThis will not be worked on

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions