[GH-ISSUE #4] Filter out invalid characters server side #3

Closed
opened 2026-02-27 23:19:56 +03:00 by kerem · 11 comments
Owner

Originally created by @freekmurze on GitHub (Oct 27, 2017).
Original GitHub issue: https://github.com/spatie/dnsrecords.io/issues/4

Originally assigned to: @freekmurze on GitHub.

Only alphanumeric characters and a . should be accepted. Let's just filter those out.

Originally created by @freekmurze on GitHub (Oct 27, 2017). Original GitHub issue: https://github.com/spatie/dnsrecords.io/issues/4 Originally assigned to: @freekmurze on GitHub. Only alphanumeric characters and a `.` should be accepted. Let's just filter those out.
kerem 2026-02-27 23:19:56 +03:00
Author
Owner

@brendt commented on GitHub (Oct 27, 2017):

We're going to need a little more than just alphanumeric characters and a ., eg. - and _.

This post (which is based on the URI spec) lists a lot more: https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid/1547940#1547940

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=`.

All of these are valid URI characters. ? eg. doesn't make sense in a domain name, but people could still paste a whole URL, which should be able to be parsed. I think PHP's parse_url might actually be a good solution, because than you shouldn't worry about parsing the input at all. If it's a valid URL, parse_url($url, PHP_URL_HOST) will return the host, otherwise it will return null. http:// or https:// should be present though.

<!-- gh-comment-id:339958375 --> @brendt commented on GitHub (Oct 27, 2017): We're going to need a little more than just alphanumeric characters and a `.`, eg. `-` and `_`. This post (which is based on the URI spec) lists a lot more: https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid/1547940#1547940 ``` ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=`. ``` All of these are valid URI characters. `?` eg. doesn't make sense in a domain name, but people could still paste a whole URL, which should be able to be parsed. I think PHP's `parse_url` might actually be a good solution, because than you shouldn't worry about parsing the input at all. If it's a valid URL, `parse_url($url, PHP_URL_HOST)` will return the host, otherwise it will return null. `http://` or `https://` should be present though.
Author
Owner

@bbashy commented on GitHub (Oct 27, 2017):

@brendt It removes those https://github.com/spatie/dnsrecords.io/blob/master/app/Http/Controllers/HomeController.php#L81

<!-- gh-comment-id:339958504 --> @bbashy commented on GitHub (Oct 27, 2017): @brendt It removes those https://github.com/spatie/dnsrecords.io/blob/master/app/Http/Controllers/HomeController.php#L81
Author
Owner

@willemvb commented on GitHub (Oct 27, 2017):

Side note: what with top level domains?
Eg. . as first character might need some extra attention:

  • be currently works
  • . currently works
  • .be fails
<!-- gh-comment-id:339958690 --> @willemvb commented on GitHub (Oct 27, 2017): Side note: what with top level domains? Eg. `.` as first character might need some extra attention: - `be` currently works - `.` currently works - `.be` fails
Author
Owner

@brendt commented on GitHub (Oct 27, 2017):

@bbashy that's perfect, no?

protected function sanitizeInput(string $input = ''): string
{
    $input = str_replace(['http://', 'https://'], '', $input);

    $input = parse_url("http://{$input}", PHP_URL_HOST);

    return strtolower($input);
}
<!-- gh-comment-id:339959181 --> @brendt commented on GitHub (Oct 27, 2017): @bbashy that's perfect, no? ```php protected function sanitizeInput(string $input = ''): string { $input = str_replace(['http://', 'https://'], '', $input); $input = parse_url("http://{$input}", PHP_URL_HOST); return strtolower($input); } ```
Author
Owner

@bbashy commented on GitHub (Oct 27, 2017):

@brendt Yeah something like that is fine apart from you need to remove http(s):// since dig doesn't accept it.

<!-- gh-comment-id:339959756 --> @bbashy commented on GitHub (Oct 27, 2017): @brendt Yeah something like that is fine apart from you need to remove http(s):// since `dig` doesn't accept it.
Author
Owner

@bbashy commented on GitHub (Oct 27, 2017):

Not like #9

<!-- gh-comment-id:339960276 --> @bbashy commented on GitHub (Oct 27, 2017): Not like #9
Author
Owner

@brendt commented on GitHub (Oct 27, 2017):

@bbashy parse_url also automatically filters out http:// and https://. You just have to make sure the protocol is there to be able to parse it: https://3v4l.org/8nqnA

<!-- gh-comment-id:339960623 --> @brendt commented on GitHub (Oct 27, 2017): @bbashy `parse_url` also automatically filters out `http://` and `https://`. You just have to make sure the protocol is there to be able to parse it: https://3v4l.org/8nqnA
Author
Owner

@brendt commented on GitHub (Oct 27, 2017):

Also @willemvb parse_url addresses that issue: https://3v4l.org/CI9pp

<!-- gh-comment-id:339961490 --> @brendt commented on GitHub (Oct 27, 2017): Also @willemvb `parse_url` addresses that issue: https://3v4l.org/CI9pp
Author
Owner

@bbashy commented on GitHub (Oct 27, 2017):

Ah yeah, of course!

<!-- gh-comment-id:339961734 --> @bbashy commented on GitHub (Oct 27, 2017): Ah yeah, of course!
Author
Owner

@brendt commented on GitHub (Oct 27, 2017):

@willemvb I was wrong about parse_url addressing the be issue.

To be complete:

  • .be is not a valid domain, but for better user experience, it should work and be filtered.
  • be. is a valid domain, but doesn't work with parse_url.
  • . also doesn't work, but is a valid domain (the root domain). This exception should also work.

So these three exceptions should be handled before parse_url; but maybe than parse_url might not be needed at all.

<!-- gh-comment-id:339963997 --> @brendt commented on GitHub (Oct 27, 2017): @willemvb I was wrong about `parse_url` addressing the `be` issue. To be complete: - `.be` is not a valid domain, but for better user experience, it should work and be filtered. - `be.` is a valid domain, but doesn't work with `parse_url`. - `.` also doesn't work, but is a valid domain (the root domain). This exception should also work. So these three exceptions should be handled before `parse_url`; but maybe than `parse_url` might not be needed at all.
Author
Owner

@freekmurze commented on GitHub (Oct 27, 2017):

Fixed in #9

<!-- gh-comment-id:339969977 --> @freekmurze commented on GitHub (Oct 27, 2017): Fixed in #9
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/dnsrecords.io#3
No description provided.