- Application Engineering
- November 25, 2024
Gaurav Shukla
Introduction
Hello Readers!! Welcome to the blog post!! Let’s learn how to check URL validity using CURL? In today’s interconnected world, URLs (Uniform Resource Locators) form the backbone of web interactions. Whether it’s a web application retrieving data from an API or a system admin maintaining a list of external resources, ensuring that a URL is both well-formed and reachable is crucial.
Manually checking URLs can be time-consuming and error-prone, but with the power of shell scripting in Linux, we can automate this task. In this guide, I’ll walk you through creating a shell script that checks both the validity and reachability of a URL.
By the end of this post, you’ll have a deeper understanding of:
- How URL validation works.
- How to check a URL’s availability using
curl
. - How to create a fully functioning URL validation script in Linux.
Why Check URL Validity?
Before diving into the implementation, let’s understand why it’s important to check URLs:
- Data Integrity: Ensuring that the URLs you work with are well-formed and lead to accessible resources prevents broken links and data integrity issues.
- Automation: Many tools and services rely on URLs, whether it’s fetching remote files, communicating with APIs, or even monitoring services. Broken or incorrect URLs can lead to failed processes.
- System Monitoring: In production environments, system admins frequently need to monitor multiple URLs for uptime. A script that automates this process helps ensure services are running smoothly without manual checks.
Step-by-Step Guide:
Step 1: Validating the URL Format
A URL has a specific format that consists of several components: the protocol, domain name, optional port, and the path. A malformed URL can lead to errors even before a network request is made. Therefore, it’s essential to validate the format of the URL first.
A typical URL looks like this:
https://example.com:8080/path/to/resource
- Protocol:
https://
- Domain Name:
example.com
- Port:
8080
(optional, default ports are 80 for HTTP and 443 for HTTPS) - Path:
/path/to/resource
(optional)
We use regular expressions (regex) to check if a URL is formatted correctly. Regex allows us to define patterns and match strings against these patterns.
Regex Breakdown for URL Validation
Here’s the regex pattern used in the script to validate URLs:
^(https?://)?([a-zA-Z0-9.-]+)(:[0-9]{1,5})?(/.*)?$
Let’s break this down:
^
: Anchors the regex at the start of the string.(https?://)?
: This matches the protocol, makinghttp://
orhttps://
optional (the?
means optional).([a-zA-Z0-9.-]+)
: This part matches the domain name, allowing alphanumeric characters, dots, and hyphens.(:[0-9]{1,5})?
: This optional part matches the port number (1 to 5 digits).(/.*)?$
: Matches the optional path part, where.*
allows any character following a/
.
Example Regex Matches:
https://example.com
(valid)http://example.com:8080/path/to/file
(valid)ftp://example.com
(invalid, wrong protocol)example@com
(invalid domain)
Step 2: Checking URL Reachability Using curl
nce we’ve validated the URL format, the next step is to check if the URL is reachable. This is where the curl
command comes into play.
curl
is a powerful tool in Linux used for transferring data with URLs. For our purpose, we’ll use it to send a HEAD request to the URL. A HEAD request retrieves only the HTTP headers, which allows us to check the status of the URL without downloading the entire content.
Using curl
for Reachability
The script uses the following command to check the URL’s HTTP status:
response=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$1")
Here’s what this command does:
-o /dev/null
: Discards the output since we only need the HTTP headers.--silent
: Suppresses the progress bar and error messages to keep the output clean.--head
: Sends a HEAD request (only retrieves headers, not the body).--write-out '%{http_code}'
: Extracts the HTTP status code (e.g., 200 for success, 404 for not found)."$1"
: Refers to the URL passed as a script argument.
Example Status Codes:
200 OK
: The URL is valid and reachable.301 Moved Permanently
: The URL has been redirected.404 Not Found
: The resource does not exist at this URL.500 Internal Server Error
: The server encountered an error.
Step 3: The Complete Shell Script
Now that we’ve covered the basics, let’s put it all together in the final script:
#!/bin/bash# Function to validate the URL formatvalidate_url() { if [[ $1 =~ ^(https?://)?([a-zA-Z0-9.-]+)(:[0-9]{1,5})?(/.*)?$ ]]; then echo "Valid URL format." else echo "Invalid URL format." exit 1 fi}# Function to check if the URL is reachablecheck_url_reachability() { response=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$1") if [[ "$response" -eq 200 ]]; then echo "URL is reachable (Status Code: 200)." else echo "URL is not reachable. Status Code: $response" fi}# Main script executionif [[ -z $1 ]]; then echo "Usage: ./check_url_validity.sh <URL>" exit 1fiURL=$1# Validate the URL formatvalidate_url "$URL"# Check the URL reachabilitycheck_url_reachability "$URL"
Step 4: Running the Script
- Save the script: Save the above code in a file named
check_url_validity.sh
. - Make it executable: Run the following command to make the script executable:
- Run the script: Pass any URL to check its validity and reachability.
#Make it executablechmod +x check_url_validity.sh#Run the script./check_url_validity.sh https://www.example.com
The script will output either:
- Valid URL format and reachable, or
- Error messages depending on the validation and reachability checks.
Example:
Conclusion
By combining simple but powerful Linux tools like regex and curl
, we’ve created a robust shell script that can check both the validity and reachability of URLs. This kind of automation can be extremely useful for system administrators, developers, or anyone managing large numbers of URLs.
This script can be a foundation for further automation, allowing you to manage URLs efficiently, prevent errors in production environments, and even integrate with monitoring tools. Try it out and see how it simplifies your workflow! Happy Reading!!!
Suggested Article
Gaurav Shukla
Gaurav Shukla is a Software Consultant specializing in DevOps at NashTech, with over 2 years of hands-on experience in the field. Passionate about streamlining development pipelines and optimizing cloud infrastructure, He has worked extensively on Azure migration projects, Kubernetes orchestration, and CI/CD implementations. His proficiency in tools like Jenkins, Azure DevOps, and Terraform ensures that he delivers efficient, reliable software development workflows, contributing to seamless operational efficiency.
Leave a Comment
Suggested Article
Challenges New Users Face with Snowflake (And How to Overcome Them)
Byvikashkumar 28th November 2024 Application Engineering
How Snowflake Handles Semi-Structured Data Like JSON and XML
Byvikashkumar 28th November 2024 Application Engineering
A Guide to Partitioning and Clustering in Snowflake
Byvikashkumar 28th November 2024 Application Engineering