Table of Contents
- On-Page SEO Analyzer
- Overview
- Features
- Fetch and Parse HTML
- Meta Data Extraction
- Heading Analysis
- Readability Check
- Image Analysis
- Link Analysis
- Call-to-Action (CTA) Suggestions
- Canonical and Alternate URLs
- Schema Markup Extraction
- Content Data Extraction
- Open Graph Data
- Social Tags Extraction
- Page Speed Check
- Mobile Usability Check
- Alt Text Check
- Fetch SEO Data
- CSV Download
- Analyze On-Page SEO
- Usage
- Detailed Function Descriptions
- fetch_and_parse_html(url)
- extract_meta_data(soup)
- analyze_headings(soup)
- check_readability(text)
- analyze_images(soup, url)
- analyze_links(soup)
- suggest_ctas(soup)
- extract_alternates_and_canonicals(soup)
- extract_schema_markup(soup)
- extract_content_data(soup, url)
- extract_open_graph(soup)
- extract_social_tags(soup)
- check_page_speed(url)
- check_mobile_usability(soup)
- check_alt_text(soup)
- fetch_seo_data(url)
- download_csv(data, filename='seo_data.csv')
- analyze_onpage_seo()
- License
- Contributing
On-Page SEO Analyzer
Overview
The on_page_seo_analyzer.py module is a comprehensive tool designed to analyze the on-page SEO of a website. It leverages various libraries such as requests, streamlit, bs4, cloudscraper, and more to fetch, parse, and analyze the content of a webpage to provide detailed SEO insights.
Features
Fetch and Parse HTML
fetch_and_parse_html(url): Fetches HTML content from the given URL using CloudScraper and parses it with BeautifulSoup.
Meta Data Extraction
extract_meta_data(soup): Extracts metadata such as title, description, robots directives, viewport, charset, and language from the parsed HTML.
Heading Analysis
analyze_headings(soup): Analyzes the headings (H1 to H6) on the webpage.
Readability Check
check_readability(text): Checks the readability score of the text using thetextstatlibrary.
Image Analysis
analyze_images(soup, url): Analyzes the images on the webpage, including their src and alt text.
Link Analysis
analyze_links(soup): Identifies broken internal and external links on the webpage.
Call-to-Action (CTA) Suggestions
suggest_ctas(soup): Suggests call-to-action phrases present on the webpage.
Canonical and Alternate URLs
extract_alternates_and_canonicals(soup): Extracts canonical URL, hreflangs, and mobile alternate links from the parsed HTML.
Schema Markup Extraction
extract_schema_markup(soup): Extracts schema markup data from the parsed HTML.
Content Data Extraction
extract_content_data(soup, url): Extracts content data such as text length, headers, and insights about images and links.
Open Graph Data
extract_open_graph(soup): Extracts Open Graph data from the parsed HTML.
Social Tags Extraction
extract_social_tags(soup): Extracts Twitter Card and Facebook Open Graph data from the parsed HTML.
Page Speed Check
check_page_speed(url): Fetches and analyzes page speed metrics using the Google PageSpeed Insights API.
Mobile Usability Check
check_mobile_usability(soup): Checks if the website is mobile-friendly based on viewport and other elements.
Alt Text Check
check_alt_text(soup): Checks if all images have alt text.
Fetch SEO Data
fetch_seo_data(url): Fetches SEO-related data from the provided URL and returns a dictionary with results.
CSV Download
download_csv(data, filename='seo_data.csv'): Downloads the SEO data as a CSV file.
Analyze On-Page SEO
analyze_onpage_seo(): Main function to analyze on-page SEO using Streamlit.
Usage
Installation
To use this module, you need to have the following Python packages installed:
requestsstreamlitbeautifulsoup4cloudscraperpandasplotlytenacityvalidatorsreadabilitytextstatPillow
You can install these packages using pip:
pip install requests streamlit beautifulsoup4 cloudscraper pandas plotly tenacity validators readability textstat Pillow
Example
import streamlit as st
from on_page_seo_analyzer import analyze_onpage_seo
if __name__ == "__main__":
analyze_onpage_seo()
Detailed Function Descriptions
fetch_and_parse_html(url)
Fetches HTML content from the given URL using CloudScraper and parses it with BeautifulSoup.
extract_meta_data(soup)
Extracts meta data like title, description, and robots directives from the parsed HTML.
analyze_headings(soup)
Analyzes the headings on the webpage.
check_readability(text)
Checks the readability score of the text.
analyze_images(soup, url)
Analyzes the images on the webpage, including their src and alt text.
analyze_links(soup)
Identifies broken internal and external links on the webpage.
suggest_ctas(soup)
Suggests call-to-action phrases present on the webpage.
extract_alternates_and_canonicals(soup)
Extracts canonical URL, hreflangs, and mobile alternate links from the parsed HTML.
extract_schema_markup(soup)
Extracts schema markup data from the parsed HTML.
extract_content_data(soup, url)
Extracts content data such as text length, headers, and insights about images and links.
extract_open_graph(soup)
Extracts Open Graph data from the parsed HTML.
extract_social_tags(soup)
Extracts Twitter Card and Facebook Open Graph data from the parsed HTML.
check_page_speed(url)
Fetches and analyzes page speed metrics using Google PageSpeed Insights API.
check_mobile_usability(soup)
Checks if the website is mobile-friendly based on viewport and other elements.
check_alt_text(soup)
Checks if all images have alt text.
fetch_seo_data(url)
Fetches SEO-related data from the provided URL and returns a dictionary with results.
download_csv(data, filename='seo_data.csv')
Downloads the data as a CSV file.
analyze_onpage_seo()
Main function to analyze on-page SEO using Streamlit.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Contributing
Contributions are welcome! Please open an issue or submit a pull request to contribute to this project.
