Yirabot

Latest version: v1.0.9.2

Safety actively analyzes 626983 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

1.0.9

We are thrilled to announce the release of YiraBot 1.0.9, the most significant update to date. This version marks a monumental leap in the evolution of YiraBot, featuring a complete code rewrite and introducing a plethora of enhancements and new features. Upgrade to the latest version to experience the unparalleled speed and efficiency of YiraBot.

Upgrade YiraBot using pip:

bash
pip install --upgrade yirabot

Overview
YiraBot 1.0.9 sets a new standard in web crawling and SEO analysis with an astounding 86% increase in performance speed. This update is not just about speed; it introduces a new Python module, enabling users to integrate YiraBot's powerful features directly into their Python scripts. With this release, we've streamlined the installation process by reducing the required packages from 10 to just 4, making it lighter and more efficient.

New Python Module
In response to community feedback, YiraBot 1.0.9 introduces a Python module, allowing for more versatile use of YiraBot in programming projects. Here’s how you can get started:

python
from yirabot import YiraBot

bot = YiraBot()

For detailed documentation on using the new Python module, please refer to the README file.

CLI Enhancements
New `-mobile` Flag
A new `-mobile` flag has been added, enabling YiraBot to use a mobile user agent while crawling. This feature is crucial for testing mobile responsiveness and SEO performance from a mobile perspective.

Multiple Commands
YiraBot now supports the execution of multiple commands in a single run, further enhancing its flexibility and usability for comprehensive SEO analysis and web crawling.

Overall Improvements
- **Performance**: Complete code rewrite leading to an 86% increase in processing speed.
- **Reduced Dependencies**: The number of required packages has been dramatically reduced from 10 to 4, streamlining the installation process.
- **License Change**: The software license has been changed from MIT to GPL-3.0. This change means that any derivative work must also be open-sourced under the GPL-3.0 license, ensuring that improvements and modifications to YiraBot are shared with the community. Developers integrating YiraBot into their projects should consider the implications of this license change, as it requires any modifications or derivative works to be distributed under the same license, promoting a more open and collaborative development environment.

This update is a testament to our commitment to providing a state-of-the-art tool that meets the evolving needs of developers, SEO specialists, and content creators. We are excited to see how you leverage YiraBot 1.0.9 in your projects and workflows.

For any questions or feedback, please refer to the documentation or reach out!

Happy crawling!

1.0.8

Upgrade YiraBot using pip:

bash
pip install --upgrade yirabot

Overview
YiraBot 1.0.8 introduces a significant focus on SEO analysis. The enhanced SEO command incorporates comprehensive features to streamline and optimize your website's SEO performance.

Enhanced SEO Analysis Command
The updated `seo` command (formerly `check`) now includes extensive features for a thorough SEO assessment:

- **Title and Meta Description Length Analysis**: Analyzes and compares the length of titles and meta descriptions against SEO standards, offering tailored feedback.
- **Keyword Extraction**: Identifies the most prominent keywords used on the page, aiding in content optimization strategies.
- **Heading Hierarchy Analysis**: Evaluates the heading structure and provides feedback on any header usage errors or inconsistencies.
- **Mobile Responsiveness Check**: Assesses the website's mobile compatibility, a crucial aspect of modern SEO.
- **Social Media Integration Discovery**: Identifies and displays the website's connections with various social media platforms.
- **Website Language Identification**: Detects the primary language of the website, essential for targeted SEO tactics.

![Screenshot 2024-02-02 at 12 39 22 PM](https://github.com/OwenOrcan/YiraBot-Crawler/assets/144565916/8a3e32d1-3aea-41e3-a6ef-3709b0fd5d81)

Command Renaming for Improved User Experience
For better clarity and user experience, we have renamed the following commands:

- `check` is now `seo`
- `crawl-content` has been updated to `scrape`

Example Usage
For SEO analysis using the new seo command
bash
yirabot seo <url>

Developer Side Changes
To enhance the YiraBot's performance and code maintainability:

- **Code Organization**: The codebase has been restructured, segregating different functionalities into multiple files for improved organization and scalability.

1.0.7.3

Upgrade YiraBot using pip:
bash
pip install --upgrade yirabot

Overview
YiraBot 1.0.7.3 brings a pivotal update to enhance web crawling capabilities, especially for accessing protected pages. This release focuses on enabling users to effectively extract data from websites requiring login credentials, broadening the scope of YiraBot as a comprehensive Python library for web crawling.

New Features

Introducing the Session Command
- **Crawl Protected Pages**: The new 'session' command empowers YiraBot to navigate and crawl pages that necessitate user authentication.
- **User-Friendly Interaction**: Designed to simplify the process of accessing and crawling content behind login screens.

How to Use the Session Command
1. **Gathering Form Input Details**: Retrieve the names of the login form input fields (usually 'username' and 'password') by inspecting the HTML of the login page.
2. **Understanding and Obtaining the Success Redirect URL**:
- **What Is It**: The success redirect URL is the page you are directed to after successfully logging in. It's where you land after entering your credentials on the website.
- **How to Get It**: To obtain this URL, manually log into the website and note the URL of the page you land on after the login process. This is the success redirect URL needed by YiraBot.
- **Why It's Needed**: YiraBot uses this URL to verify successful login by comparing the post-login landing page with the provided success redirect URL.
3. **Limitations with Advanced Authentication Methods**: The session command may not work with websites using two-factor authentication, CAPTCHAs, or dynamic forms relying on JavaScript.

Session Command Usage
- Begin a session for crawling protected pages: `yirabot session`
- Follow the prompts to input the login URL, expected success redirect URL, and the input field names for username and password.

End-User Benefits
- **Broader Access to Web Content**: Users can now extract data from websites that require login, including subscription-based platforms and private applications.
- **Streamlined Data Collection**: Enhances the ability to collect and analyze data from a variety of online sources.

Example Usage
bash
yirabot session

- Input the requested URLs and credentials as prompted.
- Select your preferred type of crawl for the authenticated page.

With Version 1.0.7.3, YiraBot continues to advance, making web data extraction more accessible and accommodating the diverse needs of its user base.

1.0.7

Overview
In this latest update, YiraBot Version 1.0.7, the focus is on refining and enhancing the command-line interface (CLI) aspect of the tool. This version marks a significant step in YiraBot's journey, further solidifying its position as a robust and versatile Python library for web crawling and data extraction. With an emphasis on user experience, efficiency, and versatility, Version 1.0.7 introduces a suite of new features and improvements that cater to a wide range of web crawling needs. Whether it's for data analysis, SEO audits, or content aggregation, YiraBot now offers more powerful and user-friendly options for professionals and enthusiasts alike. This update not only streamlines existing functionalities but also introduces new commands and features that enhance the tool's adaptability to various web environments and user requirements.

New Features

Dynamic Delay for Server Load Management
- *Dynamic Delay*: Introduces a dynamic delay mechanism in crawling processes. This feature adjusts the crawling speed based on the server's response time, minimizing server overwhelm.

Enhanced Data Extraction and Storage
- *JSON Data Extraction*: Added functionality to extract crawl data into JSON format. This can be activated using the `-json` flag during the crawl command.

Advanced Crawling Commands
- *Check Command*: A new `check` command is implemented, enabling YiraBot to crawl through a website and identify any broken links or potential issues.
- *Get-HTML Command*: The `get-html` command is introduced to create an exact HTML copy of a website, which is then saved as an HTML file.

Performance Improvements
- *Increased Speed*: YiraBot's overall performance and crawling speed have been significantly improved, offering a faster and more efficient web crawling experience.

Example Usage
bash
yirabot check example.com

bash
yirabot get-html example.com

bash
yirabot crawl example.com -json

1.0.6

Introduction
YiraBot transitions from a command-line tool to a versatile Python library, enabling integration into various projects.

New Features

Python Library Integration
- YiraBot is now available as a Python module, allowing for seamless incorporation into scripts.

Enhanced Web Crawling Methods
(All of these methods are for python module usage)
- `get_html(url)`: Retrieves the HTML content of a webpage.
- `crawl(url)`: Performs comprehensive crawling of a webpage.
- `crawl_content(url)`: Extracts detailed content like paragraphs, headings, and lists.
- `is_allowed_by_robots_txt(url)`: Checks if crawling a URL is allowed by the site's robots.txt.
- `parse_sitemap(url)`: Parses the sitemap of a website for URL discovery.

Usage Examples

- Import and initialize YiraBot in your script:
python
from yirabot import Yirabot
bot = Yirabot()

content = bot.crawl("https://example.com")

Get All Data In Key Value Format.
for item in content:
print(item, content[item])

1.0.5

Release Highlights:

New Command Implementation
- **Get Content Command**: A new command has been introduced, enabling YiraBot to retrieve specific web content more efficiently.

Modernization and Design Improvements
- **Modernized Interface**: The interface of YiraBot has been upgraded to offer a more modern and user-friendly experience.
- **Enhanced Progress Bar**: The progress bar is now equipped with additional functionality, providing clearer and more detailed feedback during operations.
- **Design Overhaul**: A significant design upgrade has been implemented, enhancing both the visual appeal and usability.

Code and Functionality Enhancements
- **Code Organization**: The codebase has been reorganized, separating the code into different files for better clarity and maintenance.
- **File Writing Style**: File writing style has been improved for enhanced readability and structure.
- **Error Handling**: Error handling mechanisms have been strengthened for increased robustness and reliability of operations.

User Experience and Performance Fixes
- **HTTPS Simplification**: The requirement to type 'https' in URLs has been removed, streamlining the web crawling process.
- **Sitemap Parser Fix**: Corrections have been made to the sitemap parser, ensuring more accurate and efficient crawling of websites.
- **File Writing Check**: The redundant checks during file writing have been identified and rectified.

The latest update aims to significantly elevate the performance, usability, and dependability of YiraBot, ensuring a more seamless and efficient web crawling experience. Your continued support and feedback are much appreciated.

Page 1 of 2

Releases

Has known vulnerabilities

Yirabot

Page 1 of 2

1.0.9

1.0.8

1.0.7.3

1.0.7

1.0.6

1.0.5

Page 1 of 2

Links

Releases