Web scraping is very easy when you are using Python. This enables an easy way to scrape web content. For example, see the output below.
┏jcartwright@localhost┋ ~━━┓ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛[08:51] ▓▒░┋ curl -s https://www.securitronlinux.com | htmlq h2.entry-title --text How to list all Youtube thumbnail images on a Youtube channel. How to use a userScript to remove query parameters from a website URL. A very useful and colorful bash prompt for a Linux system. Read the news headlines from the Daily Telegraph using Powershell. Enable the use of NTFS filesystems on Alma Linux very easily. Another good way to get CPU information on Linux and see all cores. Abyss of the Titanic. Movie pitch. Get a nice weather report with Powershell on Windows. A very useful .vimrc file to make it much more usable. Get processor information with Powershell easily. How to get a listing of all news items from ABC News easily. Good quality gaming accessories for the dedicated gamer in your family. A very powerful CPU for gaming and development. Intel 13th Gen Core i9-13900K. Nice program for Linux to generate a random password. Get information about your computer with Powershell. How the Linux directories such as /usr and /bin came to be. A very nice free VPN option for using overseas websites easily. AI in the workplace could replace HR. Upcoming Stalker 2 game to have all previous monsters and classic weapons. Stalker 2 dev build screenshots. These are amazing. |
This looks for all H2 tags with the CSS class entry-title and then gets the text from the HTML tags and displays this text for each entry.
And another example.
┏jcartwright@localhost┋ ~━━┓ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛[08:51] ▓▒░┋ curl --silent https://www.dailyadvertiser.com.au/ | htmlq a.break-words --text | uniq Wagga man calls for stronger sister city ties after years abroad How life has changed for a Local Hero Skates on and beanies out as crowds flock to Wagga's winter fest Fire crews battle house blaze in Riverina Highlands Don't get caught with your pants down: Australia's best dunnies revealed Three Lord's members suspended after Ashes abuse Kyrgios out of Wimbledon with torn ligament in wrist Uncle Tobys a real family affair for generations Drovers strife: Poor fencing, water pain make for hard yards driving stock Thick fog enshrouds city as winter conditions really set in Uncle Tobys a real family affair for generations Thick fog enshrouds city as winter conditions really set in Drovers strife: Poor fencing, water pain make for hard yards driving stock Club which experienced road tragedy promotes safety message Letters: It's time we all took pleasure in the simple things Impact Wrestling thrills crowd at night one of Tour Down Under Popular first time light display a result of collaboration Kazarian enjoying fifth trip to Australia Former mayor defends legitimacy of Wagga funding amid ICAC findings Popular first time light display a result of collaboration Former mayor defends legitimacy of Wagga funding amid ICAC findings Kazarian enjoying fifth trip to Australia Community to have a say after key highway bridge works delayed Truck, van collide on the Olympic Highway south of Wagga Men's club closes with a bang, giving remaining funds to charity Liberal MP says Maguire's 'damning actions' have reverberated widely Letters: New surface of Lake Albert Road is 'like tissue paper' How these women hope to uncover Sussan Ley's next challenger Liberal MP says Maguire's 'damning actions' have reverberated widely How these women hope to uncover Sussan Ley's next challenger Letters: New surface of Lake Albert Road is 'like tissue paper' |
This looks for all A tags with the CSS class break-words. This is an easy way to get text from a website.
If cargo is installed on your Linux PC, then use cargo to install htmlq.
cargo install htmlq |
This is a most useful Python library to get data from a website with a bit of experimentation.
Using yt-dlp with Python is also very useful. This may be used to get information from a Youtube URL without downloading the video.
from yt_dlp import YoutubeDL with YoutubeDL() as ydl: info_dict = ydl.extract_info('https://www.youtube.com/watch?v=SwcUIH7-Nb4', download=False) video_url = info_dict.get("url", None) video_id = info_dict.get("id", None) video_title = info_dict.get('title', None) video_description = info_dict.get('description', None) print("Title: " + video_title) # Video Title. print("Description: " + video_description) # Video Description. print("Url: https://www.youtube.com/watch?v=" + video_id + ".") # Video URL. |
This script will get the video Title, Description, and video file URL.
This is the output this script will give you.
┏jcartwright@localhost┋ ~/Documents━━┓ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛[08:51] ▓▒░┋ python3 vid.py youtube] Extracting URL: https://www.youtube.com/watch?v=hknp58jxki0 [youtube] hknp58jxki0: Downloading webpage [youtube] hknp58jxki0: Downloading android player API JSON Title: Star Trek Next Generation - Rogue Comet Description: Star Trek Next Generation "Masks" Url: https://www.youtube.com/watch?v=hknp58jxki0 |
Possibly a very useful script. This gets the URL of the Youtube video as well.
Yet another useful example. Getting a list of all Youtube video titles from a Youtube channel URL.
┏jcartwright@localhost┋ ~━━┓ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛[08:46] ▓▒░┋ curl -s -L https://www.youtube.com/feeds/videos.xml?channel_id=UCCjyq_K1Xwfg8Lndy7lKMpA | htmlq title --text TechCrunch Thing Translator by Dan Motzenbecker | TryTech | TechCrunch Robosen’s Hasbro-licensed Optimus Prime robot | TryTech | TechCrunch How to get companies to spend when spending is down Vegas Loop by The Boring Company | TryTech | TechCrunch Arcimoto Fun Utility Vehicle | TryTech | TechCrunch Autonomous delivery drone from Wing | TechCrunch TC City Spotlight: Atlanta Apple Messages Stickers | WWDC23 | TechCrunch Apple's Check In iPhone Feature | WWDC23 | TechCrunch Atlanta investors are bullish on where the city's startup scene is headed -- TechCrunch Live Atlanta Why the economics of equality is key to Atlanta's growth Atlanta Mayor Andre Dickens explains why tech companies are moving to the city on TechCrunch Live Journal app from Apple | WWDC 2023 | TechCrunch visionOS | Apple Vision Pro | WWDC23 | TechCrunch Eyesight feature on Apple Vision Pro | WWDC 2023 | TechCrunch |