Posted: . At: 10:14 AM. This was 10 months ago. Post ID: 18213
Page permalink. WordPress uses cookies, or tiny pieces of information stored on your computer, to verify who you are. There are cookies for logged in users and for commenters.
These cookies expire two weeks after they are set.


Another example of web scraping using the htmlq library for Python.


Web scraping is very easy when you are using Python. This enables an easy way to scrape web content. For example, see the output below.

┏jcartwright@localhost┋ ~━━┓
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛[08:51] ▓▒░┋ curl -s https://www.securitronlinux.com | htmlq h2.entry-title --text
How to list all Youtube thumbnail images on a Youtube channel.
How to use a userScript to remove query parameters from a website URL.
A very useful and colorful bash prompt for a Linux system.
Read the news headlines from the Daily Telegraph using Powershell.
Enable the use of NTFS filesystems on Alma Linux very easily.
Another good way to get CPU information on Linux and see all cores.
Abyss of the Titanic. Movie pitch.
Get a nice weather report with Powershell on Windows.
A very useful .vimrc file to make it much more usable.
Get processor information with Powershell easily.
How to get a listing of all news items from ABC News easily.
Good quality gaming accessories for the dedicated gamer in your family.
A very powerful CPU for gaming and development. Intel 13th Gen Core i9-13900K.
Nice program for Linux to generate a random password.
Get information about your computer with Powershell.
How the Linux directories such as /usr and /bin came to be.
A very nice free VPN option for using overseas websites easily.
AI in the workplace could replace HR.
Upcoming Stalker 2 game to have all previous monsters and classic weapons.
Stalker 2 dev build screenshots. These are amazing.

This looks for all H2 tags with the CSS class entry-title and then gets the text from the HTML tags and displays this text for each entry.

And another example.

┏jcartwright@localhost┋ ~━━┓
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛[08:51] ▓▒░┋ curl --silent https://www.dailyadvertiser.com.au/ | htmlq a.break-words --text | uniq
Wagga man calls for stronger sister city ties after years abroad
How life has changed for a Local Hero
Skates on and beanies out as crowds flock to Wagga's winter fest
Fire crews battle house blaze in Riverina Highlands
Don't get caught with your pants down: Australia's best dunnies revealed
Three Lord's members suspended after Ashes abuse
Kyrgios out of Wimbledon with torn ligament in wrist
Uncle Tobys a real family affair for generations
Drovers strife: Poor fencing, water pain make for hard yards driving stock
Thick fog enshrouds city as winter conditions really set in
Uncle Tobys a real family affair for generations
Thick fog enshrouds city as winter conditions really set in
Drovers strife: Poor fencing, water pain make for hard yards driving stock
Club which experienced road tragedy promotes safety message
Letters: It's time we all took pleasure in the simple things
Impact Wrestling thrills crowd at night one of Tour Down Under
Popular first time light display a result of collaboration
Kazarian enjoying fifth trip to Australia
Former mayor defends legitimacy of Wagga funding amid ICAC findings
Popular first time light display a result of collaboration
Former mayor defends legitimacy of Wagga funding amid ICAC findings
Kazarian enjoying fifth trip to Australia
Community to have a say after key highway bridge works delayed
Truck, van collide on the Olympic Highway south of Wagga
Men's club closes with a bang, giving remaining funds to charity
Liberal MP says Maguire's 'damning actions' have reverberated widely
Letters: New surface of Lake Albert Road is 'like tissue paper'
How these women hope to uncover Sussan Ley's next challenger
Liberal MP says Maguire's 'damning actions' have reverberated widely
How these women hope to uncover Sussan Ley's next challenger
Letters: New surface of Lake Albert Road is 'like tissue paper'

This looks for all A tags with the CSS class break-words. This is an easy way to get text from a website.

If cargo is installed on your Linux PC, then use cargo to install htmlq.

cargo install htmlq

This is a most useful Python library to get data from a website with a bit of experimentation.

Using yt-dlp with Python is also very useful. This may be used to get information from a Youtube URL without downloading the video.

from yt_dlp import YoutubeDL
 
with YoutubeDL() as ydl: 
  info_dict = ydl.extract_info('https://www.youtube.com/watch?v=SwcUIH7-Nb4', download=False)
  video_url = info_dict.get("url", None)
  video_id = info_dict.get("id", None)
  video_title = info_dict.get('title', None)
  video_description = info_dict.get('description', None)
 
  print("Title: " + video_title) # Video Title.
  print("Description: " + video_description) # Video Description.
  print("Url: https://www.youtube.com/watch?v=" + video_id + ".") # Video URL.

This script will get the video Title, Description, and video file URL.

This is the output this script will give you.

┏jcartwright@localhost┋ ~/Documents━━┓
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛[08:51] ▓▒░┋ python3 vid.py 
youtube] Extracting URL: https://www.youtube.com/watch?v=hknp58jxki0
[youtube] hknp58jxki0: Downloading webpage
[youtube] hknp58jxki0: Downloading android player API JSON
Title: Star Trek Next Generation - Rogue Comet
Description: Star Trek Next Generation
"Masks"
Url: https://www.youtube.com/watch?v=hknp58jxki0

Possibly a very useful script. This gets the URL of the Youtube video as well.

Yet another useful example. Getting a list of all Youtube video titles from a Youtube channel URL.

┏jcartwright@localhost┋ ~━━┓
┗━━━━━━━━━━━━━━━━━━━━━━━━━━┛[08:46] ▓▒░┋ curl -s -L https://www.youtube.com/feeds/videos.xml?channel_id=UCCjyq_K1Xwfg8Lndy7lKMpA | htmlq title --text
TechCrunch
Thing Translator by Dan Motzenbecker | TryTech | TechCrunch
Robosen’s Hasbro-licensed Optimus Prime robot | TryTech | TechCrunch
How to get companies to spend when spending is down
Vegas Loop by The Boring Company | TryTech | TechCrunch
Arcimoto Fun Utility Vehicle | TryTech | TechCrunch
Autonomous delivery drone from Wing | TechCrunch
TC City Spotlight: Atlanta
Apple Messages Stickers | WWDC23 | TechCrunch
Apple's Check In iPhone Feature | WWDC23 | TechCrunch
Atlanta investors are bullish on where the city's startup scene is headed -- TechCrunch Live Atlanta
Why the economics of equality is key to Atlanta's growth
Atlanta Mayor Andre Dickens explains why tech companies are moving to the city on TechCrunch Live
Journal app from Apple | WWDC 2023 | TechCrunch
visionOS | Apple Vision Pro | WWDC23 | TechCrunch
Eyesight feature on Apple Vision Pro | WWDC 2023 | TechCrunch

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.