This is a nice web scraper that will read a 4chan board and return a listing of all threads on that board page. This could be very useful code to expand into a useful script.
#!/usr/bin/env bash set -e links=( $( wget "$@" -qo /dev/null -SO - | grep -oE '</span><a href=\"thread\/[0-9]+\"' | grep -oE 'thread\/[0-9]+\"' | sed 's/^/\"boards.4chan.org\/g\//' ) ) for i in "${links[@]}" do echo "$i" done |
This is the output of this script when run on Linux.
jason@hoshi:~/Docum$ ./scraper.sh https://boards.4channel.org/g "boards.4chan.org/g/thread/51971506" "boards.4chan.org/g/thread/70373121" "boards.4chan.org/g/thread/70375043" "boards.4chan.org/g/thread/70373042" "boards.4chan.org/g/thread/70369732" "boards.4chan.org/g/thread/70370890" "boards.4chan.org/g/thread/70368516" "boards.4chan.org/g/thread/70376650" "boards.4chan.org/g/thread/70378082" "boards.4chan.org/g/thread/70377467" "boards.4chan.org/g/thread/70376099" "boards.4chan.org/g/thread/70356384" "boards.4chan.org/g/thread/70347662" "boards.4chan.org/g/thread/70373097" "boards.4chan.org/g/thread/70376867" |
This is a good example of a useful web scraper. This can be used to get a listing of news reports from a website.
But to get your daily news fix fast, just use this in a script. The output is cropped for brevity, but this will return quite a long listing of news stories.
jason@hoshi:~/Docum$ curl -s http://feeds.bbci.co.uk/news/rss.xml | grep "<title>" | sed "s/ <title><\!\[CDATA\[//g;s/\]\]><\/title>//;" | grep -v "BBC News" Brexit: PM cannot 'ignore' soft Brexit MPs, says minister Edmonton stabbings: Four people hurt in 'random attacks' Ukraine election: Comedian leads presidential contest - exit poll Eurostar protest: Man charged with obstructing railway IS defeat: British fighters emerge after fall of Baghuz Alex Jones hosted The One Show after miscarriage Brexit fine: Ex-Vote Leave chairwoman does not apologise over spend Nazanin Zaghari-Ratcliffe: Mother's Day card delivered to embassy Boys charged over Birmingham Grindr date robberies Knife crime: More stop and search powers for police Model with alopecia wants people to embrace differences Labour plans national bank using Post Office network Saudi Arabia 'hacked Amazon boss's phone', says investigator |
This shows how easy it is to get information off the web with the command line.
Run the one-liner like this to get only the top 10 stories.
jason@hoshi:~/Docum$ curl -s http://feeds.bbci.co.uk/news/rss.xml | grep "<title>" | sed "s/ <title><\!\[CDATA\[//g;s/\]\]><\/title>//;" | grep -v "BBC News" | head -n 10 Brexit: PM cannot 'ignore' soft Brexit MPs, says minister Edmonton stabbings: Four people hurt in 'random attacks' School LGBT teaching row: What is in the No Outsiders books? Ukraine election: Comedian leads presidential contest - exit poll Eurostar protest: Man charged with obstructing railway IS defeat: British fighters emerge after fall of Baghuz Alex Jones hosted The One Show after miscarriage Brexit fine: Ex-Vote Leave chairwoman does not apologise over spend Nazanin Zaghari-Ratcliffe: Mother's Day card delivered to embassy Boys charged over Birmingham Grindr date robberies |
This would be very useful to have in your .bashrc to see the latest news when your terminal is opened.