Posted: . At: 10:40 AM. This was 3 years ago. Post ID: 14940
Page permalink. WordPress uses cookies, or tiny pieces of information stored on your computer, to verify who you are. There are cookies for logged in users and for commenters.
These cookies expire two weeks after they are set.


Searching for words in random data. This is very interesting.


I have found a nice method of searching randomly generated data for certain words. This takes a while but can bear fruit in the end.

Here is an example, this is searching through the /dev/urandom file and then finding an instance of the word “linux”.

jason@jason-Lenovo-H50-55:~$ time tr -cd [:lower:] < /dev/urandom | fold -w 6 | nl | grep -m1 "linux"
14875225        llinux
 
real    0m5.554s
user    0m6.689s
sys     0m4.276s

This is like having a million monkeys on typewriters. But the longer the word you are looking for, the longer it would take to find it. A word like “moderation” would take far longer to find. I am not sure of the use this could be put to, but it is interesting nonetheless.

Finding 4 letter words is very fast, but longer words take an exponentially longer time to find.

jason@jason-Lenovo-H50-55:~$ time tr -cd [:lower:] < /dev/urandom | fold -w 15 | nl | grep -m1 "unix"
 77595  kpunixxylldvaxk
 
real    0m0.084s
user    0m0.047s
sys     0m0.089s
jason@jason-Lenovo-H50-55:~$ time tr -cd [:lower:] < /dev/urandom | fold -w 15 | nl | grep -m1 "unix"
 10175  xounixvzdapwafx
 
real    0m0.022s
user    0m0.030s
sys     0m0.008s
jason@jason-Lenovo-H50-55:~$ time tr -cd [:lower:] < /dev/urandom | fold -w 6 | nl | grep -m1 "unix"
  4783  ununix
 
real    0m0.011s
user    0m0.012s
sys     0m0.011s
jason@jason-Lenovo-H50-55:~$ time tr -cd [:lower:] < /dev/urandom | fold -w 6 | nl | grep -m1 "unix"
 27357  zdunix
 
real    0m0.023s
user    0m0.023s
sys     0m0.022s

This example took quite a while, but it did work in the end.

jason@jason-Lenovo-H50-55:~$ WORD="multix"; time tr -cd [:lower:] < /dev/urandom | fold -w 6 | nl | grep -m1 "$WORD" | sed 's/\b$WORD\b//g'
183394705       multix
 
real    1m8.419s
user    1m22.998s
sys     0m52.960s

This is another version, this is improved over the others.

jason@jason-Lenovo-H50-55:~$ WORD="apple"; time tr -cd [:lower:] < /dev/urandom | fold -w ${#WORD} | nl | grep -m1 "$WORD" | sed 's/\b$WORD\b//g'
36504080        apple
 
real    0m11.320s
user    0m14.958s
sys     0m8.519s

This section of the one-liner: fold -w ${#WORD} gets the length of the $WORD variable in chars and then this sets the final length of the output. That is a very useful trick when using bash scripting.


Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.