Thursday, September 08, 2005 

Optomizer.pl

# Author: Randy Flood
#
# This program is based on an idea posted by someone else to the penetration testng mailing list
# It seemed like a good idea, so I wrote my own script to do it which is probably like 1/1000th as good as thiers...
# But, anyway, I just thought I would play around with it...
#
# Usage: optomizer.pl dictionary.txt
#
# This program takes a list of words and tells you how frequently each character was used in each position in the words overall.
# This could be useful information if you were going to try to brute force passowords.
# For example, does it really make sense to start with aaaaaaaa then go to aaaaaaab, etc?
# Wouldn't it be cooler if their was some way to know which passwords were more likely based on some statsistic or something, and start #guessing there?
#
# Well, a good place to start might be to take a bunch of dictionaries and run them through a program like this. Then you could come up #with how common each symbol is in each position of the password.
# Then, you can see the freqencies of chracters in each position.
# How to best use this information to write a brute force cracker is an area for further research. You could easily make Perl scripts #that generated password that you can then use with programs like Hydra.


$n=0;
$chars{a}=0;

open (WORDS,$ARGV[0]);

# You would think that you could just do a while (<>)
# and let people call the program with a filename as the first argument, wouldn't you?
# In my experimentation on Windows, I found that my version of Perl seemed to be loading the *entire file into memory* when I did that.
# OMFG, that is retarded. There are a couple of other silly issues that I have noticed with trying to write Perl programs in Windows.
# Anyway, I will spare you the details of why I can't get back to my Linux box at the moment.

$position=0;

while (<words>)
{

chomp;
s/[^A-Za-z0-9@!\#"';:{}\[\]+=\-_|`~\*\/<>\?]//g; # This says to substitute nothing for anything that is not
# one of the characters in there.
# So, if you only wanted numbers and letters a-zA-Z0-9
# for example, you can edit that line to look like this:
#s/[^A-Za-z0-9]//g;

if ($position>$n) # $n is the maximum word length of any word that we see
{
$n= $position; # $position is our current position in the word
}
$position=0; # we reset it because we are at the start of a new word


foreach $c (split(//, $_)) # Divide each word into characters and process one at a time
{


$chars{$c}++; # The characters hash just has a total count of each character
# It's probably redundant, but I like it...

$freq{$position}->{$c}++; # This says that we have seen the character $c at position $position
# So, we mark that down by increasing its count in the %freq hash.
$position++; # $position will be 0 on the first character
# will increment until it is equal to the number of
# characters in the word because on the last character
# we process it still gets incremented.
# This is of minor note because it gets assigned to $n...
# So, $n contains the actual number of characters in the
# longest word.
}

}
print "\n---------results----------\n\n";

# $n is the position in the word
# $c is the character
# $freq{$n}->{$c} is the frequency of the character $c at position $n
# $chars{$c} is a hash containing the number of times each character was seen

print "n=$n\n";
for ($m=0; $m<$n; $m++)
{

# OK, we know that there are 0..N-1 characters
# The characters are all in the keys of the %chars hash
# The frequency that $c was used for character $n is given by:
# $freq{$n}->{$ch}


foreach $ch (sort keys %chars)
{
if ($freq{$m}->{$ch})
{
print "$m $ch $freq{$m}->{$ch} \n";
}
}

}