INCLUDE_DATA
Wed Sep 8, 10:32:52 UTC 2010



Archive for the ‘programming’ Category

A quick perl script for used car research on craigslist

Wednesday, April 22nd, 2009

So my 2001 VW jetta is getting a bit up there in the miles – it’s about 95k at the moment, and while this isn’t too much for a VW, I’ve been wanting to get a new car and I figure I should get rid of it while I can still feel good about selling it to someone else. Besides, I want to get a convertible – I live in southern California, and if it’s not the appropriate climate for one, I don’t know where is.

Initially I went to carmax and they offered me $2000. Yeeps! I was shocked. Could it really be worth that little? Kelly Blue book said it was at least worth $4k. So I thought I would test the open market and write a quick perl script to give me the average price for an item on craigslist. Here’s how it works and the code is below. It should be really easy to modify for anyone who could use something like this:


./cl_get_prices.pl 2001+jetta

http://losangeles.craigslist.org/search/cta?query=2001+jetta

lowest:         1200
highest:        9999
Average:        6134.48214285714

another example where I search for porsche boxster:


nick:~$ ./cl_get_prices.pl porsche+boxster

http://losangeles.craigslist.org/search/cta?query=porsche+boxster

lowest:         7600
highest:        33100
Average:        16865.6341463415

Anyway the code is below, and I’ll put a link to the actual perl script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/usr/bin/perl
$wget="http://losangeles.craigslist.org/search/cta?query=";
$wget .= $ARGV[0];
print $wget . "\n";
$html = `wget -q -O - $wget`;
        @words = split(' ', $html);
        foreach $word (@words)
                {
                 if ( $word =~ m/^\$/)
                        {
                                $word =~ s/(\$|,)//g ;
                                if ( $word =~ m/^\d+$/ )
                                {
                                        if ( $lowest eq '') {
                                                $lowest = $word ;
                                        } elsif ( $word < $lowest ) {
                                                $lowest = $word ;
                                        }
                                        if ( $highest eq '') {
                                                $highest = $word ;
                                        } elsif ( $word > $highest ) {
                                                $highest = $word ;
                                        }
                                        $amt += $word ;
                                        $count++;
                                }
                        }
                }
$average = ( $amt / $count ) ;
print "lowest:\t\t$lowest\nhighest:\t$highest\n";
print "Average:\t" . $average ."\n";

Here’s a link to the actual script you can download;. If you find it useful, let me know.

Playing with Bayes

Saturday, April 4th, 2009

For the last day or so I’ve been playing with moving over my simple word-count-analysis of blogs to actually creating a database with manually ranked training data and extrapolating from that. There were some hiccups and I’ve still got to go back and replace a lot of code, but it’s effectively categorizing new blog entries based on previous rankings. YAY! I’ve been using the perl Algorithm::NativeBayes cpan module, and it’s pretty great – although the the documentation is really poor. The main thing to get is that it returns a hash reference, which means you end up referring to your result as something like:

<pre>print “Sport’s ranking: \t ${$result}{sports} \n” ; </pre>

Which, lets face it, is kinda ugly, but it’s really the only good way to do it, really. It should really be better documented, though. Aside from that, as long as you get the back-end math, you’re pretty OK. Just because you’re doing AI stuff doesn’t mean that you’re automatically familiar with how perl handles references to hashes, though. One of my to-do items is to go back and update the perldoc on it. Anyway, with that in effect I’ve gone and updated the database and I’m now able to get positivity over time. This means I’m actually getting closer to building an internet happiness index, and prediciting how “happy” the internet is as a whole. The next steps are:

  • incorporate new bayes functions into existing codebase
  • add more sources to the rss feedlist scripts
  • optimize the blogparser
  • put a nice (fusioncharts?) front end together
  • add more hardware for doing the catagorization
  • get more people to do more training data
  • ???
  • profit!

Actually, the “???” is pretty well defined, but honestly, this project will have been fun even it it doesn’t make a dime. Anyway, one step closer.