Would you participate in this kind of contest?

I’m thinking, along w/ some people I’m working on a start-up with, of running a contest to help up “train” the back end “artificial intelligence engine” which is used in our software.

Here’s the gist: you would log-in to a website, and be presented w/ an “article” – this would be a blog, website, etc. you would then rate it as positive, negative, & so forth. Anyone who rated 1000 articles in a month (each takes about 1-2 seconds) would be eligible to win a prize, which would either be an xbox 360, or a playstation 3.

So: Three questions:

  • Would you do something like this?
  • Would you be more inclined to do it for an xbox or a ps3
  • If you would not be inclined, what could we change to make you more inclined to do it?

    Thanks so much!

  • Posted in entrenza, startup, tech | Tagged , , , , , , | 2 Comments

    Making me reach a bit

    I wrote a little while back about some of the things my startup has gotten me to work on, but it occurred to me today, after I being grumpy and frustrated for a good chunk of the day – the firewall died (hardware), the parser’s got a memory leak and is crashing the parse server – about how working on this project has made me a better programmer, technologist, and possibly even a better person.

    Tomorrow, after going out to brunch with some friends, and then cleaning the apartment, I’m going to come back and start doing some code profiling to look for memory leaks. I’ve done standard debugging stuff before, and learned the basics of code optimization and such in school back in the day, but the fact of the matter is, until I started working on the startup project, I mainly wrote smaller programs – utility scripts, small apps, stuff like that. I didn’t really ever get into situations where I had to think about memory leaks, or applications which would basically run constantly. It’s other stuff too – if you write a script to automate the creation of this or that, or a .net app to generate foundry server iron configs, you don’t need to put it in an architectural perspective. Now I have to think about that kind of stuff all the time. “what happens when this breaks?” – “How can I write this so I can add another server and scale out horizontally?” – I actually think about this kind of stuff when I’m coding now. I’ll scratch whole bits of things that worked because they’ll cause grief down the line.

    I think it’s also made me more disciplined. There’s a big difference between doing your job because you know, eventually, if you slack, you’re going to get grief about it, and if you do it well you’ll get rewarded for it – all by someone else, and setting goals and following through on your own. My home page, over the years has been slashdot, popurls, rootprompt.org and a miriad of other websites… this would be on my work browser. Now, it’s (thank you firefox and chrome for having multiple tabs) a google doc spreadsheet of my goals and columns representing dates and actions. Each day I list what I did to achieve those goals. Another tab contains our ticketing system. Another tab contains our intranet site, in which there are a bunch of daily actions that I try to go through. I would never have approached a job like this if I were being paid by someone else. It took doing this on myself to realize the type of mindset and tools I would have to give myself to accomplish these things.

    Another thing that I think has made me a better person is our twice weekly conference call – we have a very loose structure for the company – there’s no office, and we use email, ticketing, IM to communicate and conf. calls to go over progress and complete goals. We basically cover what we’ve done, and what’s next. It also is a good opportunity for me to talk on the phone with friends, who all, at this point, live in different parts of the world. Yes, it’s about a shared goal, and a project, but it’s also about keeping in touch with friends, and doing so regularly. I have traditionally been terrible at keeping in touch with people, and I think that this help me in that regard.

    Tomorrow is going to be frustrating as hell. Don’t get me wrong, I’ll have a nice lunch, and put the work out of my mind during that part, and I’ll enjoy the call – tomorrow’s sunday, one of the days we do it – but when I start getting into phase II of the code profiling stuff, and looking for circular references and objects that aren’t being collected, I’m going to get seriously frustrated and stressed out. I’m going to hate it. But I’ll make strides towards getting it fixed. And the idea that taking on a project of this scope, and how hard it is, is making me a better coder will give me some solace. The idea that I’ve had to change my thinking in regards to where it fits in the architecture, I think has made me a better technologist, and the discipline and keeping in touch with friends has made me a better person, I hope. Yes, I’ll definitely be incredibly frustrated when half the stuff I’m trying to do ends up breaking things temporarily, but I think, in the end, it’ll be worth it.

    Posted in Uncategorized | Tagged , , , , | Leave a comment

    Cool things I’ve gotten to play with at my startup

    For the last year or so I’ve been working on a tech startup with some friends. In doing so, I’ve gotten to work with some pretty cool stuff, and I thought I’d make a list of some of them. Basically, I wanted to extole the virtues of working on a startup as a great way to get real-life practice projects to work on – I have every expectation that we will have at least some success, but even if it ends up being a failure, here are some of the projects that I’ve gotten to work on:

    • wrote an spidering application
    • setup a mysql cluster
    • setup zimbra
    • researched several virtualization options
    • setup apt-proxies
    • wrote a custom smtp daemon / parser
    • learned a boatload about bayesian analysis and other pattern recognition and predictive tools
    • setup joomla
    • setup drupal
    • setup linux natting/routing firewall
    • wrote project plans
    • managed & motivated
    • learned how to incorporate
    • setup zimbra
    • setup dnsmask for dhcp/dns masqerading
    • setup bind dns & replication
    • learned how to motivate people and lead weekly conference calls
    • worked on project management
    • marketing
    • sales
    • setup ldap athentication
    • setup openNAS

    Now – I’ve done a bunch of these things before in previous jobs, but its still good practice and there were several I hadn’t played with before. It’s a great opportunity to learn, and even if it doesn’t end up succeeding, the time I’ve put in will not have been wasted. I’ve become a better programmer, a better leader, and I’ve had a lot of fun doing it.

    If you are interested in keeping track of the morale at your company, project, or keeping track of how positive people are about your brand, or a search term, we’re looking for beta customers. Feel free to ping me @nickbernstein on twitter if you think you might be interested.

    Posted in tech, Uncategorized | Tagged , , , | Leave a comment

    A letter to my congresswoman

    I thought I would put up a letter I recently wrote to my congresswoman, Maxine Waters, regarding the recent information that has come to light about torture under the Bush administration.


    I am writing you today in regards to the recent information about the torture which has taken place in the name of the American people. The atrocities, a word I do not use lightly, which have taken place *must* be investigated, and this needs to be done in a manner that is as fast and impartial as possible. I believe that there needs to be an investigation done not by Americans but by a international third party such as the UN. This cannot be perceived as a partisan attack against republicans, it is too important, but an investigation and prosecution of this torture must be undertaken. Our country is nothing without the ideals and principals under which it was formed, and this is must be dealt with.

    Thank you for your time,
    Nicholas Bernstein

    If you are not familiar, here is some background information:

    to balance it out, here is a video of baby pigs being cute: http://www.youtube.com/watch?v=FIWf_hc1_TM

    Posted in Uncategorized | Leave a comment

    A quick perl script for used car research on craigslist

    So my 2001 VW jetta is getting a bit up there in the miles – it’s about 95k at the moment, and while this isn’t too much for a VW, I’ve been wanting to get a new car and I figure I should get rid of it while I can still feel good about selling it to someone else. Besides, I want to get a convertible – I live in southern California, and if it’s not the appropriate climate for one, I don’t know where is.

    Initially I went to carmax and they offered me $2000. Yeeps! I was shocked. Could it really be worth that little? Kelly Blue book said it was at least worth $4k. So I thought I would test the open market and write a quick perl script to give me the average price for an item on craigslist. Here’s how it works and the code is below. It should be really easy to modify for anyone who could use something like this:


    ./cl_get_prices.pl 2001+jetta
    http://losangeles.craigslist.org/search/cta?query=2001+jetta
    lowest:         1200
    highest:        9999
    Average:        6134.48214285714

    another example where I search for porsche boxster:


    nick:~$ ./cl_get_prices.pl porsche+boxster
    http://losangeles.craigslist.org/search/cta?query=porsche+boxster
    lowest:         7600
    highest:        33100
    Average:        16865.6341463415

    Anyway the code is below, and I’ll put a link to the actual perl script:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    
    #!/usr/bin/perl
    $wget="http://losangeles.craigslist.org/search/cta?query=";
    $wget .= $ARGV[0];
    print $wget . "\n";
    $html = `wget -q -O - $wget`;
            @words = split(' ', $html);
            foreach $word (@words)
                    {
                     if ( $word =~ m/^\$/)
                            {
                                    $word =~ s/(\$|,)//g ;
                                    if ( $word =~ m/^\d+$/ )
                                    {
                                            if ( $lowest eq '') {
                                                    $lowest = $word ;
                                            } elsif ( $word < $lowest ) {
                                                    $lowest = $word ;
                                            }
                                            if ( $highest eq '') {
                                                    $highest = $word ;
                                            } elsif ( $word > $highest ) {
                                                    $highest = $word ;
                                            }
                                            $amt += $word ;
                                            $count++;
                                    }
                            }
                    }
    $average = ( $amt / $count ) ;
    print "lowest:\t\t$lowest\nhighest:\t$highest\n";
    print "Average:\t" . $average ."\n";

    Here’s a link to the actual script you can download;. If you find it useful, let me know.

    Posted in Life, programming, Uncategorized | Tagged , , , , , , | 6 Comments

    Playing with Bayes

    For the last day or so I’ve been playing with moving over my simple word-count-analysis of blogs to actually creating a database with manually ranked training data and extrapolating from that. There were some hiccups and I’ve still got to go back and replace a lot of code, but it’s effectively categorizing new blog entries based on previous rankings. YAY! I’ve been using the perl Algorithm::NativeBayes cpan module, and it’s pretty great – although the the documentation is really poor. The main thing to get is that it returns a hash reference, which means you end up referring to your result as something like:

    <pre>print “Sport’s ranking: \t ${$result}{sports} \n” ; </pre>

    Which, lets face it, is kinda ugly, but it’s really the only good way to do it, really. It should really be better documented, though. Aside from that, as long as you get the back-end math, you’re pretty OK. Just because you’re doing AI stuff doesn’t mean that you’re automatically familiar with how perl handles references to hashes, though. One of my to-do items is to go back and update the perldoc on it. Anyway, with that in effect I’ve gone and updated the database and I’m now able to get positivity over time. This means I’m actually getting closer to building an internet happiness index, and prediciting how “happy” the internet is as a whole. The next steps are:

    • incorporate new bayes functions into existing codebase
    • add more sources to the rss feedlist scripts
    • optimize the blogparser
    • put a nice (fusioncharts?) front end together
    • add more hardware for doing the catagorization
    • get more people to do more training data
    • ???
    • profit!

    Actually, the “???” is pretty well defined, but honestly, this project will have been fun even it it doesn’t make a dime. Anyway, one step closer.

    Posted in programming, tech | Tagged , , , , , | 1 Comment

    Dear Bestbuy, why don’t you value me as a customer?

    On my birthday, I drove from one store to another in the boston area trying to find a dell mini 9 netbook. I eventually did, the last one they had, which was an open box return, mid wipe. I purchased it @ full price, and even had to sign something saying I was OK with it not having the OS/Drivers installed.

    I was fine with this, I was planning on installing ubuntu on it anyway. The problem was that w/in a few days the “p” and “o” keys had gone dead. No problem, I think, I’ll just take it back. I had left the box at my GF’s house, so I asked her to ship it to my apartment. Once I got home to california, and gotten the box, I took it back to my closest best buy in el segundo, ca. and they refused to do an exchange. An exchange. I don’t want my money back, or anything fancy – they keys don’t work, I just wanted to swap it out, no data transfer even, I wiped it for them. I cajoled, I bargained, I tried to explain what an example of bad customer service this was, but to no avail. The manager wouldn’t even come out to see me. ( !! )

    In the last year I have purchased (off the top of my head):

    • a laptop (alum. macbook)
    • several mice
    • multiple laptop cases
    • two network routers
    • several video adaptors
    • a printer
    • two 160g usb hd drives
    • 1 terrabyte usb hard drive
    • several video games
    • ethernet cables out the wazoo
    • a gigabit switch
    • a non-gigabit switch
    • extended power strips

    I’m not going to enumerate everything I’ve purchased, but I buy a *lot* of computer equiptment. I can’t imagine why bestbuy would have a policy in place that would be so strict that it would willingly loose a customer over an *exchange* – I mean, this is exactly the type of service that drove compusa out of business. I spent a bunch of money, value me as a customer. If anyone from bestbuy reads this, this is the resolution I would like: contact the bestbuy in el segundo, and ask them to exchange my netbook. That’s it. Like for like. Otherwise, there’s a fry’s electronics a block away, and I’m sure they’d be happy to take my money.

    -Nick Bernstein.

    Posted in Uncategorized | 1 Comment

    An alternative approach to snaprestore rollbacks for virus outbreaks

    overview

    Figure 1

    Figure 1

    Netapp’s snap restore product is a fantastic tool in a storage admin’s arsenal. It works well. It’s fast, and it doesn’t need to “restore” data, it just makes a previous snapshot the active file system. That said, I keep seeing the same scenario put forth by netapp and various other folks as a use case for snaprestore, and it’s one where snaprestore would be my second choice. The scenario is this: We’re taking hourly snapshots over the course of a day. A virus breaks out. The files on our cifs shares have been compromised. We’re sure that our current data is infected, and we know that at least some of the data was infected an hour ago. Two hours ago is uncertain, but we are sure that the virus hadn’t broken out three hours ago, so we know the file system at this point is clean. The recommendation you always hear is to use snap restore to roll back the file system to three hours ago, and you’re now clean and not infected. Unfortunately, as a side effect, we’ve lost all the data after that point – which, I should mention, is the intended result: if we hadn’t our files would still be infected, right? My solution is to instead clone the volume using a really neat tool called flexclone.

    some snapshot basics

    I’d like to suggest an alternate method, but before I get into it, it probably makes sense to talk about how snapshots work. The idea is very simple. Each file on a file system has an inode. An inode contains information about a file – the owner, when it was created, etc. If you’re on a windows box, and you view the “properties” of a file, you’re seeing the information contained in an inode. A hard drive is made up of blocks of data, and another thing that an inode does is point to the blocks that make up the data for a given file. If a file is big, the inode won’t actually point to the data blocks themselves, it will point to indirect blocks which then point to data blocks ( inode -> indirect level 1 -> data ). If a file is really big we can have multiple levels of indirection, like in figure 1. A snapshot is, for all intents and purposes, a copy of just that top level inode block. This block then contains pointers to the indirect blocks, and those indirect blocks to the data blocks. ( Is this fun yet? ) Once we’ve made a copy of the inode (parent) the indirect and data blocks (children) are effectively frozen, since a block cannot be changed unless it’s parent, or parents (original inode and now the snapshot) agree. The way we handle modifications to files in the netapp world is to write those changes to a special reserved section of the volume called the “snapshot reserve” (apt) and to update the original inode to point to those blocks. It’s pretty slick, really.

    flexclone

    In order to understand how this alternative solition works, we need to understand how flexclone works. Flexclone – to over simplify things – is a writeable snapshot. Actually, it’s really not that much of an over simplification, really. If we go back to our snapshot basics where we talked about how each file has an inode which points to data blocks, I should mention that under the hood, everything on a filesystem is actually just, well, a file… even directories. A directory is just a file that contains a list of file names and where the inodes for those files are, that’s the data in that “file’s” data blocks. The top of any volume (think of this as a drive) is a directory, and that directory in turn, has an inode. This one we call the “root inode”. When we take a snapshot, of volume, this is the inode we’re basing our snapshot of. So, what we end up with is: [ root inode ] [ snapshot copy of root inode ]. What we can do is add one more copy to make a “clone” of the volume. this leaves us with: [root inode ] [ snapshot copy of inode] [clone copy of inode]. Now we can still write to our normal volume by writing to that snapshot reserve that we talked about, and what we do when we create a flex clone is associate a *new* snapshot reserve with the clone. The cool thing is initially the clone takes up no space. Ok, that’s a lie, it takes up a coupe kilobytes, but we’re talking storage systems here, a couple of kilobytes is nothing. This is really useful for things like backing up a database, or doing QA -theres a whole bunch of use cases, such as, oh, avoiding the unneccessary dataloss of using snap restore in the event of a virus outbreak.

    the actual solution

    Ok, now that we’ve brushed up on our netapp basics, lets review our problem:

    • we’ve taken hourly snapshots
    • our active file system is infected with a virus, we’ll assume the volume name is vol1
    • so is our previous hourly snapshot and probably the one before that
    • the normal recommeded solution would be to snap restore to hourly.2 (three hours previous) before the virus broke out

    Using this method would mean that we would lose three hours worth of data. That’s not good. Preventing data loss is the whole job of a storage admin, as far as I’m concerned. So, an alternate method:

    1. take cifs offline, using the cifs terminate command
    2. on your admin host copy the file /vol/vol0/etc/cifsconfig_share.cfg -> cifsconfig_share.cfg.YYMMDD.pre-virus.bak
    3. make a clone of vol1 called “vol1_clone” with the command: vol clone create vol1_clone -b vol1 hourly.3
    4. open /vol/vol0/etc/cifsconfig_share.cfg and do a find and replace, changing every instance of vol1 to vol1_clone
    5. run the command cifs restart
    6. run the command cifs shares -add oldvol1 /vol/vol1
    7. lock it down with cifs access <your user> “Full Control”

    So what did this do?

    We quarantined the file system by stopping cifs and removing client access. Then we created a new clone of the volume that had been infected off of the snapshot we new was clean. We updated all of our shares using a quick find and replace giving our users access to their data, so they can get back to work. We also exposed our old volume via a new cifs share, which we’ve restricted only to our user, who theoretically knows better than to muck with infected files. So why did we do all this work? Later, when new virus definitions come out, we can scan that volume over night and have our anti-virus program go clean up our mucky infected files. Once we do, we can open up the share, send out an email, and give our users the ability to recover any important documents that had been created during those three hours, saving extra work, and potentially saving compliance headaches.

    There is a final part of this, in which we split off the clone using the vol split command. I would note – you will (at least temporarally) need to have enough free space on your aggregate to contain both of the volumes, w/o any space savings. Once your split is finished, you can do a vol offline vol1 ; vol destroy vol1 to get rid of the old volume and free up that space.

    I think this is a better solution to the problem, and one that’s much more elegant than using a snap restore. I’d love to hear any feedback or improvments to this process, so if you found this useful, or can think of a better way, please let me know.

    Posted in tech, Uncategorized | Tagged , , , , , , | 1 Comment

    Teachin’ ain’t easy.

    I am very, very jet lagged. I’m not entirely sure what it is, but there’s something that knocks you on your ass when you travel in the middle of the week. I think it’s that everyone’s a *little* tired on Mondays so the world feels like it’s running at a slightly slower pace. Oh well. Short week anyway, and I’m off to Boston on Friday, which is a much shorter flight than heading back to LA. 

    I think the fact that this is a new course I’m teaching this week – I taught the previous version, but the slides have all been changed and the content moved around. I try to get into a flow with the classes I teach, and build up entertaining stories and adages and whatnot- try to be the kind of teacher I liked when I was in school- and it takes a while to get that going with new materials, I think. Maybe I’m just hyper sensitive, but the first time I teach a new class, I never feel like I’m doing it well enough, even if it seems to be going well. I’ve kept myself up re-reading the slides and my previous notes and trying to see if there are any issues by re-doing the labs, but I probably won’t like where it’s at until right before this class gets retired and a new version comes out. Oh well – I guess if it felt perfect the first time though, I’d be bored out of my mind. I really do like teaching, it’s probably one of the most fun jobs I’ve had. 

    It’s also fun coming up with notes for the classes. I tend to go all a little nuts with supplemental materials, but I think it’s a fun thing to do. Right now I’m on a career development kick- I supplemental to find things that tie into the classes but also help the students do things like put together Business Continuance Plans and pull together requirements for other groups – get visibility w/in an organization and whatnot. 

    Anyway, time for sleep. Hopefully tomorrow I’ll be awake enough to go try and find some cuban food. mmm…. plantains… 🙂

    Posted in Uncategorized | Tagged , , , , | Leave a comment

    Heading to Ft. Lauderdale

    I’m blatantly ignoring the fact that I need to get up early tomorrow to fly to Ft. Lauderdale, and I’m not packed. I’m looking forward to it. Depending on how busy I am, there’s a chance I may get to see my grandmother, who’s awesome, and makes delicious pasta, which tastes unlike any other “gravy” I’ve ever had. SO good.

    They re-did all the courseware for data on tap 7.3, and while it’s better than the previous versions, it’s a HUGE hassle to try and memorize the slide decks and labs for the new courses. Especially since I’ve been studying like a mad-man for the LPI certification so I could pass it by the time the Ubuntu train the trainer sessions get going. FYI – I see almost no value in the LPI certification. It’s a requirement to be a ubuntu trainer, and I love the idea of teaching people linux, so it’s a necessary evil, but in my ten or so years of system administration, I don’t think I’ve ever said, shit – someone tell me what the IRQ for ttyS0 is — and no looking at google! Google exists – this idea of memorizing things for the sake of passing tests is just silly. I’d much rather lab based testing. Install the OS. Better yet, setup a tftp boot server, and get an install going that way. Troubleshoot this card not working… use whatever fricken resource you want, hell, call support… just get it done in 30 minutes. Someone needs to make a certification which is 100% lab based and takes like 40 hours, but you can do it all from home via the web. Aaaaanyway. I digress.

    It will be nice to be in Ft. Lauderdale though. I really like that town, and I’ve had a couple students from there in other classes – all of whom I’ve liked, oddly, so maybe I’ll see some familiar faces. Plus – CUBAN food. Cuban food is fucking delicious, and outside of cuba, you can’t get better cuban food than florida. Feh. I should really pack though, I’m off to Boston the week afterwards, and then to DC (?) for the ubuntu TTT session, which means flipflops and overcoats. 🙂

    Posted in Uncategorized | Tagged , , , , , , , | Leave a comment