Seeking to procrastinate in any way possible, I decided it was time to get rid of the crutches of OS X and jump into becoming a terminal warrior. Picking Ubuntu was easy, as if I get lazy the crutches are semi there but I can still do all the unique tinkering only available through Linux.
Surprisingly, it works great. Although the installation was a bit of a pain (if you don’t update sensor drivers, you can cook your computer if you’re not careful), in the end it wasn’t too bad and now gives the combination of great hardware from Apple and anything I could imagine software wise from the open source community.
Warning: Don’t run the driver script in it’s entirety or you will create a driver conflict with the wireless cards, which is a pain to fix. Pick and choose whatever pieces you want.
Links to installation instructions;
Finally, the years of being in the trenches of discrete math classes, data structures and algorithms, and general programming are being tested by a project I began working on this past week. Essentially, the project is to build a web crawler + search engine, with the only constraint being that it is to be built in Java. Thus, there is a lot of freedom in both algorithms used and system operations.
At the expense of my other course work, this project has taken over my life in a way, with my days consisting of a constant flurry of new ideas and tweaks to squeeze some extra performance out of different components. I’ve completed an initial version of the web crawler that is able to fetch, parse, and insert + rank URLs at a rate of ~340/ minute on a Mid-2011 MacBook Air (2 cores, 4 GB RAM).
In the coming week, I hope to introduce threading to the crawler and see if I can get some efficiencies between the processes of fetching a URL, parsing the document for URLs and words, and existence check / database insertion. The goal is to get to 1000 URLs / minute on this hardware before jumping into searching + jsp page creation. The ultimate goal is to implement a version of PageRank with some sort of live search (character by character like Google does), but we’ll see how well the crawler goes on this hardware first. Time to brush up on automatas and grammars!
A side note about Java; I never took seriously a friend who used to harp on using Java for applets / programs such as this. But it seems however many new solutions I’ve researched that have been developed to replace Java, Java always seems to come through as a preferable choice whether it be security, scalability, or just plain efficiency (HR or complexity).