Most recent posts: page 1 of 6
1 2 3 4 5 6
Browse the complete archive by category or month.
July 24, 2008
Farm Fountain - edible eco-sculpture

Equal parts hydroponic garden, aquarium, and interactive art, the Farm Fountain is an experiment in self-contained, indoor ecosystem design created by artists Ken Rinaldo and Amy Youngs. The idea is that you can raise edible fish and cycle their waste nutrients through a hanging garden which filters the water before returning it to the aquarium.
Their 4th generation Farm Fountain is currently on display at the Te Papa Museum in New Zealand until January 2009. From the Farm Fountain website:
This project is an experiment in local, sustainable agriculture and recycling. It utilizes 2-liter plastic soda bottles as planters and continuously recycles the water in the system to create a symbiotic relationship between edible plants, fish and humans. The work creates an indoor healthy environment that also provides oxygen and light to the humans working and moving through the space. The sound of water trickling through the plant containers creates a peaceful, relaxing waterfall. The Koi and Tilapia fish that are part of this project also provide a focus for relaxed viewing.
The plants we are currently growing include lettuces, cilantro, mint, basil, tomatoes, chives, parsley, mizuna, watercress and tatsoi. The Tilapia fish in this work are also edible and are a variety that have been farmed for thousands of years in the Nile delta.
A Basic Stamp program controls the pump mechanism, allowing the plants to be watered at regular intervals for a set period of time. Depending on available natural light, supplemental lighting can be provided by a combination of fluorescent and grow-spectrum LED lighting, switched from a standard light timer. Ken and Amy worked out a lot of the details during the construction of their 3rd Farm Fountain design (pictured above) and they've assembled a how-to instructional gallery which you can use to design your own Farm Fountain system.
There are a lot of external inputs required to keep the ecosystem healthy for a long period of time including fish food, PH and nitrate monitoring, and general gardening tasks. Once you've gotten accustomed to it, though, it's probably not much more work than maintaining a lawn, and a lawn can't give you tomatoes in the middle of winter.
Farm Fountain - a sculptural ecosystem you can eat
Posted by Jason Striegel |
Jul 24, 2008 11:08 PM
Design, Food, Home, Life, Science, Survival |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 23, 2008
NTFS Alternate Data Streams - hide files inside other files
The NTFS file system has support for additional data, called Alternate Data Streams (ADS), to be attached to any file. Normally this is used by the operating system and file explorer to bind extra data to a file, such as the file's access control information, searchable file meta-data like keywords, comments and revision history, and even information that can mark a file as having been downloaded from the internet. Because this extra information is bound to the file at the filesystem level, you can move the file from one folder to another and all of the various meta-information and permission data stays with the file.
The interesting thing is that a file can have 0 to many ADS forks attached to any file or directory. While some of the ADS identifiers are use by the OS, there's nothing stopping you from adding other ADS forks to a file. You can do this directly from the command line, using a simple colon ":" notation.
Let's say you have a file called test.txt. You can store a secret message in the file like this:
echo "This is a secret" > test.txt:secretdata
If you view the contents of the file, you won't see anything peculiar. If you know about the existence of the secretdata ADS entry, however, you can easily extract the hidden information with the following command:
more < test.txt:secretdata > output.txt
When you now open output.txt, you'll find your secret data inside.
Because it's a lower level OS feature, you can even trick most programs into loading the data. In the scenario above, you could actually load and edit the secretdata stream inside of notepad by running "notepad test.txt:secretdata".You can even store and execute binary data of any particular size in an ADS fork. For instance, maybe you want to shove solitaire inside one of your text file's ADS entries:
type c:\winnt\system32\sol.exe > test.txt:timewaster.exe
Running the file is as simple as "start .\test.txt:timewaster.exe". Wild, no?
So the odd thing is that all these hidden streams are floating about your filesystem and until Vista's /R flag on the DIR command, there hasn't really been a very good built-in way of detecting them. To solve this, Frank Heyne created an application called LADS which is an excellent command line utility that will scan a directory and print out stream names and sizes for files within it.
There's was also a tool released in an MSDN article about file streams that will at an extra tab to the file properties in Windows Explorer. I've linked to a FAQ that Frank maintains about ADS that walks you through setting up the dll and registry entries to make this work. When it's activated, the Streams tab in the properties panel will let you create, view, edit or delete the stream data that's attached to any file, right in Explorer.
I can see how this file system feature could be useful, but it's a little odd that it's so hidden from the user and there seem to be a few problems with the concept. Obviously, because of ADS's hidden nature, there are a number of malicious uses that can be employed by jerk-o's who write virii and that sort of thing. Even ignoring that, there are also data interchange issues—moving a file between NTFS and another file system causes the loss of all this attached information. Call me old fashioned, but I like my files the way they used to be, with a start, an end, and some bytes in between.
Frank Heyne - Alternate Data Streams in NTFS FAQ
LADS - NTFS alternate data stream list utility
The Dark Side of NTFS
MSDN: A Programmer's Perspective on NTFS Streams and Hard Links
Posted by Jason Striegel |
Jul 23, 2008 10:30 PM
Cryptography, Data, Windows, Windows Server |
Permalink
| Comments (3)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 22, 2008
PocketMod and Mapufacture: the anti-iPhone
Here's a clever way to fold an 8.5x11 sheet of paper into a small book. The way it's folded, all of the book's 8 outward-facing pages are from the same side of the sheet of paper. This allows you to easily construct a handy little daily planner by printing a single sheet of paper. When you're done folding, the first and third leaf will have a little pouch that you can shove a business card or two inside.

The PocketMod website has a flash application that lets you quickly build a layout for your planner. You can drag calendars, todo-lists, grids, conversion tables, and even RSS feed articles to the page and print it directly from your browser.
I love it. It's the iPhone for the mobile Luddite.
You're probably thinking: this pocketmod thing is awesome and all, but what about maps? Well, PocketMod does maps too. Or rather, a cool Web2.0 mapping service does PocketMods.
At mapufacture.com, you can create and manage custom maps and import data layers from news sources, geo blogging services, and Google My Maps. In addition to all the normal embedding and sharing tools that you'd expect, they also have a PocketMod export, allowing you to convert your map into a handy format that you can put in your back pocket.
You can't make phone calls on your PocketMod and it doesn't hold any songs you can't sing or whistle yourself. On the other hand, it's crazy slim, 3rd party application writing is a cinch, the data plan is affordable, and you won't believe the battery life.
PocketMod
Mapufacture - create custom multilayer maps (with pocketmod output support)
Posted by Jason Striegel |
Jul 22, 2008 10:24 PM
Google Maps, Life, Lifehacker, Mapping, Productivity |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 21, 2008
Tether your iPhone 3G
Your iPhone can connect you to the web from just about anywhere, but sometimes browsing on a tiny screen isn't enough. With jailbroken 3G and some free software, it's pretty easy to bring that internet-anywhere access to your laptop.
Nate True put together a howto that will guide you through the steps for configuring your iPhone 3G as a web proxy using the 3Proxy software. The laptop connects to the iPhone over an ad-hoc WiFi connection, the iPhone connects to the internet on its 3G connection, and 3Proxy sits in the middle, shuttling http requests and responses from your laptop to the world wide internets.
There are a number of steps involved if you include the whole jailbreaking process. If you get this out of the way, though, you'll be prepared to jack in in an emergency (or in a lame-o airport with pay wifi).
How to tether your iPhone 3G
3Proxy
PwnageTool 2.0.1 (for jailbreaking your iPhone 3G)
Posted by Jason Striegel |
Jul 21, 2008 10:12 PM
Mobile Phones, Wireless, iPhone |
Permalink
| Comments (2)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 20, 2008
LEGO NXT Rubik's Cube solver

Hand Andersson's Tilted Twister is a LEGO robot that can solve a scrambled Rubik's Cube in about 6 minutes. I've seen LEGO cube solvers before, but there's something a bit different about this one, which you can see in the video below:
If you didn't catch the difference, this robot is solving the puzzle on its own with no attached PC!
A light sensor is used to scan all six faces of the cube. The robot then calculates a solution for the cube, before executing an average of 60 turns to complete the puzzle.
The robot is built from the parts available in a retail NXT kit—no extra or custom pieces are necessary. You can build one yourself using the LEGO Digital Designer CAD file and NXC source code that are available from Hans' site.
Tilted Twister - a Lego Mindstorms robot that solves Rubik's cube [via Hacked Gadgets]
Posted by Jason Striegel |
Jul 20, 2008 08:04 PM
LEGO |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 19, 2008
Citizen Engineer 01 - SIM card and payphone hacks
Ladyada and PT have kicked off the first episode of their Citizen Engineer video series in style. This episode explores GSM SIM card technology and the more retro tech found inside a retired Bell payphone. Ladyada shows how to create a SIM reader which you can use to do things like read deleted SMS messages or brute-force the card's secret key. In the second part, the team dismantles an old Bell payphone and hacks it to function as a home telephone, require quarters for use, and make Skype calls.
Posted by Jason Striegel |
Jul 19, 2008 07:12 PM
Electronics, Mobile Phones, Screencasts, Skype |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 18, 2008
Origami Wall-E
Brian Chan figured out how to make this origami Wall-E from a single uncut square of paper. It looks like a 2 hour project for someone with decent folding skill. My mind is officially blown. Images and the Wall-E folding pattern are available on Brian's site. I found the above time-lapse video on MIT's TechTV video site. The site looks like a YouTube for hackers and is also well worth checking out.
Brian Chan's Origami Wall-E
Wall-E Folding @ MIT TechTV
Posted by Jason Striegel |
Jul 18, 2008 10:32 PM
Design |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 17, 2008
Binary Arduino clock
Check out Daniel Andrade's binary LED clock built using the Arduino. It's well thought out, including controls for setting the time and hibernating with the LEDs off. Each hour and minute digit is represented in binary form, so it's actually fairly easy to read once you get accustomed to it.
The circuit and source are available from Daniel's site. If you're ahead of the game and already thinking about what to do this Saturday afternoon, this might be a fun option to add to the list.
DIY: Binary Clock with Arduino
Posted by Jason Striegel |
Jul 17, 2008 08:57 PM
Electronics |
Permalink
| Comments (2)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 16, 2008
Improve Linux laptop performance with Ramlog
One of the most power-hungry components in a traditional laptop is its hard disk, and time between charges can be greatly improved by keeping the disk in sleep mode. On machines like the OLPC that have solid-state disks, keeping disk writes to a minimum improves the life of the drive, minimizing unwritable sectors. Depending on how your machine is configured, log activity from kernel events and running daemons like sshd, a dns cache, or a local copy of apache can force your disk to make tiny writes every few minutes, impacting flash drive lifetime and ensuring that a mechanical drive never sleeps.
One solution to the problem is to disable syslogd entirely. An alternative is Ramlog, which offers a bit of a compromise. With Ramlog installed, log data is stored in RAM until shutdown, when it's copied back to disk in one big write. You will loose your logs if you have a system crash, but in a more usual scenario where you're trying to track down a wireless problem or an apache error on your development laptop, the logs are there for you to examine.
Installing Ramlog [linux.com]
Ramlog downloads
Posted by Jason Striegel |
Jul 16, 2008 09:11 PM
Linux, Ubuntu, olpc |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 15, 2008
When to denormalize
There's been a bit of a database religious war on Dare Obasanjo and Jeff Atwood's blogs, all on the subject of database normalization: when to normalize, when not to, and the performance and data integrity issues that underly the decision.
Here's the root of the argument. What we've all been taught regarding database design is irrelevant if the design can't deliver the necessary performance results.
The 3rd normal form helps to ensure that the relationships in your DB reflect reality, that you don't have duplicate data, that the zero to many relationships in your system can accommodate any potential scenario, and that space isn't wasted and reserved for data that isn't explicitly being used. The downside is that a single object within the system may span many tables and, as your dataset grows large, the joins and/or multiple selects required to extract entities from the system begins to impact the system's performance.
By denormalizing, you can compromise and pull some of those relationships back into the parent table. You might decide, for instance, that a user can have only 3 phone numbers, 1 work address, and 1 home address. In doing so, you've met the requirements of the common scenario and removed the need to join to separate address or contact number tables. This isn't an uncommon compromise. Just look at the contacts table in your average cell phone to see it in action.
Jeff writes:
Both solutions have their pros and cons. So let me put the question to you: which is better -- a normalized database, or a denormalized database?Trick question! The answer is that it doesn't matter! Until you have millions and millions of rows of data, that is. Everything is fast for small n.
So for large n, what's the solution? In my personal experience, you can usually have it both ways.
Design your database to 3NF from the beginning to ensure data integrity and to allow room for growth, additional relationships, and the sanity of future querying and indexing. Only when you find there are performance problems do you need to think about optimizing. Usually this can be accomplished through smarter querying. When it cannot, you derive a denormalized data set from the normalized source. This can be as simple as an extra field in the parent table that derives sort information on inserts, or it can be a full-blown object cache table that's updated from the official source at some regular interval or when an important even occurs.
Read the discussions and share your comments. To me, the big takeaway is that there's no one solution that will fit every real world problem. Ultimately, your final design has to reflect the unique needs of the problem that is being solved.
When Not to Normalize your SQL Database
Maybe Normalizing Isn't Normal
Posted by Jason Striegel |
Jul 15, 2008 08:47 PM
Data, Software Engineering |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 14, 2008
LEGO Wall-E

NXT Mindstorms hacker BlueToothKiwi created a working Wall-E trash collector robot and uploaded build instructions to the NXTLog.
To mark the release of the film, the official web site has a 'Build your own robot' section where you get to choose the looks / behavior / mobility etc.And of course, if you got a NXT - you dont need to go to a web site to design a virtual robot. You can of course build your own real Wall-e! Well almost!!
BlueToothKiwi's bot may have won an NXTLog building challenge, but there are some other great Wall-e robots on the NXTLog site worth checking out. From the look-and-feel department, Joe Meno's Flickr photos of his bot bear a striking resemblance to Earth's last robot.
Last is this humble little Wall-E, based on the original Mindstorms.
I'm pretty sure I need to build about 10 of these things and scatter them around the office.
Making your own Wall-e with NXT [thanks, Patti]
Instructions on NXTLog
Other Wall-e bots at the LEGO NXTLog
Joe Meno's LEGO Wall-e on Flickr
Posted by Jason Striegel |
Jul 14, 2008 11:28 PM
LEGO, Parenting |
Permalink
| Comments (3)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 13, 2008
Find and Grep 101
Find and Grep are perhaps the most used command line tools for the Linux user or administrator. Terse but powerful, these two commands will allow you to search through files and their contents by almost any imaginable attribute or filter criteria: file name, date modified, occurrence of the some specific word in a file, etc. Combined with a couple of other standard unix utilities, you can automate and process modifications over a number of files that match your search.
Here are two blog posts by Eric Wendelin which nicely illustrate the basics of these two commands:
Find is a Beautiful Tool
Grep is a Beautiful Tool
There are a number of other great unix utilities for file search, but knowing how to use find and grep is fundamental, as these two utilities can be found on the most basic build of every unix-like machine you come across.
Got a favorite command line hack that uses find or grep? Drop it on us in the comments.
Posted by Jason Striegel |
Jul 13, 2008 09:19 PM
Linux, Linux Server, Ubuntu |
Permalink
| Comments (2)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 12, 2008
Cruel Super Mario World hack
Kaizo Mario is a homebrew level for Super Mario World that's equal parts evil and genius. My tolerance for frustration isn't nearly high enough to be able to handle this, but for those of you looking for a gaming challenge, there's a link to the ROM below.
You can make your own custom levels for Super Mario World with a graphical level editor called Lunar Magic. You'll need a Windows PC and the original SNES Super Mario World ROM to use it. If you come up with anything you'd like to share, please add it to the comments.
Kaizo Mario Download
Lunar Magic: Super Mario Wolrd Level Editor
Posted by Jason Striegel |
Jul 12, 2008 09:59 PM
Retro Gaming |
Permalink
| Comments (2)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 11, 2008
Reverse autocomplete
Traditional autocomplete is such a powerful tool that it's managed to work its way into most desktop and a significant number of web applications. Type a URL into your browser, and the address bar will offer suggestions for all of the URLs in your browsing history that begin with the text that precedes your cursor. Autocomplete works well, but it could be better.
László Kozma brought up a problem that tends to crop up in a number of scenarios: if you move the cursor back to correct or change part of the entry string, traditional autocomplete completely ignores any context to the right of the cursor. In some applications, this means everything to the right of the cursor is ignored and overwritten instead of being part of the search. In Safari, if you type anything between "www." and ".com", autocomplete fails entirely, offering no results unless you clear out everything to the right of the cursor.
One solution, which László termed "reverse autocomplete", is to split the string at the cursor position and attempt to find matches for both anything after the first half and before the second half. Any matches that show up in both sets are the final autocomplete suggestions. The result is that if I type in "www.h|.com" (where my cursor position is represented by the "|" character) the smarter autocomplete might return "www.hackszine.com" as a suggestion but omit from telling me about "www.h-is-aitch.org".
You can take it one step further and also match against the beginning and ends of the entire string. This solves a really common problem that I run into regularly when searching through a large contact list, say the typical corporate email system. If you don't know the correct spelling of someone's name, or if you only can recall initials, you can fill in the parts you know. The system would be smart enough to turn "J|S" into "Jason Striegel" instead of forcing you to page through a huge list of J names.
Posted by Jason Striegel |
Jul 11, 2008 10:30 PM
Ajax, Software Engineering |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 10, 2008
Mapstraction - map abstraction API for Javascript

Mapstraction is an abstracted Javascript mapping API that can make use of Google Maps, Microsoft Virtual Earth, Yahoo Maps and Mapquest. Instead of deciding on a particular mapping provider, you can build your web application with Mapstraction and easily switch to a different service by changing a single line of code. From the Mapstraction site:
Mapstraction additionally fills some holes each provider's current offerings (taking advantage of existing open source solutions where possible) to normalise the feature set across platforms. In the future, Mapstraction will also talk to OpenStreetMap for people who want to build maps without restrictions on derived works.Features
- Support for 9 major mapping providers
- Point, Line, Polygon support
- Image overlay
- GeoRSS and KML feed import
- Geocoding of addresses
- Driving directions
There's an introductory walkthrough on Webmonkey that shows you how to do the basics like instantiating a map with various providers and adding markers with the abstracted API. The Mapstraction web site also has demos for geocoding, drawing polygons, and swapping map tiles. The API appears to cover all the bases. I can't think of any reason to directly use a specific map provider instead of this.
Mapstraction
WebMonkey Mapstraction Tutorial
Posted by Jason Striegel |
Jul 10, 2008 11:42 PM
Google Maps, Mapping, Yahoo! |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 9, 2008
Maglite LASER burnination
Desertfoxx sent us a tip to a classic KipKay howto video. It's pretty easy to swap out the diode in a laser pointer or similar laser housing with the diode from an old DVD burner. Put the whole deal into a mini Maglite and you've got yourself a nice little handheld fire starter, perfect for lighting matches or popping dark colored balloons from across the room. Like all lasers, it's also excellent for quickly blinding people, so watch where you point the thing.
If you haven't caught KipKay before, he's been doing some fun weekend project video podcasts for MAKE.
How To Make A Burning Laser Flashlight
Posted by Jason Striegel |
Jul 9, 2008 09:44 PM
Electronics |
Permalink
| Comments (2)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 8, 2008
3D Studio Max motion capture with a Wii Nunchuck
By passing Nunchuck data to a PC via an Arduino, Melka figured out a way to convert the accelerometer output into a MIDI stream that can be read directly by 3D Studio Max's motion capture engine:
Here's my setup under windows :
- Arduino using a WiiChuck adapter from todbot (thanks kurt ^^) and the WiiChuck library from Tim Hirzel
- Data sent to Processing via serial connection and translated to MIDI CC messages using the proMIDI library by Christian Riekoff
- MIDI output from processing sent to midiYoke
- midiYoke sends this data to Ableton Live
- Ableton re-sends the CC messages to midiYoke
- Using Float Motion Capture controllers on 3D Studio Max to rotate the objects according to the pitch and roll of the wiichuck
It's a little complicated, but from the looks of the video the payoff is worth it. You could adapt this to use data from a number of accelerometers, or turn other measurement data into a MIDI stream that can be used by any application, 3DS or otherwise.
Arduino to 3D Studio Max [via MAKE]
Posted by Jason Striegel |
Jul 8, 2008 11:01 PM
Electronics |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 7, 2008
PlaceSpotting - Google Maps geo quiz

Martin Fussen tipped us off to PlaceSpotting, a user-contributed geo quiz map mashup. The idea is to create puzzles for your friends to solve by picking an obscure landmark and supplying them with a few hints. Your friends can then zoom around on the map to find the location. If they position the map at the right zoom level and over the location the puzzle is solved. I have to say this is a pretty fun way to learn a bit about the world, especially if you're in to puzzles and treasure hunts.
You can search and browse a large library of entries that other people have created. There are a significant number of entries in German, and many of the landmarks are within Europe, but there's nothing stopping you from dropping a few landmarks near the place you call home.
Posted by Jason Striegel |
Jul 7, 2008 08:17 PM
Education, Google Maps, World |
Permalink
| Comments (0)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 6, 2008
KidWash sprinkler toy

Just because it's hot doesn't mean the kids have to stay indoors in the A/C. There are a number of worthwhile summer projects, but the KidWash looks like it has a particularly high fun/effort ratio. A trip to the hardware store for some PVC and mister jets and you can give the Wii a run for its money next weekend.
I headed down to the PVC section of the local home improvement store to pick up supplies. While browsing the adjacent sections for interesting stuff I noticed the micro-irrigation section and inspiration struck: KidWash with mister jets!The modification worked great. We turned it on and kids from up and down the block started showing up to help with the testing. It's a lot of fun on foot, but my kids also get a blast out of riding their bikes through it.
This would be great to combine with a DIY visqueen slip and slide.
KidWash 2 : PVC Sprinkler Water Toy
Posted by Jason Striegel |
Jul 6, 2008 09:23 PM
Home, Life, Parenting |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
July 5, 2008
Crawling AJAX
Traditionally, a web spider system is tasked with connecting to a server, pulling down the HTML document, scanning the document for anchor links to other HTTP URLs and repeating the same process on all of the discovered URLs. Each URL represents a different state of the traditional web site. In an AJAX application, much of the page content isn't contained in the HTML document, but is dynamically inserted by Javascript during page load. Furthermore, anchor links can trigger javascript events instead of pointing to other documents. The state of the application is defined by the series of Javascript events that were triggered after page load. The result is that the traditional spider is only able to see a small fraction of the site's content and is unable to index any of the application's state information.
So how do we go about fixing the problem?
Crawl AJAX Like A Human Would
To crawl AJAX, the spider needs to understand more about a page than just its HTML. It needs to be able to understand the structure of the document as well as the Javascript that manipulates it. To be able to investigate the deeper state of an application, the crawling process also needs to be able to recognize and execute events within the document to simulate the paths that might be taken by a real user.
Shreeraj Shah's paper, Crawling Ajax-driven Web 2.0 Applications, does a nice job of describing the "event-driven" approach to web crawling. It's about creating a smarter class of web crawling software which is able to retrieve, execute, and parse dynamic, Javascript-driven DOM content, much like a human would operate a full-featured web browser.
The "protocol-driven" approach does not work when the crawler comes across an Ajax embedded page. This is because all target resources are part of JavaScript code and are embedded in the DOM context. It is important to both understand and trigger this DOM-based activity. In the process, this has lead to another approach called "event-driven" crawling. It has following three key components
- Javascript analysis and interpretation with linking to Ajax
- DOM event handling and dispatching
- Dynamic DOM content extraction
The Necessary Tools
The easiest way to implement an AJAX-enabled, event-driven crawler is to use a modern browser as the underlying platform. There are a couple of tools available, namely Watir and Crowbar, that will allow you to control Firefox or IE from code, allowing you to extract page data after it has processed any Javascript.
Watir is a library that enables browser automation using Ruby. It was originally built for IE, but it's been ported to both Firefox and Safari as well. The Watir API allows you to launch a browser process and then directly extract and click on anchor links from your Ruby application. This application alone makes me want to get more familiar with Ruby.
Crowbar is another interesting tool which uses a headless version of Firefox to render and parse web content. What's cool is that it provides a web server interface to the browser, so you can issue simple GET or POST requests from any language and then scrape the results as needed. This lets you interact with the browser from even simple command line scripts, using curl or wget.
Which tool you use depends on the needs of your crawler. Crowbar has the benefit of being language agnostic and simple to integrate into a traditional crawler design to extract page information that would only be present after a page has completed loading. Watir, on the other hand, gives you deeper, interactive access to the browser, allowing you to trigger subsequent Javascript events. The downside is that the logic behind a crawler that can dig deep into application state is quite a bit more complicated, and with Watir you are tied to Ruby which may or may not be your cup of tea.
Crowbar - server-side headless Firefox
Watir - browser remote control in Ruby
Crawling Ajax-driven Web 2.0 Applications (PDF)
Posted by Jason Striegel |
Jul 5, 2008 12:57 PM
Ajax, Data, Web |
Permalink
| Comments (1)
| TrackBack
| Digg It
| Tag w/del.icio.us
Bloggers
Welcome to the Hacks Blog!
Categories
- Ajax
- Amazon
- AppleTV
- Astronomy
- BlackBerry
- Blogging
- Body
- Cars
- Cryptography
- Data
- Design
- Education
- Electronics
- Energy
- Events
- Excel
- Excerpts
- Firefox
- Flash
- Flickr
- Flying Things
- Food
- Gaming
- Gmail
- Google Earth
- Google Maps
- Government
- Greasemonkey
- Hacks Series
- Hackszine Podcast
- Halo
- Hardware
- Home
- Home Theater
- iPhone
- iPod
- IRC
- iTunes
- Java
- Kindle
- Knoppix
- Language
- LEGO
- Life
- Lifehacker
- Linux
- Linux Desktop
- Linux Multimedia
- Linux Server
- Mac
- Mapping
- Math
- Microsoft Office
- Mind
- Mind Performance
- Mobile Phones
- Music
- MySpace
- MySQL
- NetFlix
- Network Security
- olpc
- OpenOffice
- Outdoor
- Parenting
- PCs
- PDAs
- Perl
- Philosophy
- Photography
- PHP
- Pleo
- Podcast
- Podcasting
- Productivity
- PSP
- Retro Computing
- Retro Gaming
- Science
- Screencasts
- Shopping
- Skype
- Smart Home
- Software Engineering
- Sports
- SQL
- Statistics
- Survival
- TiVo
- Transportation
- Travel
- Ubuntu
- Video
- Virtualization
- Visual Studio
- VoIP
- Web
- Web Site Measurement
- Windows
- Windows Server
- Wireless
- Word
- World
- Xbox
- Yahoo!
- YouTube
Archives
Recent Posts
- Farm Fountain - edible eco-sculpture
- NTFS Alternate Data Streams - hide files inside other files
- PocketMod and Mapufacture: the anti-iPhone
- Tether your iPhone 3G
- LEGO NXT Rubik's Cube solver
- Citizen Engineer 01 - SIM card and payphone hacks
- Origami Wall-E
- Binary Arduino clock
- Improve Linux laptop performance with Ramlog
- When to denormalize
www.flickr.com
|






Recent comments