nextwebgen.com

The Next Generation Web Now

YQL execute now allows you to convert scraped data with server side JavaScript

Filed under: Uncategorized — Chris Heilmann at 9:28 am on Thursday, April 30, 2009

I am a big fan of YQL, a terribly easy and fuss-free way to access APIs and mix data retrieved from them in a simple, SQL style language. Say for example you want photos of Paris,France from Flickr that are licensed with Creative Commons attribution, you can do this with a single command:

CODE:

  1. select * from flickr.photos.info where photo_id in (select id from flickr.photos.search where woe_id in (select woeid from geo.places where text=‘paris,france’) and license=4)

Try it out here and you see what I mean.

The next step of this interface was to open it out to the public. You can define an “Open Table” as a simple XML schema and bring your own API into this interface with that.

One thing that’s been burning on my tongue to tell the world about has been finally released now: YQL execute. Instead of making the YQL language itself much more complex (and thus running in circles) we now allow you to embed JavaScript in the Open Table XML that will run on the YQL server and allow you to access other web services, authenticate and scrape HTML with JavaScript and E4X. As Simon Willison put it:

This is nuts (in a good way). Yahoo!’s intriguing universal SQL-style XML/JSONP web service interface now supports JavaScript as a kind of stored procedure language, meaning you can use JavaScript and E4X to screen-scrape web pages, then query the results with YQL.

Using this, you can augment the original functionality of YQL to whatever you need. For example, you can scrape HTML with YQL using XPATH, but there was no way to use CSS selectors. Using an open table that invokes James Padolsey’s css2xpath JavaScript on the server side, this is now possible.

CODE:

  1. use ‘http://yqlblog.net/samples/data.html.cssselect.xml’ as data.html.cssselect; select * from data.html.cssselect where url=“www.yahoo.com” and css=“#news a”

Run this query in YQL

The data table is pretty easy:

XML:

  1. <?xml version=“1.0″ encoding=“UTF-8″ ?>
  2. <table xmlns=“http://query.yahooapis.com/v1/schema/table.xsd”>
  3.   <meta>
  4.     <samplequery>select * from {table} where url="www.yahoo.com" and css="#news a"</samplequery>
  5.   </meta>
  6.   <bindings>
  7.   <select itemPath=“” produces=“XML”>
  8.     <urls>
  9.       <url></url>
  10.  
  11.     </urls>
  12.     <inputs>
  13.       <key id=“url” type=“xs:string” paramType=“variable” required=“true” />
  14.       <key id=“css” type=“xs:string” paramType=“variable” />
  15.     </inputs>
  16.       <execute><![CDATA[
  17.    //include css to xpath convert function
  18.    y.include("http://james.padolsey.com/scripts/javascript/css2xpath.js");
  19.    var query = null;
  20.    if (css) {
  21.       var xpath = CSS2XPATH(css);
  22.       y.log("xpath "+xpath);
  23.       query = y.query("select * from html where url=@url and xpath=\""+xpath+"\"",{url:url});
  24.    } else {
  25.       query = y.query("select * from html where url=@url",{url:url});
  26.    }
  27.    response.object = query.results;
  28.       ]]></execute>
  29.     </select>
  30.   </bindings>
  31. </table>

Check the official Yahoo Developer Network blog post on YQL execute for more examples, including authentication examples for flickr and netflix.

Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • blogmarks
  • del.icio.us
  • De.lirio.us
  • digg
  • NewsVine
  • YahooMyWeb

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>