Interesting, any idea how this works? Edit: doh, source here: http://lab.arc90.c...

aston · on March 3, 2009

You edited to include the link to the JS, but here's my reading of it:

1) It pulls out content in <p> tags (presumably those are the ones with data you want).

2) It rewrites double line breaks as paragraph breaks (in case the site isn't as semantic as it should be).

3) In order to pick the "main content" container, it looks for the container elm with the most <p>s inside.

4) It filters the "main content" to remove stuff that looks like trash. Filters include having too much non-<p> content, having too few commas, and too few words.

5) It rips out all of the HTML on the page and puts its own in, which also pulls together the user's selected style info. This is the sketchy step. I think an overlay might've been more appropriate here, but the comments imply the author had some difficulty there.

dood · on March 3, 2009

Thanks, had a brainfart and forgot I could just read the code ;)