Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Releases mod_pagespeed for Apache (googlewebmastercentral.blogspot.com)
290 points by mike-cardwell on Nov 3, 2010 | hide | past | favorite | 73 comments


There are some very interesting filters:

http://code.google.com/speed/page-speed/docs/filters.html

It will automatically strip inline css and js out of a html page and then create links to an external version. It will automatically set very long cache times. It minifies css and javascript on the fly, strips whitespace and comments from the html.

Then for images, it automatically rescales them, strips out meta data and recompresses. It even converts img tags to use data: uris when it's safe and efficient to do so.

I love it.


That's pretty cool, though the biggest slowdown for most sites are all of the third-party scripts being loaded from remote servers. If people really realized how much page speed matters in usability, they would be loathe to add in all the scripts from facebook and digg and google and 12 different analytics services.


Ad networks are a particular culprit. For many (banner) ad networks, HTTP requests get bounced around so many servers that it can really slow things down.


Might be obvious, but this is why you should use some asynchronous code to load your ads.


If someone knows how to do with with OpenX, let me know. I've read using iFrames is a solution, but don't know the disadvantages to this.


Could you please give us some links or info on loading the ads asynchronously?

Is it compatible with Adsense policy?


Here are some links. I don't have a rock solid answer regarding Google's TOS for this, my apologies.

Async Adsense: http://webmasters.stackexchange.com/questions/2506/is-it-pos...

Async Analytics: http://www.ditii.com/2009/12/01/google-analytics-asynchronou...


From the first link: "this practice might not be welcome by Google".

I don't know what's the second link for, as analytics are asynchronous per se?

EDIT: By the date on the second url, very probably is a hack for an older version of Analytics.

EDIT 2: First link is an unaccepted answer on an stackoverflow-like site on the topic for using an against the rules of Google hack. And Google doesn't need much help for closing your account and your money.


Re: Analytics; I'm not sure if the default code google gives you for analytics is async _now_ but based on this http://code.google.com/apis/analytics/docs/tracking/asyncTra... it certainly seems like it, at least at one point, it was not. Again, that may have changed.

Here is another link for async adsense with code included. The author has also posted his e-mail conversation with a rep. from the Google Adsense team. There is nothing fancy going on here, just iframes, really.

http://geoland.org/2007/01/adsense-in-iframe/

You could just reach out to adsense yourself and report back what you find regarding violation of the Google Adsense TOS.


Google Ad Manager -really- does not like you doing anything asynchronously, unfortunately.


Yeah, its because many of the ad networks that you may load into it simply cannot work asynchronously (document.write). So they have to enforce synchronous loading on the ad manger code.


Not always possible with all banner networks I am afraid!


Those banner networks deserve to lose customers to the banner networks that do support asynchronous loading. Does anybody have a list of which banner networks allow it and which don't?


It is if you stick their ads in an iframe, ala Grooveshark.


Not if they are contextual and demand the host script to be in the top level document. (and this is against the terms of use of many advertising services)


you lose ad revenue if you do that. the longer it takes the ad to show up on the page, the less likely people are to click on it.


The longer it takes me to load your page, the less likely I will ever see it (because I back out to my previous page) or will ever come back to it (because of the bad experience).


My adsense scripts almost always freeze my webs waiting for them to load.

Do you fellow HNers know any way to speed up them or at least load them in the background?


Put script tags at the end of the body, not in the header. If you still have a problem, dynamically add the script tags after the page has already loaded and displayed.


???

Could you share any sample of adsense with scripts in the header? Also, do you know any way to load dynamically the scripts?

Google seems to be full of people trying to do it to no avail.



Did you read the parent comment ? He asked about adsense script, not analytics tracking code.


You could inject the analytic script after the page has loaded instead of having it part of it.


I'm talking about ads, not analytics.


It's been a while, but I think you can choose between IFRAME and DOM-injection methods. Researching that might get you started.


addthis is a particularly guilty culprit. pageSpeed is always giving me hassle about their CSS. grumpywebmaster. You listening addthis?


You're right, but seems irrelevant. mod_pagespeed likely has a very different audience.


Regarding 'Outlining CSS', there's also a risk of breaking sites that depend on the specificity rules of CSS[1].

I think this should be added to the risks list for this filter[2]. Does anyone know how to ping Google about such things?

[1] http://www.molly.com/2005/10/06/css2-and-css21-specificity-c... [2] http://code.google.com/speed/page-speed/docs/filter-css-outl...




Ask HN:

I'm sure there is going to be tons of positives to this from the community, many I can identify myself, but can someone list some NEGATIVEs that may arise by using this?


Each of the filters mentions risks:

http://code.google.com/speed/page-speed/docs/filters.html

This is a tool that mindlessly applies optimisations that are lossy or change behavior of the page. Sometimes it may break things (e.g. I use `input[type=text]` selector in my CSS, and removal of "redundant" `type="text"` attribute from source would break it).

It also wastes some processing time on applying on-the-fly code changes you could do yourself in the source.

If you follow performance best practices, you may optimize pages better yourself (you need to judge what is better to inline, what can be made async. Simple heuristics of this module uses may be too crude).


I use Smarty as a templating engine and sometimes I have to generate a few lines of javascript or css dynamically on the page based on php variables at runtime. I wouldn't want these to be stripped out and cached.

Usually I do a lot of optimization in the templates. For example, Smarty lets you wrap your output in {strip} tags to remove whitespace and newlines. And I have a script to concatenate and minify javascript and css. Plus mod_deflate on the server. I serve images from s3/cloudfront and there is a script that will use smush.it to compress everything in an s3 folder and set the expires tags.


Would it? I thought the type defaulted to "text" if missing. I'd be surprised if css selectors didn't recognise that...


CSS sees DOM attributes (what you get from element.getAttribute('type')), and not DOM properties (element.type).

These are different things kept roughly in sync by setters/getters (called "reflected properties" in HTML spec).

The `type` attribute is also supposed to have default value implied by the DTD, but that could only work in "validating" SGML/XML parsers (which browsers aren't).


I didn't know that. I tested it myself and you're right. The "Elide Attributes" filter seems fairly pointless anyway. I can't imagine it making that much of a difference in performance even if it was safe.


It looks like the 'Optimize Images' filter rewrites a different file name for the optimized image:

Before: <img src="images/BikeCrashIcn.png" alt="Bike Crash">

After: <img src="images/images/ic.HASH.x,BikeCrashIcn,p.png" alt="Bike Crash">

If you get significant search traffic from Google or Bing Image Search you'll probably want to disable this filter.


What in particular about the rewritten filename would be detrimental to image search traffic?

The filter retains the original filename as well as the alt text.


Given two image filenames: "pogo-sticks.png" and "ic.HASH.x,pogo-sticks,p.png", I'm willing to bet the former will rank higher and lead to more pogo stick traffic and sales because of the straightforward filename.

From the horse's mouth: "optimizing your image filenames and alt text makes it easier for image search projects like Google Image Search to better understand your images."

http://static.googleusercontent.com/external_content/untrust...

Anything that makes it harder for Google to understand your site is a losing business proposition.

I'd rather use PageSpeed for Firefox (optimize in dev, set the appropriate filenames, then deploy to production) to avoid any possible ranking penalties caused by this filter in production.


The search engines will spider/cache the resized version, so people looking for a higher resolution version, may skip over your site?


I'm surprised it doesn't convert to (and if possible serve in) WebP, the image format created from WebM.

It was these guys (the "speed" folks) within Google that came up with the idea.

http://code.google.com/speed/webp/

Maybe in a later version?


If you click on the individual filters to read their descriptions, they all have a "Risks" section at the bottom of the page. For example, the inline css removal filter has this:

The 'Inline CSS' filter is low to moderate risk. It should be safe for most pages, but it could potentially break scripts that walk the DOM looking for and examining <link> or <style> tags.


The biggest negative is people implement it without understanding and just enable all the filters by default.

It's also another apache module. Personally I try and work without DSOs.


I agree with this comment. Interestingly they're trying to implement it on Go Daddys systems for 8.5 million customers. This suggests they must be fairly confident that the optimisations are safe for the vast majority of sites.

I wonder which filters Go Daddy will enable and how many problems it will cause.


It sounds a bit paranoid (and surely technically unimportant), but the first negative aspect that i think of is :

this is putting Google into our servers, too.


I would use this to quickly test any possible enhancements could be done to site in order to improve performance, but later - turn it off and implement everything in code.


I love the idea of this, but these days I always have something else sitting in front of apache (usually nginx), it seems like this is where i'd want a module like this to be running.

If apache rewrites the html with such things like temporary optimized versions of files, nginx presumably won't know anything about serving them, so it'll break (date urls being an exception here I guess).

For things like media, I really want to keep apache out of the loop completely.


Doesn't matter. I ditched Apache completely for nginx and didn't look back since then. Every apps get proxied through nginx: PHP scripts run as FastCGI (php-fpm), Rails and Sinatra apps run on Unicorn ...


This made me realize just how strongly Google is tied to the success of 'the internet' vs the traditional desktop. There isn't much in their strategy that doesn't make sense if you stop trying to figure out how it ties into seach/advertising and if you see how it just enhances the internet as a whole.

edit: of course enhancing the internet increases the number and frequency of people interacting with their search, adwords on pages, and feeling comfortable with web apps (helping create lead for Google Apps B2B)


Would love something like this for nginx - Apache's just using too much memory.


Put nginx in front of apache and allow it to buffer responses. That way apache can move on to rendering the next request while nginix empties the buffer. Each apache process still uses just as much memory, but they can get more work done, because they aren't tying up memory while they push the buffered response out.


True that. But it's just a wish from my perspective since I'm running on a VPS with limited memory so I can't really afford to have both running.


Most of these are micro optimizations that matter most on a larger site. Most large site do not use Apache for static content. I agree, nginx would be a better first choice for a module.

One thing that may not be obvious about minimization is that it does not yield much size reduction when coupled with gzip. You may see 30% decrease in size with minimization on the plain file, but when you gzip both the plain and minimized you may only see a difference for a couple of hundred bytes. If you are worried about speed, you should be using gzip for static content.


I've found using minimization to give up 40% smaller files even after gzip. Even so, combining multiple JS or CSS files into single one and using image sprites will probably be your biggest win.


I've seen similar results. It's nice to see it everything do it on the fly but I'm guessing doing all these optimizations on a large website will take up more processor cycles?


Has anyone found/done any benchmarks showing what performance impact running mod_pagespeed will have on your /server/?

I'm all for optimizing the crap out of page serving, but if this is going to double our Apache load, a) I'd like to know about it ahead of time and b) it might be worth just applying optimizations by hand.


Somewhat ironically, one of the biggest slowdowns on my websites has been using Google Analytics. I only realised this after I switched completely to my own installation of Piwik - give it a try, the load times can be quite surprising.


Were you using the asyncronous Analytics code?


Nah, and to be fair, this was a while ago - when the async stuff had barely been released in beta. I hope the situation's improved somewhat now because of it :)


The async code makes a world of difference, it definitely improved things for me and my sites.


As another option - you can cache GA js code locally and combine it with the rest site's js code. That's what Google Analytics module for Drupal is doing. Result - no third-party requests at all.


Google has been accused by free software luminaries that, while it respects the letter of the GPL, it does not respect the spirit of the license, as it does not share its optimized versions of the Linux kernel, Apache server, and many other GPL software. Of course, as it's GPL, not AGPL, they are not legally in the wrong. This nice piece of infrastructure is a step forward, let's hope they release more of their custom infrastructure.


Does Google use this internally? I got the impression that it was written specifically for external use.


I would think they almost certainly don't use this internally given that they don't run Apache for any of their user-facing applications.


For those worried that it might break something on their site, a word of advice. You shouldn't expect anything to "just work," and should, like any technology that you make use of, evaluate and test it out to see if it's right for you.

There's no details on it, but I wonder if GoDaddy has some plan to provide a way to turn specific things off. If not, this sounds like it could be a disaster.


This looks very promising! Microsoft has similar features (although not officially supported yet) for ASP.NET: http://aspnet.codeplex.com/releases/view/50869 http://aspnet.codeplex.com/releases/view/35893


All great ideas. I'm in a bit of a knowledge-base limbo where I spend just enough time setting up my server so that it survives a load test. I use nginx as my daily driver; I think I'll try to level up some admin skill and try to apply some of these principles to my setup. Who knows, maybe I'll do a write up.


Some concerns about this module here: http://blog.mostof.it/mod_pagespeed/

Don't let developers be blinded by an automatic optimizer.


Does it cache optimized images or regenerates them for every request?

Edit: Yes, it does:

In order to rewrite resources, mod_pagespeed must cache them on the server.


Has anyone been able to get this working with a web server running cPanel?


It's sad/embarrassing they don't have a Chrome plugin for this but do offer Firefox.


Screw that, they've got their priorities straight.

It's awesome that they don't treat 'uncaptured synergy' as a release blocker like Microsoft would have.


Firefox has a more powerful plugin system than Chrome, and I would bet that most web developers have Firefox installed, even if only for testing and for Firebug.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: