I guess then I'm not the only one to think the interpretation of the data in these slides is consistently poor.
e.g. The slide with the correlation scatter plots - who devised the model to draw those trend lines? That's just awful statistics.
e.g.2. The slide that judges the important pages in the deck based on the length of time they're looked at (important? why? why not 'complex' or 'difficult to grok'?)
e.g.3. Observed slide order actually says 'team is never in the middle', but then averages the team to be in the middle of the deck.
Hmmm. What an awful piece of data analysis on potentially interesting data.
At risk of outing myself as massively non hipster, I truly don't see the point. Stating obviously incorrect statements. What's the point of this game? If you're trying to suggest that the original poster made a mistake, grow up and just say so. Sometimes, there is a benefit to demonstrating a point obliquely, when doing so causes people to think about things in a new way and thus reach a burst of enlightenment. That's not the case here. This is just smug hipsterness.
Remembering pg's great advice before demo day, a pitch has only 2 goals:
- Capture the investors' attention -- they'll have to remember your company among dozens
- Make them curious to hear more -- you want them to reach out and ask for a meeting
None of these requires a big number of slides, or for that matter, time to go through the pitch. Most entrepreneur do a simple mistake and try to put everything that's relevant to their project in the pitch.
Average time could be very misleading here. I would assume that it's a bimodal distribution (or something similar), with many decks that get almost no time and some that get a lot of time.
So assuming a VC will spend ~4 minutes on your deck would be a bad idea. They'll probably either toss it out more quickly than that or take a decent amount of time to go through it properly.
A natural distribution in this case is the exponential distribution. In the first minute 80% gets thrown out. In the second minute 80% of the remainder gets thrown out, etc.
The bit about financials slides mattering most because investors spend more time reading them seems misleading, especially in light of the evidence that "only 57 percent of successful decks have this section."
Investors spend more time on financials slides because they're information dense (there's literally just more to read), because they often receive less design consideration than the "softer" slides (slap a table on there because it looks like something an accountant would make), and because they're calculating runway and evaluating business-savvy — not because they necessarily "matter more," whatever that even means.
I suspect it also has to do with traditions in finance. Due diligence, audits of accounts, etc, are a integral part of how education in finance is conducted. Also, it is easy to do the basic arithmetic and see if proposals "add up". So people are more used to examining financial documents in detail.
When it comes to markets, people, sales, etc, people are more used to going with their "domain knowledge" or "gut instinct", which takes only a few seconds.
The data presented is a sample, the "true" distribution might be different from what is shown, and if you start 'cleaning" the data based on subjective judgements, you might never discover the true range.
The thing is that the three dots that lay "outside the range were the other dots are" (not the "true" range which we of course don't know) have a higher impact on the conclusion than the amount of them justifies. That's why taking them out should not harm but help. This is not the same as taking datasets out of a medical study because they got sick by the new medicine. E.g., if you look at the left diagram. The upward swing in the red curve is probably the result of all the left points (199 points). But the downward trend is as far as I can see the result of 1 point on the right. So can we trust that downward trend as much as the upward trend? Probably not. Additionally removing the two dots at the 150 meetings point might result in the curve not having such a strong incline, but one that is backed by 196 points.
I really don't have more than basic education in statistics, but to me it looks like a slight but constant upward trend until 100 investors contacted and after that we simply don't have enough data. If you think about what the diagram talks about getting a constant upward line or even more likely a square root shaped line is very likely. Getting a bell curve is unlikely, just from thinking about the topic. What should happen that contacting more than X investors would result in less meetings?
In the second diagram removing the two dots at $4.5m and the two dots above 200 investors met leaves us with 196 data points that show a fairly equal spread and that the amount of investors contacted might not influence that much how much money you get in the end. That's also good because it's expected to find that company evaluation depends on company value (customers, market, team, etc) and not on how many investors you contact.
You make the data more conclusive by removing data points that have a big impact on your results without the backup of other data points. That's what handling outliers is.
As to the investors contacted vs meetings plot, the red line is evidently bogus. It is likely caused by trying to fit the data to some polynomial for which there is no a priori justification. A straight-line fit would look more reasonable and also be less affected by one "outlier" who contacted 300 investors.
I would agree that cutting-off curve fitting at 100-150 investors would look prettier; the thing to do in that case would be either to not attempt to fit a curve and just show the data, or alternately to fit only the earlier part (so the red line stops half-way across the x-axis), but show all the data.
The only reason for doing a curve fit on this kind of data is to show that the data fit some prior model, showing that the model is likely correct, or to show where some statistical value lies.
> The data presented is a sample, the "true" distribution might be different from what is shown, and if you start 'cleaning" the data based on subjective judgements, you might never discover the true range.
Note that if the sample is large enough and selected uniformly, the need for the analysis on the entire population is unnecessary.
The sample isn't particularly large, and we don't know the extent of possible selection biases. For example one obvious bias is that the sample is restricted to data sent through Docsend, thus likely does not represent a uniform sampling of all pitches.
In any case, there is no prior reason to exclude outliers; any large enough representative sampling of a Gaussian distribution would include "outliers".
"DocSend, a startup that provides people with a secure and private way of sharing files like offer letters or legal agreements, studied more than 200 pitchdecks ..."
If it's private, how were they able to study them?
When I was younger, my mother had bought a condo to move into when I was living with her without me even getting to check it out.
Before going to it, she said "You'll even have a private bathroom".
When I see the apt, it turns out she had a suite bathroom, and I had the other one: the guest bathroom. To this day she still claims it was my private bathroom, since she never got into it.
Does anybody see how the red lines in the "Strive for Quality, not Quantity" diagrams were generated? Looking at the data I don't get the impression that either one looks like it follows the data. Both lines seem to be as good as the other one in either diagram. I'm no expert but from having so many dots on the left sides and nearly none on the right sides I'd say the data is inconclusive. What do the data analysts say?
e.g. The slide with the correlation scatter plots - who devised the model to draw those trend lines? That's just awful statistics.
e.g.2. The slide that judges the important pages in the deck based on the length of time they're looked at (important? why? why not 'complex' or 'difficult to grok'?)
e.g.3. Observed slide order actually says 'team is never in the middle', but then averages the team to be in the middle of the deck.
Hmmm. What an awful piece of data analysis on potentially interesting data.