So interesting that even though their mathematical representations may differ, both the Jaccard index and the Tanimoto similarity essentially capture the same concept of similarity between sets: the ratio of the size of the intersection to the size of the union. Hence, they result in the same values, given the set operations, and are often used interchangeably.
Does FeatureBase compute on one or the other better?
Running dynamically generated Python code can be inherently risky and could cause serious harm to your local environment. The potential is amazing though. One must just tread carefully in this new world...
Tremor Video is processing data on 20B devices with greater than 30k attributes and is updating them at a rate of greater than 1 million records per second using FeatureBase. Not only is the ingest throughput amazing but reduced the hardware footprint of the main table to ~12TB from ~200TB in other, much slower and much more complex databases like Cloudera, Druid, Vertica, ArangoDB, Redis, and Aerospike.
This post discusses how FeatureBase uses Bit-sliced indexes to significantly reduce the number of bitmaps needed to represent a range of integer values. And by applying range-encoding to the indexes, it is able to perform lightning fast range queries.
To get to true real-time analytics, we can't just get better at scaling up or scaling out. Current technologies now excel at allowing users to spend more in order to get better performance, but the next generation of technologies needs to break efficiency barriers and open up new use cases. Scaling out works to a point, but the more machines you have, the higher the floor is on the latencies you can practically achieve. A more efficient technology is needed.
Bitmaps are amazing and we believe that they will replace all other data formats for analytics.
Today we open sourced about $25m in R&D (more to come)! They take a bit of harnessing, but once you do the results are amazing. We have released an OLAP database called FeatureBase, built entirely on bitmaps (well bitmaps in a b-tree)...