I helped stand up MLS and worked on the Android "stumbler" submission app. The original motivation for MLS was to provide a free location service for carriers that would potentially ship Firefox OS devices but not want to pay for Google's or Qualcomm's location services.
We always wished we could release the Wi-Fi location data, but there were privacy and safety concerns. Google was sued in Germany for recording Wi-Fi access points' locations and some data packets. And we were considering publishing similar location data, which would be an even bigger liability. The Google lawsuit lead to a lame opt-out system: to opt out of Google's Wi-Fi mapping, you're supposed to append "_nomap" to your SSID. To opt out of Microsoft's Wi-Fi mapping, you're supposed to append "_optout" to your SSID. I'm not sure how you're supposed to do both. MLS honored both "_nomap" and "_outopt" anywhere in the SSID string, filtering out those access points from submissions and lookup requests on both Mozilla's clients and server side.
A potential safety concern for releasing the MLS location data was the "stalker scenario": if you knew someone's access point MAC address and they moved, you might be able to find their new location by looking up their MAC address in the location database. This scenario was less of concern for lookup requests to the MLS server server because lookups include a list of MAC addresses that the client sees. The server returns an average of those neighboring MAC addresses' locations, but shouldn't return the location of an individual MAC address. (I don't know if MLS currently implements this restriction.)
Another protection (not implemented, AFAIK) could be to require lookups to include the SSID that matches a known MAC address and SSID pair. This would allow an access point's owner to change their SSID so lookups using the old SSID don't return the new location.
I thought we might be able to release the location database leveraging that restriction of requiring multiple MAC addresses. Instead of releasing a database mapping raw MAC addresses to locations, the database would map hashes of MAC1 + SSID1 + MAC2 + SSID2 to locations. Offline database lookups would need to know the MAC addresses and current SSIDs of two access points that had previously been seen together. I'm not a cryptographer, so there's probably some hole in this idea. :) It would also significantly increase the size of the database.
Requiring lookups to include multiple MAC addresses and matching SSIDs could also reduce the impact of poisoned location submissions because lookups (online or offline) would only see poisoned data if they included multiple poisoned MAC addresses and valid matching SSIDs pairs that can previously been submitted as neighbors. Poisoned data in the database doesn't affect clients that don't fetch it. :)
Another way to reduce the impact of poisoned location submissions could be to filter out submitting clients' new access point locations outside the submitting client's GeoIP region.
A neat trick from lookups including multiple access points is that the service can learn about new access points and their locations. If the service has seen MAC1 and MAC2 before and a client's lookup says they see MAC1, MAC2, and MAC3, the service can return the location average of MAC1 and MAC2 and tentatively record that new MAC3 exists at that average location and is a neighbor of MAC1 and MAC2.
We always wished we could release the Wi-Fi location data, but there were privacy and safety concerns. Google was sued in Germany for recording Wi-Fi access points' locations and some data packets. And we were considering publishing similar location data, which would be an even bigger liability. The Google lawsuit lead to a lame opt-out system: to opt out of Google's Wi-Fi mapping, you're supposed to append "_nomap" to your SSID. To opt out of Microsoft's Wi-Fi mapping, you're supposed to append "_optout" to your SSID. I'm not sure how you're supposed to do both. MLS honored both "_nomap" and "_outopt" anywhere in the SSID string, filtering out those access points from submissions and lookup requests on both Mozilla's clients and server side.
A potential safety concern for releasing the MLS location data was the "stalker scenario": if you knew someone's access point MAC address and they moved, you might be able to find their new location by looking up their MAC address in the location database. This scenario was less of concern for lookup requests to the MLS server server because lookups include a list of MAC addresses that the client sees. The server returns an average of those neighboring MAC addresses' locations, but shouldn't return the location of an individual MAC address. (I don't know if MLS currently implements this restriction.)
Another protection (not implemented, AFAIK) could be to require lookups to include the SSID that matches a known MAC address and SSID pair. This would allow an access point's owner to change their SSID so lookups using the old SSID don't return the new location.
I thought we might be able to release the location database leveraging that restriction of requiring multiple MAC addresses. Instead of releasing a database mapping raw MAC addresses to locations, the database would map hashes of MAC1 + SSID1 + MAC2 + SSID2 to locations. Offline database lookups would need to know the MAC addresses and current SSIDs of two access points that had previously been seen together. I'm not a cryptographer, so there's probably some hole in this idea. :) It would also significantly increase the size of the database.
Requiring lookups to include multiple MAC addresses and matching SSIDs could also reduce the impact of poisoned location submissions because lookups (online or offline) would only see poisoned data if they included multiple poisoned MAC addresses and valid matching SSIDs pairs that can previously been submitted as neighbors. Poisoned data in the database doesn't affect clients that don't fetch it. :)
Another way to reduce the impact of poisoned location submissions could be to filter out submitting clients' new access point locations outside the submitting client's GeoIP region.
A neat trick from lookups including multiple access points is that the service can learn about new access points and their locations. If the service has seen MAC1 and MAC2 before and a client's lookup says they see MAC1, MAC2, and MAC3, the service can return the location average of MAC1 and MAC2 and tentatively record that new MAC3 exists at that average location and is a neighbor of MAC1 and MAC2.