Libre Geolocation
This project aims to be an alternative to Mozilla Location Services that offers public domain dumps of its WiFi database.
Please see the work-in-progress documentation
When Mozilla Location Services shut down it wasn’t able to publish the massive amount of access points its users had collected due to legal and privacy concerns. It obfuscates the data it releases so that it is not possible to reasonably estimate the location of a single device.
Note: The documentation is work-in-progress and at an early stage.
Libreloc is FOSS and community-run.
Overview and FAQ
Libreloc provides an API compatible with Mozilla Location Services that can be used on various mobile devices:
-
Android (including GrapheneOS microG LineageOS)
-
iOS
-
Linux mobile devices
-
Linux, Apple and Windows laptops & more
Will Libreloc publish WiFi MAC addresses and SSID around me?
No. Data will be obfuscated before release and/or published in aggregated formats.
Libreloc aims to be GDPR compliant and avoid privacy leaks like in this research paper.
Are you storing MAC addresses and SSIDs?
No, such data is hashed in non-reversible manners before it touches the database.
Can hashed data be bruteforced using powerful GPUs?
Probably not. We are planning to use short enough hashes so that each individual datapoint would not lead to significant privacy loss.
Do you know my location?
When using the MLS-compatible geolocation API /v1/geolocate your location is calculated by the server, so yes.
We are also building a privacy-preserving API that will let clients calculate the precise location locally.
What if the server goes down or runs out of capacity?
We are planning to support geographical and logical sharding and failover.
Can host an instance for my organization?
Of course!
Contributing
When contributing to the codebase update the licensing data on .reuse/dep5 and use a comment stile compatible
Git Cliff.
Running integration tests
Integration tests use a test local database.
cargo test
Running in development mode
Build and run locally with CONF_FN=testbed.toml cargo run
Monitor with:
sudo journalctl -f --identifier libreloc
It generates metrics locally using the StatsD protocol. Run a StatsD receiver like Netdata on UDP port 8125
Building a Debian package for testing or deployment
The service is started and managed by a Systemd unit.
make debian_install_build_deps
make debian_build_deb
Roadmap
-
❏ Basic CI
-
❏ Research lookup maps
-
❏ Metrics
-
❏ Generate docs from CI
-
❏ Benchmark databases
-
❏ Deployment tools and documentation
-
❏ Public metrics dashboard
-
❏ Full CI
-
❏ Privacy-aware API and caching
-
❏ Data backup
Goals
-
Provide geolocation for a diverse family of devices across Android, Linux etc
-
Manage privacy issues; do not breach GDPR
-
Keep server requirements (CPU/memory/storage) reasonably low
-
Where possible, limit single points of failure on technical and organizational level
Difficult use-cases
IoT or laptop: a device without GSM and GPS. Relies only on WiFi/BT, therefore depends on the quality of data captured by GPS-enabled devices.
Traveller: a mobile device with limited or no access to the Internet where pre-caching phone/wifi/bt maps is possible.
Mobile access point: a mobile router or phone can create a privacy breach and be used to track the location of the owner. See https://www.cs.umd.edu/~dml/papers/wifi-surveillance-sp24.pdf
Other constraints
Devices cannot store locally billions of hashed wifi/bt datapoints, however local data can be fetched and cached and can accept that initial cache warmup takes 10 to 30 seconds. Most users have a home/work/school routine where useful data is highly local.
Design ideas
-
Require multiple clients to update the same AP before releasing it in the dataset to reduce the traceability of an update to a single client. Also good for verifiying the APs submitted location.
-
Support both an MLS-compatible API as long as needed and a privacy-friendly API (PFAPI)
Client side lookup process
This section is related to the PFAPI and not currently implemented.
Clients can have a variable amount of logic, ranging from relatively simple as in MLS, where most of the work is delegated to the central service, to slightly smarter.
The latter can be beneficial for:
-
limiting upload of GDPR-sensitive data like full AP macaddrs, i.e. upload hashed values instead
-
provides fallbacks where Internet access / GSM / GPS are not available
-
allow sharding/load-balancing servers and failover
Location discovery resources
| Type | Accuracy | Availability | Inet | GSM | DP | GPS |
|---|---|---|---|---|---|---|
Previous loc |
Variable |
High |
||||
GPS |
5 Meter |
Low |
y |
|||
Phone Cells |
5 Km |
Low-medium |
y |
|||
GSM country |
Country |
Low-medium |
y |
|||
Wifi/BT nodes |
Meters |
Low-medium |
||||
GeoIP |
City/Country |
Medium |
y |
y? |
||
DNS Anycast |
Continent |
Medium |
y |
y? |
||
RTT |
Continent |
Medium |
y |
y? |
Table description: the last 4 columns flag whereas Internet, GSM/LTE, a paid dataplan or a GPS receiver is required.
Previous loc: last known location, stored with a timestamp and accuracy. When used, the accuracy value is decreased based on the elapsed time.
Phone Cells: phone tower database, cached locally. Works without a dataplan.
GSM country: Mobile Country Code (MCC). Works without a dataplan.
GeoIP: public-ipaddr based lookup. Usually pretty reliable at country granularity [unless VPNs are in use]. Some databases are available without significant licensing restrictions: https://archive.org/download/dbip-country-lite
DNS Anycast: many cloud providers offer inexpensive DNS anycast that can both direct clients to the closest server while also discovering the client network location, both with continent granularity.
RTT: clients can ping or tcp-ping 3-4 endpoints and immediately tell if they are close to one of them using a threshold on latency. Very reliable on continent level [unless VPNs are in use].
Clients can implement an "incremental" geolookup process where needed:
-
Attempt to use readily available data: GPS location, last known location, GSM-based location, Internet-based positioning, etc
-
If needed, download GSM tower cell data and cache it locally
-
If needed, download hashed wifi/BT data and cache it locally
-
If needed, query the closest Ichnaea server
By having discovered the location on country/continent level in step 1, the client can connect to the closest Ichnaea server. This allows sharding geographical data across 3/4/5 macroareas and also increases reliability.
Step 4 is backward compatible with the current MLS/ichnaea implementation
Preventing tracking by mapping one device to multiple locations
Inspired by [WiGLE’s m8b](https://github.com/wiglenet/m8b) format, hashes can be truncated to create collisions on purpose. This means that a client cannot lookup the location of a single device. Instead, the client knows of multiple possible locations for a device, and needs to know multiple devices that are physically nearby in order to estimate which locations are most likely.
Making location data noisy/useless without knowing the original MAC
The hash of a MAC can be used to seed a random location offset in the database. This helps obfuscate sensitive locations, as data points that are physically nearby appear multiple KMs away in the database.
We may want the server to know the real location for the purpose of improving data quality, like removing old APs only if data has recently been submitted nearby.
Protecting the privacy of contributors
-
delaying updates and publishing in batches so that changes from multiple contributors in the same area are merged
-
random update delays per AP to make it harder to track individual uploads
-
random location offset per AP (above)
WiFi/BT geolocation
<TODO>