Libre Geolocation

This project aims to be an alternative to Mozilla Location Services that offers public domain dumps of its WiFi database.

Please see the work-in-progress documentation

When Mozilla Location Services shut down it wasn’t able to publish the massive amount of access points its users had collected due to legal and privacy concerns. It obfuscates the data it releases so that it is not possible to reasonably estimate the location of a single device.

Note: The documentation is work-in-progress and at an early stage.

Libreloc is FOSS and community-run.

Overview and FAQ

Libreloc provides an API compatible with Mozilla Location Services that can be used on various mobile devices:

Android (including GrapheneOS microG LineageOS)
iOS
Linux mobile devices
Linux, Apple and Windows laptops & more

Will Libreloc publish WiFi MAC addresses and SSID around me?

No. Data will be obfuscated before release and/or published in aggregated formats.

Libreloc aims to be GDPR compliant and avoid privacy leaks like in this research paper.

Are you storing MAC addresses and SSIDs?

No, such data is hashed in non-reversible manners before it touches the database.

Can hashed data be bruteforced using powerful GPUs?

Probably not. We are planning to use short enough hashes so that each individual datapoint would not lead to significant privacy loss.

Do you know my location?

When using the MLS-compatible geolocation API /v1/geolocate your location is calculated by the server, so yes.

We are also building a privacy-preserving API that will let clients calculate the precise location locally.

What if the server goes down or runs out of capacity?

We are planning to support geographical and logical sharding and failover.

Can host an instance for my organization?

Of course!

Contributing

When contributing to the codebase update the licensing data on .reuse/dep5 and use a comment stile compatible Git Cliff.

Running integration tests

Integration tests use a test local database.

cargo test

Running in development mode

Build and run locally with CONF_FN=testbed.toml cargo run

Monitor with:

sudo journalctl -f --identifier libreloc

It generates metrics locally using the StatsD protocol. Run a StatsD receiver like Netdata on UDP port 8125

Building a Debian package for testing or deployment

The service is started and managed by a Systemd unit.

make debian_install_build_deps
make debian_build_deb

Roadmap

❏ Basic CI
❏ Research lookup maps
❏ Metrics
❏ Generate docs from CI
❏ Benchmark databases
❏ Deployment tools and documentation
❏ Public metrics dashboard
❏ Full CI
❏ Privacy-aware API and caching
❏ Data backup

Goals

Provide geolocation for a diverse family of devices across Android, Linux etc
Manage privacy issues; do not breach GDPR
Keep server requirements (CPU/memory/storage) reasonably low
Where possible, limit single points of failure on technical and organizational level

Difficult use-cases

IoT or laptop: a device without GSM and GPS. Relies only on WiFi/BT, therefore depends on the quality of data captured by GPS-enabled devices.

Traveller: a mobile device with limited or no access to the Internet where pre-caching phone/wifi/bt maps is possible.

Mobile access point: a mobile router or phone can create a privacy breach and be used to track the location of the owner. See https://www.cs.umd.edu/~dml/papers/wifi-surveillance-sp24.pdf

Other constraints

Devices cannot store locally billions of hashed wifi/bt datapoints, however local data can be fetched and cached and can accept that initial cache warmup takes 10 to 30 seconds. Most users have a home/work/school routine where useful data is highly local.

Design ideas

Require multiple clients to update the same AP before releasing it in the dataset to reduce the traceability of an update to a single client. Also good for verifiying the APs submitted location.
Support both an MLS-compatible API as long as needed and a privacy-friendly API (PFAPI)

Client side lookup process

This section is related to the PFAPI and not currently implemented.

Clients can have a variable amount of logic, ranging from relatively simple as in MLS, where most of the work is delegated to the central service, to slightly smarter.

The latter can be beneficial for:

limiting upload of GDPR-sensitive data like full AP macaddrs, i.e. upload hashed values instead
provides fallbacks where Internet access / GSM / GPS are not available
allow sharding/load-balancing servers and failover

Location discovery resources

Type	Accuracy	Availability	Inet	GSM	DP	GPS
Previous loc	Variable	High
GPS	5 Meter	Low				y
Phone Cells	5 Km	Low-medium		y
GSM country	Country	Low-medium		y
Wifi/BT nodes	Meters	Low-medium
GeoIP	City/Country	Medium	y		y?
DNS Anycast	Continent	Medium	y		y?
RTT	Continent	Medium	y		y?

Type

Accuracy

Availability

Inet

GSM

GPS

Previous loc

Variable

High

GPS

5 Meter

Low

Phone Cells

5 Km

Low-medium

GSM country

Country

Low-medium

Wifi/BT nodes

Meters

Low-medium

GeoIP

City/Country

Medium

DNS Anycast

Continent

Medium

RTT

Continent

Medium

Table description: the last 4 columns flag whereas Internet, GSM/LTE, a paid dataplan or a GPS receiver is required.

Previous loc: last known location, stored with a timestamp and accuracy. When used, the accuracy value is decreased based on the elapsed time.

Phone Cells: phone tower database, cached locally. Works without a dataplan.

GSM country: Mobile Country Code (MCC). Works without a dataplan.

GeoIP: public-ipaddr based lookup. Usually pretty reliable at country granularity [unless VPNs are in use]. Some databases are available without significant licensing restrictions: https://archive.org/download/dbip-country-lite

DNS Anycast: many cloud providers offer inexpensive DNS anycast that can both direct clients to the closest server while also discovering the client network location, both with continent granularity.

RTT: clients can ping or tcp-ping 3-4 endpoints and immediately tell if they are close to one of them using a threshold on latency. Very reliable on continent level [unless VPNs are in use].

Clients can implement an "incremental" geolookup process where needed:

Attempt to use readily available data: GPS location, last known location, GSM-based location, Internet-based positioning, etc
If needed, download GSM tower cell data and cache it locally
If needed, download hashed wifi/BT data and cache it locally
If needed, query the closest Ichnaea server

By having discovered the location on country/continent level in step 1, the client can connect to the closest Ichnaea server. This allows sharding geographical data across 3/4/5 macroareas and also increases reliability.

Step 4 is backward compatible with the current MLS/ichnaea implementation

Preventing tracking by mapping one device to multiple locations

Inspired by [WiGLE’s m8b](https://github.com/wiglenet/m8b) format, hashes can be truncated to create collisions on purpose. This means that a client cannot lookup the location of a single device. Instead, the client knows of multiple possible locations for a device, and needs to know multiple devices that are physically nearby in order to estimate which locations are most likely.

Making location data noisy/useless without knowing the original MAC

The hash of a MAC can be used to seed a random location offset in the database. This helps obfuscate sensitive locations, as data points that are physically nearby appear multiple KMs away in the database.

We may want the server to know the real location for the purpose of improving data quality, like removing old APs only if data has recently been submitted nearby.

Protecting the privacy of contributors

delaying updates and publishing in batches so that changes from multiple contributors in the same area are merged
random update delays per AP to make it harder to track individual uploads
random location offset per AP (above)

WiFi/BT geolocation

<TODO>