Grids – Comparative Analysis of Geohash, H3, and S2

Bucketing and discretization is important step in any algorithmic training or analysis, today we focus on the ways we can map the continuous spherical surface of Earth into manageable buckets. Why exactly we do this? to index different locations on earth to allow for fast geospatial queries, joins and aggregations. There are three popular geo indexing systems – Geohash, Uber’s H3 and Google’s S2, all let you map (lat, lon) -> a cell id, but they differ in what surface they discretize and therefore what kinds of distortion they introduce

Geohash

Divides the map into a standard grid, but the physical size of the grid shrinks at the poles

Structure

  • Rectangular cells
  • Recursive subdivision in lat/lon space

Distortion

  • Uniform in degrees (not in meters)
  • Physical width of cells shrinks toward the poles

Indexing

  • uses a Z-order curve for indexing
  • outputs a base-32 string
  • prefix matching – same prefix – closer – allows spatial clustering by text match

H3

Maps the Earth to an icosahedron (20 faces) covered in hexagons, developed Uber

Structure

  • Discretize the sphere using an icosahedron and mostly hexagonal cells
  • requires a small number of pentagons to fil in the gaps
  • mostly uniform cells – great for grid math (neighbors, rings, disks)
  • distance from center is identical

Distortion

  • Distortion exists (as its on a sphere), but is spread relatively evenly

Indexing

  • Uses an aperture 7 system (subdividing each cell into 7 smaller parts
  • Hexagons give strong neighborhood properties (k-rings, local adjacency)

S2

Maps the earth to a cube, then projects it onto a sphere, subdivide cube faces, then project back to the sphere, developed by Google

Structure

  • Each face is recursively subdivided into four child cells (a quadtree), resulting in quadrilateral-ish cells on the sphere

Distortion

  • Bounded and well behaved globally, with better behavior near the poles compared to planar lat-lon grids

Indexing

  • uses a Hilbert space curve
  • Great for hierarchy with predictable parent/child relationships
  • very fast because maps to 64bit integers

Size and Precision

Geohash

Resolution Levels

1 to 12 (12 characters total)

Hierarchy: base 32 string – every character is 5 bit worth of precision which means exponential scaling. Z curve indexing to divide further on each level

Level 1 starts with, 8 columns (ie 360/8 = 45 degree lon), 4 rows (ie 180/4 = 45 degree lat) = 5000x5000km blocks

Neighborhood Geometry: In a rectangular grid, the distance to vertical and horizontal neighbors is 1, but the distance to diagonal neighbors is 2^(1/2)

Precision Breakdown:

  • Level 4 (Metropolitan): ~39 km
  • Level 5 (City): ~5 km
  • Level 6 (Neighborhood): ~1.2 km
  • Level 12 (Maximum Precision): ~3.7 cm by 1.8 cm

H3

Resolution Levels

0 to 15

Hierarchy: Because hexagons cannot be perfectly subdivided into smaller hexagons (aperture 7 system – hexagons +pentagons), H3’s sub-hierarchy is not perfectly contained (child cells slightly overlap parent boundaries)

Neighborhood Geometry: The distance from the center of a hexagon to all six of its adjacent neighbors is exactly equal. This makes it mathematically superior for radial calculations and smoothing

Precision Breakdown:

  • Level 0 (Continental): ~1100 km edge length
  • Level 3 (State): ~60 km
  • Level 5 (City): ~8.5 km
  • Level 7 (Neighborhood): ~1.2 km
  • Level 15 (Maximum Precision): ~0.5 m

S2

Resolution Levels

0 to 30

Hierarchy: A perfect quadtree. Every parent cell is mathematically subdivided into exactly 4 child cells, with zero edge overlap

projects earth to cube – cube is level 0 (cell at 0 = 1/6 earth)

Neighborhood Geometry: Cells are indexed using a continuous Hilbert curve, which avoids the spatial “jumps” seen in the Geohash Z-curve. Because S2 maps directly to 64-bit integers, computational operations are incredibly fast

Precision Breakdown:

  • Level 0 (Continental): ~10,000 km edge length
  • Level 4 (Country): ~576 km
  • Level 10 (City): ~9 km
  • Level 13 (Neighborhood): ~1 km
  • Level 20 (House): ~8 m
  • Level 24 (Surveying): ~55 cm
  • Level 30 (Maximum Precision): ~0.9 cm

Grid Cells

To see how these indexing systems behave in practice, let’s look at a specific coordinate in Ottawa (Lat: 45.428, Lon: -75.718) and map it to a “City” resolution across all three systems.

We choose the following resolution / level for each grid system:

"city": {"geohash": 5, "h3": 6, "s2": 11}

which ranges from 16 – 40 km2 (City Ward / Large District)

While Geohash, H3, and S2 all have a scale that conceptually maps to a “City District” their actual geometric footprints vary wildly:

Size

  • Geohash (Precision 5)
    • Token: f244k (Parent: f244)
    • Centroid: 45.417480, -75.739746
    • Dimensions: 3433.86m x 4891.99m
    • Estimated Area: ~16.8 million m² (16.8 km²)
  • H3 (Resolution 6)
    • Token: 862b83b1fffffff (Parent: 852b83b3fffffff)
    • Centroid: 45.419360, -75.762752
    • Edge Length: 3950.16m
    • Estimated Area: ~39.3 million m² (39.3 km²)
  • S2 (Level 11)
    • Token: 4cce04c (Parent: 4cce05)
    • Centroid: 45.443463, -75.710880
    • Edge Length: 3828.80m
    • Estimated Area: ~16.1 million m² (16.1 km²)

Quantization

Quantization error – cell centroid distance – If we are putting a cell around a lat lon – how far is the cell center from the lat lon, we compare this on the smallest scale:

  • Geohash (Precision 12)
    • Token: f244kvzv68zp (Parent: f244kvzv68z)
    • Centroid: 45.428000, -75.718000
    • Dimensions: 0.03m x 0.02m
    • Estimated Area: ~0.0005 m²
  • H3 (Resolution 15)
    • Token: 8f2b83b1d90b96d (Parent: 8e2b83b1d90b96f)
    • Centroid: 45.427996, -75.718002
    • Edge Length: 0.6174m
    • Estimated Area: ~0.9731 m²
  • S2 (Level 30)
    • Token: 4cce04f4d08605cd (Parent: 4cce04f4d08605cc)
    • Centroid: 45.428000, -75.718000
    • Edge Length: 0.0073m
    • Estimated Area: ~0.0001 m²

For the given Lat: 45.428, Lon: -75.718, we’ve computed the distance of the point from the centroid (haversine distance – the details of which you can read in the next section)

the smallest possible quantization error:
Geohash(12) Off-Center Distance: 0.01 meters
H3 (15) Off-Center Distance: 0.52 meters
S2(30) Off-Center Distance: 0.00 meters

city level quantization error (for the cell size displayed in the image):
Geohash(5) Off-Center Distance: 2061.21 meters
H3 (6) Off-Center Distance: 3622.33 meters
S2(11) Off-Center Distance: 1806.92 meters

Notice the massive discrepancy in area. At the city tier, S2 and Geohash yield a similarly sized bounding box (~16 sq km), but H3’s hexagonal resolution jumps to nearly 40 sq km. Furthermore, the centroids of these cells do not perfectly align with the target coordinate. This highlights a crucial rule for geospatial engineering: you cannot blindly swap indexing systems and expect a 1:1 mapping. You must tailor the resolution level to the specific precision required by your dataset.

Distance

Planar

Flat map distance, given latitude and longitude of two points, we employ an equirectangular projection and calculate Euclidean distance.

Point 1: $(\phi_1, \lambda_1)$ , Point 2: $(\phi_2, \lambda_2)$
$\phi$ = latitude (radians), $\lambda$ = longitude (radians), $R$ = Earth radius

Reference point $(\phi_0, \lambda_0)$

Projection Center (reduce distortion):
$\phi_0 = \frac{\phi_1 + \phi_2}{2}$
$\lambda_0 = \frac{\lambda_1 + \lambda_2}{2}$

Equirectangular Projection :

east-west displacement in meters
$x = R \cdot (\lambda – \lambda_0) \cdot \cos(\phi_0)$
$\cos(\phi_0)$ term corrects for the fact that lines of longitude
get closer together toward the poles

north-south displacement in meters
$y = R \cdot (\phi – \phi_0)$

Euclidean Distance in Projected Space (m) :

$d = \sqrt{(x_2 – x_1)^2 + (y_2 – y_1)^2}$

Geodesic

Because the earth is curved, to find distance between two points specified with a latitude or longitude, we employ the haversine formula.

Point 1: $(\phi_1, \lambda_1)$ , Point 2: $(\phi_2, \lambda_2)$
$\phi$ = latitude (radians), $\lambda$ = longitude (radians), $R$ = Earth radius
$\Delta \phi = \phi_2 – \phi_1$, $\Delta \lambda = \lambda_2 – \lambda_1$

Haversine:
$a = \sin^2\left(\frac{\Delta \phi}{2}\right)+ \cos(\phi_1)\cos(\phi_2)\sin^2\left(\frac{\Delta \lambda}{2}\right)$

Angular Distance:
$c = 2 \cdot \arctan2\left(\sqrt{a}, \sqrt{1-a}\right)$

Final Distance:
$d = R \cdot c$

For a collection of points as follows:

test_pairs = {
    "local": [(45.428, -75.718), (45.47, -75.70)], #(Ottawa ~5km)
    "regional": [(45.428, -75.718), (45.5017, -73.5673)], #(Ottawa–Montreal)
    "cross_country": [(45.428, -75.718), (49.2827, -123.1207)], #(Ottawa–Vancouver)
    "equator": [(0.0, 0.0), (0.0, 10.0)], #(0°, 0° - 0°, 10°)
    "poles": [(70.0, 0.0), (70.0, 60.0)], #(70°, 0° - 70°, 60°)
}

levels = {
    # "neighborhood":{"geohash": 6,"h3": 7,"s2": 13},
    "smallest":{"geohash": 12, "h3": 15, "s2": 30},
    "largest":{"geohash": 1, "h3": 0, "s2": 0}, # geohash 0 is the whole world 
    "state": {"geohash": 3, "h3": 2, "s2": 6}, # 100,000+ km2 (State / Province Level)
    "metro": {"geohash": 4, "h3": 4, "s2": 8}, # 500+ km2 (Wide Metro Area / County Level)
    # "city" :{"geohash": 5, "h3": 5, "s2": 10},
    "city": {"geohash": 5, "h3": 6, "s2": 11}, #16 - 40 km2 (City Ward / Large District) 
    "neighborhood": {"geohash": 6, "h3": 7, "s2": 12}, #5 km2 (Neighborhood),
    "sentinel_scale": {"geohash": 8, "h3": 12, "s2": 20} #10m approx
}

we see the geodesic and planar distance are noticeably different on large scales because of earth’s curvature

Now lets compare the error between haversine distance for the actual points vs the cell centroids:

The “Local” Distance Anomaly: If you look at the “Local” test scenario (a distance of roughly 5 km) across all three systems, the error hits exactly 100.0000% at the “State” and “Largest” levels. This happens because the physical distance being measured is vastly smaller than the cell itself. If both points fall into the exact same massive cell, the calculated centroid distance is 0, resulting in a 100% error. This is a vital reminder: never use a spatial index resolution that is physically larger than the phenomena you are trying to measure.

Geohash’s Equator Extreme: At the “Largest” resolution level, Geohash struggles massively at the Equator, showing a staggering 314.0962% error. By comparison, H3 at the Equator for the same level shows a 64.7221% error. Geohash’s rectangular planar projection simply cannot handle extreme longitudinal distances reliably at a low resolution.

S2’s Rigorous Bounding: Look at S2’s “Largest” row: it hits a flat 100.0000% error across all scenarios. Because S2 Level 0 cells cover exactly one-sixth of the Earth’s surface, almost all of our test pairs snapped to the same massive cell centroid. However, as soon as you step down to the “Metro” and “City” levels, S2’s errors drop rapidly and consistently.

Convergence at the Sentinel Scale: By the time we reach the “Sentinel_Scale” (roughly 10-meter resolution, perfect for satellite imagery) and the “Smallest” scale, all three systems converge to near 0.0000% error, regardless of whether you are tracking a cross-country flight or a localized grid outage

The following is how the ‘city’ level cells look like over a regional distance

Intersect

By intersect what we mean is geofencing and point in polygon problem, in most of geospatial problems, we will constantly be checking if telemetry coordinates fall within a specific boundary, and for that we can either do the exact mathematical calculation (which is computationally expensive at scale) or check if the point belongs to a pre computed set of grid cells (which is blazing fast but introduces edge-case errors)

When determining if a point falls inside a complex polygon, we use these test cases (firmly inside the polygon, riding the borderline, and just outside) and compare the exact geometric calculation (using the Shapely library) against native grid set lookups across Geohash, H3, and S2 at the “Neighborhood/City” scale.

geofence_poly = [ #downtown ottawa - bounding box
    [45.430, -75.715],
    [45.432, -75.695],
    [45.418, -75.690],
    [45.415, -75.705],
    [45.430, -75.715] 
]

test_points = {
    "Inside": (45.422, -75.700),
    "Borderline": (45.430, -75.710),
    "Outside": (45.410, -75.720)
}
Point_Type Lat_Lon Exact_Contains GEOHASH_In_Set H3_In_Set S2_In_Set GEOHASH_Intersects H3_Intersects S2_Intersects Exact_Time_ms GEOHASH_Set_Time_ms H3_Set_Time_ms S2_Set_Time_ms GEOHASH_Intersect_Time_ms H3_Intersect_Time_ms S2_Intersect_Time_ms
0 Inside 45.422, -75.7 True True True True True True True 0.0193 0.0059 0.0031 0.0143 0.0065 0.0149 0.0054
1 Borderline 45.43, -75.71 True True False True True True True 0.0053 0.0054 0.0018 0.0142 0.0046 0.0063 0.0050
2 Outside 45.41, -75.72 False False False True False False True 0.0500 0.0065 0.0030 0.0160 0.0064 0.0068 0.0082

Where In Set is a still a mathematical intersects (using shapely) for the cell polygon and intersects is the the grid systems native intersect function.

The Inside Point: All three systems, as well as the exact calculation, correctly identified that the point was inside the polygon. No surprises here.

The Borderline Point: The exact math confirmed the point was inside. Geohash and S2 successfully flagged it via set lookup. However, H3’s native set lookup returned False. Because hexagons don’t perfectly conform to sharp, straight polygon edges, the H3 polyfill algorithm left a slight gap at the border, resulting in a false negative.

The Outside Point: The exact math correctly flagged the point as outside. Geohash and H3 agreed. However, S2 returned True for both the set lookup and dynamic intersection. Because the S2 cell at this specific resolution level was large enough to overlap the polygon boundary while also swallowing the outside point, it yielded a false positive.

Now you ask if grids introduce false positives and negatives, why use them at all? Speed!

  • Exact Geometry (Shapely): Ranged from 0.0053 ms to 0.0500 ms. Exact math is highly variable because the computational cost increases depending on how complex the polygon is and where the point lies relative to its edges. This was a simple bounding box, but geo polygons can be and usually are very detailed
  • Geohash Set Lookup: Hovered around ~0.0054 to 0.0065 ms.
  • S2 Set Lookup: Ranged from ~0.0142 to 0.0160 ms.
  • H3 Set Lookup: Blew the competition away at 0.0018 to 0.0031 ms

(and we can always solve for accuracy by adjusting resolution and/or employing hierarchal checks)

Neighborhood

In geospatial analytics, we rarely look at a single cell in isolation.

Not just optimal drive time solutions but even in Earth Observation problems we apply smoothing filters, calculate local density, or model the spread of phenomena (like a flood or a signal footprint) by radiating outward. This requires pulling a cell’s immediate neighbors.

Based on our analysis at the “Neighborhood” scale (roughly 1 to 5 km²), here is how Geohash, H3, and S2 handle spatial adjacency

Geohash

Geohash is computationally simple, but library support for adjacency can be lacking. For instance, the pygeohash library does not have an inherent function to fetch neighbors.

  • The Method: To find neighbors, we had to manually calculate the latitude and longitude error margins (the dimensions of the cell) and step outward to create a 3×3 grid around the center token

  • The Result: You get 8 neighbors (sharing edges and corners). However, because Geohash relies on a Z-order curve, cells that are physically adjacent might have wildly different string prefixes, making database range queries for neighbors highly inefficient

H3

Uber’s H3 system was made for this. Hexagons are the gold standard for neighborhood traversal because the distance from the center of a cell to all of its neighbors is identical.

  • The Method: H3 uses a built-in grid_disk(center, k=1) function, which efficiently returns the center cell plus its immediate ring of neighbors

  • The Result: You get exactly 6 perfect neighbors. If you are building a Convolutional Neural Network (CNN) or a Graph Neural Network (GNN) that requires pooling data from adjacent spatial regions, H3’s uniform k-ring expansion makes the math incredibly clean

S2

S2 handles neighbors robustly, leveraging its 64-bit integer indexing and Hilbert curve.

  • The Method: S2 provides a native get_all_neighbors(level) function

  • The Result: It returns 8 neighbors in total: 4 edge-sharing neighbors and 4 vertex-sharing (corner) neighbors. Because S2 avoids the severe spatial jumps of Geohash’s Z-curve, fetching these neighbors is computationally fast, though the variable distance to corner neighbors makes it slightly less ideal for radial smoothing than H3

Hierarchy & Coverage

we’ve already established how these systems group cells (hierarchy) and how they fill a shape (coverage) are fundamentally different, now we tackle on a problem of indexing a geographic boundary – to retrieve data inside, here we see how at different hierarchies and resolutions, different grid cells cover a geographic area

Geohash

Hierarchy: Geohash’s hierarchy is incredibly simple for databases. If your cell is f244k, its parent is f244. You just chop off the last character. It is fast for text-based NoSQL scans, but because of the Z-curve, a parent cell might contain child cells that are physically distorted or seemingly disjointed

Coverage: Geohash lacks a native, sophisticated region coverer in most standard Python libraries. To cover a polygon, we had to manually generate a bounding box and iterate through coordinates. This brute force method almost always results in high over coverage at the polygon’s edges because rectangular cells do not hug diagonal borders well

H3

Hierarchy: H3 uses an “aperture 7” hierarchy, meaning one parent hexagon roughly contains seven child hexagons. However, geometry dictates that hexagons cannot be perfectly subdivided into smaller hexagons. This means H3’s hierarchy is not perfectly nested, child cells will slightly overlap the boundaries of their parent

Coverage: H3 offers a native polygon_to_cells polyfill function. Because hexagons approximate organic, curved, and diagonal shapes much better than rectangles, H3 provides a very tight footprint with minimal over-coverage, saving compute on downstream queries

S2

Hierarchy: S2 is mathematically flawless here. It uses a pure quadtree hierarchy where exactly 4 child cells fit perfectly inside 1 parent cell with zero boundary overlap. For hierarchical data aggregations, S2 is vastly superior

Coverage: S2 is purpose-built for this. It has a native RegionCoverer() class that allows you to specify a min_level, max_level, and even a maximum cell count. It mixes large parent cells in the center of the polygon and small child cells at the complex borders, optimizing both accuracy and database storage

You can access the code here github

Share:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *