real Toronto data3,273 parks · avg score 34.6 · avg confidence 60%

Methodology

The Toronto Parks Vitality Atlas is an experimental, data-informed reading of Toronto parks through Jane Jacobs-inspired ideas of urban vitality. The score is not a definitive judgment. It’s a transparent way of noticing patterns.

The lens

Jacobs argued that great urban places emerge from short blocks, mixed primary uses, dense and permeable edges, and the ordinary surveillance of “eyes on the street.” Parks live or die by the same logic: a park surrounded by cafés, schools, homes and small shops behaves differently from one bordered by parking lots, expressways, or blank institutional walls.

We translate that intuition into six measurable proxies. Each proxy is a 0 to 100 sub-score with a plain-English explanation. The overall Vitality Score is a weighted average:

Edge Activation: 25%
Connectivity: 20%
Amenity Diversity: 20%
Natural Comfort: 15%
Enclosure / Eyes on Park: 10%
Border Vacuum Risk: 10% (inverted: high risk reduces vitality)

Weights are configurable via SCORE_WEIGHT_* env vars.

The metrics, in plain English

Amenity Diversity: Counts distinct types of activity inside the park rather than raw amenity counts. A small park with playground, washroom, water, garden, art and benches outscores a larger park that only repeats one use.
Connectivity: Streets within 25 m of the park edge (Toronto Centreline V2), intersection nodes within 100 m, mapped paths and sidewalks within 50 m (Centreline trails/walkways + the Pedestrian Network), transit stops within 400 m (OSM), and an estimated count of access points where mapped paths cross the polygon boundary. A superblock penalty applies when fewer than 0.5 street edges per 100 m of perimeter touch the park. Jacobs-style permeability rewards many ways in, many short trips.
Edge Activation: Within 100 m of the park edge: cafés, restaurants, retail, schools, community uses, residential count as positive. Parking lots, highways, rail, blank institutional, industrial subtract from the score.
Border Vacuum Risk: Adjacency (within 50 m) to highways, rail corridors, large parking lots, industrial parcels, blank institutional edges, or ravines with poor frontage. Higher risk reduces overall vitality.
Natural Comfort: Stratified-grid sampling of each park polygon classifies points as canopy (Toronto Treed Area), inside ravine system (Ravine Bylaw Areas), water (Waterbodies), or open. Combines that with street-tree count + density inside the polygon and distance to the nearest waterbody. The score reads human comfort in an urban-park context (shaded, cool, varied) rather than ecological purity. A 100% canopy ravine doesn't automatically beat a well-treed neighbourhood park.
Enclosure / Eyes on Park: Toronto's 3D Massing dataset (428k building footprints with heights) is intersected with park edges to count buildings within 25 m and 50 m, compute the average height of those buildings (binned into low-rise / mid-rise / tower), and measure frontage density per 100 m of perimeter. Mid-rise (3 to 7 floor) edges score highest. Jacobs argued they are the strongest ‘eyes on the park’. Towers within 25 m incur a penalty (tower-in-the-park risk); blank/under-framed edges incur a separate penalty.

Data sources

City of Toronto Open Data: Parks (Green Space)
Polygon boundaries, official names, types.
Parks & Recreation Facilities
Inventory of in-park amenities (washrooms, fields, rinks…).
Toronto Pedestrian Network
Sidewalk segments around and through parks; estimated park entrances.
Toronto Centreline V2
Street segments + intersection nodes near park edges; trails and walkways.
Toronto 3D Massing
Building footprints + heights for edge-building counts, frontage density, and tower-in-the-park risk.
Toronto Treed Area
Tree canopy share inside park polygons via stratified-grid sampling.
Toronto Waterbodies & Rivers
Water surface inside parks + nearest-water distance for cooling.
Ravine & Natural Feature Protection
Ravine overlap as a cooling / natural-comfort signal.
Toronto Street Tree Inventory
Tree count + density inside park polygons.
Neighbourhood Profiles
(Pending) Equity context proxy.
OpenStreetMap (Overpass API)
Cafés, restaurants, retail, transit stops, parking, highways, rail.

Data confidence

Different metrics rest on different evidence. The headline score blends them all, but the per-metric confidence and the table below let you see where the model is on solid ground and where it’s holding a placeholder until more data lands.

Metric	Status	Basis
Amenity Diversity	direct	City Parks & Recreation Facilities. Distinct amenity types per park, spatially joined to the Green Spaces polygon.
Edge Activation	direct	OSM POIs (cafe / restaurant / shop / school / community / parking / highway / rail / industrial) within 100 m of park edge. Quality varies with OSM coverage.
Border Vacuum Risk	direct	OSM landuse, highways, rail, parking within 50 m of park edge.
Connectivity	direct	Toronto Centreline V2 (street segments within 25 m, intersection nodes within 100 m, trails / walkways within 50 m) + Pedestrian Network (sidewalk segments within 50 m, path-polygon crossings as estimated entrances) + OSM transit stops within 400 m. Components weighted 35 / 20 / 20 / 15 / 10 (paths / intersections / transit / entrances / superblock penalty). Confidence is tiered: high when all three sources are present near the park, medium when two are, lower when only one is.
Natural Comfort	partial	Toronto Treed Area (canopy % via stratified-grid sampling) + Ravine & Natural Feature Protection Area (ravine overlap %) + Waterbodies & Rivers (water % inside park, distance to nearest water) + Street Tree Inventory (tree count + density per ha). Components weighted 35 / 20 / 20 / 15 / 10 (canopy / impervious / green / ravine+water / diversity). Impervious surface is approximated. Toronto's authoritative impervious layer ships only as a GeoTIFF raster, which the pipeline can't read without GDAL.
Enclosure / Eyes on Park	direct	Toronto 3D Massing (428 k building polygons with footprints + heights). Counts buildings within 25 m and 50 m of park edge, avg edge height (binned into low-rise / mid-rise / tower), frontage density per 100 m of perimeter, blank-edge share, tower-in-the-park count. Components weighted 30 / 25 / 20 / 15 / 10 (frontage / human-scale / mid-rise eyes / blank-edge avoidance / tower-penalty). Held at neutral 50 with low confidence for parks with no nearby buildings (ravines, hydro corridors).
Equity Context	placeholder	Requires the Toronto Neighbourhood Profiles join. Surfaced as context only, not in the headline weighting.

Table 1. Evidence basis and confidence status for each scored metric.

direct: measured from a primary source we’ve loaded.
partial: some inputs are loaded; others are still placeholder.
placeholder: no source data loaded yet. The metric defaults to neutral 50 with low per-metric confidence, so it doesn’t silently inflate or deflate the headline.

Because some dimensions are placeholders, the headline score should be read as a Jacobs-inspired model in motion, not an official ranking. Park detail pages show a confidence value per sub-score so you can read the score at the right level of certainty.

Research-grade methods

Sub-score	Inputs	Normalization
Edge Activation	OSM POIs within 100 m of park edge: positives (cafes / restaurants / retail / schools / community / transit / residential), negatives (parking / highway / rail / industrial / blank institutional).	100·p / (p + 6) − 8·n, clamped to [0, 100]
Connectivity	Centreline V2 streets ≤ 25 m, intersection nodes ≤ 100 m, paths/sidewalks ≤ 50 m, OSM transit ≤ 400 m, estimated entrances, edge density per 100 m of perimeter.	35% paths · 20% intersections · 20% transit · 15% entrances · 10% superblock penalty
Amenity Diversity	Distinct amenity types from City Parks & Recreation Facilities, spatially joined to the park polygon (with a 25 m fallback buffer).	100·d / (d + 6), clamped
Natural Comfort	Treed-area canopy %, ravine overlap %, water % + nearest-water distance, street-tree count + density. Effective canopy = max(polygon, density × 0.7).	35% canopy · 20% impervious · 20% green · 15% ravine+water · 10% diversity
Enclosure	3D Massing buildings ≤ 25 m / 50 m of edge, avg height bin (low-rise < 9 m / mid-rise 9 to 21 m / tower ≥ 40 m), frontage density per 100 m of perimeter.	30% frontage · 25% human-scale · 20% mid-rise eyes · 15% blank-edge avoidance · 10% tower-penalty
Border Vacuum (inverted)	Sum of weighted hostile uses within 50 m of edge: highway 30, rail 18, parking 12, industrial 14, blank-institutional 10.	overall contribution = 100 − risk

Table 2. Inputs and normalization formula for each sub-score.

Sampling. Cover-overlap measures (canopy %, ravine %, water %) use stratified-grid point sampling inside each park polygon at adaptive 6 to 30 m steps, capped at 400 sample points per park. Confidence is dampened on parks with fewer than 25 valid samples.

Clustering.k-means (k = 8, k-means++ init, deterministic seed) on the five normalised sub-scores. Cluster names are derived from each cluster centroid’s most-distinctive dimensions versus the citywide mean (delta-based, not z-score), with hand-curated overrides for cleanly identifiable patterns.

Confidence tiering. Each sub-score reports its own confidence based on which source layers contributed: measured (≥ 0.7) when the canonical source layer landed and had non-empty results for the park; partial (0.4 to 0.7) when one of multiple expected sources is missing; inferred(< 0.4) when the metric falls back to a placeholder. The headline score weights its sub-scores by their stated weights but does not re-weight by confidence. That’s a deliberate choice, so a low-confidence reading doesn’t silently shrink. It only gets visually flagged.

Human activity signals

The spatial-form scoring above describes parks that should work. It says nothing about parks that are actually programmed, photographed, walked through, or socially meaningful. The Activity Signals layer tries to pick up that second framing, partially. It is partial by design.

Sub-score	Direct vs proxy	Sources
Programming	Direct (event records).	Toronto Festivals & Events JSON feed; future Eventbrite / Meetup APIs (require keys).
Social attention	Proxy (mention/photo/review counts).	Wikipedia REST pageviews + manual CSV imports. Optional Flickr / Google Places APIs.
Temporal diversity	Optional / manual.	Manual Popular-Times-shaped CSV. We do not scrape Google Maps.
Pedestrian / cycling flow	Proxy (counter at distance).	Toronto Permanent Bicycle Counters; future pedestrian counters where available.
Cultural significance	Proxy.	Wikipedia article presence + sentiment + tag diversity.

Table 3. Activity-signal sub-scores, whether each is a direct measure or a proxy, and their sources.

Inferred vitality is incomplete. The spatial model says how parks should behave. The activity layer tries to say how they actually do. Both readings are useful; neither is sufficient.
Social media is biased. Photogenic parks over-index; everyday neighbourhood parks under-index. We label this clearly in the social-attention sub-score.
Event data over-represents programmed civic use. Saturation and recurrence weighting prevent Nathan Phillips Square from dwarfing the rest, but the underlying feed is still city-curated.
Popular Times is optional / manual / licensed only. We never scrape Google Maps. If no popular-times data has been imported for a park, the temporal-diversity sub-score is flagged as “unknown”.
Counters measure movement near parks, not park occupancy. Proximity confidence reflects how close the counter sits to the park edge.
Confidence is honest. Activity scores under sample mode are clamped to 0.25 confidence; real-data scores rise to 0.9 only when all five sub-source families contributed.

Privacy guarantees: see /data-ethics for what we will and will not collect.

Why scores are not bell-curved

Toronto’s parks aren’t normally distributed. Hydro corridors and ravine slivers cluster near zero; iconic neighbourhood parks cluster up high; the long middle is where most of the city lives. If we forced the raw scores onto a bell curve we’d be pretending that asymmetry away, and real structure would be hidden. So we don’t do it.

What we add instead is context. For every park we publish four numbers alongside the raw score:

Citywide percentile: rank against all 3,273 parks. Useful for absolute orientation.
Typology percentile: rank within parks of the same primary typology. Prevents unfair comparisons (a Civic Square and a Ravine aren’t doing the same job).
Cluster percentile: rank within the auto-detected morphological cluster.
Expected score: median of a similar-park cohort defined by typology + size band + ravine/waterfront status. The performance gap = raw − expected.

The gap is labelled in five buckets (strong over, modest over, typical, modest under, strong under) at ±5 and ±12 thresholds. Each park’s panel publishes its cohort size and a context-confidence (high, medium, low) so the reader can tell whether the gap is well-supported or comes from a tiny cohort.

What this answers: is this park strong for what it is?A modest raw score in a beloved parkette can still rank in the 90th percentile of its typology and read as a strong overperformer. That’s a more useful sentence than “55 / 100” alone.

Why typologies matter

A Civic Square (paved, for events, surrounded by towers) and a Ravine Park (forest, no buildings, off-grid) cannot be compared directly on a single Vitality score and the comparison would be actively misleading if you tried. Each park is automatically assigned a typology and a cluster, and the headline rankings on the home page are now typology-aware: best-in-class within a family rather than a city-wide leaderboard. The classifier is rule-based and explainable, and each detail page shows exactly which thresholds fired.

Jacobs vs Wilderness: two axes, not one

Behind the headline score sit two distinct frameworks. Urban integration (the average of edge activation, connectivity, and enclosure) measures whether the park is woven into the daily city Jacobs cared about. Natural comfort measures whether the park provides ecological respite. The Insights → Jacobs vs Wilderness page plots every Toronto park on these two axes; the result is bimodal. Most parks are strong on one axis and weak on the other; genuinely balanced parks are rare. We treat that as a finding, not a bug. The city actually has different kinds of parks doing different jobs.

Limitations of algorithmic urbanism

Typologies are heuristics, not ground truth. A park may straddle types and the secondary read on the detail page is sometimes more accurate than the primary.
Clustering is unsupervised k-means on five normalised dimensions. The cluster names are descriptive labels assigned by us, not categories the data “knows” about itself.
Narratives are generated from real metric values via templates. They reference actual numbers but they are still pattern-matching, not understanding.
The validation feedback we collect is a small signal, not a verdict. It weights the model’s confidence over time but does not override scoring.

The right way to use this site is as a conversation starter about Toronto’s parks, not as a ranking that decides which is “better”.

Limitations

Measured ≠ truth. We measure proxies. A high-scoring park can still feel uninviting; a low-scoring park can be beloved.
OSM coverage varies. Café and entrance density depend on volunteer mapping and skews toward downtown.
No real pedestrian counts. We have no observation data, so “activity” is inferred from surrounding land use.
Static snapshot. Seasonality, events, and time-of-day effects are not modelled in this MVP.
Equity context is rough. Neighbourhood-level proxies hide block-level differences.

Privacy and ethics

We use only open, aggregated data sources. No individual movements, no proprietary location traces. Scores describe places, not people. We deliberately do not infer “safety” from policing or enforcement data; that conflation harms communities and was explicitly excluded.

Roadmap

Street View / computer-vision derived facade and shade scoring.
Real pedestrian activity (sidewalk counts, anonymised cell-network mobility).
Event permits and programming density.
Real-estate proxy data for neighbourhood vibrancy.
Resident-perception surveys and Jane’s Walk-style observations.
Seasonal usage modelling.