Data ethics

We added a Human Activity Signals layer because inferred urban-form scoring is incomplete on its own. Adding signals about real human use makes the model more honest. It also raises real ethical questions, which we want to answer up front rather than apologise for later.

What we will not do

No personal tracking. Nothing in this project is tied to an individual person.
No usernames. Public mention counts are aggregate; we never store the handle that posted.
No raw posts. We don’t store the contents of social posts, comments, or reviews: only counts, sentiment averages, and aggregated tags.
No face detection or biometric processing. We don’t look at images of people in parks. Photo activity is a count, not a content analysis.
No device identifiers. No mobility traces tied to a phone or a device. Pedestrian-counter data is venue-level only.
No scraping in violation of terms of service. We don’t scrape Instagram, TikTok, Facebook, X, or Google Maps HTML. Where Google Popular Times is desired, we accept manual CSV imports or licensed API data.

What we do collect (aggregate-only)

Event metadata from official feeds (Toronto Festivals & Events JSON feed). Names of events, locations, start/end, recurrence: all already public, all already aggregated.
Wikipedia pageview totals for parks with their own articles. Per-article, daily aggregates returned by the public REST API, totalled to a single number per park.
Counter aggregate volumes from the City of Toronto’s Permanent Bicycle Counters dataset. Counter location + total counts; we never look at individual passages.
Manual CSV imports a researcher chooses to bring in for social mention counts, photo / review counts, sentiment averages, or activity tags. Schema is fixed and aggregate; raw contents are forbidden.

Stated guarantees

store_personal_content: false
store_usernames: false
store_raw_posts: false
store_face_data: false
store_device_identifiers: false
aggregation: park-level only

These flags live in config/activity-sources.json and the ETL scripts read them; if we ever needed to lift one, the change would be visible in source control.

Why social media biases the picture

Social attention is a real signal but a biased one. Photogenic parks (waterfront sites, civic squares, large central destinations) get over-represented. Daily neighbourhood parks where people actually live their lives (a quick stop on the way home, a coffee in winter) get under-represented. We surface social-attention scores as one of five activity sub-scores, not the whole story.

Why event data over-represents civic programming

The Toronto Festivals & Events feed is heavy on programmed civic events, which means Nathan Phillips Square shows up loudly while Trinity Bellwoods’s Tuesday farmers market may not. We account for this with saturating curves and recurrence weighting, but the underlying inventory is still biased toward what the City programs and permits.

Why pedestrian counters measure movement, not occupancy

Counter sensors record people walking or cycling past a fixed point. They don’t know if those people stopped at the park. We treat counter data as an activity proxy, not an occupancy measurement. The proximity-confidence sub-score reflects how close the counter is to the park.

Removal & correction

If you maintain a park, run programming there, or have a concern with how this site represents your park’s activity signals, please contact us via the project’s GitHub issue tracker or by email. We will engage in good faith and hold ourselves to a 30-day correction window for any requested aggregate change. (Contact placeholder: replace before public launch.)

Source transparency

Every park detail page shows which sources contributed (programming, social, popular_times, counters, sample-fixture). Sources are listed in config/activity-sources.json and documented in the methodology page.