Opinion & Analysis  ·  Open Data  ·  Civic Technology

In the Age of AI, Is Open Data Still Open?

How compulsory API keys and user registration — introduced to manage AI-driven load — are quietly but fundamentally eroding the principles on which the Open Data movement was built.

Central Question

If accessing a dataset requires user registration or an API key — can that dataset still honestly be called Open Data? Or has it quietly become something else: managed access to public information, dressed up in the language of openness?

Something is shifting underneath one of the internet's better civic promises. Open data portals operated by major cities — among them DataSF (San Francisco), NYC Open Data (New York City), Chicago's Data Portal, data.seattle.gov, and Los Angeles's open data platform — now route their data through the Socrata platform, operated by Tyler Technologies, in a way that enforces compulsory API keys. The stated reason is reasonable enough: artificial intelligence crawlers have been flooding data endpoints, degrading service for everyone. But the remedy deserves scrutiny — because it quietly changes something fundamental about what “open” actually means.

These are not marginal portals. Chicago alone publishes over 1,000 datasets. NYC Open Data covers everything from street flooding complaints to taxi trips. San Francisco's DataSF has been held up internationally as a model of civic transparency. The Socrata platform underpins hundreds of similar portals across the United States, Canada, Australia, and beyond — meaning a policy decision made at the platform level propagates across the entire ecosystem of civic data.

And Socrata is far from alone. This is a pattern playing out across the broader open data landscape — from meteorological agencies to space science — wherever public-sector data providers have encountered the combined pressures of AI-scale consumption and the institutional temptation to know who is accessing what.

The Core Tension

The traditional definitions of Open Data — particularly the Open Definition and the Open Government Data principles — rest on a small set of non-negotiable properties. Data must be accessible without barriers. It must be machine-readable. It must be available without registration or fees. And crucially, it must be non-discriminatory: anyone can access it, without having to identify themselves.

Compulsory API keys or user logins conflict with that last principle, even when access is free and registration is trivial. Requiring identification to access data is a barrier — a soft one, perhaps, but a barrier nonetheless. The Open Data movement was built on the idea that public data belongs to the public, unconditionally. A key-gated or registration-gated dataset is a managed asset, not a public commons.

A key-gated dataset is a managed asset. The Open Data movement was built on a public commons — and those are meaningfully different things, even when the manager is a government and access remains free.

What the Open Definition Requires

A Pattern Across Providers

The Socrata situation would be concerning enough in isolation. But it is part of a wider and accelerating pattern. Across sectors and jurisdictions, datasets that are labelled “open” now routinely require some form of identity before access is granted. The examples below illustrate the breadth of the problem.

Provider / Dataset Claimed Status Access Barrier Type
Socrata / Tyler Technologies US city portals: Chicago, NYC, SF, Seattle, LA, and hundreds more Open Government Data Mandatory API key required for all data access API Key
UK Met Office Weather DataHub Public Task weather data — NWP model output, site-specific forecasts Public Task / Open Data Access requires mandatory registration and an API key; the legacy DataPoint service (which provided free, tiered access) was decommissioned on December 1, 2025, and replaced by the Weather DataHub, which uses a 'freemium' model with strict per-key rate limits and volume-based quotas. Registration API Key
NASA Earthdata Earth science data from satellite missions (LP DAAC, NSIDC, ASDC, etc.) Open Science / NASA Open Data Earthdata Login account required for data download. The underlying data is pledged as openly available, though the login requirement long predates the current AI-scraping era and reflects a different set of institutional priorities. Registration
NOAA Climate Data Online (CDO) Global historical climate records, daily summaries, precipitation data Open Government Data API token required in the request header for all queries. Token is free but must be applied for individually. Token
NASA Open APIs (api.nasa.gov) APOD, Near Earth Objects, Earth imagery and more NASA Open Data A DEMO_KEY allows very limited exploration, but meaningful usage requires a registered API key with personal email verification. API Key

The UK Met Office case is particularly instructive. For over a decade, the DataPoint service provided free, tiered access to public weather data — widely used by hobbyists and researchers — though it always required registration for an API key. Its replacement as of December 2025, the Weather DataHub, is technically more capable but introduces a more restrictive 'freemium' model gated by mandatory account creation and volume-based quotas. While the Met Office maintains this fulfills its 'Public Task' under the Open Government Licence, the branding has pivoted away from 'open' accessibility toward a commercial-style API gateway

NASA's Earthdata login is a somewhat different case — the login requirement has existed for years and predates the AI-scraping problem, reflecting longer-standing institutional priorities around usage tracking rather than a direct response to AI load. It is included here not as a direct parallel to the Socrata situation, but as a reminder that registration barriers on publicly-funded data are not new, and that the Open Data community has largely tolerated them without serious challenge.

Policy vs. Technical Implementation

There is an important distinction that often gets glossed over: the difference between a rate-limiting measure and an identity mechanism. Rate limiting without keys is a technical measure that slows down excessive consumers without knowing who they are. An API key or user login is something categorically different — it is fundamentally an accountability and identity system. Even if free and easy to obtain, a key means the provider now knows who is accessing the data, when, and how much.

From a CORS and browser-side consumption standpoint, this change is particularly consequential. A truly open dataset could be queried directly from a browser application with no backend required. A civic journalist, a student, a neighbourhood watchdog could build a lightweight tool in an afternoon. Once an API key is mandatory, that frictionless path closes. Embedding keys in frontend code is a well-established security anti-pattern, which means developers now need a backend proxy — introducing cost, infrastructure complexity, and a maintenance burden that effectively excludes the very people Open Data was supposed to empower.

The Socrata / Tyler Technologies Angle

Socrata, now owned by Tyler Technologies and rebranded as its “Data & Insights” division, is commercial infrastructure for government data publishing. It underpins open data portals for hundreds of cities, counties, and state governments. There is a reasonable concern that its enforcement of API keys serves business interests — tracking usage, managing liability, generating upsell opportunities — at least as much as it serves the genuine public interest in managing AI-driven load.

The AI flooding justification is real. But it is also, conveniently, excellent cover for tightening control over data that governments have pledged to make unconditionally open. When the infrastructure layer of Open Data is owned by a commercial entity with its own incentives, the long-term trajectory of “openness” becomes subject to that entity's commercial decisions — not to democratic or civic principles. Portals in Chicago, Seattle, Los Angeles, and many others have, in effect, outsourced a key governance question about public access to a private company.

*   *   *

The Counterargument, Fairly Stated

Defenders of the API key approach make points that deserve engagement. The data itself remains free — the key is a handshake, not a paywall. It enables sustainable infrastructure by letting providers manage load and plan capacity. And many in the Open Data community have long accepted “free registration” as compatible with openness, much as Creative Commons permits attribution requirements without abandoning the spirit of open licensing.

There is also a pragmatic point: without some throttling mechanism, AI crawlers operating at scale genuinely would degrade service for every other user. The problem is not hypothetical. Unmetered public endpoints have been overwhelmed well before AI made the situation dramatically worse. And for agencies like NASA or the Met Office, the volume of data involved — terabytes of satellite imagery or global NWP output — makes truly anonymous bulk access a genuine infrastructure challenge.

But the pragmatic case for keys does not make them neutral. It makes them a tradeoff — one that should be acknowledged honestly rather than dressed up as a purely technical measure with no implications for openness. The question is not whether providers face a real problem. They do. The question is whether the chosen solution is consistent with the principles they have publicly committed to uphold.

Were There Better Options?

The answer is almost certainly yes, though none are without cost. Aggressive caching layers could absorb repetitive AI queries without ever touching the live endpoint. Volume-based challenges at bulk-access thresholds would deter automated scrapers while leaving casual users unaffected. Tiered rate limiting — generous for individual users, strict for high-volume consumers — could impose friction proportional to the behaviour being targeted, without requiring anyone to identify themselves first.

These approaches are harder to implement and, critically, they do not give the data provider the usage visibility that API keys deliver. That visibility has genuine administrative value. But it is worth being clear that this is part of why keys were chosen — not only because they solve a technical problem, but because they also solve an institutional one.

The Honest Assessment

Let us state it plainly: a dataset that requires user registration or API key authentication to access is not, by the established definitions of the field, fully open. It may be free. It may be available. It may even be well-intentioned. But it is not open — not by the standards that the Open Data movement itself established.

The shift is from public infrastructure to managed access to public data — and those are meaningfully different things, even when the manager is a government agency and the access remains free of charge. Public infrastructure is there when you need it, unconditionally. Managed access can be revoked, throttled, repriced, or discontinued when the institutional calculus changes.

The AI scraping problem is real, and providers deserve sympathy for needing to respond to it. But the chosen response — identity-based access control — has side effects that quietly erode the original promise, particularly for CORS-based consumption, anonymous access, and the civic hackers who built things without needing to ask anyone's permission first. When the UK Met Office decommissions an open service and replaces it with a registered one, when hundreds of city open data portals simultaneously introduce mandatory API keys — these are not isolated technical decisions. They are a pattern. And the pattern has a direction.

It is also telling that organisations like the Open Data Charter and the Sunlight Foundation have been relatively quiet on this specific issue. The movement has not yet properly grappled with what AI-scale consumption means for the principles it was built on. That conversation is overdue — and it should happen openly, which, given the circumstances, seems like the least that could be asked.

Published March 2026  ·  Topics: Open Data  ·  Civic Technology  ·  AI  ·  CORS  ·  Socrata
Definitions referenced: Open Definition (opendefinition.org)  ·  Open Government Data Principles  ·  Open Data Charter