One of the big innovations of the Web [1] was that it unified the process of accessing a remote document
and encoded all the information needed for the (previously) multi-step process of
conncting to the right FTP server, navigating the remote filesystem to find the right file,
downloading the resource, and opening it in the right application into a single step:
loading a URL in the browser.
However, when it comes to writing data to the Web, we still need to deal with
complex, multi-step processes and disparity between APIs and data formats.
As a result, most systems that store user data do so either locally or (more frequently) on a single cloud service they control.
Users have little to no data portability; effectively locking them into the service.
Part of this is the common need for services to control user data.
However, another component is simply that supporting user selection for data storage is really hard.
While reading data from remote sources is generally easy, persisting data remotely
is one of the big usability cliffs of the web platform.
Despite it being such a common need, implementing it requires understanding of many programming concepts
which are non-trivial for programmers and entirely out of reach for beginners.
Worse, even after investing the effort to understand and use a specific authentication and storage mechanism,
interfacing with a different service requires learning a completely new API.
Madata is a set of simple protocols and JavaScript APIs1Source code and documentation is available at madata.dev.
that allows web applications to read and store data in a variety of locations,
serialized in a variety of formats,
all with the same unified API.
Programmers can support additional services without any changes to their application code,
as Madata abstracts differences between services away into a single API.
Since one of the primary goals is portability,
inspired by the usability innovations of the Web, Madata introduces the concept of a storage URL.
The storage location can be uniquely identified via a URL, from which Madata infers which service to use, the location of the data within the service,
and how to access it.
This prototypes a future where users can decide where their data is stored by simply entering a URL in the settings of the application they are using.
Most remote services require authentication, which is a complex process that requires
registering an OAuth [2, 3] application and writing code to handle the authentication flow which involves a multi-step handshake.
Madata simplifies this process by introducing the concept of a federated authentication provider or FedAP.
A FedAP is a server that securely stores API keys for specific OAuth appliations in its supported services.
Instead of requiring developers to register a new application to experiment with a new API,
FEDs allow several developers to share the same OAuth application.
Developers have the option to use their own API keys that are not shared, but they don’t need to.
This flow also provides user experience benefits to end-users:
once they have logged in to a FED, they can log in to any app using the same FED with two clicks.
Any server can become an authentication provider by implementing a simple API (Section 5.5.3).
While the Madata client library that requires programming to use, it has been designed to
minimize the gulf of evaluation[4] and to maximize closeness of mapping[5]
between the user’s mental model and the API.
It is indicative that Mavo[6], which targets non-programmers, provides an HTML-based API which is
a thin abstraction layer over Madata objects and components.
While portability in terms of storage location is the core focus of Madata,
it also allows for data portability in terms of data serialization format.
Portability of format is essential for data longevity, as it allows data to be migrated to new formats as old ones become obsolete.
While defaulting to JSON, Madata seamlessly parses and serializes data in a variety of formats, including CSV, YAML, TOML, BiBTeX, and more.
The European Union establishes data portability as a fundamental human right[7].
By making it easier for developers to offer end-users data portability than not to,
Madata prototypes a future where data portability (and the data ownership it begets) are not a rare exception, but the norm.
WebDAV [8–11] was a protocol with very similar goals to Madata.
It was developed in the late 1990s as an extension to HTTP to enable users to collaboratively edit and manage files on remote web servers.
Despite its promising features, WebDAV failed to achieve widespread adoption beyond certain niche domains.
Some of the reasons were related to its high complexity which created performance issues,
and its proneness to network effects, as it required web server support.
Instead of attempting to replace existing protocols, Madata is designed to pave the cowpaths
by reducting the friction of interfacing with them.
The Solid Platform [12, 13] has very similar goals as Madata.
Solid is a decentralized platform for social Web applications.
Like Madata, user data is managed independently of the applications that create and consume this data.
User data is stored in a Web-accessible personal online datastore (or pod).
Like Madata, Solid allows users to store data in many different providers, and easily switch between providers.
However, the solution Solid is proposing is more heavyweight and involves higher complexity and more cognitive burden for end-users.
Solid’s approach requires users to understand concepts such as Pods and Linked Data, which can be complex for non-technical users,
compared to Madata’s simple URL-based approach, which imposes no requirements or demands on the data being exchanged.
But most importantly, Solid requires adoption by the storage providers, thus being subject to network effects,
while Madata can work with any service that provides a Web API.
Two very relevant projects that came after Madata are Scrapir [14] and Shapir [15],
tackling similar goals of standardizing and democratizing access to disparate Web APIs.
Scrapir [14] takes an assisted collaborative approach,
with somewhat technical users adding support for new services via a GUI,
so that non-technical users can then use those services.
Shapir [15] also tackles similar goals of standardizing various Web APIs
by mapping them to schema.org[16] entities that can then be read and written by modifying regular JavaScript objects.
While there is some intersection, both of these are focused around reading and writing third-party data,
while Madata is focused on reading and writing arbitrary user data.
A backend class (or for short, backend) tells Madata how to interface with a specific type of storage location.
This is often a remote service (e.g. GitHub, Google Sheets, Dropbox),
but it can be any I/O mechanism that supports hierarchical data.
For example, there are backends like:
Local for storing data locally in the browser (localStorage object)
Element for “storing” data as another element’s content (mainly useful for debugging)
URL for “storing” small amounts of data as parameters of the current URL.
Not all backend classes provide the same capabilities.
Backend classes declare which capabilities they support (write, login, upload).
Applications using Madata can then read this information and adjust their UI accordingly
or communicate to the user that their selection of backend is unsuitable for the current operation.
Backends are organized in a hierarchy, with common patterns implemented in base classes.
For example, the Google Sheets backend and the Google Drive backend both share the same
parent backend, which defines authentication for many Google™ services (Figure 5.2).
A backend instance encapsulates a specific storage location within a specific backend class.
For example, a specific sheet in a Google Sheets spreadsheet, a specific file on Dropbox, or a specific key in the browser’s localStorage.
In some cases the boundaries of what should be an object in a hierarchical data structure
vs a separate backend object can be blurry.
For example, should a spreadsheet backend object represent the entire spreadsheet,
by returning an object with keys for every sheet, or a single sheet?
The current Google Sheets backend has opted for the latter, as many use cases only involve a single sheet,
and thus having to deal with an extra level of nesting would be cumbersome.
However, both options are defensible.
As a design principle, it should be possible to identify storage locations by specifying a URL.
While a URL should unambiguously identify the storage location, the same storage location may be described by multiple URLs.
The URL should be either easy to compose, or easy to obtain from the service itself, and ideally both.
For example, one of the supported URLs for GitHub is the URL shown in the browser when viewing a file on GitHub.
Or, the URL for a Dropbox file is the URL obtained when using its Share file UI feature.
For example, if the storage location is https://github.com/mavoweb/mavo.io/blob/main/demos/todo/tasks.json,
Madata infers the following from it:
The service to use is GitHub,
The backend to use is “GitHub File” (as opposed to e.g. GitHub Gist),
The file is located at demos/todo/tasks.json in the mavoweb/mavo.io repository in the main branch.
Storage URLs are merely a portable way to represent the information needed to access a storage location.
They do not need to be URLs browsers can natively load or actually resolve to the resource,
altough both of these are desirable properties, as they assist with debugging.
However, in some cases, especially with non-typical backend classes, no HTTP URL pattern is a good fit.
A last resort is using a custom protocol.
For example, to save data in the browser’s local storage using mykey as the key, the Madata URL is local:mykey.
URLs are defined as URL patterns [17],
a standardized syntax for concisely specifying a set of URLs
(for an example, see Section 5.5.1).
Each backend defines a set of zero or more test URLs and zero or more known URLs.
Test URLs are those that are used to test if a given URL should resolve to a given backend.
They need to be more precise, to avoid false positives.
Known URLs are those that the backend knows how to process, but should not necessarily cause it to be selected.
Madata stores a list of all backends topologically sorted by their inheritance.
To resolve a given storage URL to a backend, Madata iterates over the list of backends,
trying their test URLs in order, until one matches.
If none match, it falls back to the default backend, whose only capability is reading data from URLs.
These URL patterns perform double duty:
their named groups are used to extract the necessary information from the URL,
so that a second parsing step (which may be beyond the capabilities of many novice programmers) is not necessary.
There are various authentication protocols in use today to facilitate secure access to third-party APIs.
These authentication protocols are designed to be very secure, but they are also complex and require a lot of boilerplate code to use.
As an illustrative example, consider OAuth 2.0 [3], one of the most popular authentication protocols today.
For a developer to use OAuth 2.0, they usually need to:
Register an application with the service they want to access, by describing what they intend to do and obtain a secret API key.
Obtain a server with the capability to run server-side code. Register a domain name, then provide a “calback URL” to the service they want to access.
From their client-side application, open a popup to a certain URL, so users can authenticate with the third-party service.
The popup asks the user to log in to the third-party service and authorize the application to access their data.
Then, the popup redirects to the callback URL, with a temporary code in the URL.
The server-side code residing at the callback URL sends a POST request to a special URL at the third-party service
(e.g. https://github.com/login/oauth/access_token for GitHub),
providing the temporary code as well as its secret API key (which shouldn’t be shared).
If everything went well, the response should (finally!) contain an access token. Extract that access token.
Now communicate the access token back to the client-side application, so it can use it to access the third-party service.
OAuth does support an easier implicit grant flow, but it is considered less secure and not supported by all services.
Thus, even if the rest of the API is CORS-enabled [18],
the authentication handshake often requires server-side code.
We introduce the concept of a Federated Authentication Provider (FedAP),
to abstract all this complexity away into a single string: The FedAP’s domain name.
FedAPs are servers that store API keys for specific OAuth applications in their supported services,
take care of the authentication handshake, and communicate the resulting access token to Madata,
all with no involvement from the application developer.
Developers can change their authentication provider (from the default auth.madata.dev)
by simply setting a static property on Backend objects.
For example, to be able to take advantage of existing logins to Mavo applications,
one would need to use the Mavo authentication provider:
At the end of the authentication handshake, the FedAP presents the user with a confirmation dialog (Figure 5.4).
This step is essential for preventing malicious use.
Without a confirmation, a malicious application could trick users into visiting the page,
and then would get unfettered access to the user’s data on all services they have used the FedAP to authenticate with.
If the user confirms, the FedAP then communicates the access token to Madata, and no further server interaction with the FedAP is necessary.
Since the FedAP is only involved in the authentication handshake, this is generally a very low resource operation.
As one data point, the author has used a very early precursor of this approach on a website [19]
that served 50-100k users per month for five years (2013-2018) with no issues.
FedAPs can be used independently of Madata.
When using a FedAP (without Madata), the above process looks like this:
From their client-side application, open a popup to a certain URL, so users can authenticate with the third-party service.
The popup asks the user to log in to the third-party service and authorize the application to access their data.
Then, the popup redirects to the FedAP callback URL.
The FedAP takes care of the rest (steps 4-7 above), and communicates the access token back to the originating application
by using the Window Cross-Messaging API window.postMessage() to send the token back to the originating application…
When using a FedAP with Madata, the process is even simpler:
From their client-side application, the developer calls backend.login(), which takes care of the rest.
The developer can await the result, which will be the user information (if the login was successful)
or just listen to the login event.
FedAPs follow introspection; visiting a FedAP’s root domain displays the list of services it supports (Figure 5.5).
The same information can be obtained programmatically via /services.json which should be CORS-enabled.
To create a backend object for a specific storage location, all that is needed is to
call Backend.from() with the storage URL as the sole parameter, for example:
// Import Madata and all supported backends and formatsimport Backend from"https://madata.dev/src/index.js";let backend = Backend.from("https://github.com/leaverou");
From that point onwards, common data operations are a single function call away.
We provide a few examples below.
Creating a backend will automatically log in a previously logged in user without showing a login prompt (passively).
To show a login prompt when the user is not logged in, we can call backend.login().
Backend objects emit login and logout events so that the UI can be updated accordingly.
Note that while this is may appear like a nontrivial amount of code, it is nearly all UI (DOM) code:
setting up events, updating the UI, and handling button clicks.
The actual data interaction with the data layer has been reduced to a single line of code for each operation.
Now suppose we want to show a number of upvotes for the current page,
and allow any webpage visitor to see the same number,
and any logged in user to add one or more votes.
These 6 lines of code take care of uploading the file to the right location,
fetching a URL that can be used to display it, and updating the image element with it.
For a paradigm like Madata to be successful, extensibility is key, not simply a nice-to-have.
Like Mavo, the prototype implementation of Madata supports arbitrary extension points via hooks,
but here we focus on the three core extensibility points:
adding a new backend, adding a new format, and adding a new authentication provider.
While adding support for a new backend requires writing JS,
this does not mean that superfluous complexity is acceptable.
Madata follows a class hierarchy where common patterns are implemented on base classes,
so authors need only specify the backend-specific details,
such as the specific authentication URLs, storage location URLs, or API calls needed.
The less divergent the backend is to existing standards and patterns,
the more declarative the code will be.
As an illustrative example, here is the full code2Slight simplifications for readability. to add support for GitLab, a popular code hosting service,
The urls field specifies the URL patterns that should be recognized as GitLab URLs when Backend.from() is called.
While not shown here, there is also a knownUrls field that specifies additional URL patterns that are recognized,
but do not participate in the URL matching algorithm.
Adding support for a new format entails specifying which file extensions and MIME types [20]
should automatically be recognized as the format,
plus defining two methods: parse(string, options) and serialize(string, options),
which are usually wrappers around a library that does the actual parsing and serialization.
For example, here is the complete code to add support for TOML [21],
a generic configuration format language:
Note that because the format’s parse and stringify methods simply
pass their arguments directly to the library,
we did not even need to implement these two methods, we simply assigned them.
By definition, creating a new authentication provider requires a server.
However, all it requires is copying the code (forking) of a template repository and deploying it to a server.
Then, the server administrator would need to register OAuth applications with the services they want to support,
and add their API keys to the server’s secret configuration file (.secret.json),
and their public metadata to a public configuration file (services.json).
The Madata ecosystem has many parties: the developer, the user, other users, the FedAP, the web site hosting the Madata-using app.
An important question is, how much does each entity have to trust other entities?
And what power does a malicious version of each party have?
As described above, a mailicous website that tricks users into visiting the page, will not be able to access any user data,
since the user will not confirm the authentication.
However, a malicious website that tricks users into authenticating (such as a phishing attempt), can do a lot more damage.
While this is a risk, it is not unique to Madata; it is a security risk on par with the implicit (client-side) OAuth grant flow
— Madata simplicy extends it to the explicit (server-side) OAuth grant flow.
FedAPs need to be chosen carefully, as they require a high level of trust.
A malicious FedAP could do a lot of damage, as it can access all user data on all services the user has used it to authenticate with.
FedAPs do not need to store any user data (although nothing prevents them from doing so);
once the access token is communicated to the client application, the FedAP’s job is done.
This means that even if a FedAP is compromised, the damage is limited to the access tokens of users who authenticate while it is compromised.
Passive authentication (where the user has already previously logged in) is not affected, since the FedAP is not involved in that process.
Some of the risks could be mitigated by allowing FedAPs to register multiple OAuth applications for the same service,
and distrubuting them to Madata-using applications based on a one-way hash of their URLs,
so that users who have previously authenticated with one Madata application do not need to authorize more OAuth apps,
but other Madata applications using the same FedAP do not need to share the same API keys with all other applications using the same service on the same FedAP.
OAuth does provide a mechanism of scopes3See tools.ietf.org/html/rfc6749#section-3.3 for the standard, and oauth.net/2/scope for a more human-readable explanation. so that each OAuth application does not get unfettered access to user data.
Using this mechanism here is tricky, because the scopes are shared across all Madata apps using the same service on the same FedAP,
and thus currently Madata requsts very broad scopes when authenticating.
However, perhaps the FedAP could start conservatively, and expand scopes as needed when users authenticate with a new Madata app.
It should be noted that Madata does not require use of FedAPs.
For increased security, app developers can use their own authentication server that handles communicating the access token to their application,
and still take advantage of Madata’s unified API for data access.
However, this requires more work, as the OAuth handshake needs to be implemented on the server side.
Perhaps Madata could make this easier by allowing users to register their own OAuth applications and securely store their API keys with the FedAP,
which would take care of authenticating only a whitelist of app URLs with these API keys.
This would still require trust in the FedAP, but it would prevent Madata applications from being prone to phishing attacks.
Another avenue is user empowerment.
Many third-party services provide GUI for users to obtain access tokens directly with the service (see Figure 5.6 for an example).
A planned improvement is for Madata to provide an optional way for more technical, privacy-conscious users to directly enter an access token,
so that they can set their own scopes and access limits, and revoke access at any time.
It could even make this easier by storing the link to the relevant settings page for each service in the FedAP’s metadata.
Madata facilitates simplicity and ease of use, but that does not come for free.
Some tradeoffs were described in the previous section.
Another potential tradeoff is accountability.
In the traditional OAuth model, application developers register their own OAuth applications to procure API keys,
and are thus accountable for their actions.
With Madata FedAPs, they do not need to register anything — they simply start using the service.
FedAPs do have access to the app’s requesting URL and thus the ability to block bad actors,
but this is after the fact, since there is no review step involved.
However, it is questionable whether the review step in OAuth is effective at preventing bad actors,
or whether it simply adds friction to the development experience for little benefit.
Madata is purely about the data layer.
It does not construct any UI, which is left up to the application developer.
While this makes it more flexible, it can also make it tedious to use,
as a lot of interactions with the data layer are repetitive.
Mavo HTML is certainly a solution to this problem, but it is not a perfect one.
While Mavo provides a very high abstraction level,
the loss of control can be frustrating for programmers.
To bridge this gap, Madata implements a set of Web Components4madata.dev/components that encapsulate specific UI interactions
such as authentication, or autosave with throttling.
For example, a <madata-auth> custom HTML element can be used to display user information and
authentication controls with just a single line of HTML:
When Madata begun, as a Mavo component, it was a lot more imperative.
To support a new backend, authors had to implement low-level get(), put(), login(), upload(), getUser(), etc. methods.
While it did provide abstractions that made it palatable to implement these methods,
it was still a lot more boilerplate.
Over time, as more backends were added, common patterns emerged,
and their code was abstracted away into base classes that only set static class fields as inputs.
It is an open question how far this process can go.
Could we reach a point where adding a new backend can be done entirely by specifying metadata,
without requiring any imperative code?
And if so, would that bring it within reach of non-programmers?
The discerning reader will have noticed that the promise of storage URLs is not yet fully realized.
For Madata to support a given service, it needs to know about it in advance,
and for a storage backend to exist and have been imported for that particular service.
For true decentralization, there should be a standard protocol to enable services to declare all the necessary information that Madata needs to know,
so that the necessary backend can be generated on the fly.
There is already such a mechanism: well-known URIs, defined in IETF RFC 8615 [22].
A well-known URI is a URI [RFC3986] whose path component begins with the characters /.well-known/.
While the registry of well-known URIs is maintained by IANA5iana.org/assignments/well-known-uris/well-known-uris.xhtml,
nothing prevents services from using well known URIs that are not registered,
and those in widespread use would likely later become standardized.
One could imagine a URL like /.well-known/madata or /.well-known/madata.json that would contain all the necessary information for Madata to interface with the service.
Additionally, authentication providers could also provide the necessary code or metadata for Madata to interface with a new service.
This is not without new risks.
When allowed backends are imported by the programmer, they are in control of what code runs in their application.
If backends can be generated on the fly, this control is lost.
And if URLs are also user-supplied, an adversarial user could easily host malicious code on their own server,
and then use a URL from said server.
The main way to mitigate this risk is to turn code exchange into data exchange.
This could happen either at the point of interfacing with the backend,
by only allowing entirely declarative backends from third-party services (see previous section),
or at the point of data I/O, by running all untrusted third-party code in a sandboxed environment,
where it can only access its inputs, and nothing else.
Then, the sandbox communicates with the main application by exchanging plain data,
which is generally safe.
Of these, declarativeness is a preferable approach;
not only due to the benefits of declarative languages in general, discussed in previous chapters,
but also because it is less likely to negatively affect ergonomics.
In this chapter, we presented Madata, a set of protocols and client-side APIs,
designed to facilitate data ownership by democratizing data access.
By unifying the processes of reading and writing data across diverse storage services and formats through a single API,
Madata addresses not only one of the biggest usability cliffs in modern web development,
but also suggests a model that could facilitate data portability more broadly.
Source code and documentation is available at madata.dev. ↩︎
Norman, D. 1986. Cognitive Engineering. User Centered System Design: New Perspectives on Human-Computer Interaction. 31–61. 10.1201/b15703-3.
Cited in1
Green, T.R.G. and Petre, M. 1996. Usability Analysis of Visual Programming Environments: A ‘Cognitive Dimensions’ Framework. Journal of Visual Languages & Computing. 7, (Jun. 1996), 131–174. 10.1006/jvlc.1996.0009.
Cited in1
Verou, L., Zhang, A.X. and Karger, D.R. 2016. Mavo: Creating interactive data-driven web applications by authoring HTML. UIST 2016 - Proceedings of the 29th Annual Symposium on User Interface Software and Technology (2016), 483–496. 10.1145/2984511.2984551.
Cited in1
Kuebler-Wachendorff, S., Luzsa, R., Kranz, J., Mager, S., Syrmoudis, E., Mayr, S. and Grossklags, J. 2021. The Right to Data Portability: conception, status quo, and future directions. Informatik Spektrum. 44, (Aug. 2021), 264–272. 10.1007/s00287-021-01372-w.
Cited in1
Hernández, L.O. and Pegah, M. 2003. WebDAV: what it is, what it does, why you need it. Proceedings of the 31st annual ACM SIGUCCS fall conference (New York, NY, USA, Sep. 2003), 249–254. 10.1145/947469.947535.
Cited in1
Dusseault, L. 2007. HTTP extensions for web distributed authoring and versioning (WebDAV). RFC 4918. IETF: http://tools.ietf.org/rfc/rfc4918.txt.
Cited in1
Mansour, E., Sambra, A.V., Hawke, S., Zereba, M., Capadisli, S., Ghanem, A., Aboulnaga, A. and Berners-Lee, T. 2016. A Demonstration of the Solid Platform for Social Web Applications. Proceedings of the 25th International Conference Companion on World Wide Web (Republic, Canton of Geneva, CHE, Apr. 2016), 223–226. 10.1145/2872518.2890529.
Cited in1
Sambra, A.V., Mansour, E., Hawke, S., Zereba, M., Greco, N., Ghanem, A., Zagidulin, D., Aboulnaga, A. and Berners-Lee, T. Solid: A Platform for Decentralized Social Applications Based on Linked Data.
Cited in1
Alrashed, T., Almahmoud, J., Zhang, A.X. and Karger, D.R. 2020. ScrAPIr: Making Web Data APIs Accessible to End Users. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu HI USA, Apr. 2020), 1–12. 10.1145/3313831.3376691.
Cited in1, and
2
Alrashed, T., Verou, L. and Karger, D.R. 2021. Shapir: Standardizing and Democratizing Access to Web APIs. The 34th Annual ACM Symposium on User Interface Software and Technology (Virtual Event USA, Oct. 2021), 1282–1304. 10.1145/3472749.3474822.
Cited in1, and
2
Verou, L. 2013. Dabblet: A visual IDE for rapid prototyping of client-side web development (Bachelor’s thesis, Athens University of Economics & Business).
Cited in1
Freed, N. and Borenstein, N.S. 1996. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. RFC 2046. Internet Engineering Task Force: https://datatracker.ietf.org/doc/rfc2046. Accessed: 2024-08-06. 10.17487/RFC2046.
Cited in1