The Problem

The Quantified Self community is a group of people who track and analyze their lives to gain insight, most often to aid in decision making towards achieving goals -- think counting calories to lose weight. Data from multiple sources can lead to even greater insight.

While platforms for aggregating personal data exist, two limiting factors arise: 1.) The platform takes charge of the analytics and, due to human resource limitations, can only offer a narrow scope of "one-size-fits-all" analyses; or 2.) the platform outsources the analytics to a community of developers, BUT gives them the personal data, reducing privacy and increasing risk for the user.

The Solution

Connectrix is a platform for personal data analytics that connects personal data sources in the cloud to analytics apps created by a community of developers. As the picture below depicts, Connectrix pulls in the data and the apps via APIs and runs the apps on our servers, eliminating the need to give app developers access to a user's personal data. In reference to the problems above: 1.) By shifting the creation of analytics apps to a community of developers, users can access all kinds of analyses to find ones that suit their needs. Connectrix also makes it super easy for developers to create and publish these analyses. 2.) As mentioned, privacy risk is reduced by running the apps on Connectrix servers.
connectrix graph

Establishing Connections

When a user first creates an account and logs in, they need to establish connections to personal data sources. Currently available at the time of this writing are connections to Fitbit, Moves, and YouTube. The user simply clicks a "Connect" button, which redirects them to the provider's website where they allow Connectrix to access their data via API. Once they accept, Connectrix receives an access token, which is essentially the only personal data persisted in the Connectrix database. This occurs via the OAuth 2.0 authorization flow protocol and is implemented with the help of Passport (easily integrated since the backend is written in Node.js with the Express framework).

Using Connectrix Apps

After connections have been established, the user may begin using analyses available on the Connectrix App Store. When a user finds an app they want to use, they add it to their favorites. Then when they wish to run it (and only then), Connectrix fetches the data via APIs, fetches the program from Github, and runs the data through the program on the Connectrix server. The output is then passed to the client and injected into the webpage for the user's viewing. See "The Process of Running a Connectrix App" below for more details. The user's Dashboard lists their connections and favorite apps.

Selecting Required Connections

Since the responsibility of data aggregation is shifted from the developer to Connectrix and the privacy-preserving data flow disallows developers from ever viewing the data, it becomes very important for them to know exactly what they're getting before hand to build a program that effectively processes that data. For this reason, when an app is created, the developer must indicate which data sources are required for the app to run. Further, a developer can choose from a number of more specific endpoints available for each API; e.g. a developer can choose "Fitbit - Body Weight Time Series" in addition to "Moves - Storyline."

Developing and Testing Connectrix Apps

The entire process of developing, testing, and publishing a "Hello, world" Connectrix app is detailed in the Developer Guide. But in general, a developer creates a new app by simply specifying a name and description, selecting the connections required for the app, and pointing Connectrix to a public Github repository where the application is to be stored.

At this point, the developer can begin writing a Python program that accepts data, processes it, and returns HTML with embedded CSS and JavaScript. When the app is run, the user's data will be fed into the Python program, processed, and the output will be passed to the client and injected into the webpage for the user to view.

You may be wondering how a developer can test such a program with only a guess of what the user's data will look like. Essentially, they use their own data and test locally on their machine. Having specified the required data connections, they can click a "View JSON" button to see their own actual data received from the APIs in the format that will be fed to their app. Further, Connectrix provides a test engine to test their app locally. After downloading the test engine (a Node.js app), they copy and paste their example JSON into a view.json file and the engine will run their program with their data in the same way that Connectrix would run their program in the online environment. Output is written to an HTML file that the developer can open in a browser to see exactly what the app will look like. When they're done testing, they publish the app by simply pushing the files to their Github repository.

Retrieving User Data

When a user wishes to run an app, a request is made to the Connectrix server with information of the user who made the request and which app they'd like to use. On the server, "Data Getters" are compiled according to the app's required connections and specific third party API endpoints, and they contain the necessary credentials of the user for retrieving the data. These Data Getters are JavaScript functions returning Promises and are run simultaneously with the Promise.all() method.

However, it is a bit more complex than that; the logic can be thought of as being two levels deep. The top level compiles Data Getters by third party API provider (Fitbit, Moves, YouTube), and the lower level compiles Data Getters by third party API endpoint (Fitbit - Body Weight Time Series, Fitbit - Calories Burned, etc.). This is to account for error handling at the lower level. For instance, if a request is made to the Fitbit API but the access token used had been expired, the error is caught and handled at the Fitbit level (reading the error in the Fitbit response, refreshing the access token according to Fitbit standards, storing it to the database). Then the API call for data is made again recursively (within limits) in the error handler. This way, each provider may use and reuse their own error handling code, calls for data are not redundant, and everything runs asynchronously.

See the the runApp function in the source code to begin the journey down the rabbit hole.

Retrieving and Building an App

Since app source code is stored in a public Github repository, retrieving them is done simply with the nodegit Node module. The files are clones to the Connectrix server's filesystem. However, in order to run apps for multiple users simultaneously, these clones are indexed according to the number of clones currently on the system. (Storing the first directory as "clone-0," a second app run initiated by another user while clone-0 is still present would render a "clone-1," and so on.) The directories are deleting when done processing the data and a response is returned. See the "Security" section below for information about running untrusted code on the server.

Currently, apps may only be written in Python and use predetermined libraries that have been pre-installed on the server.

Passing Data to an App

After cloning the program to the server and retrieving the user's data, Connectrix spawns a child process (initiating a pre-named main file) with the python-shell Node module. This makes it easy to run and listen for data or errors from the Python program. Data is passed to the Python program's stdin stream and received stdout or stderr (most easily a print statement).

App Output

Once the data is received from the Python script, it is returned in the response to the client. Then, it being HTML, it is injected into the app view page's HTML. This HTML output from the app could contain embedded JavaScript and CSS. For example, to create a graph, the developer could use the mpld3 Python library to convert a matplotlib figure to a browser compatible D3.js graph with full zoom, move, and reset functionality.

connectrix graph
Screenshot of the "Calories Burned" graph in the "Weight Vs. Calorie Burn" app. See the source code for an example of converting from matplotlib to D3.js.

Currently, the injected HTML is able to access and utilize all the resources already in the page such as Bootstrap and JQuery (and AngularJS app), but this should obviously be changed in the future. See the "Security" section below for details.

One of the biggest concerns for this application is security. As a centralized trusted-third-party, Connectrix acts as a safe place to process data. However, at this stage, there two vulnerabilities in particular where data could be leaked to another party or other malicious activity could occur.

Malicious Python Script

Currently, the app software is pulled in from Github and built on the Connectrix server -- the same server that runs the Connectrix app. This could be bad for two reasons. First, it could access any part of the filesystem that any other script could and disrupt the application or pull from the database. Second, it could fairly easily send the user's data across the network. To prevent this, the software could instead be built in a virtual machine with no access to the filesystem, and on top of that, in a sandbox that had no network capabilities. I have considered creating such an environment on an Amazon EC2 instance and using the PyPy sandbox.

Malicious Output

Another vulnerability lies in the fact that the app output is injected into the client application. This means that a malicious JavaScript function could run and access the AngularJS application. Or it could send the user's data across the network. The combat this, I've considered using an iFrame with restrictions to the network.