August 25, 2013

by Joshua Sariñana

Data is embedded in our environment, in our behavior, and in our genes. Over the past two years, the world has generated 90% of all the data we have today. The information has always been there, but now we can extract and collect massive amounts of it.

Given the explosion of mobile photography, social media based photo sharing, and video streaming, it’s likely that a large portion of the data we collect and create comes in the form of digital images.

One method used to organize and analyze massive amounts of image data is through the use of Artificial Intelligence (AI) software. AI software is unique in that it learns from the data it is given. For example, AI software can be used to seek out and organize image files of cats on the Internet.

At first the software may be confused and categorize a dog as a cat, but the more data the software analyzes, the better it gets at correctly identifying cats. However, AI software is not only used to categorize animals, it is also used for complex science data analysis and digital photography computation, which I’ll give specific examples of later in the article.

AI software becomes stronger by ingesting massive amounts of data from which to learn, and what better place to find all that data than on the Internet. Troves of image files are uploaded each day, all of which can and are used to train AI software. AI software gets smarter with more data, which in turn increases the power of the software to mine out useful information from very large datasets.

In this article, I will show how AI software analyzes digital images to reconstruct and map out the architecture of the brain. I will also show how the same software uses digital photos from social media sites to stitch together 3D models of entire cities. I will end by showing how this very type of AI analysis is used to reconstruct an individual’s personal information as a way of predicting human behavior.

Analyzing Visual Images to Map the Mind

The field of optics has been pivotal in advancing neuroscience research. The advancement of microscopes brought us from hand drawn architectural analyses of the brain to using AI software to reconstruct neurons and their connections at the nanoscopic level. Let’s start with hand drawn brain anatomy.

In 1906, Santiago Ramón y Cajal and Camillo Golgi shared the Nobel Prize in Physiology for their work on mapping the architecture of the brain. Golgi created a silver staining technique that made some neurons in brain slices turn black. The Golgi stain allowed Cajal (who wanted to be an artist) to perform detailed anatomical drawings of individual neurons that made up specific brain regions.

With a microscope and a camera lucida, Cajal drew many structures of the brain in unbelievable detail. A hundred years later, neuroscientists still use Cajal’s drawings to help direct research questions. Below is Cajal’s drawing of a hippocampal slice.

Today we have incredibly advanced microscopes that allow for neuroscientists to capture the brain in exceptional anatomical detail. Neuroscientists can potentially use an offshoot of current microscope technology to map out the entire human brain, which is part of President Obama’s Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative.

The goal of the BRAIN initiative is to map out every single neuron and its connections in the human brain. A map of all the neural connections in an animal brain is referred to as aconnectome. Thus, the BRAIN initiative seeks to build the human brain connectome. Some neuroscientists believe that with the connectome data, neuroscientist can build a brain activity map to show how the brain works in healthy and unhealthy states (e.g. Alzheimer’s disease or post traumatic stress disorder).

It terms of technology, we are nowhere near generating the human connectome. However, researchers are working to tackle this problem. One researcher is MIT physicist and neuroscientist Sebastian Seung, who is a leader in the connectome research field (I highly recommend watching his TED talk).

Another investigator is physicist and neuroscientist Winfried Denk, who developed a microscopy scanning technique that allows for nanometer image resolution of brain tissue slices. Denk’s microscope creates 2D digital images of thinly sliced brain tissue. These slices are then reconstructed into 3D images of the brain. This technique is currently used to study the mouse connectome.

Just this month, the Seung and Denk labs published a research paper on reconstructing a tiny portion of the mouse connectome, which consisted of 950 neurons. In the paper, the researchers reconstructed the connections of 950 neurons in a piece of tissue that was ~ 1 cubic millimeter. To collect this data, it is necessary to trace the neurons by hand, a very long and tedious form of data collection. So, in order to expedite data collection, an online crowdsourcing game was developed (if you want to help connect the connectome, check out eyewire).

AI software (specifically, machine learning — because machines need to learn how to see images) was used in conjunction with the user data to identify these 950 neurons. Online users created one terabyte of data just from ~1mm of tissue. This data was then fed into the machine learning software, which uses it to learn how to identify neurons on its own. The goal is to train the machine-learning model so well (by giving it massive amounts of user-generated data) that it no longer needs human input.

Denk estimates that the mouse brain connectome could produce 60 petabytes of data, and the human connectome 200 exabytes, which is about equal to a the amount of internet data created in 2005.

Generating the human brain connectome and activity map is going to require an enormous leap in optical, computational, and AI technologies. Still, even as the technology develops, the greatest challenge will be to optimize AI software to mine out useful information.

AI Software and Social Media

Flickr users upload 1.42 million digital images per day, Instagram 45 million, and Facebook 300 million. With this much online digital image access, there is an exceptional amount of information that can be pulled from these datasets.

Visual information is one type of data pulled from images. Another type of data that can be pulled from images are the attached EXIF files, a type of metadata. This metadata information can include geolocation, timestamps, camera type, and even serial numbers associated with the purchaser. This metadata provides information about both the captured scene and the photographer.

The digitization of photography has resulted in massive amounts of data. It is now possible to reconstruct large scenes from digital images by using visual information and metadata.Computer vision is a type of AI modeling software used to reconstruct massive datasets of digital images.

Researchers have shown that they can use public digital photos from social media sites to train computer vision software to reconstruct entire cities. The software learns to find and stitch together different images of the same structure to produce a 3D model. Check out some of their work — it’s quite amazing.

As you can see, AI software can be a powerful tool when it comes to mining and reconstructing useful information from very large datasets. AI software is used to reconstruct 3D models of neuronal connections (e.g., the connectome). It is also used to reconstruct 3D visual scenes using images from social media sites.

Some of this data is generated to advance science and technology, but some is also used to reconstruct information about individuals. More and more, the public is becoming aware that their online information is being collected into massive datasets to be analyzed. This has led to discussions surrounding the impact of this type of data analysis on personal privacy.

Social Ramifications of Large Scale Data Analysis

The AI software used for image reconstruction is the same type of AI analysis used to mine and reconstruct personal data. Most recently, there has been blowback from the NSA’s clandestine PRISM surveillance program, which was leaked to the public in June. The PRISM program conducts data mining on metadata, and although it is not exactly clear what type of analysis they conduct, it is highly probable that they are using AI techniques to acquire, categorize, and analyze large amounts of metadata on people around the globe.

Metadata can infer (with great accuracy) your sexual orientation, your political affiliation, shopping preferences, and can even predict your behavior. Even if personal information is stripped away from the metadata, it is still possible for a person’s identity to be revealed using only a few data points. There are even programs that can predict a person’s social security number from public data.

Corporations and government agencies have heavily invested in developing the next generation of data mining software and processing power. Recently, Google and NASA have created theQuantum Artificial Intelligence Lab and have purchased a quantum computer to boost computational power to analyze large-scale datasets. Their goal is to use this advanced computing power to push through massive amounts of data to train machine learning and computer vision programs.

The amount of readily available image data is growing exponentially, and will be used to train AI software so that it may learn how to more effectively pull out information on its own.

The AI-based analysis of digital images used to reconstruct the brain and virtual cities is precisely the type of analysis being used to reconstruct information about the Internet user. This data — constantly collected and saved — allows interested parties to reconstruct meaningful information, whether that information is about the brain, or our personal lives.