When “Data” Builds Your Profile Without Context

The “personalized content curation” feature of modern marketing (read: companies focused on 100% pure data w/o additional context) & even the content curation on Facebook assumes we are static and unchanging.

This is particularly fun when I play with the “Who does google think you are” feature on my different email accounts.

Just a few current profiles:

1) Work email: Female (because I at some point told them in google+), age 25-30, interests: (list of several hundred topics ranging from bizarre histories of the United States to neuro-imaging to python hacks)

2) Male, 45-55, interests: Python, Algorithms, Economics, Cyber Crime

3) Male, 35-45, interests: Public Health, Africa, Ebola, Vaccines, Latin America, Economics, MIT, WHO, UN

And now an existentialist question… if google thinks I am a middle aged man, does that make me a middle aged man?

Love Your Data. Can I have some context with that?

You know what is sexy? Presentations where the data and algorithms presented by researchers come with a healthy does of real life context. [Also, other researchers who read applied statistics textbooks in coffee shops early in the morning. I have been doing this a lot recently and just made friends with someone who was reading a different book by the same statistician I was reading.]

I constantly complain that we lose a lot of information when we work with big data analytics. Part of it is that many researchers are encouraged to work with data from their desks in offices tucked away inside of universities or office buildings in major cities, far away from the ecosystems they are trying to describe through numbers and algorithms.

Nate Silver spends a lot of time talking about the weakness of prediction models in his book The Signal and the Noise: Why so many predictions fail — but some don’t. He points out that economists have trouble identifying relevant variables to make predictions. This is fair… economies are constantly changing in structure and dynamic. It would be really hard to collect appropriate data on the formal economy as it shifts, and even harder to keep track of informal economic activity in a way that would lend itself well to predicting output for the future.

I’ve found the only way that I truly understand the pulse of an economic ecosystem is by living and breathing the structure and community of it. After all, economies depend on communities and trust for transactions to take place at all. But this is for another post.

But I did find someone trying to add context to big data!

I watched this talk by Anna Rosling Rönnland from TEDxStockholm yesterday, and while the introduction is a little confusing, the center of the talk is important. The best way to watch this talk, in my opinion, is to consider the implications of using photographs to describe the spread of the distribution.

In non-jargon speak, this means, consider how your perspective on wealth disparity changes when you see how people in the richest 25% versus the middle versus the lowest 25% brush their teeth. This hits home a lot harder than quoting per capita numbers at someone would, because it also takes into account differences in pricing/living costs within the country. We can see where wages fall short and what that means in the day to day life of workers around the world. We gain perspective on data. And that’s sexy.


Survey Design Love.

Confession: I’ve been a survey lover for as long as I can remember.

I realize many people approach them with dread — it becomes yet another thing one must do, a task, a barrier preventing you from using the app or website or service you want to us. It’s the people milling around outside of grocery stores, ready to poll you about political candidates. It’s the signs people hang on their doors — no soliciting. All of these sort of negative associations.

What I like about surveys is more the process that goes into building them and how people record and work with the data they receive. There is a lot of psychology and narrative that goes into the preliminary design and coding that comes after we have the data. Not to mention the process of interviewing… Each piece presents new and interesting challenges for the researchers and team setting out to run a survey.

Second confession I need to make: I just finished my General Assembly course on User Experience Design, and spent a lot more concentrated time than I have in a while thinking through the layers of survey design and data collection. It was truly wonderful and useful to me… and gave me a lot of time to nerd out about surveys.

For my current projects, I am thinking about the best ways to structure and develop surveys that will make the experience less awkward and forced for the data collector and for the person responding to the survey. In many ways, I think some of the weakness we encounter in data collection about difficult topics, like informal business, could be addressed through better design/systems thinking.

Some of the typical problems I encounter in the work are:

1) Low response rates, which means the group that responded gives me a strange and inaccurate data set to use when describing the community I am working with

2) Questions are sometimes unclear/worded inappropriately given the audience we are working with. I think the value of language and word associate often gets overlooked. We need to account for the way the question will be received and also for the types of answers we might be able to get. Are we asking questions clearly in the right lexicon? Are we interpreting the responses in the appropriate weight/meaning of the local ways people talk about business?

3) Information Capture Beyond the Page: often the most important and enlightening pieces of information (the new rabbit holes to go exploring, if you will) will not fit nicely into a question topic or predicted category. Getting researchers who are ready and able to follow these threads of thought, record them, and offer some sort of analysis is… rare and challenging. Some people are truly excellent at this task! They also, however, need to make an effort to collect and protect this information, so that it does not get lost in the coding process.

4) Plain and Simply: the surveys are extremely long and fatiguing for both parties. These surveys will not generate the information you need and the survey administrator will not stay focused and engaged with the task beyond a few runs. If your respondent sighs/looks at the clock/looks bored as a noticeable change in mood, the design is off. A wise mentor told me once I get 10 questions and often within those 10 questions, I can answer 70% of my questions overall.

These are all issues I am attempting to address in my redesign process. I apologize in advance to the friends and family members that are regularly subjected to my guerilla field testing… but it’s for a good cause!

Happy New Year!



Assisted Contact Tracing — in Brief

I want to explain in a little more detail what I am working on now — I think the following, which I was working on for a press release, will help explain the project.

First off, ACT is a software used by healthcare workers, contact tracers, field organizers, and physicians working on the ground to track and predict the spread of the Ebola virus.

There are two components to Assisted Contact Tracing (ACT).

First, ACT collects and organizes data about the ebola outbreak through contact tracers. This data is available for us by organizations like the CDC and the WHO. Beyond that the identities of contacts and ebola patients is protected, allowing other organizations to look through the data for trends but not identify specific contacts.

Second, ACT helps contact tracers, health care workers and physicians prioritize cases that come up in the field. Rather than rifling through a gigantic list of contacts generated by an Ebola patient, the ACT system helps healthcare workers prioritize cases. The data provided by ACT provides contact tracers with context before they make visits: they will know before they walk into a community whether or not their contact is showing symptoms so that they can better prepare and protect themselves from infection.

ACT uses contact information collected by the initial contact tracers to generate automated calls in specific dialects back the contacts every day for the 21 day quarantine period to monitor for health/symptom changes. Once a contact reports that they are sick and their symptoms, ACT generates an SMS to local health care workers and physicians to report the case, contact information and location of the patient.

This measure cuts down on time for data collection and accuracy, and allows field workers and healthcare workers to build better strategies for patient outreach.

ACT is a critical tool for a few reasons.

First, it would limit the level of exposure to Ebola for healthcare workers, who are currently some of the people most at risk of being infected by the virus.

Second, the automated check-ups allow a suspected patient to remain isolated at home, rather than having to stay in an isolation ward. “ACT is important for the system to allow people to pursue their own healthcare, so [potential patients] don’t have to go into isolation. It also provides information on how to care for a family member in their home with limited resources,” say Camilla Hermann, Founder and Director for Odisi | ACT. ACT does not make cold calls. These automated calls are opt-in and only start after a doctor or contact tracer has first met and spoken with an Ebola patient or contact.

While ACT is currently geared towards the Ebola crisis, we think this program will have broader implications and potential applications in public health infrastructure around the world. I will continue to update based on our progress as we start getting into better field testing and data collection work.