13/3/2024
In the first edition of the year of Verticesin the Voices section, we talked with Nicolás Velásquez, social and data scientist and co-founder of Linterna Verde.
Nicolás shared his perspective on the importance of integrating data science into civil society organizations. We addressed the challenges of implementing this field in organizational culture and strategies for organizations with no previous experience in the use of data to approach this area and strengthen their social impact.
The following is the full interview.
Todaythere is talk of the 'datification of the world' and, particularly in the social sector, the demand for organizations to incorporate data science into their decisions is growing. Could you explain what this field consists of and what its applicability is in the context of civil society organizations?
We are in what we call the information technology revolution. This means that today data abound and are a raw material that we can easily generate and exploit added value in the public and private spheres of civil society.
Organizations around the world have always worked with data. Today, we call someone who works in Excel or SQL a data analyst. When it comes down to it, all of us who are analysts, scientists in the social sciences, humanities, law, have always worked with some data. The difference is that today we have access to a much larger volume of data at a relatively low cost; either by getting the data, having it on a computer or using tools or techniques that allow us to deal with that volume of information.
Data are transversal to the fundamental questions we have asked ourselves in civil society: what are our rights, what are our roles, what is the project of society we would like to build? So, data analysis for civil society is to examine that source that is becoming dynamic today, from the economic, from the productive, from our rights to be able to claim privacy over certain data that, in other ways, are public.
What are the main challenges faced by civil society when integrating data work in their organizations?
There are three extremely common challenges. The first is drowning in data. Many times, we want something and we don't know how to look for it well, which leads us to have more data than we need. In that need, two aspects are defined: the first is what question can I answer and, for an organization like Linterna Verde, it is very important to arrive at a second, how do we deliver the data. Do we make a statistical table or a story?
Different organizations have different teams that can differentially take advantage of these inputs. At Linterna, we often deliver narrative data supported by statistical tables, and an organization tells us: 'We found the statistical table very useful, but we didn't really look at the narrative you provided us with because we decided to create another one', or the opposite has also happened: 'We were fascinated by the narrative, please help us understand this table because there is no one in our team who really understands this concept'. So, a previous dialogue is key to help us deliver a product that is suitable for the team that is going to benefit from it.
The value of data science depends on the model of the research question or design.
The second, extremely common challenge is what is sometimes called dataism: believing that because something has passed through the magic of data science or statistics, it is more true, more authoritative, or more valuable than if it had not. The value of data science depends on the model of the research question or design. The data modeling has to fit the criteria of what we are asking and has to be valued, beyond the statistical, by experts in the field. Then comes a data process that generates results. So, it is not true that because something is statistical or data science it is more true.
Finally, the third challenge relates to privacy and data rights. Depending on the research question, we will often find things that it is not necessarily wise to show, or at least make public to everyone. This is because such disclosure could compromise the safety of already vulnerable individuals or because we frequently come across data related to potentially illegal activities. An inherent responsibility arises, then, about what is disclosed, to whom and how it is reported.
How could a civil society organization that does not specialize in data approach them and why should it do so?
They should do so because they will most likely find insights or knowledge supported by a large volume of data that answer essential questions for each organization.
Now, the first thing to do when working with data is to have a dialogue. This applies to both large and small organizations. All data science processes, to be useful, require a triptych: statisticians, technology experts and subject matter experts. It is the dialogue between these three expertisees that rationalizes and generates research questions to solve relevant problems in the social sector.
For a data science exercise to be fruitful, this intersection is necessary, because there is no technical tool that serves everyone from beginning to end. However, it is the subject matter experts who lead this dialogue; they are the ones who understand what the data are going to be used for and they are the ones who have to validate that the information they are receiving from the other two experts makes sense with reality and answers the right questions.
Considering that many NGOs and civil society organizations lack the technological infrastructure and technical talent for this, what would you recommend to take the first step?
Organizations can either get a human resource, either by training someone from the established team or by hiring an external technician. What you need in a data team is to have subject matter experts. So, if we are talking about a civilian organization, they would already have these experts; the next step would be to bring in a data engineer or technician who knows how to use technical tools.
On the other hand, artificial intelligence can help organizations use other data capture tools. Today, if I were to train in something for a civil society organization, it would be prompt engineering. This is nothing more than critical research design. It is the questions or interactions we have with a tool with a transformer, such as ChatGPT or Copilot, which gradually make the research question more complex, discarding some results and asking for others to be added. I am sure that a civil society organization with social scientists, lawyers, humanists or people who have worked in the field of social mobilization, have the capacity and the critical spirit necessary to train themselves in this.
How can predictive analytics be applied to monitor and evaluate the impact on organized civil society?
Due to the availability of data we have today, we are able to model comparisons between what happened in the past and what we wish to do in the future. It is a fact that we currently have the capacity to perform these analyses or statistical abstractions to project the outcome of an intervention or public policy model implemented by civil society organizations.
Statistics serve to say little about many things, and in-depth qualitative analysis usually serves to say a lot about a few things.
Twenty years ago, when I was an undergraduate, we did this by reading three or four case studies; today, with different tools, we can model a comparator that includes an overwhelming number of cases from a statistical rather than a qualitative perspective, with the risks and advantages of using statistics. Statistics serve to say little about many things, and in-depth qualitative analysis usually serves to say a lot about a few things.
Now, an important issue for civil society is to ask itself which data model to follow and to be very clear that statistics and big data do not justify practically anything, they do not contribute much more than a study based on three or four in-depth cases. Both have values and challenges, and a data scientist, from a very well-informed criterion, has to decide how far to believe the data. What ultimately justifies things is a research design logic that we must sustain and that has fundamentally political, moral and social values.
Where do you see the intersection between data science and social impact heading? What trends do you think will be important in the coming years?
There is no doubt that for any civil society organization, adopting data-driven solutions or analytical tools will be increasingly crucial, either to implement them directly or to understand them. This is because they will be facing other organizations or institutions equipped with these tools, which will be generating added value. So learning how to manage and respond to these capabilities is a must.
Now, there are two issues that I think are going to be very important in the next 20 years. First, the rights and duties related to data. The way we generate and handle our data is already regulated to some extent. However, the roles of consumers, producers or processors of data, especially with regard to rights such as intellectual property and privacy, are still subject to debate and definition. The question arises: if an organization or institution, whether public or private, derives additional benefit from the data collected, from whom and where does the data come from? By what right was it captured? For what purpose? What obligations arise once the data has been processed and insights extracted? What is the responsibility for storage or disposal of such data?
Second, the need for regulation is indisputable, since we are talking about a power dynamic. In our society, power relations are often subject to regulation. The specific details of what should be regulated are a more complex and deeper issue, beyond what can be addressed in a single hour of study. However, it is likely that many of these issues will be regulated in one way or another, or that as a society we will choose to regulate certain aspects and leave others unregulated.