The open data movement in the area of access to public (and other) information is a relatively new but very significant, and potentially powerful, emerging force. It has now been widely endorsed by among others Tim Berners-Lee generally acknowledged as the Father of the World Wide Web. The overall intention is to make local, regional and national data (and particularly publicly acquired data) available in a form that allows for direct manipulation using software tools as for example, for the purposes of cross-tabulation, visualization, mapping and so on.
The underlying idea is that public (and other) data, whether collected directly as part of census collection or indirectly as a secondary output of other activities (crime or accident statistics for example) should be available in electronic form and accessible via the web. There are significant initiatives in this area underway in the US , the UK and Canada among many many other jurisdictions.
This drive towards increased public transparency and allowing for enhanced data enriched citizen/public engagement in policy and other analysis and assessment is certainly a very positive outcome of public computing and online tools for data management and manipulation. However, as with the earlier discussion concerning the “digital divide” there would, in this context, appear to be some confusion as between movements to enhance citizen “access” to data and the related issues concerning enhancing citizen “use” of this data as part, for example, of interventions concerning public policies and programs.
In an earlier paper dealing with the digital divide discussion I suggested the use of the concept of “effective use” to distinguish between the opportunity for digitally-enabled activity presented by ICT access, from the actual realization of those opportunities in the form of “effective use”. At that time I introduced a set of layers of requirements, which can be understood as “pre-conditions” for the realization of “effective use” of digital “access”.
Efforts to extend access to “data” will perhaps inevitably create a “data divide” parallel to the oft-discussed “digital divide” between those who have access to data which could have significance in their daily lives and those who don’t. Associated with this will, one can assume, be many of the same background conditions which have been identified as likely reasons for the digital divide—that is differences in income, education, literacy and so on. However, just as with the “digital divide”, these divisions don’t simply stop or be resolved with the provision of digital (or data) “access”. What is necessary as well, is that those for whom access is being provided are in a position to actually make use of the now available access (to the Internet or to data) in ways that are meaningful and beneficial for them.
The question then becomes, who is in a position to make “effective use” of this newly available data?
The suggestion implicit in most of the discussions on “open data” (and explicit in Berners-Lee’s above quoted talk) is that “everyone” has the potential to make use of the data. However, as we know from experience elsewhere, not “everyone” has access to the digital infrastructure, to the hardware or software, or to the financial or educational resources/skills which would allow for the effective use of data or any other digital resource. Thus rather than the entire range of potential users being able to translate their access into meaningful applications and uses, the lack of these foundational requirements means that the exciting new outcomes available from open data are available only to those who are already reasonably well provided for technologically and with other resources.
The example that Berners-Lee quotes concerning the role of the data mashup in the Zanesville lawsuit is an interesting case in point. In this instance, the direct creators of the mashup were the Cedar Grove Institute a public interest consulting firm specializing in GIS applications and employing several leading Ph.D. GIS specialists and with a U. of North Carolina, MBA as the CEO. The lawyer who argued the case and presumably who so effectively deployed the mashup is a Harvard law school graduate.
Of course, there is nothing wrong with this, nor with the outcome of their intervention and their use of open data—in fact, as with Berners-Lee, I think this is an exemplary case of the positive benefits for people that can come from open data.
However, this is a very very long way from what folks like Berners-Lee seem to be asserting which is that “open data” empowers everyone. In fact, the example indicates precisely the opposite, that is, that “open data” empowers those with access to the basic infrastructure and the background knowledge and skills to make use of the data for specific ends. Given in fact, that these above mentioned resources are more likely to be found among those who already overall have access to and the resources for making effective use of digitally available information one could suggest that a primary impact of “open data” may be to further empower and enrich the already empowered and the well provided for rather than those most in need of the benefits of such new developments (unless of course, they have means or the luck to find benefactors such as the Cedar Grove Institute or Harvard Law School graduates willing to work pro bono or on a contingency basis).
A very interesting and well-documented example of this empowering of the empowered can be found in the work of Solly Benjamin and his colleagues looking at the impact of the digitization of land records in Bangalore. Their findings were that newly available access to land ownership and title information in Bangalore was primarily being put to use by middle and upper income people and by corporations to gain ownership of land from the marginalized and the poor. The newly digitized and openly accessible data allowed the well to do to take the information provided and use that as the basis for instructions to land surveyors and lawyers and others to challenge titles, exploit gaps in title, take advantage of mistakes in documentation, identify opportunities and targets for bribery, among others. They were able to directly translate their enhanced access to the information along with their already available access to capital and professional skills into unequal contests around land titles, court actions, offers of purchase and so on for self-benefit and to further marginalize those already marginalized.
Certainly the newly digitized information was “accessible” to all on an equal basis but the availability of resources to translate that “access” into a beneficial “effective use” was directly proportional to the already existing resources available to those to whom the access was being provided. The old story about the pauper and the millionaire having equal opportunity to purchase a printing press as a means to promote their interests can be seen as holding equally here as in the 19th century.
Benjamin’s meticulously documented paper shows how the digitization and related digital access to land title records in Bangalore had the direct effect of shifting power and wealth to those with the financial resources and skills to use this information in self-interested ways. This is not to suggest that processes of computerization inevitably lead to such outcomes but rather to say that in the absence of efforts to equalize the playing field with respect to enabling opportunities for the use of newly available data, the end result may be increased social divides rather than reduced ones particularly with respect to the already poor and marginalized.
As well, this is not to argue against “open data” which in fact is a very significant advance and support to broad-based democratic action and empowerment but rather to argue that in the absence of specific efforts to ensure the widest possible availability of the pre-requisites for “effective use” the outcome of “open data” may be quite the opposite to that which is anticipated (and presumably desired) by its strongest proponents.
An “effective use” approach to open data would thus be one that ensured that opportunities and resources for translating this open data into useful outcomes would be available (and adapted) for the widest possible range of users. Thus, to ensure the effective use of open data a range of considerations needs to be included in the open data process and as elements in the open data movement including such factors as the cost and availability of Internet access, the language in which the data is presented, the technical or professional requirements for interpreting and making use of the data, the availability of training in data use and visualization, among others.
An interesting example of how open data, with appropriate attention being given to some of these pre-conditions, in fact can provide a basis for effective use can be seen in how the UCLA Centre for Health Policy Research’s California Health Interview Survey (CHIS) has been put to use by Community Advocates in Solano County. The CHPR conducts a bi-annual California Health Interview Survey in conjunction with the California Department of Health “to provide a snapshot of the health and healthcare of Californians”. The survey is used by a range of political authorities but most interestingly they provide free and widely accessible training on how to use the information “to develop appropriate and targeted policy responses” and overall “to learn how to use and apply the data to improve health and health care”. That is, the information is not only made accessible but attention is paid and resources are provided to ensure that the data is usable by those who might make effective use of it.
In this instance, the Solana County Community Advocates were trained so as to be able to take the data provided by the CHIS, and plot incidences of asthma by local electoral district. They were then able to create a map showing an extremely high frequency of asthma among residents in a particular local area. The Community Advocates successfully argued against developing another truck stop along I-80 in the county based on CHIS 2001 data estimates that showed Solano County to have the state’s highest rate of asthma symptom prevalence overall and one of the highest rates for children.
While in many respects this example parallels the earlier one from North Carolina the difference here is that the skills required for doing the analysis of the online data were provided through training to the local community who were then able to mount a local campaign to achieve the desired end. The key difference here was the attention that was paid by the provider of the information, the CHPR to ensuring that the data could be effectively used without the need for highly skilled (and expensive) professional intermediaries. This involved the development of end user oriented training programs.
In this instance it should be noted that Internet access, bandwidth, the language of the data among other factors were not an issue. However, in other circumstances such as for example among indigenous peoples, non-English speakers, the very poor, those living in areas with poor connectivity and so on, these issues will be inhibitors of use of open data and a responsible intervener would be concerned to ensure that these issues were attended to as part of an open data program.
(Additionally, the difficulties and types of interventions required to ensure that effective use can be made of information by the intended clients can be found in the very interesting report from Shelter in the UK on Social inclusion in the Digital Age. This report documents a very useful approach to providing some of the tools needed for effective use of online information by those to whom that information is being directed and who would necessarily be those who could make the most active and effective use of that information—information on housing for the homeless being made available for use by the homeless themselves. (Regrettably the project being reported upon has been canceled by the UK government.)
For a more detailed discussion on “effective data use” and overcoming the “data divide” see the next blogpost at: