Contextual Design Research

Methodology

Central to providing open data to the public is understanding what datasets are currently being utilized by city departments. Our group is charged with designing a sustainable process, called Data Governance Health Status Checks, that will promote continuous updating of data inventories. These inventories will be, at first, simple spreadsheets that accurately list and describe all the data sets that are used by the various city departments. The process should include smooth check-in procedures so designated data stewards can update and report inventories to the Data Services Team. To better understand how data is used and to discover problem areas with data management in city departments, the Data Services Team has asked that we interview managers of those departments who oversee data usage and data systems.

Our role in this project was to conduct research that would inform the development of Data Governance Health Status Checks. The research goal was to identify a sustainable approach to maintaining data inventories and publishing open data, to formalize and standardize data governance for accountability and for reporting and measuring progress, and to identify threats to the success of these efforts. The design of the Data Governance Health Status Checks must take into consideration that the City of Pittsburgh is a multi-faceted organization with numerous departments that perform unique functions but that are interdependent in their use of data. Therefore, the research we conducted for this project employed a contextual design approach, which enabled us to gain insight into several department’s practices and experiences around data use and management through a series of interviews designed for collecting relevant qualitative data. We consolidated the data and modeled the practices and experiences to reveal possible user-driven designs for the Data Governance Health Status Checks.

To ensure the validity and reliability of this research, we developed an interview protocol to make sure each interviewee was asked the same questions about data use, data management, and technology. To solicit candid responses, we informed each interviewee that their responses were confidential and that they could decline to answer any question. The interview protocol provided consistency and focus; however, we conducted each interview in a semi-structured format to create a relaxed atmosphere and to foster conversation that was partially directed by the user. This enabled us to gain deeper insights into each user’s world. At least two people attended each interview with one person being the designated note taker. We held interpretation sessions to recall each interview and refine our notes.

To collect our data, we interviewed seven managers from six bureaus and departments in video conferences. We organized the responses from each interview into separate files that we stored in our shared Google file system. To prevent readers from identifying interviewees from their responses, we anonymized each interviewee by assigning them a code, which consists of a letter to indicate the department and a number to indicate the interviewee. We used these codes instead of their names when we referred to them in models, diagram notes, and analyses.

From these interviews, we gained significant insight into the ways city departments create, use, and manage their data and the difficulties they encounter. Our team met weekly to discuss interviews, analyze the data, and build models. We built a team consolidated model, Data in the Life, to illustrate the lifecycle of data within City of Pittsburgh bureaus and departments. Each of us created an individual consolidated model to illustrate other patterns revealed by the data, and these include a persona model, identity model, collaboration model, and work sequence model. Finally, as a team, we created an affinity diagram to organize all the responses into one cohesive diagram to reveal concerns or problems common to all departments and interviewees.

Our team had numerous resources available to help us analyze the data and to create the models and affinity diagram. We used Zoom or Teams to conduct the interviews, to meet with our partner, and to meet as a team to work on the affinity diagram, team model, and corresponding analyses and presentations. We used the free versions of the online whiteboards MIRO and Mural or Microsoft Word to create the models and affinity diagram. Finally, we used a Google shared folder to organize our files.

Through our research, we have identified three primary threats to consistent, ongoing data inventorying and open data publication. We have also identified common practices and problems around data use and management. All of this allows us to recommend with confidence several possible solutions for formalizing and standardizing data governance into regular Data Governance Health Status Checks that takes into consideration the current data climate and incorporates appropriate steps to promote climate control.

Findings

Through qualitative research involving interviews from a contextual design perspective, we gained significant insight into the ways city departments create, use, and manage their data and the difficulties they encounter. We also identified the following threats to consistent, ongoing data inventorying and open data publication:

  • Poor data management practices due to the lack of resources
  • Lack of policy governing the management of analog data
  • Lack of policy and procedures surrounding the Open Data Ordinance
  • Undefined roles and responsibilities with respect to open data publication
  • Conflicts in reporting obligations
  • Limited awareness of the collaborative role between I&P and city departments
  • Lack of understanding of the requirements of the Open Data Ordinance

It was noted that the city departments did not feel that they were adequately qualified to manage data in compliance with the Open Data Ordinance. The departments also felt that their obligatory reporting to local, state, and federal agencies would take precedence over publishing open data. It also revealed interviewees identification as data users and lack of time or skills as the biggest threats to the Open Data Health Status Checks. Furthermore, interviewees admitted little knowledge of the ordinance itself and did not realize that much of the work they already perform is a step in providing data that would supply datasets for the open data ordinance.

Recommendations

The value of the qualitative research that we conducted is that the resulting solution(s) to the city’s problem will be user-centered. This is important because the city managers must take ownership of their data and their responsibilities with respect to open data publication. Solutions that take their current practices and struggles into consideration offer the best chance of reducing or eliminating their concerns as well as the existing threats to consistent, ongoing data inventorying and open data publication.

The contextual design interviews revealed many pain points and ineffective methods that could threaten the city’s efforts to comply with the Open Data Ordinance. The biggest threats to compliance are lack of good data management practices and identification as a data user, not a data manager. The Data Governance Health Status Checks will formalize data governance around publishing open data and provide a system of support and accountability. We recommend including one or more of the following designs in the health status checks to support departments and to monitor progress with data inventorying and open data publication:

  • Center Data Governance Health Status Checks around FOIL requests and data shared with other departments or agencies.

    • Define a set of expected datasets from past Freedom of Information Legislation (FOIL) requests, acquired from the Legal Department from the Legal Department who receives and distributes requests to the departments. Consolidate similar datasets.
    • Prioritize datasets by frequency of the request and by frequency of the requesting organization, also acquired from the Legal Department.
    • Name data stewards and data coordinators based on who fills FOIL requests, also acquired from the Legal Department.
    • Define additional expected datasets by what departments report to the State and Federal governments, agencies, or other organizations.
    • Require data stewards to save all datasets from FOIL requests and government reporting. After distributing the request to the requesting individual or organization, data stewards will save the dataset to a shared drive. A standard filename format will be defined for consistency and the metadata will link the dataset to its data inventory entry.
  • Create a web application to assist users in keeping up to date with data inventories. See San Francisco’s data inventory submission form: https://datasf.org/publishing/.
    • A web form will act as a data inventory and metadata wizard for departments to enter information about their datasets.
    • This information will be stored in a database and can be filtered and exported to CSV files.
    • Data inventory entries can be viewed as web page reports and edited.
    • Data inventory entries can be copied and modified to reduce the time involved in entering records that are similar to existing records.
    • Whenever possible, data inventory fields will have a defined vocabulary, and users will be able to select an option from a list.
  • Define procedures to remind and monitor progress.
    • Send regular reminder emails to complete data inventories.
    • Monitor data inventory and metadata entries, whether completed in the original Excel spreadsheet or through a web form. Someone from I&P will review the data inventory and metadata entries regularly and provide feedback.
  • Develop submission guidelines and make them available as a web page and/or poster, similar to San Francisco’s: https://datasf.org/publishing/submission-guidelines/.
    • Define data steward and data coordinator roles.
    • Define steps for publishing open data: planning, identifying sensitive data, documenting data, etc.
  • Offer incentives for publishing data. Distribute at meetings with all departments, VOLTRON or start new meetings around open data. Define the criteria and define the expected number of datasets for each department (based on raw count and percentage out of expected).

  • Distribute a quarterly newsletter to highlight datasets that were published last quarter, projected datasets for the next quarter, and a department of the month (select based on raw number published, percentage out of expected, or growth/improvement over past quarters). Provide visualizations and tables of published datasets, expected datasets, missing inventories or missing fields or metadata from completed inventories.

Conclusion

A user-centered design for the Data Governance Health Status Checks is important for this project because the city managers must take ownership of their data and their responsibilities with respect to open data publication. The most impactful solutions will be those that take their current practices and struggles into consideration.

We understand that some of the recommendations for Data Governance Health Status Checks mentioned in this report could be costly. We recommend that the City of Pittsburgh consider starting with inventorying datasets that are already shared with other entities, i.e., data from government reports, Right-to-Know requests, or interdepartmental requests, and implementing a technological solution to assist departments with understanding and completing data inventories. The following solutions have moderate to high impact potential and are achievable in the short term:

  • Center Data Governance Health Status Checks around FOIL requests and data shared with other departments or agencies.
  • Create a simple web application to assist users in keeping up to date with data inventories.
  • Define procedures to remind and monitor progress.
  • Develop submission guidelines and make them available as a web page and/or poster.
  • Distribute a quarterly newsletter to highlight datasets that were published last quarter and projected datasets for the next quarter.