Tutorial

Thank you for contributing to the U.S. City Open Data Census, a joint community effort undertaken by contributors around the country, just like you.

The following tutorial will help you make your contribution to the U.S. City Open Data Census.

If you have questions that are not answered by this site, you can email usopendatacensus@gmail.com or post on Open Knowledge International's discussion forums.

Contributing a new submission to the Census

There are two main ways you can contribute to the U.S. City Open Data Census:

You will be asked a series of questions about the dataset as well as your knowledge about open data. This enables us to understand the submitters, as well as more subjective questions concerning data findability and usability. If you would like more details for what a particular question is asking, click the question-mark sign next to that question. It contains help text that is prompted once you click on the sign.

Additional tips for specific questions

B1. Is the data collected by the government (or an official third party on behalf of the government)?

The Census seeks to measure the publication of open government data. It is therefore important to ensure that your submission refers to data that government is responsible for. A submission is only valid if the government is responsible for producing, managing, or publishing the submitted dataset. Usually, data will be found on a government website. In the rare cases where it is somewhere else but is the official location — this may be the case for state-owned enterprises or contractors delivering public services for government — you can verify this by: looking for a disclaimer on a website that the data is produced by government or calling a government official. Answer “Yes” if the data is collected by government, or a third party officially representing government. Tell us which agency collects or provides the data. If you cannot find evidence that the government does collect the data, provide a comment explaining why.

B2. Is the data available online without the need to register or request access to the data?

This question measures if the data is accessible online from the government without mandatory bureaucratic barriers. It is important that no such barriers exist because they can deter people from accessing data. Answer “Yes”, if the data is made available by the government on a public website. Answer “No” if the data is NOT available online or are available online only after registering, requesting the data from a civil servant via email, completing a contact form, or going through some other administrative process.

B2.2 Where did you find the data?

This question evaluates whether the dataset is stored centrally or across several websites and sub-pages. The question of where data is published is important to address the issue of data findability. Tell us the different websites, as well as sub-pages on a website (if applicable) where you found dataset characteristics. If you find the same information on several websites and sub-pages, only document those that enable you to answer the following questions with “Yes.”

B3. Please confirm that the following characteristics are present in the data published online by the government.

Tell us whether you can find all required characteristics online, checking the box for each one as you verify that the dataset has it. This is important to make sure that the dataset has all of the information required. If the dataset does not have all required components, you should mark the dataset as missing and explain why with a comment.

If you are unsure whether a part is contained in the dataset or not, you can ask on the Open Knowledge discussion forum.

B4. Is the data available free of charge?

The data is free if you don’t have to pay for it.

B5. Is the dataset downloadable at once?

This question measures whether you can download all information on your computer with easy steps. Downloads can be organized by month or year or broken down into sub­files for very large data files. It is important that the download is feasible in a few easy steps. Answer “No” if you can only view the data (but not download it), if you have to do many manual steps to download the data, or if you can only retrieve very few parts of a large dataset at a time (for instance through a search interface).

B6. Data should be updated every [TIME INTERVAL]: Is the data up-to-date?

Often, data is only useful if it is provided in a timely manner. But different information needs to be provided in different time intervals. For example, data on 311 requests should ideally be updated daily, while annual budget data is only published once a year. This question measures if data is provided in a timely manner. Please base your answer on the date at which you answer this question. The dataset may have metadata that says how often the dataset is updated; also check to see when it was most recently updated. Be careful with publication dates and check whether the published data is actually up-to-date. Answer “No” if you cannot determine a date, or if the data is outdated. Please also use the comment function to explain why the data is outdated.

B7. Is the data openly licensed/in the public domain?

This question measures if anyone is legally allowed to use, modify and redistribute data for any purpose. Only then is data considered truly "open" (see the Open Definition). Answer ”Yes” if the data are openly licensed. The Open Definition provides a list of conformant licenses. Answer also “Yes” if there is no open license, but a statement that the dataset is in “public domain.” To count as public domain the dataset must not be protected by copyright, patents or similar restrictions. Statements of public domain status can be disclaimers, terms of use, or legal documents such as national access to information law. If you are not sure whether a disclaimer is compliant with the Open Definition 2.1, seek feedback on the Open Knowledge discussion forum.

B8. Which of the following formats is the data in? If none (for example, if it’s a PDF), please make a note in the comments and move on to the next question.

Tell us the file formats of the data. We automatically compare them against a list of file formats that are considered machine-readable and open. A file format is called machine-readable if your computer can process, access, and modify the dataset characteristics (see B3) that you find in a data file. Worked example: if you find a JPG image of a map embedded on a website it is not considered machine-readable. The Census considers formats to be “open” if they can be fully processed with at least one free and open-source software tool. These formats allow more people to use the data, because people do not need to buy specific software to open it. Important: The Census uses a less rigid definition of open formats. It may consider some file formats to be open even if their source code is not. What counts is that the file format can be fully processed with an open-source software tool.

B9. Please provide an assessment of how easily the data are usable without human effort. Select 1 if extensive effort is required to make data usable, select 2 if some effort is required and select 3 if little to no effort is required.

Data may be in a machine-readable format like an XLS spreadsheet. But they might contain unstructured information (like notes randomly written in a column). Such data often has to be cleaned to become usable. Tell us the effort it takes for you to use the data. Document any relevant feedback on usability in the comment section.

Commenting on a submission

The Census allows only one submission per one dataset. However, you can still help by commenting on a current submission and propose changes by creating a new topic in the Open Knowledge discussion forum. Leaving detailed notes in the comment field supports the review process, too.