Tutorial Page 2016-17

Thank you for contributing to the US Open Data Census, a joint community effort undertaken by contributors around the world, just like you.

The following tutorial will help you make your contribution to the US Open Data Census.

Do not be afraid to make mistakes; a global community of contributors past and present are are here to help you out every step of the way in the Open Data index forum :-) How does the Index work?

Contributing a new submission to the Census

In the current Index you can contribute in 3 different ways: Enter a new submission for 2016: you’ll see an ‘Add’ button under the dataset. Click on it and add your submission.

You’ll be asked a series of questions about the dataset as well as your knowledge about open data. This enables us to understand the submitters, as well as more subjective questions concerning data findability and usability. We know that some questions might not be self-explaining. There is a question mark sign next to a question. It contains a help text that that is prompted once you click on the sign. We have built some of the questionnaire logic directly into the interface. This should give you more confidence in your responses.

In the following, you will find further tips for each specific question.

B1. Are the data collected by the government (or an official third party on behalf of the government)? The Index seeks to measure the publication of open government data. It is therefore important to ensure that your submission refers to data that government is responsible of. A submission is only valid if the government is responsible for producing, managing or publishing the submitted dataset. To find evidence, you can look for a disclaimer on a website that data is produced by government, call a government official, or google the government agency that is most likely to hold/collect the data. Answer “Yes” if the data are collected by government, or a third party officially representing government. This may be the case for state-owned-enterprises or contractors delivering public services for government. Tell us which agency collects or provides the data, so reviewers can understand whether it is truly government data. If you cannot find evidence that government does collect the data, provide some explaining words as to why. This is very important information so that reviewers can double check submissions, and to understand the country context.

B2. Are the data available online without the need to register or request access to the data? This question measures if the data are accessible online from government without mandatory bureaucratic barriers. It is important that no such barriers exist because they can deter people from accessing data. Answer “Yes”, if the data are made available by the government on a public website. Answer “No” if the data are NOT available online or are available online only after registering, requesting the data from a civil servant via email, completing a contact form or another similar administrative process. The Index will allow you to answer this question with “Yes” even if you do not find all necessary dataset characteristics (see B3) online. Your submission is important information about the availability of these characteristics. Your submission data will be stored and processed for a final research report. In a review phase our reviewers will verify your submission again. If not all characteristics are available online, they will answer question B2 with “No”. For more details see our methodology page.

B2.2 Where did you find the data? This question evaluates whether the dataset is stored centrally or across several websites and sub-pages. The question where data is published is important to address the issue of data findability. Tell us the different websites, as well as sub-pages on a website (if applicable) where you found single dataset characteristics. If you find the same information on several websites and sub-pages, only document those that enable you to answer the following questions with “Yes”.

B3. Please confirm that the following characteristics are present in the data published online by the government: Tell us, whether you can find all required characteristics online. This is important for us to see what kind of dataset you evaluate. If submitters cannot find all data characteristics online, they continue answering all further questions referring to the characteristics they found. This information will be stored for research. Reviewers will filter all submissions that refer to partial datasets (not containing all characteristics), and only assess data sets that contain all characteristics. All following questions have to apply to these dataset characteristics. Worked example: if you look for government spending data, check if the datafile contains department, money spent, date, vendor, etc. Check if all this data is openly licensed, available to download, etc. Answer “No” in cases where the questions only apply to some characteristics and use the comment section to tell for which characteristics it applies.

If you are unsure whether a part is contained in the dataset or not, ask the Open Data Index forum.

B4. Is the data available free of charge? The data is free if you don’t have to pay for it.

B5. Is the dataset downloadable at once? This question measures whether you can download all information on your computer with easy steps. Downloads can be organised by month or year or broken down into sub­files for very large data files. It is important that the download is feasible in a few easy steps. Answer “No” if if you have to do many manual steps to download the data, or if you can only retrieve very few parts of a large dataset at a time (for instance through a search interface).

B6. Data should be updated every [TIME INTERVAL]: Is the data up-to-date? Often, data is only useful if it is provided in a timely manner. But different information needs to be provided in different time intervals. Traffic data is mainly needed in real-time, election data should be accessible immediately after an election, while national census information might not change for years. This question measures if data is provided in a timely manner. Please base your answer on the date at which you answer this question. Sometimes the data gives you an indication, to which date it refers (for example, weather forecasts refer to specific dates). Be careful with publication dates and check whether the published data is actually up-to-date. Answer “No” if you cannot determine a date, or if the data are outdated. Please also use the comment function why the data is outdated.

B7. Is the data openly licensed/in public domain? This question measures if anyone is legally allowed to use, modify and redistribute data for any purpose. Only then data is considered truly "open" (see also the Open Definition). Answer ”Yes” if the data are openly licensed. The Open Definition provides a list of conformant licenses. Answer also “Yes” if there is no open license, but a statement that the dataset is in “public domain”. To count as public domain the dataset must not be protected by copyright, patents or similar restrictions. Statements of public domain status can be disclaimers, terms of use, or legal documents such as national access to information law. If you are not sure whether a disclaimer is compliant with the Open Definition 2.1, seek feedback on the Open Data Index discussion forum.

B8. In which formats are the data? Tell us the file formats of the data. We automatically compare them against a list of file formats that are considered machine-readable and open. A file format is called machine-readable if your computer can process, access, and modify the dataset characteristics (see B3) that you find in a data file. Worked example: if you find a .jpg-image of a national map embedded on a website it is not considered machine-readable. The Index considers formats to be “open” if they can be fully processed with at least one free and open-source software tool. Potentially these formats allow more people to use the data, because people do not need to buy specific software to open it. Important: The Index uses a less rigid definition of open formats. It may consider some file formats to be open even if their source code is not. What counts is that the file format can be fully processed with an open-source software tool.

B9. Please provide an assessment of how easily the data are usable without human effort. Select 1 if extensive effort is required to make data usable, select 2 if some effort is required and select 3 if little to no effort is required. Data may be in a machine-readable format like an .xls spreadsheet. But they might contain unstructured information (like notes randomly written in a column). Such data often has to be cleaned to become usable. Tell us the effort it takes for you to use the data. Base your assessment on the question whether the data are fit for your use cases. Document any relevant feedback on usability in the comment section.

Commenting on a submission

The Index allows only one submission per one dataset. However, you can still help by commenting on a current submission and propose changes by creating a new topic in our www.discuss.okfn.org forum. Leaving detailed notes in the comment field supports the review process, too.

Here are some tips for leaving helpful comments:

If you’ve determined that a particular dataset is not collected by government, let us know where you found evidence (and what else led you to believe that it isn’t collected by government). If the dataset is not available online without restrictions to access, let us know how one could get a hold of the data, if at all. If the data is only available through a freedom of information request (FOIA), give us an indication of what is involved in making that request.

Leave comments on the availability of dataset characteristics, and whether only some parts of them are open. Where do you find the data? Do you have to enter different websites? Is it hard for you to find the data? Tell us more about the parts of the dataset you can download: is all data readily downloadable? Is some data published online, but not downloadable? Do you face barriers that prevent you from easily downloading all the required data? If you are unsure as to whether a license is open or not, answer “No” and indicate why in the comments field. Also document which legislation tells you more about the public domain status of the data. Is the public domain status clear to you? If not, why? Include information and/or links to licenses or terms of use pages so the reviewer can quickly make a second assessment.

If you are unsure whether or not a file type is machine readable or not, mark “No” as an answer and explain your rationale in the comment section. Let us know why you think the data is either made available in a timely manner or not. It is important for us to understand the local context as much as possible. For example, different places have different legislative and governmental spending cycles resulting in differences when spending data counts as up-to-date. Such information is invaluable help for us.