Dataverse FAQ

Who may submit datasets to Dataverse?

Any employee (faculty/staff) or student of the University may submit datasets. 

Considerations:
George Mason University claims ownership rights to intellectual property, including data, generated with significant University resources. Therefore a researcher may not be able to claim sole ownership of his/her data and may need to consult with the University before placing the dataset in Dataverse. Please see University Policy 4011, Ownership and Maintenance of Research Records, for more information, or consult with the Office of Research for guidance as to whether you may make your data available through Dataverse.

Please consider whether other researchers or colleagues have rights to manage the release of the data, for example others involved with the grant, research activity, or laboratory, whether at George Mason or at another institution. If the answer to this question is yes, you must obtain their permission to deposit the data in the George Mason Universiy Dataverse.

What can be deposited in Dataverse?

Examples of appropriate data for deposit would include:

  • data already made publicly available through another repository
  • data required by a funding agency to be made publicly available (which does not include sensitive or confidential information)
  • data required by a journal to be made publicly available
  • scholarly data which does not include sensitive or confidential information

In all cases depositors should be careful to ensure that the content they submit contains no confidential or sensitive information. Sensitive and personally identifiable data is highly regulated under federal, state and University policy. Prior to supplying any data for public archiving and distribution, you must remove any confidential or sensitive information, student education records protected under FERPA, and all information that personally identifies any individual or that contains any information classified as highly sensitive under state or federal law, or George Mason policy.

For these purposes, highly sensitive data currently includes personal information that can lead to identity theft if exposed and health information that reveals an individual’s health condition and/or history of health services use. 

For details of each type, see the full text of the Removing Personally Identifiable Information checklist.

  1. Personal information that, if exposed, can lead to identity theft.
  2. Personally Identifiable Information (PII) is information that, if exposed, can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context.
  3. Health information that, if exposed, can reveal an individual’s health condition and/or history of health services use.

For more on University policy, see:

You should review the MARS Dataverse Dataset Public Deposit License now to gain a better understanding of the types of data that can appropriately be deposited for public access through Dataverse. You will also be asked to confirm the following points before data deposit:

  • That you have read the deposit license and affirm that you have the legal right and authorization to make this data publicly available online for world-wide unrestricted access through Dataverse
  • In preparing the data for public archiving and distribution, you have removed any confidential or sensitive information, student education records protected under FERPA, and all information that personally identifies any individual or that contains any information classified as highly sensitive under state or federal law, or George Mason University policy.
  • If the submission is based upon work that has been sponsored or supported by an agency or organization other than George Mason University such as the National Institutes of Health, the National Science Foundation, or a private sponsor or funder, you represent that you have fulfilled any right of review, confidentiality, or other obligations required by that contract or agreement.
  • You represent that you have made a reasonable effort to ensure that the data contained in your submission is accurate.
  • You represent that you have appropriately acknowledged other researchers whose work contributed to the data.

Are there restrictions on file type and size?

The limit on individual self-deposited files is 6 GB. If you have larger file sizes to upload or datasets larger than 50 GB, contact us.

Any file format will be accepted. While all file formats are accepted, please keep in mind that some file formats are more likely to remain readable in the future – for example, plain text files, or files in non-proprietary formats which are commonly used and for which conversion utilities are more likely to be available.

How will other users understand my data?

In order for other users to understand your data, it is advisable that you submit a readme file along with your data. The readme file should be a plain text or PDF file and include the following types of information, in as much detail as possible:

  • Abstract which includes a brief description of the study which generated the dataset, including methodology.
  • Information on the contents of specific files if more than one is included in the dataset.
  • Variable names which are unambiguous and consistent, especially if you are submitting multiple data files from the same study.
  • Row and column identifiers which are clearly identified and defined.
  • Explanation of codes and classification schemes used, especially an interpretation of any values that are not obvious.
  • Algorithms used to transform data.
  • File format and software (including version) used.
  • Terms of use which require end users to cite and acknowledge the data creator (you).
  • Citation reference if you have a particular one you wish others to use to acknowledge your work.
  • Citation and acknowledgement information if your data was generated by co-researchers, co-researchers at other labs, or co-researchers at other institutions.
  • Citation information if you re-purposed a data file from someone else. You should also include a link to their original data file.

Refer to these readme file templates from University of Virginia and Cornell University.  Additionally, we encourage including a link to publications (e.g., journal articles) that relate to the data being deposited, which will provide users with context and analysis for the data beyond the technical explanation of how the data was collected and how it is organized.

Why do I need to agree to the deposit license, and what does it say?

While individual facts and observations are not protected by copyright and are free for all to copy and reuse, datasets include not only observed facts but also expressive content that could qualify for copyright protection. Accordingly, George Mason University needs the author(s)’ permission to store, display, and distribute these aspects of the datasets, and members of the public need permission to download and reuse them. And, because some aspects of a dataset can raise concerns about privacy, confidentiality, and other important legal interests, we need you to represent that your data is free from these potential hazards.

Specifically, the deposit license requires you to affirm that:

  • You have the legal right and authorization to make this data publicly available online worldwide through Dataverse.
  • You have removed any confidential or sensitive information, including all information that personally identifies any individual or that contains any information that should not be made public under state or federal law, or George Mason University policy.
  • You have fulfilled any right of review, confidentiality, or other obligations imposed under any sponsored research agreement.
  • You have made a reasonable effort to ensure that the data contained in your submission is accurate.
  • You have appropriately acknowledged other researchers whose work contributed to the data.

The Deposit License confirms that you wish to make the material available through Dataverse to the public, to allow certain educational, non-commercial, public uses of the material under a CC0 license (by default) or your own custom license, and to allow for preservation by the University. It is a non-exclusive agreement, meaning you retain the rights to use your data however you like. However, if you use a CC0 license, you are surrendering the relatively thin rights you may have had to limit others’ use of copyrightable aspects of the dataset.

See the full text of the Deposit License for more information.

Who can access Dataverse to search for and download items?

Dataverse is an open access repository, meaning that anyone can search, view and download content. The default license, CC0 (public domain dedication), permits unlimited reuse by anyone who accesses the repository. This is consistent with the purpose of open data repositories, namely, to encourage the widest possible reuse of data. You may choose to use a custom license to limit reuse. This will make your data less useful to researchers, but may be required by some funders or institutions. Contact datahelp@gmu.edu if you need to modify the default license.

May I delete or change a dataset that I have added to Dataverse?

Because the repository is meant for scholarly work that is as close as possible to its final or published form, items cannot be deleted once they are deposited in Dataverse. Scholars should only deposit in Dataverse the version to which they intend to provide permanent open access.

If the Library determines or is made aware that a dataset has been deposited which contains non-public sensitive data or information, it retains the option of removing public access to the data at its discretion. If you, the depositor, become aware of a rights or privacy problem with your data deposit, we ask that you let us know immediately and the Library will take prompt appropriate action.

Are items in Dataverse guaranteed to remain available in perpetuity?

University Libraries and the Office of Research Computing are committed to the durability and sustainability of scholarship deposited in Dataverse. Dataverse uses standard data management practices, including security and backup procedures, to provide a reasonable assurance that files will remain retrievable over time. However since permanent access is not a guarantee with any technology, we urge scholars to keep personal copies of their files.

The university reserves the right to remove content that is deposited out of compliance with the standard terms set forth in the Data Deposit License, or content that is deemed inappropriate for public viewing.

How can I find more information on using and depositing in Dataverse?

How do I get started depositing in Dataverse?

The best place to start is with the Data Deposit Checklist.

If you have any questions, please don't hesitate to contact us at datahelp@gmu.edu.