What is Open Data?
July 15, 2019 - 6 minutes readThe DAP is an Open Data project. This means that the Dog Aging Project data will be available to the public. Anyone who is interested will be allowed to analyze the data we collect and to give presentations and write papers about their discoveries.
The data we collect will include all kinds of information about each dog and about the home and surrounding environment in which the dog lives. We will tell you about all of these data in more detail elsewhere, but to give you some idea, here’s a list of relatively simple information that we will be collecting:
- Height, weight, and age of the dog
- The zip code where the dog lives
- Number of dogs and other non-human animals in the home
- Number of humans in the home
- Medical records
- What kind of food the dog eats
- What sorts of activities the dog likes
We will also collect more technical information, including things like:
- The sequence of the dog’s genetic material
- Measures of hundreds of different molecules found in the dog’s blood
- Levels of air pollutants in the dog’s neighborhood
- The diversity of microbes in the dog’s intestinal tract
- Data from activity monitors about how much time the dog spends walking, running, or lying down
As you can see from these lists, we will be collecting a LOT of data about each canine participant. Given that the Dog Aging Project will follow tens of thousands of dogs, we expect that each year the data we collect will add up to about 100 terabytes. (A single terabyte is one trillion bytes of data!) The laptop I am using to write this paragraph can store a total of 1 terabyte of data. Each year we will collect enough data to completely fill 100 laptops like mine.
Almost all of these data will be accessible by anyone who is interested in learning more about dogs. I say “almost” because while the idea of Open Data is a core value of the Dog Aging Project so is protecting the privacy and confidentiality of the dog owners who are so generously sharing their data with us. We will not share any information that could reveal the actual identity of our human and canine participants or any protected information about these individuals.
We are not required to make all of our data publicly available, but we believe that it is the right thing to do. It will make the science we do better and make the Dog Aging Project a resource not just for our research team but for everyone.
There are many reasons why people in the scientific community have argued for the importance of Open Data. Here are two that we think are especially important:
- The way science works is that people come up with ideas, carry out studies to test these ideas, and publish their results in peer-reviewed papers. Other people can then verify that the initial studies were carefully done and that the results are valid. This process of continuing validation allows us to trust what has come before and to build upon that previous knowledge, learning more over time. The science that we do should be repeatable, and the scientists doing that work need to be accountable for the science they do. By allowing outsiders to access our data, they can effectively check our work, make sure we got everything right, and quickly fix errors when they do arise.
- Even though the Dog Aging Project has a big team, we will collect much more data than we ourselves could analyze in our entire careers, and there are many more questions than our team has the expertise to ask. By making the data publicly available, the Dog Aging Project and the dedicated and generous dog owners who are sharing their data will benefit not just from a few dozen really smart minds but from hundreds and even thousands of smart, creative people trying to learn from the Dog Aging Project data.
The Open Data approach encourages collaboration, cooperation, and contribution to foster the spread of knowledge. It also recognizes that the dog-owning community has shown enormous generosity in sharing data about their dogs and the environment in which each dog lives with our research team. We see ourselves as stewards of these data. It’s our responsibility to learn everything we can from these data and share this knowledge with our participants and with the next generation of scientists. We believe that this approach leads to the ultimate betterment of dogs and humankind.
Daniel Promislow
Principal Investigator