We now have many options for archiving data sets online:
Dryad, KNB, Ecological Archives, Ecology Data Papers, Ecological Data, etc.
However, these portals largely do not communicate with one another as far as I know, and there is no way to search over all data set sources, again, as far as I know. So, I wonder if it would ease finding of all these different data sets to get these different sites to get their data sets cloned on a site like Infochimps, or have links from Infochimps. Infochimps already has APIs (and there’s an R wrapper for the Infochimps API already set up here: http://cran.r-project.org/web/packages/infochimps/index.html by Drew Conway), and they have discussions set up there, etc.
Does it make sense to post data sets linked to published works on Infochimps? I think probably not know that I think about it. But perhaps it makes sense for other data sets, or subsets of data sets that are not linked with published works to be posted there as I know at least Dryad only accepts data sets linked with published papers.
One use case is there was a tweet from someone recently that his students were excited about getting their data sets on their resume/CV, but didn’t think there was a way to put them any place where there wasn’t a precondition that the data set was linked with a published work. Seems like this could be a good opportunity to place these datasets on Infcohimps, and at least they are available then where a lot of people are searching for data sets, etc.
What I think would be ideal is if Dryad, KNB, etc. could link their datasets to Infochimps, where they could be found, then users can either get them from Infochimps, or perhaps you would have to go to the Dryad site, e.g. But at least you could search over all ecological data sets then.