Distributed Data
February 4, 2014 1:12 PM Subscribe

Academics launch torrent site to share papers and datasets

“Sharing data is hard. Emails have size limits, and setting up servers is too much work. We’ve designed a distributed system for sharing enormous datasets – for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds,” Cohen and Lo explain.

AcademicTorrents allows researchers to upload datasets, articles and other research material. The site runs it own tracker and supports web-seeds as well, which guarantee that files are available at all times.

The focus on easily allowing sharing of large data sets differentiates Academic Torrents from similar open research initiatives such as the arXiv and the bioRxiv (previously).

posted by eviemath (11 comments total) 65 users marked this as a favorite

The role of commercial services in the future is going to be discovery, not "article renting." It's a pity that the commercial publishers don't realize this. The internet has disaggregated the journal, so the concept barely makes sense any more. The coveted Impact Factor is just as dead. It really never made sense -- your article never had more impact just by virtue of sitting next to a well-cited article (although it might have had a mildly better chance of being read) -- but now people looking at that article don't even see yours , since they are getting there by using a discovery tool of some sort rather than leafing through the paper journal. The Impact Factor got traction half a century ago because it could be calculated; now we have much richer data, so why do faculty and (especially) administrators cling to it as a badge of quality?
posted by GenjiandProust at 3:16 PM on February 4, 2014 [1 favorite]

Oh man, this is going to be awesome for teaching statistics.
posted by Blasdelb at 3:37 PM on February 4, 2014 [2 favorites]

Anyone understand/know whether the library at UMB was involved with this at all? A lot of academic libraries are setting up massive data repositories that fill a similar niche...
posted by mostly vowels at 3:49 PM on February 4, 2014

"The role of commercial services in the future is going to be discovery, not "article renting." It's a pity that the commercial publishers don't realize this. The internet has disaggregated the journal, so the concept barely makes sense any more."

You've left out managing the peer review process, which is doable but a giant pain as just a labor of love.

"The Impact Factor got traction half a century ago because it could be calculated; now we have much richer data, so why do faculty and (especially) administrators cling to it as a badge of quality?"

Distributed editorial control. The impact factor works to quantitatively follow how well editors are doing at selecting articles for review, and how good of an editor an author could convince to publish their paper. A lot of what the journal publishing system is, has become simply a formalized network of community, where distinguished, motivated, and knowledgable editors become motivated to police articles for quality rather then just systems of quid pro quo that grow so easily in small and intellectually isolated communities like academia. Of course, like all things it is no substitute for good judgement, but it is still a valuable aid for figuring out how valuable a research community finds a paper.
posted by Blasdelb at 3:52 PM on February 4, 2014

The impact factor works to quantitatively follow how well editors are doing at selecting articles for review [...] like all things it is no substitute for good judgement, but it is still a valuable aid for figuring out how valuable a research community finds a paper

I think this is one of those things where individual fields (and citation practices) vary way too much for a blanket judgement to be useful. The impact factor numbers that I've seen calculated in the fields that I know well are honestly worse than useless, pure noise, totally at variance with the sense of the quality/prestige of the journals you'd get from talking to active researchers. If it's even remotely useful in your field, that's great, but don't take that as an indication that it is so in wider use.

The internet has disaggregated the journal, so the concept barely makes sense any more.

This seems right to me too, but the role of the editor (or the editorial board) still makes a lot of sense on intellectual grounds — although it's becoming less and less institutionally and economically viable. It's worth working to preserve the editorial role somehow, with its incumbent sense of responsibility to a field of knowledge and a community of researchers, even if it ends up moving into a different institutional infrastructure and distribution mechanism.
posted by RogerB at 4:06 PM on February 4, 2014 [2 favorites]

join my private academic tracker must have lossless data and high first author ratio
posted by klangklangston at 5:00 PM on February 4, 2014 [4 favorites]

Distributed editorial control. The impact factor works to quantitatively follow how well editors are doing at selecting articles for review, and how good of an editor an author could convince to publish their paper.

True, to some degree, although I think that it would also hold true at the level of publisher, society, or institution -- the idea of the journal itself, as a packaging unit, is kind of silly. Especially when the physical size constraint on "issues" is gone and articles are pretty much released as they are done, it's mostly fiction to justify pricing.

And I am not sure we need commercial publishers to run the review process. That seems like something that is more properly in the hands of the knowledge producers (societies and maybe institutions, in this case).
posted by GenjiandProust at 2:21 AM on February 5, 2014

"And I am not sure we need commercial publishers to run the review process. That seems like something that is more properly in the hands of the knowledge producers (societies and maybe institutions, in this case)."

The review process for both commercial and non-profit journals really is already in the hands of knowledge producers, both contract the actually intellectually heavy work out to mostly unpaid editors and reviewers. What commercial publishers do, and do very well in ways that neither institutions nor any but the largest and most solidly established societies are capable of matching, is uncouple the success of a journal from both petty drama and the ultimately fickle passion of people who have better things to be doing. Elsevier and the like generally acquire journals by swooping in when an old editor in chief who did way too much for too long dies, or an asshole gets the wrong position and is not worth dealing with for anybody, or simply that no one in the community is willing to step forward anymore. They then make it very easy for everyone involved, provide professional staff from a centralized pool, make payroll, sort out asshole management from an outside perspective, and reap massively excessive profits for the genuinely invaluable service they provide.

We need to kick these fuckers out, or at least find some way to reign them in, but we're never going to be able to do it by ignoring what they do.
posted by Blasdelb at 3:36 AM on February 5, 2014 [1 favorite]

Blasdelb, I don't ignore what they do; I just don't believe that a commercial venue is the only or the best way to organize the work.

You illuminate a critical part of the problem when you say "people who have better things to be doing." Because, really, no. They don't have "better things to be doing." Scholarly communication is part of the job, and that isn't limited to publishing your own work. I deal with an awful lot of faculty (especially in STEM) who believe that research is the only significant part of their job, and it is an absolutely toxic attitude. The excessive authority of university administrations, the corporitization of higher education, and the deep problems in scholarly communication are a few of the ugly outcomes of the belief that researchers have "better things to do" than provide service to their institutions and disciplines (let's not even talk about teaching).
posted by GenjiandProust at 6:16 AM on February 5, 2014

My friend who is a big data scientist thought this was awesome but also noted:

"There are a couple of barriers to adoption that come with using a distributed bittorrent system instead of a central repository. A lot of scientists don't have bittorrent clients, and I also recently learned that it is hard to edit files on bittorrent once they have propagated through the system. I can imagine situations where a scientist uploads a big dataset and finds out the next day that there were a few errors in it, but is unable to easily fix the error once it is already out in the wild."
posted by bluesky43 at 9:49 AM on February 5, 2014 [2 favorites]

A lot of scientists don't have bittorrent clients

Free clients are easy to download and install.

I can imagine situations where a scientist uploads a big dataset and finds out the next day that there were a few errors in it, but is unable to easily fix the error once it is already out in the wild.

1. Before you upload it, let someone unpack and try it for you. Get extra eyes on it. Run it against an app that generates output you can verify. Then upload it as a torrent. Treat your upload as carefully as publication.

2. If it's already out there, either upload the entire dataset again as a new torrent ("My Data Corrected [date]...") and let the old torrent die, or upload part of the data as a smaller torrent ("My Data - Corrections [date]...") if possible. To help with this, don't upload a torrent as a single massive file if you can break it into a number of smaller files that the user can still conveniently combine and use.
posted by pracowity at 12:25 PM on February 5, 2014 [1 favorite]

« Older The Last Stand | The black British actor in America Newer »

This thread has been archived and is closed to new comments

MetaFilter

Distributed Data
February 4, 2014 1:12 PM Subscribe

Tags

Share

Distributed Data February 4, 2014 1:12 PM Subscribe

Tags

Share

Distributed Data
February 4, 2014 1:12 PM Subscribe