Pijul
April 11, 2016 5:57 AM   Subscribe

Pijul is distributed version control system that combines the patch theory based approach of Darcs with the snapshot-based approach of git, mercurial, etc. (NH)

It's written in Rust and structured as a library, avoiding several headaches from git.
posted by jeffburdges (42 comments total) 16 users marked this as a favorite
 
We've few posts on Rust so it's worth noting that the introduction to the Rustonomicon is amusing.
posted by jeffburdges at 5:58 AM on April 11, 2016 [1 favorite]


Oh first click:

Casual users be warned, the stuff you're about to read is not for the faint of heart

A few more posts like this take off and we'll be ready to propose the next subsite: math.metafilter.com
posted by sammyo at 6:42 AM on April 11, 2016 [2 favorites]


why is libgit2 the link for "headaches"? i have had recent headaches with git, but they had nothing to do with a particular library (i couldn't work out how to merge particular paths - seems like checkout can work with paths, but not merge. was i missing something or would the patch based approach have helped?)
posted by andrewcooke at 6:51 AM on April 11, 2016


why is libgit2 the link for "headaches"

Perhaps the headache is that separate library has to be maintained for this kind of easy integration?

In any case, this looks promising, and once it has binary support I'll be trying it out. Although I don't think anything can unseat git as the reigning king of the dvcs world.
posted by dis_integration at 6:56 AM on April 11, 2016 [1 favorite]


Darcs' badmerge example is my living dread. I work on a large codebase with ~10 coders actively committing on a 4 week release cycle. We use git and git flow and this brings a ton of sanity in comparison to what happened with SVN, but there's still a lot of committing to the same files. As an interview question, we actually describe what happened to us when one dev did a git revert on a merge from the develop branch*, and how you detect/avoid that issue--as far as we can tell, you don't, you're just fucked.

So you could say that we depend on merge conflicts to force human intervention and avoid automated fuckups, and I've always wondered how many times a successful merge is subverting intended functionality.

That said, Darcs has a problem where the superset of commits it's spawning grows exponentially, which is obviously unworkable in our case because our git history is already way too large--and our argument for switching to git was that our SVN history needed pruning.

Rust is on my short list of technologies to learn because it's cool and genuinely new, and source control is getting to be a very advanced problem where we're into, like, third or fourth generation issues. Might be interesting to analyze Pijul in terms of high-volume use by a large team on a large codebase.

* In git, revert creates an anti-commit negating the reverted commit; reverting a merge creates a commit that reverts all the commits of one of the branches, back to the last common ancestor of both. When smarthead merged from develop into his feature branch and then reverted it in his branch's favour, he created a commit undoing a month of commits to the develop branch. Weeks later when he merged his branch back to develop, he removed a month of work from two months ago, without error or warning. It just disappeared from the code which went back to its previous working state.
posted by fatbird at 6:57 AM on April 11, 2016 [9 favorites]


Noooo! I've just learned git in the last two years, can't I use the same technology for at least five years before having to switch to something else?
posted by octothorpe at 7:02 AM on April 11, 2016 [3 favorites]


have you tried using react node dockers to patch your multi-tips? sometimes that helps in switching.
posted by indubitable at 7:06 AM on April 11, 2016 [11 favorites]


Yes, I meant libgit2's necessity. There are many "under the hood" applications for DVCS-like libraries that git cannot cover because integrating it into anything user-friendly seems hopeless. It'd rock if Pijul could fill that roll.

In particular, we need to build collaboration tools on top of privacy preserving messaging protocols, which requires merge algorithms that fail extremely rarely. I think Google Docs does similar stuff internally, well they do not use CRDTs everywhere.
posted by jeffburdges at 7:08 AM on April 11, 2016 [1 favorite]


Noooo! I've just learned git in the last two years, can't I use the same technology for at least five years before having to switch to something else?
Lucky for you, it’s AGPL, which means effectively nobody will ever use it.
posted by reluctant early bird at 7:09 AM on April 11, 2016 [5 favorites]


Did you solve the “exponential merge problem” darcs has?

Yes, we solved the exponential merge problem. There are two minor caveats, though:

- Pijul does not (yet) have an equivalent of darcs replace. In other words, Pijul works in polynomial time for all patches that systems other than darcs know of. We’ve not yet thought all the theory of this through, but it might be added in the future.

- Although most patches are inversible in Pijul, patches resolving conflicts are not. They can still be unrecorded, but not rollbacked (the standard inverses delete lines). This is a theoretical obstruction, coming from the simple fact that all repositories are not isomorphic.

posted by jeffburdges at 7:12 AM on April 11, 2016


Nothing above the fold in this post read as a real sentence to me. It was like those word salad bits you get in spam emails.

I'm not complaining, mind you. This post is not aimed at me. I do the same thing within my own field. But this is one of the more impressive displays of tech speak I've seen in a while! :)
posted by maryr at 7:13 AM on April 11, 2016 [2 favorites]


They're gonna need a catchier name first. Git = catchy. PEE-jewel ... Eh.
posted by freecellwizard at 7:17 AM on April 11, 2016 [1 favorite]


> Lucky for you, it’s AGPL, which means effectively nobody will ever use it

Why is the license on a DVCS an impediment to usage? You don't have to distribute it with your software.
posted by Horselover Fat at 7:23 AM on April 11, 2016


If I'm reading the Wiki summary of the AGPL right, though, it does guarantee that nobody can make a GitHub out of this (as in an actual business).
posted by indubitable at 7:40 AM on April 11, 2016 [1 favorite]


You don't have to distribute it with your software.

If I'm reading the Wiki summary of the AGPL right, though, it does guarantee that nobody can make a GitHub out of this (as in an actual business).

Exactly, they'll have to give up the source code to any special sauce they create. Which is sort of the point, put give the web nature of the current tech giants, went down like a lead balloon.
posted by zabuni at 7:44 AM on April 11, 2016


My understanding is that AGPL lets the original maker of the project block anyone else from starting a commercial business. Since they are the ones who licensed the code out to the world they can choose to license it out under other terms as well.
posted by humanfont at 7:55 AM on April 11, 2016


I think Git itself is GPLed, btw. If however you wanted to use this "under the hood" in a collaboration tool, then AGPL, etc. creates an impediment to distribution on iOS. Ain't so likely you'll really want a spreadsheet on iOS thought anyways, probably just the simpler messaging components that do not need this. And it's easy enough to simply ask for an exception for that purpose from small projects like this.*

You can build a GitHub on partially AGPL code of course. You'd release your changes to Pijul itself, but they'd be minor. And you'd want to release them anyways to simplify maintenance. You could keep the site itself proprietary since it's a completely separate program. Really, there is zero change your "secret sauce" is going to be an Apache module that links to Pijul. It's really all user interface, layout, etc. work. And really you'd interact with Pijul via some RESTful microservice thing anyways.

In general, these GPL-like licenses assist considerably in building and maintaining a community, while the BSD, MIT, etc. licenses work best if you start with corporate adoption. In particular, these GPL-ish licenses usually make sense if you're entering an established arena like say DVCSs and nobody will get paid.

* I posted the following to a cryptography mailing list recently :

There are many CLAs being imposed by projects specifically to deal with the iOS situation. I think this is especially common for cryptographic projects who view protecting as many people's privacy as possible as currently more important than defending many other freedoms protected by the GPL. In fact, there are cryptography project using weak licenses like MIT or BSD for this reason.

We probably need a GPLv3 variant adapted to these app stores that says basically :

Anyone and/or the project may act as a licenses transferring party to distribute this program through channels whose policies conflict with the GPL, such as app stores, subject to the licenses transferring party (a) assuming the distributors obligations under the GPLv3, and (b) preventing the distributor from stealing the code.

To enforce this, this licenses transferring party : (1) must not allow the distributor to charge a fee, (2) must not grant the distribution the right to modify the software, (3) must not agree to automatic license modifications by the distributor that might give them the ability to modify the software, (4) must publish the source code used by the distributor, and (5) must provide reproducible build instructions for reproducing the
distributors exact version from source.


I believe this reproducible builds requirement is important to ensure that versions distributed by an app store can be verified as not being back doored. There might be additional requirements to sign the binary or publish the signature, hash, etc. with the license creator.

It'd need a framework for updates similar to the GPL's "or later" clause that allows the license creator handle changes in the app stores' policies. In general, the license creator should simply accommodate the changes, preventing the app store from even knowing what software was restricted by this license. If however changes caused serious problems, like by allowing apps to be back doored, then the license creator could pull any or all covered app off that app store, maybe giving them some leverage to negotiate, or impose additional requirements on licenses transferring parties to help user discover if they were under
surveillance.

posted by jeffburdges at 8:09 AM on April 11, 2016


Git was designed for the developers of the Linux kernel. It seems to have solved a number of issues, technical and social, and has become very popular. With the vast drop in storage costs companies like Github basically give away infinite online storage space for public projects. But as fatbird relates above it's not a perfect fit for every situation and development group.

The scenario where a month's work vanishes after a merge makes folks crazy but it's not the worst case. Say there is a setting, a line of code that has it's value invisibly reverted after say a new regulation that can cost a company money or lives. The problem with Git is it makes merging too easy, don't merge, ever, well unless have to, but even then merge manually with care by someone that knows the code base.
posted by sammyo at 8:18 AM on April 11, 2016


Yeah I don't see how AGPL makes this a non-starter. If github released all its source code, and someone made a github competitor that was basically identical, does that mean people would leave github? Hardly, since like so many web services, they are the market leaders because they caught fire first, and now have enough of a critical mass that everyone expects you to have a github etc. Nobody puts "show me your bitbucket account" on their job advertisements. Similarly, facebook could release all of its code, even its special advertising magic, and it would still stay facebook. Their massive user base is what makes their code valuable, not the other way around.
posted by dis_integration at 8:18 AM on April 11, 2016


git isn't the key technology, it's GitHub. That could just as easily had been BkHub or HgHub or BzrHub or DarcsHub or PijulHub. But instead it's Git, so here we are, rebasing like there's no tomorrow. (The one non-option is SvnHub; RIP collab.net.)
posted by Nelson at 8:22 AM on April 11, 2016 [1 favorite]


Do you guys not do code reviews on pull requests?
posted by Artw at 8:25 AM on April 11, 2016 [7 favorites]


as per octothorpe: Nooooooo! from me too ;-)

Forgive me; just finished a few months' contract involving Node and Angular (yuck x 2; some new and harder ways to make the same old webapp mistakes. This is progress?), so I'm a little bit brittle about more new stuff right now.
posted by Artful Codger at 8:35 AM on April 11, 2016 [1 favorite]


Pull requests don't scale.

A pull request is a terrible unit of code review, and is a classic case where something that's easy for the computer to do--display a set of diffs--is a terrible human factor. Small PRs are fine, because you can read it and grasp the larger import on the codebase, especially if it's a settings change. But when you're up to hundreds of files and thousands of lines of change, it's a wall of diff output that becomes visual noise--and that's without others commenting on lines of code and having conversations on top of the PR.

Bitbucket frequently gives up on trying to display all the files in a PR that we submit, offering a sad little link to display it if you actually get that far down the page. But practically no one does because, faced with a wall of diff output, you settle back to a superficial read, checking variable names and whatnot. What you don't really do is exactly what you most need to do, which is model the change in your head and try to grasp the consequences.

You can say this is really a policy/process issue, and you're right, it is. gits gonna git; darcs gonna darc. If we're submitting PRs too large to properly review, that's not git's fault. What we're finding here is the practical limits of using source control as team management tool. Git got us a lot further than before, but we're facing another wall now, which is managing a higher rate of change in the codebase.

Say there is a setting, a line of code that has it's value invisibly reverted after say a new regulation that can cost a company money or lives.

This is an easier case: your integration tests should catch this. If a change could cost millions, you should absolutely be testing "has this value changed? Has the output of this function depending on this value changed?" Our problem with git revert is that the tests got reverted too.
posted by fatbird at 8:43 AM on April 11, 2016 [4 favorites]


integration tests should catch this.

Lol, was just thinking about this case, I guess be sure to keep code and tests in separate repositories. :-)
posted by sammyo at 8:52 AM on April 11, 2016 [1 favorite]


sammyo: "Lol, was just thinking about this case, I guess be sure to keep code and tests in separate repositories. :-)"

"Ughhhh, why are you making things so complex?"

"Because we're writing software for a heavily regulated and scrutinized sector. If you can't get behind this, we'll be happy to write you a letter of recommendation."

*shakes fist*
posted by boo_radley at 8:56 AM on April 11, 2016


One semi-plausible solution to the revert issue is to maintain a text file in the repo that includes a token of some sort for each commit/feature/issue/whatever (such as the JIRA ticket number for the work item that led to the commit) added by a pre-commit hook, and have another pre-commit hook (or better, a hook on the central repo out of the nasty dev's control) to check if that file shrinks between diffs, which it should almost never do, and start launching flares if it does.
posted by fatbird at 9:01 AM on April 11, 2016


Do you guys not do code reviews?

ha ha ha. ha. sniff. sob.
posted by andrewcooke at 9:04 AM on April 11, 2016 [6 favorites]


So, Rust seems to facilitate safety beyond just type safety. The "written" link in the FPP seems to be built around that. Is this the appeal of Rust, like Go and concurrency? I've only skimmed stuff about Rust so I haven't been able to figure out what its main thing is.
posted by ignignokt at 9:04 AM on April 11, 2016


Noooo! I've just learned git in the last two years, can't I use the same technology for at least five years before having to switch to something else?

The Pijul website was started in September of 2015, and has seen eleven short blog posts since then, though the docs have had some love. Pijul itself on version 0.2, and as yet has no mailing list or IRC channel for discussion. It may well become the new hotness, but I’m pretty sure that git isn’t going to be subsumed for a while.
posted by Going To Maine at 9:05 AM on April 11, 2016


Rust seems to facilitate safety beyond just type safety.... Is this the appeal of Rust, like Go and concurrency?

Rust's borrow-checker seems to be something really, fundamentally new in programming languages and offers a third option after manual memory management and garbage collection. Don't quote me, but it looks like a pretty basic advance in language design, offering all kinds of possibilities for static analysis to detect and prevent errors, or to require you to code in much more memory safe ways.

It's got other features, but this is one of the biggest ones. If it works out as well as it promises, it offers C-level performance and determinism, without the endless memory issues of C-languages.
posted by fatbird at 9:22 AM on April 11, 2016 [3 favorites]


Rust does memory management in the type system, ignignokt. It constrains one slightly but prevents some parallelism bugs. And you can temporarily drop those protections to work with raw pointers when necessary, like when doing an FFI, implementing some collection data types, etc.

I liked the talk Guaranteeing memory safety in Rust by Niko Matsakis. And relevant sections in the Rust book and Rust by Example.
posted by jeffburdges at 9:31 AM on April 11, 2016 [3 favorites]


When smarthead merged from develop into

This reads like horror porn for devs.
posted by Slothrup at 9:46 AM on April 11, 2016 [3 favorites]


git isn't the key technology, it's GitHub.

I'd have to disagree with that somewhat -- when GitHub came onto the scene, git was arguably the fastest and most mature DVCS in widespread use. GitHub's social features would have been very difficult to scale if they used something like SVN as their backing VCS.

OTOH, I'm starting to resent GitHub's ubiquity. Many applications/tools now specifically work with GitHub, rather than git, imposing a completely artificial limitation that circumvents one of the core reasons for having a DVCS in the first place.
posted by schmod at 9:56 AM on April 11, 2016 [1 favorite]


You know, when I woke up this morning my first thought was "I hope someone starts evangelizing a new, completely different version control system which will inevitably become attractive to FOMO-burdened IT departments everywhere and which I will then be subsequently forced to learn, causing me to lose countless hours of productivity in the meantime."
posted by grumpybear69 at 9:59 AM on April 11, 2016 [7 favorites]


In general, these GPL-like licenses assist considerably in building and maintaining a community, while the BSD, MIT, etc. licenses work best if you start with corporate adoption. In particular, these GPL-ish licenses usually make sense if you're entering an established arena like say DVCSs and nobody will get paid.

I agree that the merit of GPL-like versus BSD-like licenses is definitely dependent on the existing commercial/non-commercial players in the market for particular code, though I think you can build a community around BSD-like code, a good example of which is the scientific Python community.

I also think that people often overestimate the protections provided by the GPL in preventing others from profiting off their code without contributing changes back. In particular, it's possible to create a proprietary fork of GPL code, host it on a server "in the cloud" and allow calls into it through an API. Using this scheme you technically do not distribute any code. There are GPL-derived licenses that seek to close this loophole, but by its nature it's difficult to enforce and the definition of a "derivative work" and "distribution" are notoriously tricky.
posted by anifinder at 10:21 AM on April 11, 2016 [1 favorite]


definition of a "derivative work" and "distribution" are notoriously tricky.

Oh that link, it's like there needs to be a full on "licence compiler" with a licence-lint to validate that the licence is both valid and does what the owner actually wishes. Perhaps a meta-graphical-licence-design-language?

tl;dr: not reading, just wanna code, let'em have the damned bits...
posted by sammyo at 12:01 PM on April 11, 2016


There are plenty of companies that are terrified of the AGPL and won't let it anywhere near their devs. It's a significant barrier to adoption even if the license doesn't attach to people usIng the vcs for all normal purposes.
posted by zachlipton at 5:05 PM on April 11, 2016


fatbird: A user that's done exactly `git merge master; git revert HEAD -m1` is detectable by comparing tree hashes, and thus rejectable by the server via a git hook. Still no script is ever going to be able to detect every possible malicious or incompetent (force) push, so there should be a human element like code reviews.
posted by fragmede at 3:29 AM on April 12, 2016


Yeah. At work, we have our central git repo set up to reject any force push to our main branch.

Simply put, you can rewrite history in your own branch as much as you want, but any changes to the main branch really need to be reflected by an actual commit.

We'll definitely be looking at Pijul once it's more mature. Git opened up a lot of new workflows for us, and it really looks like Pijul covers 95% of the "useful git workflows," while providing vastly superior history tracking.
posted by schmod at 5:03 AM on April 12, 2016 [2 favorites]


no script is ever going to be able to detect every possible malicious or incompetent (force) pushM

Yeah, we're talking about the guy who said "it wouldn't let me push the dev branch after I merged, so I had to use -f"

there should be a human element like code reviews.

Someone keeps a close eye on him specifically, nowadays.
posted by fatbird at 6:38 AM on April 12, 2016


We should probably have an FPP about Rust, its borrow checker, the their very public rfc process, and any interesting packages. In particular, Rust has really interesting asynchronous IO packages, like rotor, eventual_io, gj (Cap'n Proto), and mioco all based on mio.
posted by jeffburdges at 7:21 AM on April 13, 2016 [1 favorite]




« Older “We need help in Attawapiskat,”   |   Thanks, Obama Newer »


This thread has been archived and is closed to new comments