Excel + Python
December 17, 2007 9:38 AM Subscribe

Resolver One looks and feels like an Excel clone, except that it stores all the data and formulas as a Python program. You can add more code, or export the whole thing. It's in public beta now, and the commercial release will be free for open source and personal projects.
posted by signal (38 comments total) 8 users marked this as a favorite

This seems cool, but I'm having a bit of trouble wrapping my head around why it's actually useful, unless you're a Python programmer doing analytics. Is that it?
posted by mkultra at 10:09 AM on December 17, 2007

That was going to be my exact comment. I love the idea even though I can't think how I'd use it. Perhaps a good way to clearly document what all the cells in your spreadsheet are doing instead of having hidden forumlae all over the place.
posted by DU at 10:12 AM on December 17, 2007

I use some python for analytics, and this certainly looks cool. I'm curious how it stands up to - or plays with - say, SciPy/PyLab/etc. It might be a great way to play with data, but I wonder if it will provide the depth of analysis. On the other hand, if you can just poke around with your data then pull the code into a full application elsewhere, that might be very powerful.

Anyhow - looks really interesting at the least, as a fairly unique model for working with data.
posted by freebird at 10:17 AM on December 17, 2007

For one thing, their logo looks a lot like VMware's.

IMO, the spreadsheet UI metaphor is way out of date and needs to be scrapped entirely. Using Excel is fine for a while, but once things get complicated it kind of sucks that everything needs to be arbitrarily fixed to a weird 'grid'. Why is that?

I was thinking that if you edit your code, it could break compatibility with the sheet itself. But then I realized it would actually be easy to avoid that problem by having the generated code make API calls, so you can write any program you want and it would still work with the spreadsheet interface. Is that how it's setup?

But yeah, the spreadsheet needs to die. We must smash the grid that imprisons our numbers, and let them roam free!!!.
posted by delmoi at 10:29 AM on December 17, 2007 [3 favorites]

It sounds like a good idea in theory, but:

A) it doesn't appear to be open source;
B) it's probably payware, ultimately;
C) it uses .NET, so it's Windows-only.

I just sort of assumed it would be self-hosting Python, and I sort of imagined you'd modify your spreadsheet and then send the whole thing as a standalone app to the next person or people to use it.

I don't mind the payware part, but the closed source and .NET requirements are large turnoffs.

This could be very interesting in the sense that Python is incredibly flexible and powerful, so your data sources for your cells could be live data on the Internet, or your customers, or your competitors. It could be extracting information from almost anything that could generate something text-filish in webspace somewhere.

Any given cell should, in theory, be able to have an algorithm of arbitrary complexity behind it. This isn't THAT new: VBA gives you that kind of power in Excel. But Python's a lot cleaner and gives you nicer data models to work with. It's just ... elegant and easy to work with, once you get the basics.

Hmph. If only it were multiplatform...and open source. I wonder why they went for Python generation in a .NET app, rather than the C# it's probably written in?

on preview: delmoi, that's been done. See: databases. :) Grids persist as data models because they're often very useful. Having the visual correlation can give you clues to a dataset's structure. X over Y presentation is fundamental: money over time, speed over temperature, weight over mileage. That sort of thing.
posted by Malor at 10:34 AM on December 17, 2007 [2 favorites]

Malor: yeah, but there is no reason you need to use a grid to do that visual layout. You could have things as lists or cells that could be dragged around freely, like objects in Illustrator or photoshop.

The problem is that the grid causes correlations between things that are not actually correlated. Like say in columns B-D you have one list, and in F-G you have some computed values. Now if you want to add a row in list B-D in row 13, you have to insert a new row which will screw up your list in F-G.

And yes, you could use a relational database, but so far no one has bothered to create an interface that's as easy to use as a spreadsheet on top of one for doing spread-sheet like tasks (well, It's possible some people have, but their products were never too popular).

We can keep the useful things about spreadsheets while jettisoning the pointless grid.
posted by delmoi at 10:56 AM on December 17, 2007

Hmm - good screencast, and there are some cool ideas in there, but yah: I'm unlikely to make use of it. The python that's generated is certainly not going to run in anything else, I don't use .NET, and the "web application" that's generated requires their special service. So I'm sure there are shops out there this will be perfect for, but ours isn't one of them.

Also - it looks like it may be built with Excel components. If it shares Excel's dataset size limits, that will likely rule out a lot of possible analytic uses.

Aside - I agree that the grid model doesn't apply to plenty of interesting applications, and I don't really ever work in one other than generated summaries, but don't throw the baby out with the bathwater. There are lots of places where the grid model works really well, and the structure it implicitly enforces makes things much easier and clearer.
posted by freebird at 11:06 AM on December 17, 2007

It sounds like you're objecting not so much to the idea of a grid itself, which is obviously useful, but the idea of a single large grid where everything is connected, even things that aren't related.

You can get around that to some degree by using tabs, but you're right -- we're overdue for a model change there. The basic idea of a flat piece of paper is good, but some kind of linked data representation that treats grids as a special case of something deeper would be interesting.

Access does that to some degree. I haven't really used it since 2.0, back on Windows 3.11. Even back then, though it was unreliable and missing many important database features, I thought the interface did a good job of letting you represent databases visually. I imagine it must have gotten better in the last 12 years.

This application might be a step in the right direction. Individual grids for individual datasets strikes me as an idea that will never go away. But the Python in this app might let you truly disconnect disparate data in a way that makes visual sense, without the problems of the single large page.
posted by Malor at 11:08 AM on December 17, 2007

It sounds like you're objecting not so much to the idea of a grid itself, which is obviously useful, but the idea of a single large grid where everything is connected, even things that aren't related.

Did someone say Numbers?
posted by designbot at 11:21 AM on December 17, 2007 [1 favorite]

Hmm, I love the idea but it just cries out to be done as an open-source project. I mean, what's the point of writing Python if you can't simply run it anywhere?

Python is so cool. I'm writing a framework for a robot version of 's In C where instead of human players you submit playing strategies as little Python programs -- and I literally would never think of doing this in any language other than Python.

(OT plug: come to my show tonight.)
posted by lupus_yonderboy at 11:57 AM on December 17, 2007

So, does this thing use binary or decimal floating point?
posted by ryanrs at 12:06 PM on December 17, 2007

tlhIngan wa' looks and feels like an Excel clone, except that it stores all the data and formulas in the Klingon language. You can add more data, or export the whole thing, and either way, it will be completely unintelligible. But at least it's not Python.
posted by rusty at 12:19 PM on December 17, 2007 [3 favorites]

The problem is that the grid causes correlations between things that are not actually correlated.

Not really clear what you're on about. It seems like yet another off-topic Deep Observation that doesn't hold up on inspection: the whole point of spreadsheets is to turn reams of data into meaningful observations, observations like correlations. If your complaint is having unrelated data in nearby columns causes people to see correlation where it isn't, that seems difficult to blame on the format. Moreover, I don't see how free-floating palettes of data, whose only analog in physical space would be to messy desks covered in Post It Notes (the sort of disorganization spreadsheets try to combat), would prevent people from seeing non-existent correlations any better.
posted by yerfatma at 12:20 PM on December 17, 2007 [1 favorite]

I imagine it must have gotten better in the last 12 years.

HAH!

It's exactly the same as it was when I learned it in a compulsory 100 level CS class in '99. Maybe it improved somewhat between 12 and 7 years ago, but just using it you can really feel how old and crufty it is compared with a lot of modern software.
posted by delmoi at 12:40 PM on December 17, 2007

yerfatma: whose only analog in physical space would be to messy desks covered in Post It Notes (the sort of disorganization spreadsheets try to combat), would prevent people from seeing non-existent correlations any better.

First of all, Lotus 1-2-3 wasn't designed to 'combat' anything, because before that if you wanted to crunch numbers on a computer, you had to write a BASIC program or something. It's a model that was designed to run on computers with a few K of ram and super-slow processor. The fact that we still use it today is as much a historical accident as anything.

As far as a 'messy desk covered in post-it notes' well, it would be more like having a choice of having a messy desk covered in post-it notes, or you could chose to have a well ordered desk with notes and lists arranged in a way that makes sense for a particular application.

And finally, why do we need a physical analogue at all?

the whole point of spreadsheets is to turn reams of data into meaningful observations, observations like correlations.

Some people use spreadsheets with a small amount of information and lots of calculations in order to 'sketch' some mathematics. Other people use them for data entry (i.e. office workers who fill out pre-made spreadsheets)

The point is, having a grid layout makes spreadsheets difficult to modify because everything is connected. The point is, things that are not related should in no way affect each other. There are various hacks to get around this, but they shouldn't be needed in the first place.
posted by delmoi at 1:01 PM on December 17, 2007

I am one of the founders of Resolver so thought I should answer some of the questions/comments raised so far!

Resolver One is useful to any programmer (including high-level spreadsheet users) who is doing analytics.

It doesn't work with SciPy etc yet, but we do have an opensource project to fix this.

It is payware for commercial use - but free for all opensource uses.

The libraries needed for runtime will be re-distributable when we are out of beta.

It is written in IronPython. The Python code generated can be run anywhere using IronPython and .Net - probably even just pure Python later on.

There are no Excel components used at all - it is completely independent and does not have the same limitations.

It uses Binary floating point.....!

Hopefully this provides some help - do try it out!!
posted by pbk at 1:06 PM on December 17, 2007 [8 favorites]

There are various hacks to get around this, but they shouldn't be needed in the first place.

This seems to be the thrust of a lot of your technical arguments. "There's a better way to do this." But we're always left wanting some details. You've got a lot about the shitty usability of spreadsheets but not a lot about how to fix it except that it should be more like drawing programs. The problem is that spreadsheets, as you suggest, have become the business person's catch-all for anything that doesn't have a product aimed at it, so to say how spreadsheets should be different would be to make them better for a certain subset of the user base while most likely making things more difficult for everyone else.

And finally, why do we need a physical analogue at all?

We don't, and that's poorly-phrased on my part as I often find myself arguing against the decision to recreate a physical interface. What I mean is, if you're going to make something wildly different from what people are used to, it will need to be demonstrably better, better from across the street kind of better. Call me reductive, but I have a hard time imaging Lotus/ Excel/ Numbers/ Google Spreadsheets all look very similar because of lazy UI design on the part of billion dollar companies.
posted by yerfatma at 1:23 PM on December 17, 2007 [2 favorites]

Cool, thanks pbk. Your comment ensures that I will return to check on this, as it sounds like a lot of what keeps me from really getting into it is getting fixed. Neat!
posted by freebird at 1:40 PM on December 17, 2007

Did someone say Numbers?

We don't speak Apple here. Das ist strengst verboten!
posted by Blazecock Pileon at 1:45 PM on December 17, 2007

It seems like an interesting concept, but I've got a few questions about the concept.

pbk, why did you decide to use .NET at all? I don't understand why you would willingly cripple yourself in that way. Many people who do academic numerical analysis (who will be the primary audience for this, I think) use Unix of one form or another.

Furthermore, due to the peculiar Python-based file format, it seems that it will be mostly incompatible with other file formats. Do you intend to implement .ods, .gnm, or .xls compatibility? If not, resolver's utility might be limited in business applications.

Are you aware that the dual licensing will cause some severe ideological/legal problems when dealing with open source projects? Most people who work on open source are not willing to use merely "free for non-commercial use"-licensed components.
posted by sonic meat machine at 1:45 PM on December 17, 2007 [1 favorite]

The key here is the portability. Really, there's no reason why the tool the designer of the spreadsheet uses (e.g., Excel) must be the same as the tool employed by a user of that spreadsheet. We don't ask our users to install an IDE, do we?

My cousin built a spreadsheet that ranked new cars by weighted criteria (fuel economy, horsepower, style, etc.) to help select a car to purchase. I'd have loved to have used it, but couldn't be bothered to run Excel. If he had been able to send me a nice little portable script encapsulating just the formulas and logic, that would have been great.

Think how many useful little mortgage calculators and the like are trapped in proprietary spreadsheet file formats. Makes me shudder.

Please feel free to skip this little anti-Microsoft rant: This is yet another example of how monopoly vendors are organizationally, structurally incapable of innovation. The portability which is the key element of this product is antithetical to the way Microsoft must maintain their monopoly.
posted by sdodd at 1:51 PM on December 17, 2007

PBK: I think I am in your target market: I use analytics software all the time (a cross between OLAP and DSS systems), so Resolver looks very familiar --the addition of Python is just gravy. I got to ask though: have you benchmarked for scalability? what's your upper limit in say cells refreshed per transaction? I assume all the number-crunching is happening on the client too, right?

Now, if you moved your number-crunching on the server (have you seen Star-P? very interesting as well, and maybe your model-creation and their transparent parallelism could play together) and had a more powerful UI paradigm (Improv/Numbers/Quantrix-like), then you'd have me drooling :-)
posted by costas at 2:21 PM on December 17, 2007

This seems to be the thrust of a lot of your technical arguments. "There's a better way to do this." But we're always left wanting some details. You've got a lot about the shitty usability of spreadsheets but not a lot about how to fix it except that it should be more like drawing programs.

Well, it would take much longer to spec out exactly what I had in mind. Part of the problem is that it's kind of hard to describe visual things in a textual way very efficiently. I could draw diagrams, but that would take a while, or I could do animations or even work up a prototype of what my idea in DHTML or Java. But, that would take much longer then writing a comment complaining about the current state of spreadsheets. I actually have another project that's taking up a lot of my time.

But basically my idea would be a lot like a relational database. But rather then a 'real' relational database, you would just right click on an empty area and create a grid or list. You could create as many of these as you wanted. If you wanted to put a formula into a cell, you would the same way you would in excel, but formulas would have to index cells in other boxes by box name. So rather then something like F9+F3. You could do something like F(6)+F(3). But if work more by clicking you'd just click the cells like you do in excel.

The main visual task would be to arrange the grids, and you would want to include a powerful layout manager that made it easy for people to lay things out nicely.

In fact, you could even create an alternate interface that works exactly the way a normal spreadsheet does, and allow a user to 'break' their sheet into sections when and if they run into 'space' issues. The idea is to make the transition from Excel to this thing as easy as possible for people, while opening up new possibilities only when they need them.

Anyway, I would bet products like this already exit. Someone mentioned Numbers, but that's Mac only. Costas mentioned Improv and Quantrix. It's not some crazy idea, it's been done before, but it's just never caught on.

We're basically talking about a genera of app that hasn't changed much in the past 28 years. It could be done in a much better way.
posted by delmoi at 3:06 PM on December 17, 2007

pbk, why did you decide to use .NET at all? I don't understand why you would willingly cripple yourself in that way. Many people who do academic numerical analysis (who will be the primary audience for this, I think) use Unix of one form or another.

I was wondering about this as well, but I'm guessing he did it to develop in Python without requiring users on Windows to install anything special. Many people who work at places that will pay for software to do numerical analysis run Windows.
posted by yerfatma at 3:32 PM on December 17, 2007

We're basically talking about a genera of app that hasn't changed much in the past 28 years. It could be done in a much better way.

Pfft. We're talking about a kind of app that's worth billions of dollars. I have a hard time believing the lack of interface updates is due to neglect.
posted by yerfatma at 3:33 PM on December 17, 2007 [1 favorite]

Pfft. We're talking about a kind of app that's worth billions of dollars. I have a hard time believing the lack of interface updates is due to neglect.

Well, the Iraq war cost trillions of dollars. Does that mean it was a good idea?

Excel is "good enough" for most tasks, but that doesn't mean it can't be better. I mean, who needs more then 64k rows except in rare situations? But I the idea that there some good reason, other then backwards computability to limit people to 64k rows is absurd. It's the most basic thing but it can't be changed because development costs go up exponentially when back words compatibility is needed.

I mean come on. I can buy a PC with 8 gigabytes of memory. I can build a PC with 128GB of RAM. Are you seriously suggesting that I'd be limited to 64k rows due to anything other then neglect or lazyness?
posted by delmoi at 4:21 PM on December 17, 2007

Who's having problems with the 64,000 row limit? Again, the point of a spreadsheet is to turn reams of data into meaningful information, not collect endless reams of data. If you want to do that, it's time to put the data into a database. All Excel is is a simplified version of Access. I'm guessing the 64,000 row limit is a result of that.
posted by yerfatma at 4:27 PM on December 17, 2007

Also, your Iraq reference is embarrasing. There is no market with people purchasing wars. But you're right, I'm in the bag for M$ and spreadsheets could be better tomorrow if users weren't so stupid.
posted by yerfatma at 4:29 PM on December 17, 2007

Why are you talking about 64k row restrictions, yerfatma? Excel supports 1,048,576 rows. Columns are up to 16,384. Etc. You're several years out of date.

With Excel backed by a database (using data connections) you can have an effectively unlimited data set. I use Excel for analyzing huge log sets (hundreds of gigabytes) with no problems.

Not to say other tools can't be useful, but a lot of the Excel talk in this thread is pretty outdated. So of course it sounds like nothing has changed, if you don't know what has changed....
posted by wildcrdj at 4:39 PM on December 17, 2007 [1 favorite]

pbk, why did you decide to use .NET at all? I don't understand why you would willingly cripple yourself in that way. Many people who do academic numerical analysis (who will be the primary audience for this, I think) use Unix of one form or another.

mono's workin' pretty good these days, or so I hear.
posted by flaterik at 9:28 PM on December 17, 2007

All Excel is is a simplified version of Access.

Except that it isn't. Access seem so much like a spreadsheet, I've always wondered why they didn't just put spreadsheet capability into it.
A spreadsheet with relational database capability would solve a lot of the flat 2-D grid problems that people have been talking about. But unlike Access, you would be able to do formulas like Excel.
posted by eye of newt at 11:35 PM on December 17, 2007 [1 favorite]

Why are you talking about 64k row restrictions, yerfatma? Excel supports 1,048,576 rows. Columns are up to 16,384. Etc. You're several years out of date.

It was in reference to this comment. I have no idea how many rows Excel can handle.

All Excel is is a simplified version of Access.
Except that it isn't. Access seem so much like a spreadsheet

Except that it is, code-wise. My understanding is that Excel and Access share a good deal of source code such that an Excel spreadsheet is essentially a one table Access db.
posted by yerfatma at 3:49 AM on December 18, 2007

This is an interesting project. I might have to play with it when I get a chance. Would it run on IronPython/Mono on Linux? It seems like that would be easy to support.

But basically my idea would be a lot like a relational database. But rather then a 'real' relational database, you would just right click on an empty area and create a grid or list. You could create as many of these as you wanted. If you wanted to put a formula into a cell, you would the same way you would in excel, but formulas would have to index cells in other boxes by box name. So rather then something like F9+F3. You could do something like F(6)+F(3). But if work more by clicking you'd just click the cells like you do in excel.

I kind of like that idea. I wish I had time to code a proof of concept. There are some issues to deal with, layout being one of the major ones. wxpython has great grid-type objects, but it's not as easy (though still no doubt doable) to position arbitrary objects on a canvas.
posted by musicinmybrain at 7:01 AM on December 18, 2007

A few more comments from Resolver.

sonic meat machine: Our initial target market for this was - and is -
trading desks at financial companies, where Microsoft rules the roost -
hence .NET. The server can run under Mono, as can the generated code,
and as I said, we hope to have a pure-Python version of the libraries needed to run
our generated code soon. Our GUI may take a bit longer - Mono support
for graphical stuff isn't quite there yet - but we'll be working on it.
We do support importing from .xls files and can also export back to them,
though obviously the Python code is lost when you do that....
Re: the licensing - we'd love to work with an open source project so that we can
get this right!

costas: scalability is our Achilles heel; because we have a Python
program that is executed when you refresh the grid, it can take a
while. We've pushed the recalc into a background thread, so the GUI
isn't blocked, but we think that our best bet for performance is to get
NumPy, SciPy, and the other C extensions working on IronPython so that
we can support them in our product too. Moving recalc over to a server
is a great long-term plan; lots of investment banks have floors of Linux
blades just waiting for spreadsheets to be thrown at them for
calculation :-)
We've not seen Star-P, thanks for the pointer.
posted by pbk at 10:39 AM on December 18, 2007

pbk: I am in retail analytics: relative to finance, the problems are typically simpler (fewer formulas, each less complicated), but the scale is much, much larger (orders of magnitude from what I can gather). The approach that seems to win the game over on this side of the fence is: thick servers that can do the calculation updates and thin clients that show the updated data (and only what the user is looking at from the updates at that).

There is much to be gained from going server-side: you'll find for example that lots of users will be running similar aggregations or calculation paths, so you can cache a lot of intermediate results and share them across sessions. And of course you can parallelize and timeshare on really fast machines rather than rely on thick clients. The downside is that usually anyway, the user is tied to a server-side configuration (too dangerous to let them play on the server-side), so there's less flexibility on what-if analysis --I can see that barrier going away soon though, with proper sandboxing.

I believe that long-term (10 years?) our worlds will merge: I am writing this on a laptop that has more horsepower than a customer's server did only 6-7 years ago. The boundaries of the problems of interest expand, but I think Moore is catching up and that's (probably) good...

Good luck with Resolver!
posted by costas at 1:42 PM on December 18, 2007

Well, according to microsoft the size limit of an Excel worksheet is:

Worksheet size 65,536 rows by 256 columns

Apparently that's just been fixed this year in 2007, which I've never used. A million rows probably is enough for most people, but the fact that the row limit stood until now is really a testament to how little attention was paid to the core features.

And, rows greater then 64k will be 'ignored' in previous versions, so it's not backwards compatible with even other 21st century versions.

The point is, the idea that Excel is

Except that it is, code-wise. My understanding is that Excel and Access share a good deal of source code such that an Excel spreadsheet is essentially a one table Access db.

I would be shocked if that were true. I'm sure office products share a lot of code, but the spreadsheet is very different then a database. Do you have any evidence for this?

I kind of like that idea. I wish I had time to code a proof of concept. There are some issues to deal with, layout being one of the major ones.

Check out apple's web demo of Numbers. These ideas are pretty obvious. Maybe the disconnect comes from ivory tower CS theorists who use a spreadsheet once or twice a year to do a budget or calorie counter and people who use spreadsheets every day, for whom the grid works perfectly.
posted by delmoi at 6:38 PM on December 18, 2007

I believe that long-term (10 years?) our worlds will merge: I am writing this on a laptop that has more horsepower than a customer's server did only 6-7 years ago. The boundaries of the problems of interest expand, but I think Moore is catching up and that's (probably) good...

How about clients that connect to utility computing services like Amazon's S3, and just point to data stored on storage services, all from their desktops.
posted by delmoi at 6:40 PM on December 18, 2007

Do you have any evidence for this?

Just 3rd-hand word of mouth.
posted by yerfatma at 5:33 AM on December 19, 2007

« Older Latte it Forward? | Magic Highway U.S.A. Newer »

This thread has been archived and is closed to new comments

MetaFilter

Excel + Python
December 17, 2007 9:38 AM Subscribe

Tags

Share

Excel + Python December 17, 2007 9:38 AM Subscribe

Tags

Share

Excel + Python
December 17, 2007 9:38 AM Subscribe