image provider

Dynamic Version Control


Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

Current version control systems stink big time. PVCS (Polytron Version Control System) drives you mad because the modules you want are always locked by someone else. CVS (Concurrent Versions System) drives you mad because other people’s changes constantly stop you from getting a clean compile. You have the problem of manual or automated merges of changes done based on superficial similarities in bits of text. The process is hideously error prone.

The problem with our current text-as-typography style of designing programming languages is that source code control systems treat the source code as lines of ASCII (American Standard Code for Information Interchange) text. These languages are not intended quite so line oriented and the diffing techniques used by the source code control tools cannot be made perfect. Having an actual database model for the source code can make the source code tools much faster and more useful.

Here is my vision.

We will have programming teams scattered over the planet. They will simultaneously be working on the same code with a 24-hour Internet high speed ADSL (Asymmetric Digital Subscriber Line technology) or cable modem connection. As you make each change to the source code, that change is broadcast and reflected to anyone who is simultaneously looking at that part of the program.

Changes are made atomically by syntactic element (token), not character by character. They are transactions to your database. The database always reflects a syntactically correct program, including correctly spelled variable names. It might not work, but it is syntactically correct at all times. See SCID for some detail on how these transactions are merged.

I now have to wave my hands a bit. Each programmer works logically on a transparent sheet of film, overlaid over the program called a job sheet (or job for short). He puts his changes on that film. Other programmers work on their own layers of film. Each layer of film represents a job — a unit of work to make some complete change. A programmer may have several layers on the go at once and several programmers may work on the same layer.

You can choose to hide or apply other programmer’s layers to your work. You reveal them to see how upcoming changes will interfere with your work. You can hide them if the new layers don’t quite work yet and you need to test your code.

There is a process where a layer is proven to work and gets incorporated into the base layer. It can be backed out, but basically once in is incorporated, you have to deal with any changes it has created.

Code you write changes the base layer. It cannot build on somebody else’s job in-progress layer. If it did it would be impossible to back out that job layer. Once you do deliberately build on somebody else’s job layer, you forgo the possibility of backing out that layer. Your code becomes dependent on it. You would have to back out both your changes and the changes your changes depend on. Highlighting helps you see areas where your changes are conflicting with somebody else’s both before and after that person’s changes are committed. Since both of you can see the conflict arising, there is a good chance the two of you will have a conversation about how to create easily mergeable code. You might just stay away from that part of the code until the dust settles and you can incorporate that finished job layer into your working base.

Because the code is in a database and every syntactic element in the program has a unique ID, there is less guesswork about what will happen when changes are merged. Further, both of you see the merged changes on your screen as you work. You can flip back and forth removing and adding other layers from your view without changing any code, just as you can remove or add layers for your compiles.

Compilation takes little time since the database is already pre-parsed. Even so, you may have some background speculative compilation process on the server, working on whatever piece seems most stable, so that when you are ready to run to debug, most of the compilation will already be complete. When the server has nothing better to do, it can do various optimisations and native code generation, just so it will be ready when you ask for them.

How could such a thing be implemented? For some general thoughts see SCID. I imagine it might work like this: You have a central repository of code that lives pretty much entirely in RAM (Random Access Memory). It may have some ability to store some of the lesser used information on disk and to lazily back it up to disk. However, persistent store classes let the designer of the SCID pretty well pretend the entire database is RAM resident.

You have a parse tree. You submit transactions to a common change server queue to request changes to the parse tree. The parse tree is never in an inconsistent state. It is impossible for it to represent an invalid program. Transactions are checked before they are applied atomically. So you don’t change the parse tree keystroke by keystroke, but node by node. You can reconstruct the parse tree by replaying various streams of transactions. Every transaction is timestamped and stamped with author, job layer and reason for the change. A global rename, for example, could be done with a single transaction. It would be trivial since there would actually be only one copy of the name stored in the database. All other references would be numeric or by address.

When you have several remote sites, you can replicate the entire database at each site. You might have a centralised scheme or a peer-to-peer scheme. Centralised is better if the communication lines are very high speed and the server is very powerful compared with the workstations. In a centralised scheme, for speed, the server broadcasts copies of both the source code database and the compiled results to everyone so they have local copies for high speed access. Basically transactions are broadcast and all copies of the database update themselves.

If a database is corrupted, it is automatically regenerated from one the remaining copies.

Transactions are serialised by timestamp, shared and processed in standard order between all sites to keep all sites up to date. There has to be some mechanism for provisionally changing the local copy of the parse tree while waiting for the earlier-dated transactions to come from the other sites.

The closest thing to dynamic version control that exists today is ENVY, the version control system used in IBM (International Business Machines) VisualAge for Smalltalk.

I have noticed programmers are often extremely hostile when I bring up the idea of SCIDs or dynamic version control. Perhaps the ideas will have to be honed and proven in a different environment where users are not so attached to primitive text-editing tools. One possibility is a collaborative website generation. Most people don’t particularly like HTML (Hypertext Markup Language) text and would be quite happy to work with visual tools. The push for fully internationalised websites with teams scattered over the globe could grease the way for both SCIDs (Source Code In Databases) and dynamic version control.

Fortress
SCID

This page is posted
on the web at:

http://mindprod.com/project/dynamicversioncontrol.html

Optional Replicator mirror
of mindprod.com
on local hard disk J:

J:\mindprod\project\dynamicversioncontrol.html
Canadian Mind Products
Please the feedback from other visitors, or your own feedback about the site.
Contact Roedy. Please feel free to link to this page without explicit permission.

IP:[65.110.21.43]
Your face IP:[44.220.251.236]
You are visitor number