What is version control, and why should an Excel developers care about it?
Version control is a system that records changes to a file or set of files over time so that specific versions can be recalled later. A version control system (VCS) allows a developer to revert files or an entire project back to a previous state, compare changes over time, see who last modified something that might be causing a problem, who introduced an issue and when, and more.
Elementary version-control methods of choice for developers are:
Our S1-Sync Excel Addin automates this simple principle: A full workbook version history is retained in a backup folder by time stamping the file name automatically every time a Save/SaveAs operation is triggered in Excel.
However, developers need to collaborate with other people on remote computers. To deal with this problem, Centralized Version Control Systems (CVCS) were developed. These systems, such as CVS, Subversion, and Perforce, have a single server that contains all the version-ed files.
This setup offers many advantages over local VCS for teams. For example, everyone has visibility to a certain degree to what everyone else on the project is doing. Administrators have precise control over who can do what and it’s far easier to administer a CVCS than it is to deal with local databases on every client. However, the most serious disadvantage is the single point of failure that the centralized server represents.
This is where Distributed Version Control Systems (DVCS) step in. In a DVCS clients don’t just check out the latest snapshot of the files: they fully mirror the repository.
The major difference between GitHub and most other Version Control Systems is the way Git stores its data. Other systems (CVS, Subversion, Perforce etc) keep source code as a set of files and the changes made to each file over time. Git doesn’t store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini file system. Every time you commit, or save the state of your project in Git, it basically stores a picture of what all your files look like at that moment.
Summary of benefits:
Users may be reluctant to return workbooks to the developer for code updates due to data confidentiality concerns. Besides, this process could prove time consuming. There are 3 methods to address this issue in Excel:
For more information about GitHub, please read the Pro Git book by Scott Chacon, which is available online here or read about GitHub plans & pricing here.
Elementary version-control methods of choice for developers are:
- Create incremental or time stamped versions of files in the working folder.
- Copy files into another directory and time-stamped them (or copy files to a time stamped folder).
Our S1-Sync Excel Addin automates this simple principle: A full workbook version history is retained in a backup folder by time stamping the file name automatically every time a Save/SaveAs operation is triggered in Excel.
However, developers need to collaborate with other people on remote computers. To deal with this problem, Centralized Version Control Systems (CVCS) were developed. These systems, such as CVS, Subversion, and Perforce, have a single server that contains all the version-ed files.
This setup offers many advantages over local VCS for teams. For example, everyone has visibility to a certain degree to what everyone else on the project is doing. Administrators have precise control over who can do what and it’s far easier to administer a CVCS than it is to deal with local databases on every client. However, the most serious disadvantage is the single point of failure that the centralized server represents.
This is where Distributed Version Control Systems (DVCS) step in. In a DVCS clients don’t just check out the latest snapshot of the files: they fully mirror the repository.
The major difference between GitHub and most other Version Control Systems is the way Git stores its data. Other systems (CVS, Subversion, Perforce etc) keep source code as a set of files and the changes made to each file over time. Git doesn’t store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini file system. Every time you commit, or save the state of your project in Git, it basically stores a picture of what all your files look like at that moment.
Summary of benefits:
- Most operations in Git only need local files and resources to operate. This means that there is very little you can’t do, if you’re offline or off VPN.
- With Git, nearly all operations are performed locally, giving it a huge speed advantage on centralized systems that constantly have to communicate with a server somewhere.
- Everything in Git is check-summed before it is stored and is then referred to by that checksum. This means it’s impossible to change the contents of any file or directory without Git knowing about it. This functionality is built into Git at the lowest levels and is integral to its philosophy.
- Git allows multiple local branches that can be entirely independent of each other. The creation, merging, and deletion of those lines of development takes seconds.
Users may be reluctant to return workbooks to the developer for code updates due to data confidentiality concerns. Besides, this process could prove time consuming. There are 3 methods to address this issue in Excel:
- A 'VBA conversion' workbook is distributed to the Excel users, which removes all old code from the target workbook and replaces it with a new version.
- VBA code is stored in Excel Addins and user data in macro-free workbooks. The Addin is replaced, when new VBA code becomes available.
- Excel VBA modules can be updated automatically directly from GitHub. The approach has been pioneered by Bruce McPherson and is demonstrated in this article using open-source repositories.
For more information about GitHub, please read the Pro Git book by Scott Chacon, which is available online here or read about GitHub plans & pricing here.
Spreadsheet file naming conventions
Version control software or a file sharing platform with version control may not be an option when files are exchanged with third parties e.g. auditors.
Spreadsheet model auditors follow 2 long standing naming conventions:
Please note:
Spreadsheet model auditors follow 2 long standing naming conventions:
- yyyymmdd-Project-Name-1.2.3.xlsx or
- Project-Name-1.2.3.xlsx
Please note:
- The date is specified in reverse order (yyyymmdd), so that files are sorted correctly by the file system in Windows and that there is no confusion between American dates and the rest of the world.
- Dashes can be replaced by underscores, as an alternative. To avoid using quotes around the filename, spaces are usually avoided.
- Words should not be run together like MyProjectName, because myprojectname is harder to read, if case is not preserved.
- A version number should be maintained in addition to a date, because models may have to be re-issued during the same date with a new version, after a tiny change has been made.