Chapter 20 What software should I use to write my PhD?

Thirty years ago, when I started to use computers as an undergraduate, the word processing standard was a piece of software called WordPerfect. Back then it seemed inconceivable that anyone would want to use anything else, or that WordPerfect could lose its grip on the market. WordPerfect became a victim of the Microsoft revolution that standardised most PC operating systems (think Windows) and software using Microsoft Office in the mid-1990s. The chances are that today your institution will still have subscriptions to Microsoft Office making it free for you to use. But given that changes in software can happen so fast and appear to change with increasing frequency, should you use Microsoft’s Word, or another piece of software to write your PhD?

The simple answer is that you should use whatever you, and your advisor, feel most comfortable with. You will face a lot of challenges during your PhD including learning to write with lots of different kinds of software. Each one you learn to use will require a steep learning curve, and that will require your time. If there’s any good time to start with a new word processing package it will be before you start your PhD. If you do decide to go with a new writing package then my suggestion would be to choose open source software like (R Markdown Xie, Allaire & Grolemund, 2018). More on these later.

Another important consideration is whether or not your chosen word processing package will allow seamless integration with your reference manager. Your lab may have a communal repository of references, and so this may then constrain what you are comfortable using.

If you feel comfortable with a word processing package with a nice GUI, like Microsoft Word, then go ahead and use it. These certainly have advantages like being able to get a synonym at the click of a mouse (Figure 13.1). My best advice is that you do not save your files in their default “.docx” format. Instead, make sure that all of your files are saved as .rtf, .txt or .html. This should make little difference to your opening and editing but will mean that all of your texts are forward compatible with other kinds of software. It will reduce your file size and reduce the chances of file corruption: i.e. when part of the file malfunctions and you lose the contents. That’s right, in addition to loss of data by back up deficiencies see Part III, you also have the risk of having corrupt files that will also mean you lose all of the contents. Even if you always use Word and never use any other file format, do yourself a favour and save each version as a .rtf or .txt file as well. Just like a back up, this means that you won’t have lost all of the text if a file goes corrupt. And yes, I do know people who have multiple backup versions of corrupt files: it’s happened to me.

Keeping files in a generic (rather than software manufacturer’s) format will allow easy sharing of these files, and access to them in future. It might seem inconceivable now that anyone would not be able to read a .docx file, but in 10 years, you may be so used to the latest format that after another 5 years it becomes very difficult to retrieve information from a .docx file without paying someone to retrieve your data.

The real problem with word processing packages like Microsoft Word is when files get very large as they tend to do during a PhD. Once you start putting chapters together especially if they include embedded images tables and the like you will find that many word processors start misbehaving. This can be especially frustrating in the closing phases of your PhD when you’re trying to submit it. Especially if there’s a tight deadline. However this problem is easily overcome by chunking your thesis into documents equivalent to chapters and then merging PDFs from each chapter.

20.1 Alternatives to conventional word processors

Many academics are turning to LaTeX in order to write their manuscripts as this gives a lot of possibilities to insert links, formulae, references, images, etc. in a very simple way. Once you are ready to print a LaTeX document it can be shared very easily as a PDF.

Be aware that the problems may come not with what you want but with how your advisor and or co-authors are comfortable. I find that commenting on a PDF takes more time than doing the same on a Word document. I don’t mind working with LaTeX but I know some advisors are very adverse to having to use this platform. Obviously, in order to get the feedback from your advisor you are going to need to use a platform that they are also comfortable with. So please consult them on what they are happy to use. You may also need to instruct them about how to make comments. For example, track changes is a function that a lot of people like to use but it doesn’t translate back into other platforms seamlessly.

As with many things in this book I urge you to keep an open mind with respect to software. For example, if your lab is progressive and already experimenting with new and flexible software platforms then I suggest that you learn with them. The future of computing is certainly likely to be open-source software, and so the sooner that you become more familiar with this, the easier you will find it to adapt in future.

Another consideration is how your word processing package is going to work with referencing and citations. Again, your lab may already use some software that you want to integrate with your word processor.

If you are feeling adventurous, try writing in R Markdown, using RStudio as a text editor GUI (Graphical User Interface) and the Bookdown platform to publish your chapters into a single thesis file in virtually any format. There is also great support for both of these formats with free guidebooks about how to use them (Xie, 2016; Xie, Allaire & Grolemund, 2018). Given that I wrote this book in R Markdown with bookdown, I feel that I should point out (albeit briefly) the advantages to this platform:

  • Reproducibility
  • Dynamic Content (include statistical code and graphics inside your thesis)
  • Focus on content (not formatting)
  • Numerous output types (including .docx, .html and .pdf)
  • Independent reference manager (via .bib file) and style files (via .csl file)
  • Free software and community support

Importantly, what you want to try to avoid is conducting a major switch between software right at the end of your PhD. Try making such changes when you have more time.

20.2 Naming versions of files

Good file management is an important aspect of a PhD. I like my students to give each chapter of their thesis a nickname or short running title that we both agree on to name the files. An example might be “Swimming_performance_V1.rtf”. The reason why I don’t like “Chapter_1.rtf” as a filename is that every student has a chapter 1, so I can’t distinguish between them. In addition, I’m not good at remembering which chapter is chapter 1 and which is chapter 2. Sometimes, towards the end of a thesis, the chapters get their orders switched around, causing even more confusion. Notice that I don’t like having spaces in filenames. This is a hangover from old days of MS-DOS, but is consistently useful today when using R and other coding software. However, some software platforms react oddly with underscore (_), and so you might need to work with a hyphen instead of a space (-).

Once I’ve commented on a file and sent it back in an email, or uploaded it to their shared folder, I’ll add my initials Swimming_performance_V1_JM.rtf. Each time I comment on this version, I will add a number to my initials and the student adds one back, so eventually, we might come to: Swimming_performance_V1_JM3_MM2.rtf. After three rounds, we would normally send a clean V2 out to co-authors and once their comments are back in we should be all set.

While you and your advisor can adopt any naming convention you like, you and they need to instantly be able to distinguish between a new version and one that they have already commented on by the file name. No advisor wants to spend time commenting on a chapter and then be told that it was an old version.

20.3 How many versions of each file should you keep?

I would suggest keeping at least one older version of each file to minimise the possibility of losing the content through corrupt files, but you should also be keeping those .rtf or .txt files as well, just in case. Because of my early experience with file corruption, I started to keep all versions of files. This is unnecessary, and I’ve never needed to go back to anything but the last version. Nevertheless, text files take up very little space and it’s not a problem to keep more versions if you want to. More on this in the chapter on data management.

References

Xie Y. 2016. Bookdown: Authoring books and technical documents with R markdown. CRC Press.
Xie Y, Allaire JJ, Grolemund G. 2018. R markdown: The definitive guide. CRC Press.