The Importance of Documentation in Personal Research Projects

Every few years I get reminded of the importance of documentation and working in a systemic fashion. Todayʻs wake-up call came in the form of two personal research projects that still require more work. The first is for my Paddington project. I have been scanning a boxed set of what I consider to be Paddington canon for a computational linguistics project. I made the mistake of deciding to scan while watching something mindless, and after unbinding, the pages scanned the even and the odd pages of the books. I did not realize until later that only one set of pages had the name of the chapter and the other set had the name of the books. Also, not all pages have page numbers. With the help of a large spreadsheet, I am now going through the scans to create a txt copy of the text from the PDF scans. Of course, given the small pages, many of them got stuck in the scanner and never scanned. Lesson learned next time I scan for personal reasons, smaller batches, and try to scan by chapter. Fortunately, I have some time over Christmas week to bond with the scanned documents and get organized to write the conference proposal. Fortunately, I have the time to write down what should have been my methodology in the scanning process. 

The second reminder happened during my project for my Data Analytics class. I am working with an interesting data-set, and need to remember that the point of the project is to practice my Excel skill at this stage, not the actual analysis. Since many of these tools and tricks are new to me, important to record what I used for each function/cleaning step. Taking a step back to record each step of the data-cleaning project, which I professionally know is important, can be difficult. I also need to remember that I should be focusing on learning Excel tricks and tools that are new to me and be grateful that I have the privilege to exert resources on this project. 

As Iʻm working on both of the projects in chunks of found time, documentation is especially important as I take a large amount of time away from both projects and need to pick up right where I left off. 

Back to PDFʻs and spreadsheets!