Switching careers mid-life can be hard. I embarked on this journey several years ago, when I came back to Canada after a five-year-long overseas teaching adventure. Since then, the BCS program (second degree in computer science) at UBC has opened the doors for me to the vibrant world of software engineering. In a short amount of time I had been exposed to data structures and algorithms, hardware and OS concepts, design patterns and programming paradigms. It is easy to get lost in this ocean of knowledge without the beacon of practical application. The barrier of entry to the profession is relatively high given a plethora of software development tools and frameworks, so I did not hesitate to enroll in the co-op program.

Four academic semesters, one summer research scholarship, and one eight-month co-op later, I was getting ready to graduate with one more semester of studies to go. I have learned a ton about software engineering processes and good coding practices, yet I chose to apply for one more co-op job hoping this time to reinvent myself as a backend developer. With that goal in mind, my applications became more focused. A position on the Core Team at PDFTron caught my attention, as it emphasized mathematical aptitude providing a good chance to capitalize on my background in mathematics. I ended up accepting the position and have not regretted this once.

Technology aside what makes PDFTron a great place to work is its people and culture. The company is a team of smart, hard-working people, who are never tired of helping less experienced colleagues get better at accomplishing common goals. By the end of your co-op term you will get to know most of your coworkers and understand what their role is within the organization due to the practice of company sponsored lunches, where people get assigned into random groups of 4-5 individuals to share a meal together.

Color-coding differences between images

PDFTron's flagship product is a cross-platform PDF SDK. My classmates often ask me what it is like to work with the PDF. You will be surprised at how many fundamental computer science concepts are involved when developing for the PDF standard. Before my work at PDFTron I have taken for granted character encodings and how to make them render in system agnostic ways. Image file formats did not seem sexy, until I discovered image compression techniques like color quantization. Most importantly as a co-op student, I gained invaluable experience in developing robust code that runs and compiles in multiple environments.

In fact, PDF is not just about mature technologies. Given the ubiquity of electronic documents, the room for improvement in productivity, security, efficiency is truly limitless. The direction of PDF SDK evolution is entirely customer driven. As SDK users are often startups exploring new market opportunities, there is constant demand pull to stay abreast of cutting edge technologies. With machine learning becoming more mainstream, everyone comes to expect ML software features. PDFTron has been actively recruiting full-time machine learning specialists. Learning about my interest in ML, the company has been extremely accommodating by allowing me to work on optical character recognition project.

Working on the core team you will obviously write C++ code, as well as many other things: "hack around" in CMake, get to set up Virtual Box environments to test your code, and write Python scripts among others. The company culture encourages students to learn by doing and to own their projects. Flat organizational structure is accommodating and students get the amount of support they want and need. PDFTron is indeed a result-oriented company, and one of the results people here care about is your learning outcome.

Page layout segmentation based on pixel density

To give a concrete example, one of the challenges I faced was to find a mapping between characters recognized by OCR to their counterparts in the input document. I used center points of character boundary boxes as a measure of character's location. At first, I matched characters based on distance, e.g. via Hungarian algorithm. While distance minimization approach worked reasonably well for ground truth datasets concocted by translating existing .pdfs with searchable text into images, the approach failed for more realistic datasets where documents were printed and then scanned. Eventually, I realized there is a geometric relationship between the two coordinate systems and the coordinates can be rescaled to get exact mapping.

In summary, working at PDFTron is exciting and future co-op students will undoubtedly find the experience challenging and rewarding. My daily duties dispel the myth that knowledge obtained at university is too theoretical to be practical. Working at PDFTron so far has inspired me to seek enrollment in advanced operating systems, compilers and computer graphics courses, as well as gave me the sense of career direction I was hoping to find.