Tyler Martinez is PitchBook’s Director of Data Science and Software Engineering. He’s built the team from the ground up, expanding from just three people in 2015 to a team that tallies 17 today. Tyler oversees the data science and cloud technologies that support internal data collection processes and power the algorithms behind PitchBook features like Suggestions. In the Q&A below, he shares what he’s learned about leading a tech team in Seattle and what he’s most proud of them building.
In your own words, what do you do at PitchBook?
I create and maintain a clear vision on how our technology projects align to company initiatives. This involves communicating the impact and value that machine learning and modern cloud architectures have on the PitchBook business, as well as bringing new tech into the Product department. When it comes to managing my team, I remove barriers for engineers and coordinate with other leads across Product, UX, Research and our Ukraine development teams, as well as manage budget, hiring and planning.
What are the pros and cons of recruiting talent for an engineering team in Seattle? How did your team get to where it is today?
The pros are easy: machine learning and cloud microservices are hot right now in Seattle! PitchBook is positioned right at the forefront of modern development which makes it appealing for candidates. It’s a fun time to be in a market with so much advanced research happening and new products and ideas being shared every day.
What’s difficult is there are only so many spots available on the team. With so many people applying because of our company’s growing popularity and an attractive collaborative culture, we do have to turn many candidates away. I wish I could give everyone a job, but unfortunately that’s not how it works. It’s like coaching any sports team when you have a large group at try-outs and only a certain amount will make the cut.
We got to where we are today through lots of trial and error, being ok with failing and learning from our mistakes. We’ve invested heavily in modern CI/CD pipelines that allow us to iterate quickly on new findings. And we’ve taken the time needed to show success on projects and not release until we meet our internal quality bar, and then have invested more and grown the team from there. Most importantly, the team is in a successful position today because of all the hard work and effort they’ve put into making our vision happen. They are truly world class!
Much of the work your team does is behind the scenes. What are one or two things you’d want to tell our customers about how the product has evolved on the backend?
We’re getting smarter on how we index, store and label our data to maintain a high-quality product. When I started at PitchBook a few years ago, we only had two or three relational databases and only a couple people knew how to query them. Today we have 10 to 20 times that and over 50 team members that are fluent in a variety of data extraction languages: SQL, NoSQL, S3. The sheer size of the data repository we have for analyses gives us a significant competitive edge, and we’re coming up with new insights every week. Our internal data dictionary and how we merge data sets together allows us to be nimble when we want to extend existing data sources or add new ones.
In addition to that, we’re constantly choosing which parts of our system make sense to move to micro-services architecture and which legacy systems to leave as is. The biggest internal benefit to micro-services is the ease of use for new engineers to come in and out of that service; we have a lot of flexibility there. We’ve exposed many RESTful services internally to enable global engineering teams to share data products much more effectively. This has empowered engineers as well as non-technical teams to better understand how our systems work together—and monitor quality.
We hear a lot of buzz about artificial intelligence and machine learning, especially since we’re in an industry that’s trying to capitalize on the latest technology. Can you explain the tech your team uses and help us cut through the jargon?
I see a lot of companies talk about human-in-the-loop or try to sell it as a service, but they lack the execution. We have shown success in this area and are still innovating daily since the room for improvement in this field remains high. Our machine learning, human in the loop and natural language processing efforts at PitchBook are cutting edge. We have a world class named entity recognition system that outperforms Google and Amazon’s standard offerings. This is a machine predicting what is an organization, person or location in text. We take special care to label of tens of thousands of data sets to maintain high quality models, and we’re very thankful for all the support we get from our research teams to build out these data sets—our data quality would not be as high as it is without their help!
Most impactful project you’ve led at PitchBook?
Something we refer to internally as News as a Service, which involves collaboratively growing our news data collection process from processing 50,000 articles a week to over one million while only having to hire three researchers to cover every relevant data point mentioned in the news. Plus, the proprietary integrated feedback system we’ve built continually validates our machine learning models in the wild. This system scales to all the data points our clients care about from revenue figures to M&A events to a company’s headquarters location. In the cases where the machine makes a mistake, it’s easy for a researcher to show it where it went wrong so our data science team can continuously retrain their models.
Trends you consider when you look at the future of the product in the next few years?
The continued improvement and maintenance of our data quality is huge. At PitchBook, we understand that data is only useful if it’s correct, and we’ve found a nice balance between leveraging automation and human intervention to produce the highest quality VC, PE and M&A dataset in the market. As we keep building out our public data offering with the help of our parent company, Morningstar, the breadth of our data coverage will only continue to grow.
We’re also looking at improved search and discoverability features in our platform and implementing some slick methods of searching amongst the millions of companies, transactions and people we have in our database. And yes, if you’re wondering: some of the methods will be using machine learning! Finally, as our client base continues to grow, we’re investing heavily in user research to curate experiences and improve recommendations based on a user’s interests and workflows in PitchBook.
What do you do in your spare time? Any side projects you can share?
I spend a lot of time reading about technology, what’s happening in the artificial intelligence, machine learning and natural language processing space as well as cloud hardware. I like to nerd out and read research papers on https://arxiv.org/, particularly under Machine Learning and Information Retrieval.
As for side projects, I enjoy building out small python applications. Two in particular I’m working on at the moment are a blackjack card counting application that was trained using reinforcement learning methods and an application that helps understand game theory optimal play in no limit Texas holdem.
Outside of that, I enjoy traveling with my wife, dim sum, yoga, piano and time with family.