Travis Oliphant: scientist, software developer, entrepreneur and now, subject of a documentary.Credit: Christi Norris/BYU
In 2005, Travis Oliphant was an information scientist working on medical and biological imaging at Brigham Young University in Provo, Utah, when he began work on NumPy, a library that has become a linchpin of scientific computing in the Python programming language.
Oliphant came across Python as a graduate student in 1998 and, as he wrote in the Guide to NumPy (2006), “quickly fell in love” with it. That “is a remarkable statement to make about a programming language”, he acknowledged. “If I had not seen others with the same view, I might have seriously doubted my sanity.”
But as he explained to Nature, Python has two particularly attractive features, even for a programmer who is conversant in more powerful languages: it’s easy to read — “executable English”, as he puts it — and extensible. Oliphant would go on to write or contribute to two of the language’s most crucial extensions: NumPy and SciPy.
With hundreds of millions of downloads per month, NumPy simplifies and standardizes the treatment of complex numeric arrays, enabling mathematical computations that are necessary to do science — including artificial intelligence (AI) — in the Python language. This is one reason why Python has become the go-to language of scientific computing. As a 2020 Nature research article pointed out1 , “NumPy underpins almost every Python library that does scientific or numerical computation, which includes SciPy but also Matplotlib, pandas, scikit-learn and scikit-image.” SciPy, in turn, is a collection of scientific algorithms, many of which Oliphant created.
Now a technology entrepreneur and chief AI scientist at OpenTeams in Austin, Texas, Oliphant has gone from scientist to software developer to chief executive to venture capitalist — all thanks to NumPy, he says. “NumPy has been a driving force for my whole life.”
Now, his career has taken him to the silver screen. Last month, the YouTube channel CultRepo, whose films tell the human stories behind technologies, released a movie about the programming language and its backstory. Fittingly, the official premiere for Python: The Documentary was held at PyCon Greece, a conference dedicated to the language, in Athens on 28 August.
Oliphant features in the 84-minute film, alongside Guido van Rossum, who created the language at the end of 1989. A separate featurette, also released last month, focuses on NumPy and SciPy, with Oliphant in a starring role.
Nature chose to mark the 20th anniversary of NumPy’s creation by asking Oliphant to reflect on two decades of array arithmetic.
Why did you develop NumPy?
NumPy was a merger of two existing array libraries in the Python ecosystem. When I started using Python in 1998, a library called Numeric already existed. Numeric was written in 1995 by Jim Hugunin, a graduate student at the Massachusetts Institute of Technology in Cambridge. He was a pioneer and a brilliant technologist: he basically built an amazing ‘while-loop’, a block of code that executes as long as a specified condition remains true. And this while-loop did a bunch of things that made it easy to do mathematics using big arrays of numbers.
Around 2000, computer scientists began writing a new library, called NumArray. NumArray was a replacement for Numeric, with more sophisticated querying and support for more expanded data types. But I saw it as a problem because SciPy worked only with Numeric — and people were starting to write libraries that worked only with NumArray. There was a library, called ndimage, that did image processing. And I liked image processing — that was a big part of my graduate research, and it worked only with NumArray.
So I saw this divided community. That’s what prompted me to spend a year writing NumPy: I wanted to merge NumArray and Numeric together, to make it easier for this fledgling ecosystem of scientists using Python. I needed to help myself, too, because I was also a user.
What were some key innovations?
Looking back on my work on NumPy, a few things stick out as really compelling. First, it was a blend of two array structures. And there was a whole lot of backward-compatibility work to make sure that every NumArray user and every Numeric user could still use NumPy, without having to make a lot of changes to their code.
Another innovation involves ‘typing’. Python uses ‘dynamic typing’, in which variables can be attached to different types of object. For instance, a variable can refer first to an integer but then be reassigned to a string of text. But each object itself has a definitive type. The innovation in NumPy was related to what is contained in the array. I introduced the concept of the data type, or dtype, which defines the type of thing inside the array — is it a 32-bit float? An 8-bit integer? It’s the detail of what the bits are inside the array.
Another breakthrough was the universal function, or how to perform operations on two arrays that are not the same shape. For example, it’s very easy to understand what to do if you have a 3 × 3 array and you want to add to another 3 × 3 array: just add each element. But what if I have a 1 × 3 array, and I want to add it to a 3 × 3 array — what do I do there? There’s something called broadcasting, in which you assume the 1 × 3 array is actually 3 × 3, and you just add them elementwise. NumPy exposed the broadcasting algorithm in such a way that anybody could build on it.
How has NumPy changed over 20 years?
The changes have been slow. And it’s a trade-off for success, actually. When nobody is adopting your library, you have all kinds of room to change it quickly. Once people start adopting it at scale, it’s really hard to make changes without creating a lot of disruption. So, you have to create a process for that. It’s expensive, actually.
For me, one question has been a big motivator: how do we pay for this? How do we create companies and alignment approaches that allow open-source software to be funded, developed, sustained, supported and innovated on?
So that’s how open source has taken me on a journey from scientist to software developer to chief executive to venture capitalist. It’s all tied to NumPy — NumPy has been a driving force for my whole life, actually. And I still feel very much a part of it.
In 2018, I created Quansight, a Python-focused consulting firm in Austin, and Quansight Labs, the firm’s public-benefit division, followed in 2019. And Ralf Gommers, a central maintainer of NumPy and SciPy, helped to do that. He just became the co-chief-executive of Quansight Public Benefit Corporation (formerly Quansight Labs), along with Tania Allard. Quansight now has people who work on NumPy, SciPy, Python and the rest of the ecosystem, funded by grants, customer projects and investment revenue.
I’m very proud of Quansight because I left academia in 2007 to get into entrepreneurship and to work out how to fund NumPy and SciPy, and I now have a company that actually pays people to work on the products. It just took me a little while. I don’t know whether that means that I’m not very good at it, but it’s definitely passion driven.
Where does NumPy fall short?
I would say there’s two things that NumPy didn’t add early enough to really be the fundamental ground on which everybody built their AI models. First, it didn’t support graphics processing units (GPUs). It worked on central processing units (CPUs), but modern machine learning and deep learning rely on GPUs for training. The shift to GPUs led to modern breakthroughs in the field; it’s why NVIDIA, a tech company in Santa Clara, California, that makes GPUs, is doing so well. The machine-learning libraries PyTorch and TensorFlow worked with GPUs from the beginning. But NumPy does not use GPUs natively, and that’s a big thing that’s caused it not to be as useful as it could have been.