Category: ENGINEERING

NINA

After six baseball seasons I am leaving the Statcast baseball data team at Major League Baseball. I’m enormously grateful to have had the opportunity to work at MLB, but it was time for a change and the pendulum to swing back to my roots of music and innovation. The big news is that I’ll be joining Nina Protocol as their first dedicated backend engineer, founded by Mike Pollard, Eric Farber, and Jack Callahan. We all share the strong fundamental belief that online music platforms and tools do not currently serve the needs of independent musicians and it’s only recently that decentralized blockchain technology is at a point where artists can take full control of their creative work knowing that it can survive past the lifetime of a centralized music platform. It’s very early days and the time to build is now.

For most of my adult life until I was in my mid-thirties, music and being involved in a community of experimental musicians dominated my time and energy. Always running in parallel was immersing myself in software engineering and learning as much as I could, and pretty much whatever supplemental income I had at my disposal was reinvested in music and putting out records for other people. This pretty much went on for years and I absolutely benefitted growing up in Oakland and the Bay Area at large which had an incredibly rich tapestry of electronic and experimental musician weirdos as well as an innovative hub of computer history in Silicon Valley. Eventually a wave of blandness enveloped San Francisco in the 2010’s dominated by bloated adtech, MBA’s leaving finance, and product managers that just took their Agile certification to get in on the wild carnival ride.

My first job coding was in the summer at age 15 and I never really looked back. Occasionally these worlds would intersect but for the most part I discovered over time that it was entirely healthy to not have my job depend on music and vice versa. In that time however, it became increasingly clear after many years that even amongst my most successful music peers, there really wasn’t much of an economy for releasing limited runs of vinyl that took years of effort for everyone involved only for it to be written and tweeted about for a month and then disposed of in an increasingly dystopian streaming platform world where power law curves dictated a winner take all business model. I cherished publishing an artifact that could sit on someone’s shelf for decades that could outlast most technology companies and loved labels that took this idea very seriously.

The universe has a way of slamming the door shut on an epoch of your life whether you’re ready for it or not, and that’s pretty much what happened with me at the end of 2016 and the first couple of months in 2017 that complicated my relationship with music. There were three key events that happened all within two months of each other. I had come off of a tour with legendary Coil collaborator Ivan Pavlov (COH) and had an unfortunate falling out with the label I was releasing music for. The Ghost Ship tragedy had just happened right around the same time which I’ve written about before, and it really ripped the soul out of the Bay Area experimental community and left a hole in our hearts collectively that will never go away. And lastly, I had resigned from the company I had worked for which had recently been acquired by a major streaming music company which I was pretty dejected about because it ran counter to my values as an independent musician and I was unwillingly sucked into it. Several months later I ceased operations with Isounderscore, the label I had started at age 23 ending a 12 year run. I still worked on music over the next five years but everything slowed down considerably to a grueling crawl with two releases to show for it. I focused more on rebalancing my priorities as a new dad and a large amount of focus on my next work opportunity.

When I was six years old, to help make sense of a world where the 1988 Dodgers took down my juggernaut Oakland Athletics in the World Series I had the idea to compile the statistics from the 1989 Fleer baseball set for each team for a first grade science project to demonstrate that the Athletics were in fact the superior team. Throughout the course of my life growing up I would voraciously consume advanced statistics and always did well in math, and was well aware of what Billy Beane and the Oakland Athletics were doing as a small online community of nerdy A’s fans before Moneyball became a thing. But ultimately, I really had no idea how far the rabbit hole went with baseball analytics and baseball data pipelines. A small subset of my music friends knew that I was really into this kind of thing.

That’s why jumping into one of my other passions which was baseball data and analytics was the perfect pivot, and really for the first time in my life I owed it to myself to do something completely different which is why I joined MLB Advanced Media back in 2017 before the Bamtech deal spun us off as a technology organization at MLB. Many people don’t realize this but it’s a very small team in San Francisco that helped build out the baseball data infrastructure for Statcast which involved heavy collaboration with the fine folks over in New York operating out of the MLB headquarters. My role was to work very closely on helping create real-time metrics and a lot of the backend engineering. I was totally obsessed with the Statcast data for years and it forever changed the way I think about and experience baseball. To see how that changed the game of baseball in a very short amount of time while having a small part in it first hand really blew me away.

I’m incredibly grateful for my time at MLB and the opportunity to meet people across the baseball industry. It’s humbling to have had the privilege to collaborate directly with people over the years like Tom Tango, Mike Petriello, a really talented group of analysts and data scientists like Jason Bernard and Travis Peterson, our physicist Clay Nunnally and of course the software engineering team in the San Francisco office that I was a part of behind the scenes doing a lot of heavy lifting. When I met Daren Willman for the first time when I came to Houston in 2017, he demonstrated what southern hospitality meant with his generosity and is now helping rebuild the Texas Rangers in his post Statcast career. Of course, I would be remiss if I didn’t mention that I am indebted to Rob Engel for giving me the opportunity to work at MLB, he is incredibly humble and the man behind the curtain since the very beginning of Statcast’s inception. The truth is that the biodiversity and expertise across multiple technological domains under Major League Baseball is impressive, and the scale that is required to support the game of baseball whether it’s on the field or behind the scenes is truly humbling.

For the first time in a very long time I’m actually looking forward to enjoying baseball simply as a fan again, even if it means braving the Astros crowd in my now dated kelly green Matt Olson jersey when the A’s come to Houston. I really don’t know if I’ll ever work in baseball again, who knows? I’ll probably make some public contributions here and there for the community with some rare downtime. But it’s time to take care of some unfinished business and help build the tools that I would have loved to have had in the 2000’s and 2010’s working on music and the label. I really could not have found a better place to do that at right now than at Nina.

Time to get to work.

SLOW XGBOOST ON APPLE M1

I recently purchased an Apple M1 Max with 64GB of RAM as the old 2016 Macbook is slowly dying with the battery giving out and the machine randomly restarting with anything processor intensive. I actually had to send the original M1 I purchased back to Apple after a few days due to a hardware manufacturing issue with the screen flickering. Once the new replacement finally came, one of the tasks at hand was migrating all of my code on the personal computer to the new one and setting up the development environment. It went pretty smoothly until I ran one of my NBA models and noticed that it ran excruciatingly slow. What gives? This computer is a beast but there was a significant performance hit with one of my models running in Python and this computer is light years ahead of the 2016 Macbook Pro so I dove into what was going on.

Here was the time output of the NBA model on the new Apple M1 Max:

python ./nba_model.py 20221022 2196.36s user 642.10s system 385% cpu 12:15.39 total

This ran roughly an order of magnitude slower on the new machine whereas on the old Macbook it would finish in about 2-3 minutes. Clearly something was wrong so I ran htop to see how things were running and after doing some research I began to suspect it was an issue running on Apple Silicon since these new M1 machines are using their own CPU and universal memory architecture instead of Intel’s. There were a lot of threads online but nothing specific about xgboost being slow on M1’s. After some fiddling around and experimentation I was able to figure out that this was a tale of two package managers. Initially I installed xgboost the “correct” Anaconda way:

(base) nickell@mlm1 % conda install xgboost
Collecting package metadata (current_repodata.json): done
Solving environment: done

Package Plan

environment location: /Users/nickell/opt/anaconda3

added / updated specs:
– xgboost

The following NEW packages will be INSTALLED:

_py-xgboost-mutex pkgs/main/osx-64::_py-xgboost-mutex-2.0-cpu_0 None
libxgboost pkgs/main/osx-64::libxgboost-1.5.0-he9d5cce_2 None
py-xgboost pkgs/main/osx-64::py-xgboost-1.5.0-py39hecd8cb5_2 None
xgboost pkgs/main/osx-64::xgboost-1.5.0-py39hecd8cb5_2 None

Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Retrieving notices: …working… done
(base) nickell@mlm1 pops %

After accepting that this library install was too slow, I proceeded with the uninstall. Now instead of using Anaconda, let’s try old school pip:

(base) nickell@mlm1 % pip install xgboost
Collecting xgboost
Downloading xgboost-1.6.2-py3-none-macosx_10_15_x86_64.macosx_11_0_x86_64.macosx_12_0_x86_64.whl (1.7 MB)
1.7/1.7 MB 13.0 MB/s eta 0:00:00

Lo and behold, here’s the improved performance time:

python ./nba_model.py 20221022 105.95s user 23.25s system 439% cpu 29.416 total

A performance improvement of 20x. The moral of the story is that if you’re installing xgboost on the new Apple M1 use pip install and not conda install. It would be a good exercise to build from source. My overall takeaway is to keep an eye on the performance of certain python libraries and packages on the M1 as it’s still a relatively new target architecture. Even running PyTorch on M1 GPU’s has only been around since May: https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/. If I run into similar issues with other packages then reinstalling with a different package manager is the first place I’d look.

© 2022 nickell.io

Theme by Anders NorenUp ↑