Data Analytics Lessons from the NBA's first data mining product from IBM

I attended the Boston Business Intelligence meetup at Microsoft last week. I was spurred to attend because Bedrock Data is launching a new product in January that closely aligns us with business intelligence teams as the mechanism to supply unified customer data across applications for business analytics and dashboards.

It was a flashback moment for me as it put me back to the start of my career at IBM in the mid 1990s where I worked with a small team led by Dr. Inderpal Bhandari, then a chief research scientist, today the Chief Data Officer of IBM.

Our team created and launched IBM’s Advanced Scout, the first data mining software used by NBA teams which within two years was used by 24 of the 28 NBA teams, powered a regular content series “Beyond the Boxscore” on, and was covered in a host of publications including Wall Street Journal, Sports Illustrated, Washington Post, Computer World, CIO Magazine and Wired Magazine.

I’m proud of those achievements in launching the product and there are takeaways from what we did that apply to business intelligence projects to this day, over two decades later.

Let me start with some background and then get to the lessons. Two things were happening in 1995: Inderpal had developed a new data mining algorithm at IBM Research to find interesting patterns in data, and IBM, as an NBA sponsor, was part of an effort to systematize, for the first time, the collection of NBA play-by-play data through courtside data collection.

Inderpal saw an opportunity, asking the question, “Where’s the opportunity to apply these new data mining algorithms to the NBA using this new play-by-play data resource?”

That’s where I came in.

As a high schooler in the New York suburbs at the peak of the rough-and-tough New York Knicks and Big East basketball, I was into basketball. I was the man for the job. I met Inderpal at an IBM student luncheon in the spring of 1995 – and we were off.

It was over 20 years ago but I still remember it like yesterday (although have forgotten a lot of what’s happened in between). These are three lessons I took from what we achieved and how we achieved it, to apply to any data analytics project.

#1 – Focus on the Most Important Performance Metric

The slate I was given at the time is – we have data on each play outcome. Not the detailed pass level data you might see today, but the basics of who shot the ball, what happened (score, miss), was there an assist, who got the rebound. As well as data on substitutions.

The question was – what can we do that’s available in the data and compelling for coaches?

At the time, the most common basketball success metric was field goal percentage.  We went in a different direction.

We focused on points scored, and looking at the lineup combinations that had the highest contribution to positive and negative differentials on points. Our thinking was that there’s a ton happening on the basketball court, but if we start with the outcome as the point of analysis – we are going to unearth patterns that are interesting to the coaches. It will be meaningful if we can tell the coach if a certain player is making the difference, or a combination of players or matchups is making the difference. Take the noise of an entire NBA game or series of games, and net out to the coach what matters.

This personal legacy for me around this is applying that thinking to analytics across a variety of industries. The big breakthrough in marketing analytics in the past decade is looking at what marketing activities are most connected to revenue as an outcome. It’s the same method of thinking that led to these types of closed loop marketing analytics, as looking at points as the key outcome for basketball.

The basketball legacy is this type of plus-minus points analysis has become mainstream in basketball and can now be found in daily box scores on every sports website. At the time, it wasn’t a mainstream metric for basketball – in fact it was commonly found in hockey where very few goals were scored. I thought – what if we apply it to basketball where there are many, many points?

Here’s the ESPN box score from the recent Christmas Day game between the Warriors & Cavs. You can see +/- featured in the box score, and in this particular game the Warriors performed the best when Andre Iguodala from their reserve squad was on the floor. He was the unsung hero with 4-for-8 shooting, 6 rebounds and a blocked shot off the bench.

Every ESPN NBA box score now features the +/- metric, which back in 1995 was a hockey only stat     

Every ESPN NBA box score now features the +/- metric, which back in 1995 was a hockey only stat

#2 – Get Creative to Solve Data Challenges

To make Advanced Scout compelling for coaches, we had to get creative. The data mining system worked off the concept of attributes and looking at which combination of attributes were most interesting (statistically significant) for a specific numeric attribute (points).

We had information on substitutions but it wasn’t immediately clear how we could use that information for analysis. We couldn’t just throw the list of players on the floor in an unstructured data format, it needed to be a structured data format that would work as an input to the data mining engine.

Then it came to me.

We came up with the idea of a player order ranking, which we prepared a default for the coaches based on player height, and this ranking plus the players on the floor determined which positional slot to put each player in as the attributes – using the order of point guard (PG), shooting guard (SG), small forward (SF), power forward (PF) and then center (C).

So for example, I got into basketball around the 1989-90 Knicks – so let’s take that team as an example. Using these seven players, we would order them like this:

  1. Mark Jackson
  2. Rod Strickland
  3. Gerald Wilkins
  4. Johnny Newman
  5. Kenny Walker
  6. Charles Oakley
  7. Patrick Ewing

So with the starting lineup of Jackson, Wilkins, Newman, Oakley & Ewing, the players would slot into the positions like this, in order:

          PG = Jackson

          SG = Wilkins

          SF = Newman

          PF = Oakley

          C = Ewing

Swap out Jackson for Strickland, Strickland swaps into point guard. Swap out Oakley for Walker, Walker slots into power forward.

Let’s say there was a “small ball” lineup with Jackson, Strickland, Wilkins, Newman and Oakley (that would be pretty rare), then it would lay out like this:

          PG = Jackson

          SG = Strickland

          SF = Wilkins

          PF = Newman

          C = Oakley

Strickland’s not a shooting guard, you might say. Or Oakley’s not a center.

But in these lineups, they were playing those roles – and this allowed us to ensure the algorithms knew that was the case when those lineups were on the floor. 

This approach created the foundation for analysis that led to insights created from every game – and shared with coaches and also used as content for an series and TV broadcasts.

There were many, many stories that resulted – see the end of the article for links to ones I could find still published. A Darrel Armstrong story became the most famous.

In 1996-97, Armstrong was a backup guard for the Orlando Magic. He averaged a modest 6.1 points over 15 minutes per game off the bench in the regular season, shooting 38% from the field.

In the playoffs, the Magic took on the Heat, and lost the first two games of the best-of-five series. Following game two, Advanced Scout flagged for the Magic coaches that the team was performing best with smaller lineups with Armstrong on the court. Armstrong played a much more prominent role from there forward, the Magic won both games three and four and gave the Heat a run in game five.

Armstrong’s performance in the playoffs was nearly double his points and minutes from the regular season, while upping his shooting performance to 48% from the field. This personal breakthrough carried over as for the next four seasons his minutes per game increased and he became a more and more prominent part of the Magic. In 1998-99, he won both Most Improved Player and the Sixth Man of the Year Awards.

This story is covered in multiple outlets and you can see in the links at the end of this article.  Here’s how the great NBA reporter Jackie MacMullan told the story in a 1998 Sports Illustrated article titled “Cyber Scouting”:

A dramatic example of the value of computer scouting came in the first round of the playoffs last season, when Orlando found itself down 2-0 to the Miami Heat, having lost those games by an average of 26 points. When the Magic got home after the second loss, Sterner spent three hours in his office plugging questions into the Advance Scout program.

Shortly after 3 a.m. he unearthed a nugget: With reserve point guard Darrell Armstrong on the floor, Orlando had outscored the Heat by 15 points during the two games. In addition, the Magic had shot 64% with Armstrong on the floor and 37% without him, while Miami had shot 57% while Armstrong was out of the game and 45% when he was harassing point guard Tim Hardaway and his Heat teammates. Sterner called up corresponding video footage, which showed how effectively Armstrong had pushed the ball up the floor in transition and created scoring opportunities, and how, on defense, he had forced Miami turnovers and caused the Heat to resort to tough shots.

Armstrong had played only 23 minutes in the two games. In Game 3 Orlando coach Richie Adubato played Armstrong 38 minutes. He had21 points, eight assists and one turnover, and the Magic won 88-75. Rejuvenated Orlando also won Game 4, with Armstrong contributing 12 points, nine rebounds and one assist. Although Orlando dropped the deciding fifth game in Miami, the Magic had been transformed from a floundering club into a team infused with new life--not to mention nearly $3 million more from ticket sales, concessions and television revenues.

The Darrell Armstrong Advanced Scout story was featured in Sports Illustrated in 1998

The Darrell Armstrong Advanced Scout story was featured in Sports Illustrated in 1998

#3 – First & Foremost, Deliver on Decision Support

A key premise to how we approached Advanced Scout came from Inderpal and our chief software architect Rajiv Pratap. Every time we talked to the coaches, we talked to them about being a tool to help them make better decisions.

To bring this life, we linked the stats to video. Since every play was time stamped, we could take an insight like the Armstrong nugget, and then feed the specific offensive and/or defensive plays when that lineup was on the floor.

In fact, the main allies of the tool became the video coordinators who could now link analytics to how they fed video to the rest of the coaching staff. These video coordinators became trusted advisors to head coaches. Two of the video coordinators we worked closely with in those years were Erik Spoelstra and Frank Vogel.

Spoelstra then was video coordinator for Pat Riley’s Miami Heat, and after moving up the ranks became their head coach in 2008. Vogel then was video coordinator for Rick Pitino’s Boston Celtics, and became head coach of the Indiana Pacers in 2011 and the Orlando Magic in 2016.

The legacy of IBM’s Advanced Scout is significant – the plus minus stat in basketball; the great stories that IBM would leverage for years around generating insights in data; the development of careers of a generation of analytical coaches; and now today IBM’s Watson technology is closely partnered with ESPN and can be found generating insights for fantasy leagues.

Here’s the press coverage I could locate – and there’d be a lot more if this didn’t occur during the early days of the Internet. I also wrote hundreds of stat insight articles for under the Beyond the Box Score banner which are no longer archived on the site.   

Check out the links above for more texture. Other highlights from this included being featured in an IBM ad campaign (featuring Inderpal photo in a full page print ad), assisting the 1996 Olympics teams in Atlanta, and featured in numerous IBM and NBA events including the Olympics and All Star games.