Near the end of Frank McAfee’s 40 years working in systems simulations at Boeing, his younger colleagues began arriving with a plethora of new programming codes under their belt. Recognizing the limits of his attention span when working with tutorials and bland data sets, McAfee used Major League Baseball data to develop a Monte Carlo mathematical model.
“Every wave of new engineers that we’d get would have new tools, so I found myself getting outpaced,” he recalls. “As a fun way to learn some of the languages, I wrote this model in Visual Basic for Excel first, then in Java, and then finally in Python a few years ago.”
When the Seattle Mariners signed Robinson Canó Prior to the 2014 season, McAfee tried to simulate the impact of a prime-career superstar supplanting replacement-level second basemen in the lineup. By his own calculations, Canó projected to add nearly two-thirds of a run per game—Which was worth eight or nine wins in a season —and McAfee began ruminating about the possibilities of such lineup-specific modeling.
Thatly admitted rudimentary early code has undergone dramatic enhancements thanks to a team led by his son, Brian, and now serves as the original simulation code for SEQNZRa baseball and softball analytics tool that is proliferating throughout the college ranks and recently signed its first deal with an MLB client, the Cincinnati Reds.
Brian McAfee was uniquely positioned for the task. He pitched at Cornell as an undergrad, completed his final year of
Brian McAfee, right, has turned his father Frank’s hobby into SEQNZR.
eligibility while earning a Master of Management Studies at Duke, played minor league ball in the Tampa Bay Rays’ and Seattle Mariners’ organizations and now is a data scientist at Nike.
“Of course, he’s unusual because there’s data analysts out there who are sophisticated like him, and there’s professional baseball players—but it’s very rare to have somebody who crosses over between those two things like Brian has,” Frank McAfee says. “He’s been able to add levels of sophistication to the tool that I wasn’t even aware of.”
Brian McAfee turned his father’s hobby into a commercial product with the help of two of his Cornell pitching teammates: Scott Soltis, who built most of the data architecture, and Pete Lannoa former San Francisco Giants minor leaguer who is now director of sales while also contributing to the data science and code writing.
SEQNZR began as a lineup optimization tool and has since expanded into situational probabilities in a run expectancy called RE216 that helps in-game decision making. It computes 100,000 simulations on every query, replicating the type of work MLB front offices have built but making it publicly accessible. SEQNZR has partnered with 6-4-3 Charts, whose college data it uses, and is now part of a bundled tech stack with 6-4-3 and Playsight. (SEQNZR previously has cooperated with BaseballCloud and Driveline Baseball.)
McAfee emphasizes the importance for the tool to be flexible and customizable, drawing on not only historical data but also team- and player-specific inputs. The universal run expectancy matrix, RE24, lacks specificity of individual players’ abilities on the base paths and in the batter’s box. RE24 is so named for the number of run expectancy combinations for runners on base with varying numbers of outs; SEQNZR multiplies that by the nine lineup positions for RE216 to signal its capacity for individual player attributes.
“The typical stolen base break-even percentage is like 70-75%, but it really depends on who you have in your lineup and how likely they are to hit a ground ball, hit into a double play, hit the ball into the gap ,” Brian McAfee says.
Frank McAfee first brainstormed SEQNZR when Robinson Cano became a Mariner, his data suggesting Cano could give Seattle eight or nine more wins a season.
While there have been sabermetric studies downplaying the impact of lineup construction, McAfee counters that argument by noting that even incremental improvement is worth realizing and could be impactful. He adds that most lineup research uses data sets drawing on the league average player at each batting-order position, whereas some individual teams might have larger variance.
“The more you deviate from the average, the more benefit there’s going to be. Granted, I will say, you’re looking at, over the course of the college season, three-to-10 runs. It’s still marginal, but it’s so easy to get those three-to-10 runs that you just don’t want to leave them on the table,” McAfee says, adding: “It takes a lot in the weight room, it takes a lot on the mound, to be able to get three-to-10 runs. The ease of scoring those runs is a low-hanging fruit, I’ll say that.”
SEQNZR connected with the Cincinnati Reds because of the annual ABCA conference in January. The Reds’ game planning and outfield coach, Jeff Pickler, gave a talk entitled, “Game Day Strategies: Balancing Our Feel for the Game Alongside Today’s Data, During the Heat of Battle.” When it concluded, several attendees saw SEQNZR’s booth and asked, “Hey, is this like Pickler’s presentation?” That prompted McAfee to seek out Pickler and the Reds where, it just so happened, McAfee’s catcher at Duke, Christian Pérez, is the club’s major league advanced scouting coach.
It takes a lot in the weight room, it takes a lot on the mound, to be able to get three-to-10 runs. The ease of scoring those runs is a low-hanging fruit, I’ll say that.
— SEQNZR’S Brian McAfee
The Reds are still deliberating the best way to deploy the technology but see its potential in augmenting their in-house capabilities. McAfee says its RE216 model is even more in-depth with MLB data, as the ball-strike count is taken into consideration. With 12 possible counts, RE216 becomes RE2592, indicating the average runs expected, while RP2592 (for run probability) projects the odds of scoring a single run. McAfee explains that RE is more important for early innings whereas RP matters more in close games in the late innings.
“It’s something we can integrate into what we’re already doing,” Pérez says. “And someone like Brian, to be able to work with him and his own skill set—he can help us outside of necessarily things that are explicitly related to SEQNZR. He can also help us build stuff out internally as a consultant, so that’s part of the allure.”
Individual player attributes, such as speed on the bases, are an important reason why SEQNZR aspires to be a flexible and customizable tool.
SEQNZR is expected to be available on the iPads that MLB permits every club to use in the dugout. While there are restrictions on in-game video, bespoke scouting information is available with real-time data updates.
“You can upload your own spray charts, scouting reports, all those sorts of things that can be done in advance of the game,” said MLB chief operations and strategy officer Chris Marinak, speaking recently about in-game access. “And there’s a sync mechanism where teams can upload to a shared folder certain document typesand then they will be available for use on the iPad to pull up.”
To date, most of SEQNZR’s target demographic is in the college—and even youth—ranks, with a lot of early traction in college softball as well as a partnership with The Alliance Fastpitcha national softball network of teams and leagues.
“It’s a technology that nobody has looked at on the softball side,” McAfee says. “There’s no Tom Tango of the softball world pulling softball data”. They are one step behind baseball, but everyone’s really hungry to learn.”