
In this week’s report:
- Research-grounded Investor Q&A: “Is high short interest a contrarian buy signal or a legitimate warning?”
- Treating the stock market like a language can unlock a 2.85 Sharpe ratio
- Active managers charge 17x more than their information advantage is worth
1. Research-grounded Investor Q&A: “Is high short interest a contrarian buy signal or a legitimate warning?”
High short interest is a legitimate warning – not a contrarian buy signal.
It’s not just how much short interest exists, it’s where short sellers aim.
Chen, Da & Huang (2022) introduced “short selling efficiency” (SSE), a measure of how accurately short sellers concentrate their capital on the most overpriced stocks. Technically, SSE is the slope coefficient from cross-sectionally regressing abnormal short interest (the difference between current short interest and its trailing 12-month average) on a composite mispricing score each month. SSE significantly outperforms raw aggregate short interest as a market return predictor, both in-sample and out-of-sample.
When SSE is high – meaning short sellers are targeting the right names – subsequent market returns decline as mispricing gets corrected quickly. The predictive power intensifies during recessions, high-volatility environments, and periods of low public information flow.
For practitioners, this means raw short interest levels can mislead: a stock heavily shorted by hedging-motivated or market-making flows carries a very different signal than one shorted by conviction-driven fundamental shorts.
Unexpected changes in short interest carry more information than static levels.
Hanauer, Lesnevski & Smajlbegovic (2023) proposed “surprise in short interest” (SUSIR): the unexpected component of short interest changes after accounting for cross-sectional distributional differences.
SUSIR negatively predicts returns in both U.S. and international equities. The mechanism: investors anchor on historical short interest levels, so when short selling activity unexpectedly ramps up, the market underreacts to this new information.
The return predictability is strongest among illiquid, volatile stocks with high information uncertainty, exactly the names contrarian buyers find most tempting. Importantly, the effect is unrelated to short-selling frictions (like borrowing costs), meaning it captures a distinct behavioral channel separate from the constraint-based overpricing story.
The GameStop aftermath made surviving high-short-interest positions more informative, not less.
The January 2021 GameStop episode – where retail traders coordinated on Reddit’s r/WallStreetBets to drive a stock with ~140% short interest from $17 to over $400 – is the canonical case people cite for the contrarian thesis. The academic literature that followed tells the opposite story.
Ahmad, Hauf & Seiberlich (2022) studied the spillover effects on short interest across the entire U.S. equity market. Heavily shorted stocks experienced a significant reduction in short interest post-GameStop, not because the fundamental thesis was wrong, but because hedge funds restructured toward less concentrated shorting strategies.
By decomposing the change in short interest into quantity and price effects, they showed the reduction in actual shares shorted overcompensated the mechanical price effect.
The implication for practitioners: post-2021, a stock that still carries elevated short interest despite the deterrent effect of squeeze risk likely reflects even higher-conviction negative views than a pre-2021 equivalent. The bar to maintain a large short position went up, so surviving shorts are more informative.
Pedersen (2022) formalized the game-theoretic dynamics, showing how social network coordination can temporarily overwhelm fundamentals, but not permanently. GameStop itself is illustrative: after peaking above $400 (split-adjusted $100) in January 2021, the stock traded below $15 (split-adjusted) by 2024. The short sellers’ thesis – that GameStop was a declining brick-and-mortar retailer – proved correct on every relevant fundamental horizon.
Why the contrarian thesis keeps tempting people, and why it keeps failing.
The persistence of the “buy what’s hated” narrative despite contradicting evidence comes down to two well-documented behavioral errors. Recency bias causes investors to overweight the handful of spectacular short squeezes (GameStop, AMC, Tesla’s 2020 run) while ignoring the overwhelming base rate of high-short-interest stocks that simply grind lower. The availability heuristic amplifies this: the squeezes are dramatic, newsworthy, and memorable, while the steady drip of underperformance among heavily shorted names generates no headlines.
MSCI’s factor research through July 2024 found that stocks in the lowest-quality quintiles constituted the majority of stocks in the highest short-interest quintiles. Excluding highly shorted and low-quality stocks from the MSCI USA Small Cap Index would have materially improved its performance relative to large caps since 2007. Short sellers are overwhelmingly targeting weak fundamentals, and they’re usually right.
Practical implementation framework:
- Changes over levels. The SUSIR framework (Hanauer et al. 2023) indicates that unexpected increases carry more predictive power than the static short interest ratio. A stock moving from 5% to 12% short interest is far more informative than one sitting at 12% for two years.
- Integrate borrowing costs. Raw short interest without supply-side context is noisy. Combine it with equity lending fee data (available from S&P Global Securities Finance, formerly IHS Markit) or proxy using the short-interest-to-institutional-ownership ratio (SIRIO). The Schultz (2024) CRSP-based proxy methodology also provides an implementable alternative when real-time lending data is unavailable.
- Assess short selling efficiency. The Chen, Da & Huang (2022) SSE measure offers a market-timing dimension: when SSE is high (shorts are targeting the right stocks), expect lower forward market returns. When SSE is low (shorting capital is scattered or noise-driven), the correction of mispricing stalls.
- Post-2021 regime adjustment. Surviving high-conviction shorts post-GameStop carry stronger informational content. Factor this into your screening by giving extra weight to names where short interest remains elevated or increases despite the structural deterrent of squeeze risk.
- Equal-weight vs. value-weight caveat. The short interest effect is concentrated in smaller, less liquid stocks. For institutional portfolios with significant AUM, the raw signal is weaker in value-weighted terms. Capacity is limited – this is primarily useful for smaller allocators, single-name research, or as a screen to avoid rather than a signal to short.
2. Treating the stock market like a language can unlock a 2.85 Sharpe ratio
Generative AI for Finance: A New Framework (March 4, 2026) – Link to paper
TLDR
Researchers repurposed BERT – the AI architecture behind LLMs – to read the stock market as if it were a language, where each firm is a “word.”
The model tries to guess which company (word) comes next based on the context of its surrounding companies (words) in order to form a ‘sentence’ that makes sense based on its training.
First, what is BERT?
BERT stands for Bidirectional Encoder Representations from Transformers. It’s the AI architecture Google built in 2018 to understand language. Its trick: when reading a sentence, it looks at words both to the left and right of any given word to figure out what that word means.
If I say “I went to the bank,” BERT checks the surrounding context to determine whether “bank” means a financial institution or a riverbank. It reads the whole sentence at once, not just left to right.
Here’s how BERT learns: You take a sentence, hide a word, and ask the model to guess the missing word based on everything around it. Do this billions of times across massive text datasets and the model develops a deep ‘understanding’ of how words relate to each other in context.
The insight: does the stock market work the same way?
The researchers realized that a company’s risk profile doesn’t exist in a vacuum – it depends on where that firm sits relative to every other firm. Apple’s pricing is shaped by its position relative to Microsoft, Google, Amazon, and thousands of others across dozens of traits.
Just like a word’s meaning comes from its neighbors in a sentence, a firm’s expected return comes from its neighbors in the cross-section.
So they built a “market language.” Each month, they ranked every U.S. stock by a single characteristic – say, how cheap or expensive it is – creating a “sentence” where the cheapest firm is the first “word,” the next cheapest is the second, and so on.
They repeated this for 94 different characteristics (size, momentum, profitability, volatility, etc.), generating 94 “sentences” every month. Then they fed all of them into a BERT-style model they call RPBERT for “Risk Premia BERT.”
Why this is different from other quant models
Traditional models treat each company as an isolated math problem: plug in its characteristics, predict its return. RPBERT asks a fundamentally different question: “How much should I care about Firm B when I’m trying to understand Firm A?” It assigns attention weights to every firm-to-firm comparison, and those weights shift every month as the market evolves. No one tells the model which comparisons matter – it figures that out on its own.
How the model learns to price stocks
RPBERT works in two stages, borrowing directly from the LLM playbook.
In stage 1 (pre-training), the model studies ranked firm sequences from 1980–1989. It masks random firms and tries to guess which company belongs in each spot based on the surrounding companies – exactly how standard BERT learns language by filling in hidden words. This teaches the model how firms naturally cluster and relate to each other within characteristic rankings.
In stage 2 (fine-tuning), they attach a return-prediction layer on top of the BERT encoder. Each firm comes out of BERT as a list of 768 numbers – a numerical fingerprint that captures how that company relates to its peers across all 94 characteristic rankings. Now, when the model makes a return prediction error, that error signal flows all the way back through the system, adjusting not just the return predictions but also those 768 numbers for each firm. So the fingerprints keep getting refined until they capture exactly the peer relationships that matter for predicting returns.
When BERT processes a firm, it doesn’t just look at the market one way. It looks 12 different ways simultaneously. Each of these 12 “attention heads” decides, independently, which other firms are most relevant to the firm it’s analyzing. Think of it like asking 12 different analysts the same question: “Which other companies should I compare this one to?” The value analyst might say “look at firms with similar book-to-market ratios.” The liquidity analyst might say “look at firms with similar trading volumes.” The growth analyst might say “look at firms with similar earnings forecast revisions.”
That’s roughly what the model teaches itself to do. When the authors peeked inside, they found each head had gravitated toward a recognizable economic dimension. Some heads weighted firms that share similar analyst earnings forecasts. Others weighted firms with similar liquidity and trading cost profiles. Others focused on firms with similar capital expenditure and investment patterns. The model was never told these categories exist – it discovered them because they’re useful for predicting returns. The final output combines all 12 perspectives into a single pricing signal.
The results
Out-of-sample on U.S. equities from 1990–2023, the five-factor RPBERT delivered a total R² of 17.92% and predictive R² of 1.79%, well above the best prior benchmark (the autoencoder model at 13.95% and 0.57%).
Value-weighted decile spreads hit 2.54% per month with an annualized Sharpe ratio of 2.85. For context, above 1.0 is considered good, above 2.0 is exceptional. The tangency portfolio Sharpe reached 4.91.
The critical validation: when they shuffled firm rankings randomly – destroying the relational structure but keeping everything else identical – R² collapsed from 18% to 5% and predictive power went deeply negative. The ordering is the signal.
3. Active managers charge 17x more than their information advantage is worth
The Value of Information: A Puzzle (March 6, 2026) – Link to paper
TLDR
- Investors pay active managers 0.67% in fees, while the profit generated by these “informed traders” is only 0.04% per year
- The calculation is simple: multiply price changes by order flow each minute, then add them all up.
- Even on earnings days – when information value triples – the gap between what informed trading can generate and what active managers charge remains massive.
The 3 players in every stock market
To understand this paper, you need to know the three types of participants that show up in every market.
Informed traders: they have genuine insight into what a company is worth that isn’t yet reflected in the stock price. This isn’t illegal insider trading. It’s the hedge fund analyst who visits 200 retail stores and counts foot traffic before earnings, the quant team that builds a better revenue model from satellite imagery, or the macro trader who synthesizes public Fed speeches faster than everyone else. This is exactly what active managers claim to do and charge fees for: develop superior (but legal) insight through research, analysis, and expertise.
Noise traders: they buy and sell for reasons that have nothing to do with a stock’s true value. They need cash for a house, they saw a ticker on social media, or they’re mechanically rebalancing a portfolio. They’re essentially playing blind.
Market makers: the dealers in the middle. They set bid and ask prices, buy from sellers, sell to buyers, and try to break even over time.
How money flows between them
Informed traders make money because noise traders lose money. Every time a noise trader buys a stock and the price drops (or sells and the price rises), they bleed a small amount. Those losses flow to informed traders as profits.
If market makers are competitive – meaning they don’t extract rents – they just pass the money through. This means noise trader losses exactly equal informed trader gains.
Economists call the total informed trader gain the “value of information” – it’s the maximum that all informed participants combined can extract from the market in a given period.
Why this has been so hard to measure
If market makers themselves can’t tell informed trades from noise trades (that’s the whole premise of these models), how could a researcher possibly do it? You’d need to isolate noise trader losses, but you can’t observe noise trading directly. This has stumped the field for decades.
The clever trick that makes it possible
Kadan and Manela’s insight is that you don’t actually need to observe noise trading separately. Here’s why.
Remember, what we’re trying to measure is how much noise traders lose to price impact. That is, how much money they leave on the table when prices move against them right when they trade. We’d want to multiply price changes by noise trades each minute and add them up. But we can’t see noise trades directly. We can only see total order flow: informed trades plus noise trades, lumped together.
Here’s the trick. Consider what happens when you zoom into very short intervals – say, one minute. Informed traders spread their orders out slowly and steadily across the day. If they know a stock is undervalued, they don’t slam a giant buy order. They drip it in, a little each minute, to avoid tipping off market makers. Over a one-minute window, an informed trader’s contribution barely changes.
Noise traders are the opposite. Their orders are erratic and random. One minute they’re buying, the next they’re selling, driven by liquidity needs that have nothing to do with fundamentals. Over a one-minute window, noise trading jumps around unpredictably.
Now, price changes over one-minute windows are also jumpy and erratic because market makers are reacting to the unpredictable order flow they’re seeing. When a burst of buy orders comes in, the price ticks up. When a burst of sells comes in, it ticks down.
Here’s the key: when you do the multiply-and-sum using total order flow, the informed trader’s slow, steady drip contributes essentially nothing to the result. It’s too smooth to interact with the jittery price moves. All the signal comes from the noise traders’ erratic orders pushing prices back and forth. So even though total order flow contains both types of trades, multiplying price changes by total order flow gives you effectively the same answer as multiplying price changes by noise trades alone.
That means you can measure noise trader losses – and therefore the value of information – using only data you can actually observe: prices and total order flow. The actual calculation is surprisingly simple. For every one-minute interval during the trading day, you multiply the price change by the net order flow (positive for buying, negative for selling).
If there’s net buying and the price goes up, that’s money lost by whoever was buying into a rising price, i.e., noise traders getting hit by price impact. You add up all those products across every minute of the day, and the total is the day’s noise trader losses for that stock. That’s it: one multiplication and one sum, repeated across minutes.
An important nuance: this doesn’t capture every way to lose money in the market. Bad stock picks, poor timing, panic selling… none of that shows up here. What the calculation measures is one very specific thing: the wealth transfer from noise traders to informed traders via price impact. It’s the cost of trading against someone who knows more than you do, accumulated across every minute of every trading day.
What the numbers show
Using NYSE data across 8,300 stocks from 2003 to 2024, the average stock has about $3.5 million per year in total information value ($13,700 per day). Across the entire US equity market, that adds up to just 0.04% of market capitalization.
Investors pay active fund managers – the people who promise to beat the market using superior research – about 0.67% of market cap in fees every year (French, 2008). That’s a 17:1 ratio. It’s like paying $17 for a lottery ticket where the maximum prize is $1.
The measure passes every sanity check. Information value rises during crises like 2008 and COVID, when uncertainty makes private knowledge more valuable. It’s higher for large-cap growth and momentum stocks. On earnings announcement days it triples, but earnings days are only 1% of trading days, so even a strategy that exclusively targets those high-value windows falls far short of justifying active fees.
Why the puzzle won’t go away
The authors systematically closed off every escape route. If market makers aren’t perfectly competitive and keep some profit, informed traders get even less, so the puzzle deepens. If informed traders are risk-averse, they’d pay less for information; worse again. In discrete time (real-world trading), the measure becomes an upper bound, meaning the true value is if anything even smaller. Fee compression since French’s 2008 estimate helps somewhat, but not nearly enough: the 2024 active-passive fee gap is still 59 basis points for mutual funds and 30 for ETFs.
The bottom line
The equity market’s total “information rent” is remarkably small. For allocators, the implication is this: the aggregate pie available to all informed traders combined is roughly one-seventeenth of what investors collectively pay trying to get a slice
Either most active fees compensate for something other than information edge, such as convenience, hand-holding, portfolio construction, or there’s a large class of behavioral traders systematically losing money on stale news, subsidizing the ecosystem in ways this measure doesn’t capture.
Either way, any manager claiming an information-based strategy needs to clear a very high bar to justify their fee.
Disclaimer
This publication is for informational and educational purposes only. It is not investment, legal, tax, or accounting advice, and it is not an offer to buy or sell any security. Investing involves risk, including loss of principal. Past performance does not guarantee future results. Data and opinions are based on sources believed to be reliable, but accuracy and completeness are not guaranteed. You are responsible for your own investment decisions. If you need advice for your situation, consult a qualified professional.
Until next time,
The Academic Signal Team