Solutions
Products
SOCIAL TEXT FEED

Social Media Text for Finance

Raw, cleaned, and fully-tagged social media text from 30+ global investment sources and hundreds of millions of authors. Built for systematic alpha generation and financial generative AI, with point-in-time history back to 1998.
Coverage
30+
Global Sources
27yr
History
<60s
Latency
1.5M+
Daily Messages
140K
Mapped Entities
4,000+
Event Types
Performance
<60s
Standard latency
20s
TruthSocial delivery
1,000+
Business tweets/min
1998
Earliest data vintage

Research Findings

Source Quality Determines Alpha
Sentiment signals are not equal across media sources. A 2020–2026 monthly study of Russell 3000 equities ranked stocks by their prior 30-day sentiment and binned them into deciles. The top-bottom sentiment decile spread was tracked for the following 60 days, revealing divergence by source.
Twitter (X)
+0.70%
Strongest, most durable signal. Smooth monotonic curve over 60 days. History back to 2013.
Legacy News
+0.19%
Early momentum signal, plateaus by week 3. Useful benchmark context for social feeds.
Reddit
≈ 0%
Near-zero spread in aggregate. Signal requires selective sub-community filtering. Back to 2019.
Stock Msg Boards
-0.65%
Inverse return signal. Yahoo! message boards data available back to 1998.
Decile spread by media source, 2020-2026 Russell 3000 studyDecile spread by media source, 2020-2026 Russell 3000 study
Twitter sentiment shows the highest and most durable positive decile spread, while Yahoo! Finance stock message boards exhibit a pronounced inverse signal that can be exploited as a short signal.

Data Tiers

Three Levels of Processing
Social Text is available at three levels depending on your existing NLP infrastructure, from raw text ready for your own models to fully enriched, tagged feeds.
Tier 1
Raw Text
Unprocessed posts with full metadata. Maximum flexibility for teams with proprietary NLP pipelines.
→ Full post text, unmodified
→ Author, timestamp, source metadata
→ Point-in-time delivery (no look-ahead bias)
→ Conversation IDs are provided
→ 30+ source coverage in English

X / Twitter Coverage

Depth of Coverage on the Highest-Signal Source
Material from X / Twitter generates more than 1.5 million daily finance-relevant messages, captured through three structured source streams.
1.5M+
Finance-relevant messages per day
350+
Individual influencer handles
High-signal financial commentators, analysts, and market participants
400+
Financial search terms
Economics, IPO, investment, stock market, Fed, earnings, and more
1,000
Business tweets per minute
Continuous real-time ingestion with latency under 60 seconds

Historical Coverage

Point-in-Time History Since 1998
Over 25 years of clean, point-in-time data across major investment social sources. The early archive was inherited from dot-com era company Codexa (1998-2002) and then continued by our own team.
1998
Yahoo! Finance
Earliest vintage in the dataset
2010
StockTwits
Specialist investor micro-blogging
2014
Twitter (X)
Robust since 2014
2019
Reddit
WSB, investing, stocks and more
Varies
30+ Others
Global investment social media · English & French
Please Note: this feed was designed to capture investor and trader conversations, with less emphasis on retail product monitoring. Tik-Tok, Instagram, and Facebook content is not included. YouTube and podcast transcripts are available upon request as an add-in.

Delivery Options

Three Endpoint Options
The Social Text feed is available via two API schemas, designed to integrate with systematic trading pipelines of any complexity.
01
Articles Endpoint
Full social media text articles plus all associated metadata. Maximum data completeness for custom NLP ingestion pipelines.
Full text + metadata
02
Radar Database API
Cleaned articles tagged at the sentence level. Structured schema with entity mappings, topics, events, and sentiment scores — production-ready.
Sentence-level tagging
Powered by the Engine Behind Yahoo! Finance & the iPhone Stocks App
Our NLP engine maps 140,000 entities to universal asset identifiers. It the same technology powering Yahoo! Finance, the native iPhone Stocks app, and numerous top-tier systematic desks. In the tagged text your team benefits from point-in-time entity aliases, updated weekly.

Legal & Compliance

Designed for Compliance
MarketPsych Social Text is built on publicly available data obtained via ethical web scraping practices. The legal framework governing social media data collection has been substantially clarified by recent U.S. federal case law and executive policy, and our feed has been independently validated through the compliance review processes of large global institutional investors.
Due Diligence
Multiple large global hedge funds and systematic asset managers have independently submitted this product to their compliance and legal teams as part of standard alternative data due diligence. In each case, the feed has passed review and been approved for production use.
U.S. Executive Policy | 2025
The Trump administration’s January 2025 Executive Order on Artificial Intelligence (Removing Barriers to American Leadership in Artificial Intelligence) directed federal agencies to eliminate regulatory obstacles to U.S. AI competitiveness. This deregulatory posture was elaborated further in public remarks at the Hill and Valley Forum in July 2025, where President Trump stated:
“[I]f you read an article and learn from it, we have to allow AI to use that pool of knowledge without going through the complexity of contract negotiations, of which there would be thousands for every time we use AI.”
— President Donald Trump, Hill and Valley Forum, July 2025
While distinct from the CFAA question settled in hiQ and Bright Data, Trump’s stance reflects the same policy logic: that publicly posted content should be accessible to AI and data systems as a matter of U.S. competitiveness. MarketPsych collects only public social media posts via ethical web scraping practices. Clients should assess applicability with their own legal counsel.

Get in Touch

Ready to integrate Social Text?
Connect with our data team to arrange a trial, discuss coverage requirements, or explore delivery options for your infrastructure.