Predata uncovers predictive behavior hidden in web traffic. Predata provides measures of online attention for thousands of topics spanning currencies, equities, commodities and macro themes, as well as coverage for over 180 countries. This dataset offers insights into supply and demand concerns for commodities as well as off-the-radar shifts in geopolitical and macro financial trends that could impact the U.S. Treasuries and currency markets.
Predata signals measure anomalously large degrees of engagement with a particular thematic group of web sources (viewing, editing, commenting, publishing, etc.) relative to recent patterns of activity seen for those sources.
In order to quantify online engagement, Predata uses a combination of anomaly detection algorithms (to determine days when a source is receiving high engagement) and the overall count of users engaged with a source. Predata’s algorithm automatically assigns weights the component web sources in a signal each day, based on simple statistics calculated from historical traffic.
The weight of a source depends only on the comparison of the source’s current and historical traffic patterns. This lets Predata signals naturally track the most important dimensions of online activity at any given time without the need for a human to rebalance the components. The raw data is smoothed with a Kalman filter to produce de-noised estimates of engagement counts.
The final weight for each web source is the raw output of the anomaly detection process (on a 0-1 scale) multiplied by the natural log of that source’s traffic (measured in terms of raw engagement counts). The signal is then normalized on a 0-100 scale, where a zero value represents no anomalous attention to any source in a signal’s entire range of underlying sources, whereas a 100 value represents the highest level of attention to the range of sources at that particular point in history.
The data is typically used as an input into a multi-factor trading model. Predata’s signals are not meant to serve as the sole basis of predicting future trends in the price of financial products. Predata uses only publicly-available web sources and does not track personally-identifiable information (PII). As a result, there’s no risk of breach of Material Non-Public Information (MNPI) policies and regulations.
Files will be delivered in .csv format daily.
Less than 1 MB.
The number of files you receive depends upon the amount of products you are subscribed to. If you are subscribed to one product, you will receive one file a daily. If you are subscribed to all of the products (eight), you will receive eight files daily.
Predata signal data is delivered once daily per product.
Files are delivered in the range of 5 a.m.-7 a.m. Eastern Time daily.
No, the files will not be compressed.
Sample data sets will be available upon request. Please visit CME DataMine for further information.
Data typically goes back to 2010.
Yes, the following dates are missing across data sets: “2010-01-24", "2010-07-08","2010-07-09", "2013-10-24", "2014-12-04","2014-12-25","2015-05-09".
Other than the above cited dates, there should not be any missing dates in the datasets; however, on dates with no attention anomalies the values will be zero.
Please email platformsolutions@cmegroup.com to obtain additional materials on our signal data.
Predata’s data can be used as an orthogonal input into a trading system via ingestion of signal data from CME Datamine or into asset-pricing models using the daily values sent via .csv files. For more insight into the individual sources of attention underlying any particular signal and the languages driving interest, please reach out to contact@predata.com.
CsvHeaderName | Definition | Typical Value |
---|---|---|
date | Calendar date for the 12 AM-11:59 PM ET window prior to the date of file availability in which online attention activity is aggregated into a signal. | O1/11/2020 |
Signal Name | The topic for which online web activity is being measured. For example, the Corn Producers signal measures online interest in major corn producers, etc. | Federal Reserve: FOMC: Hawks, Corn Producers, etc. |