Frequently Asked Questions: Predata

Who is Predata?

Predata uncovers predictive behavior hidden in web traffic. Predata provides measures of online attention for thousands of topics spanning currencies, equities, commodities and macro themes, as well as coverage for over 180 countries. This dataset offers insights into supply and demand concerns for commodities as well as off-the-radar shifts in geopolitical and macro financial trends that could impact the U.S. Treasuries and currency markets.

What data does Predata offer via DataMine?

Predata offers a range of country-focused, thematic, and product-specific online attention signal data across eight product suites: Treasuries, Eurodollar, Oil Products, Natural Gas, Precious Metals, Base Metals, Grains, and Livestock.

 

How is this data calculated?

Predata signals measure anomalously large degrees of engagement with a particular thematic group of web sources (viewing, editing, commenting, publishing, etc.) relative to recent patterns of activity seen for those sources.

In order to quantify online engagement, Predata uses a combination of anomaly detection algorithms (to determine days when a source is receiving high engagement) and the overall count of users engaged with a source. Predata’s algorithm automatically assigns weights the component web sources in a signal each day, based on simple statistics calculated from historical traffic.

The weight of a source depends only on the comparison of the source’s current and historical traffic patterns. This lets Predata signals naturally track the most important dimensions of online activity at any given time without the need for a human to rebalance the components. The raw data is smoothed with a Kalman filter to produce de-noised estimates of engagement counts.

The final weight for each web source is the raw output of the anomaly detection process (on a 0-1 scale) multiplied by the natural log of that source’s traffic (measured in terms of raw engagement counts). The signal is then normalized on a 0-100 scale, where a zero value represents no anomalous attention to any source in a signal’s entire range of underlying sources, whereas a 100 value represents the highest level of attention to the range of sources at that particular point in history.

Are there risks in using the data?

The data is typically used as an input into a multi-factor trading model. Predata’s signals are not meant to serve as the sole basis of predicting future trends in the price of financial products. Predata uses only publicly-available web sources and does not track personally-identifiable information (PII). As a result, there’s no risk of breach of Material Non-Public Information (MNPI) policies and regulations.

What is the file format of this data?

Files will be delivered in .csv format daily.

What is the average daily file size?

Less than 1 MB.

How many files are available per day?

The number of files you receive depends upon the amount of products you are subscribed to.  If you are subscribed to one product, you will receive one file a daily.  If you are subscribed to all of the products (eight), you will receive eight files daily.

What is the delivery frequency of the data?

Predata signal data is delivered once daily per product.

What time will the files be delivered each day?

Files are delivered in the range of 5 a.m.-7 a.m. Eastern Time daily.

Are the files compressed?

No, the files will not be compressed.

Are sample files available?

Sample data sets will be available upon request. Please visit CME DataMine for further information.

How far back historically does each dataset go?

Data typically goes back to 2010.

Are there any missing dates to the data I should know? If so, what are they?

Yes, the following dates are missing across data sets: “2010-01-24", "2010-07-08","2010-07-09", "2013-10-24", "2014-12-04","2014-12-25","2015-05-09".

Other than the above cited dates, there should not be any missing dates in the datasets; however, on dates with no attention anomalies the values will be zero.

Where can I find collateral on how to understand this data?

Please email platformsolutions@cmegroup.com to obtain additional materials on our signal data.

Is there a certain process I must use to be able to use the data?

Predata’s data can be used as an orthogonal input into a trading system via ingestion of signal data from CME Datamine or into asset-pricing models using the daily values sent via .csv files. For more insight into the individual sources of attention underlying any particular signal and the languages driving interest, please reach out to contact@predata.com.

How is the data structured?

CsvHeaderName Definition Typical Value
date Calendar date for the 12 AM-11:59 PM ET window prior to the date of file availability in which online attention activity is aggregated into a signal. O1/11/2020
Signal Name The topic for which online web activity is being measured. For example, the Corn Producers signal measures online interest in major corn producers, etc. Federal Reserve: FOMC: Hawks, Corn Producers, etc.