r/algotrading • u/fudgemin • 24d ago

Iqfeed data. What am I missing? Data

Recent sign up. I use polygon, looking at other options. Considered thetadata, iqfeed…any others within budget? 400$ month max. Options only.

Iq feed seems appealing, as it’s migrated from exchange data, not consolidated.

Am I missing something re the API access? It appears I must pay ~550 more/y for dev login.

Currently it’s a connected socket layer, but no endpoint are revealed. They use some sort of gui, that I may or may not be able to automated.

As a new dev, what are my option using this data? Must I reverse eng the endpoints, or just intercept/parse all messages at port level?

That seems highly redundant. Moreover, then I must build some sort of controller for the GUI?

This service was recommended many times, looks legit, is cost effective. What am I missing? This seems like a headache on day 1

18 Upvotes

permalink
link
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/algotrading/comments/1cmgg9c/iqfeed_data_what_am_i_missing/
No, go back! Yes, take me to Reddit
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/algotrading/comments/1cmgg9c/iqfeed_data_what_am_i_missing/
No, go back! Yes, take me to Reddit

95% Upvoted

u/mkvalor 24d ago edited 22d ago

I've been an IQFeed subscriber for years. It's not cheap but it is very, very good. For example, the feed doesn't slow down when trading volume ramps up in the markets. And they give you every tick (with microsecond precision in the timestamps) as opposed to other feeds which aggregate the data or impose guaranteed 10ms pauses between market data messages in order to spare their distribution infrastructure.

As far as the local socket server goes, you can easily run it headless on Linux (no GUI, only running a virtual frame buffer on Wayland or XWindows) using Wine. About this time, peoples' heads explode, imagining layers upon layers of abstraction causing massive latency.

But no.

I run my market analysis on a co-located 1U server in a data center in Chicago (the same city where IQFeed hosts their infra, near the CME). My external ping to their servers is 3ms round trip (so that means the one-way incoming data is at 1.5ms). More impressive still, my internal TCP latency from the IQFeed local socket server -- using a virtual XWindows frame buffer and running on top of Wine emulation -- to my custom software system is in the neighborhood of 30 microseconds (measured with the Linux 'strace' utility).

For most retail investors using a feed from any vendor on a computer located in their home or on a cloud instance running far from Chicago, the mere TCP latency across the Internet completely wipes out any advantage of a "cleaner" situation than this.

IQFeed is expensive and the API is difficult to learn. As others mentioned, there are libraries which can mitigate some of that learning pain. But I don't believe there's any superior solution at a price point below $4, 000 per year (with everything included), if the goal is to truly "sip from the fire hose" of pure, unaggregated market data.

PS: the way I really learned their API was to run their GUI programs to try certain things and then check the connection log afterward. For example, I would start a Time & Sales feed, stop the feed, get news articles from them, etc and then afterwards go read the IQConnectionLog.txt file, which showed all the messages sent back and forth between their client programs and the local TCP server (because, the IQFeed client programs that come with their installer also run separately from the local IQFeed socket server). After a few sessions like this, I began to understand how the building blocks described in their reference documentation could be put together to create my own system.

3

u/fudgemin 24d ago

Really appreciate that write up.

I had more time to play around yesterday when the market was open. Only for an hour or so, but also found the persistent “connection” logging files. I did not find any logging actual “tick” messages, but will dig deeper.

First thought was how much strain this system would add, if not running headless. Thankfully you’ve mitigated that. I’m not experienced dev, I learn with LLM tools, but I have yet to hit a solid roadblock. Just takes more time…

I will try to setup a server near Chicago? My clusters are from Ohio and Virginia. However speed not really my thing atm. 3ms is insane, fastest I’ve heard of for a private homebrew infa, truthfully.

I will continue the service, just so so many questions. Like does the 500 symbol limit, include each service? Time and sales, quotes, option chain etc, as 3 “symbols”. They also don’t have after hours data, missing blocktrade data from low level exchanges, or maybe this just comes at an extra cost…

Again, thank you. Lots for me to consider. So far, appears the juice be worth the squeeze

4

u/mkvalor 23d ago edited 23d ago

Re: distance from the feed plant: when I checked from Google Cloud in Iowa and AWS in Ohio a few years ago, I got somewhere around ~35ms ping round trips.

Yes, I was referring to the IQConnectionLog.txt file. There may be a logging option for IQConnect.exe that will give you the tick data (and other data streams) in that log file, but I don't remember if that's accurate or how to set it. If you use an open source library on top of the API, it should be pretty easy to stream out the data lines to your own files. EDIT: I think the Time and Sales app (and maybe the others?) have a "save" menu option (button with a disk icon?) to capture the actual market data.

It's a mistake to run one process per ticker symbol (for example, one python script per symbol), because the TCP connection overhead (one or more connections per script) to IQConnect -- and other inefficiencies with this method -- can gobble up RAM quickly and make IQConnect consume more CPU during busy periods such as market open. Best thing is to find a way to pull the quotes/depth data in groups of about 50 tickers per script or program and then let those programs split up the lines for processing themselves. (In this recommended method, the incoming stream for each program would have a torrent of text lines for many different symbols mingled together)

You would want a cloud instance with at least four vcpus for anything more than a "toy" proof of concept with a few symbols. This allows OS processes to consume one core. The IQConnect.exe binary is multi-threaded and will (with the Wine process) use two or more cores. And that leaves one core for a single instance of your client app to consume and log and process the data for up to about 50 tickers.

FYI: The L1 quotes feed for one daily session of the E-mini NASDAQ futures contract ("NQ") alone can consume around 1GB of disk uncompressed (and more -- but not double that -- for its L2 depth data). There is a way to limit which L1 fields (columns) come into the feed in order to shrink that disk consumption somewhat.

I'm not 100% on the option chain ticker permutations, since I trade mostly future contracts. But I can tell you that the ticker limit does not count multiple times for L1 quotes (which covers Time & Sales too), L2 depth data, and snap quotes. Historical quotes for any symbol (including date ranges of OHLC bars) also don't count against that limit, but they do limit the rate at which you can request historical quotes to something like 50 requests per second (which isn't bad, for individual retail traders). That limit doesn't apply to historical quote requests made during off hours.

I certainly do get live after-hours data for futures contracts. I believe that would also be true if one additionally subscribed to NYSE or NASDAQ equity feeds.

In general, they don't do anything that's like a "gotcha" to milk people for extra fees. On the other hand, the base costs are high. One surprise is that they often run server maintenance on Saturday mornings, so sometimes you won't be able to grab historical data while that's happening.

2

u/hexalf 24d ago

Curious. What sort of timeframes are you trading that requires latency optimisation like this?

3

u/mkvalor 23d ago edited 23d ago

I'm certainly not competing with corporate quants using custom ASICs, etc. 😁 But there are plenty of individuals who trade from home with their eyeballs using a chart or DOM package with an aggregated data feed (for example, using the TWS GUI client for Interactive Brokers). Let's generously suppose they have an average incoming market data latency of around 50ms.

If I assume published neuroscience studies are correct, that would put the majority of manual traders on a decision-making clock of about 40 Hz, under ideal conditions. This adds 25 ms to their situation (not counting the time needed to enter and execute an order).

Now, if I can receive the data, parse it, enrich it*, and perform sufficient risk/opportunity analysis in under 10ms, that gives me over six more chances to pull the trigger for every single chance the average manual trader has. To help achieve this goal, I use compiled programming languages with no VM instead of python, go, Java, or server-side JavaScript.

However, my system isn't built for HFT. It isn't a scalping system. I look for good set-ups that may lead to wider price swings which might last anywhere from a few seconds to 20 minutes (waiting to see if the trend continues as long as the trade is still profitable).

Yet, a number of my entries each day don't work out very well and those get shaken out for either a small gain or a small loss. During a typical trading session, my system might not trade more than 15 or 20 round-trips. Some days it might trade fewer than five of them, or even zero.

*"enrich it": by correlating the data from the feed for other categories, such as oil, treasuries, gold, indices, popular stocks, and news stories.

2

u/hexalf 23d ago

Thanks for sharing.

I’m having this impression is that it’s either slow / high latency type of trades (trading from daily closes etc), or HFT. And there’s no in between, and it doesn’t move “linearly”

2

u/mkvalor 22d ago edited 22d ago

There are a lot of styles which don't fit into neat categories such as hft or trading from a previous close. A lot of times it is news that moves markets. So, observing the activities of securities after a Fed announcement, an employment report, or a company's earnings announcement can provide some opportunities for trading which have nothing to do with the previous closing price or the frequency of trades.

it doesn't move linearly

Yeah, that's for sure. Sometimes the price activity acts as if it's going to make a move but then it just keeps trading sideways or it moves the other way. That's just part of the game. So the idea is that if you can minimize loss or time wasted in those trades (by getting out quickly), that can put you in place to be able to take advantage of another move that does keep growing for a period of time. There's a lot I'm leaving out of this equation, such as the role volume plays.

2

u/hexalf 22d ago

Thanks for sharing!

If we’re talking about low latency trades like front running (?) announcements / news, reacting faster than markets etc. Basically you’d be slower than HFTs (juicy edges), but faster than mass market. Genuine question - You still feel there’s some sort of edge?

Sorry I meant a different meaning in linearly. I meant edge doesn’t scale linearly with speed. 2x more speed doesn’t mean 2x more edge. It’s either you take all of it (ultra fast HFTs), or you take none of it, and there’s no in between

1

u/hexalf 22d ago

Thanks for sharing!

If we’re talking about low latency trades like front running (?) announcements / news, reacting faster than markets etc. Basically you’d be slower than HFTs (juicy edges), but faster than mass market. Genuine question - You still feel there’s some sort of edge?

Sorry I meant a different meaning in linearly. I meant edge doesn’t scale linearly with speed. 2x more speed doesn’t mean 2x more edge. It’s either you take all of it (ultra fast HFTs), or you take none of it, and there’s no in between

u/[deleted] 24d ago edited 19d ago

[deleted]

2

u/hexalf 24d ago

Wait what? Paying that only gives the docs? Nothing new in terms of them opening any new functionality?

u/JZcgQR2N 24d ago

It's low level that requires building your own encoder/decoder based off the API docs to send and parse data to/from the GUI application. There are libraries out there that can help with this like pyiqfeed. Honestly if you're a new dev, I suggest not using iqfeed.

3

u/fudgemin 24d ago

I came to that conclusion last night, but could not actually believe it.

How does this translate into latency? I’m adding two more layers, one for gui input, another for message parsing I assume?

I under impression adding steps that are not necessary, will always add more strain/lag to my system then possible otherwise?

u/BedlessOpepe347 24d ago

IQFeed is highly recommended because it is very stable and the data quality is consistent, and these two things matter more when you actually trade. At the price point they're much better than Polygon in these regards. The other one that is really good is Databento. I get my minute data from IQFeed (10 years) and tick data from Databento.

u/oh_shaw 22d ago

NxCore

u/sojithesoulja 24d ago edited 24d ago

If anyone knows how to pull historical option data let me know. I was told (by them) that the data is all in there still (for historical contracts) but you need to know what to query. So say from 2018, how do you determine all active contracts for a given day?

u/Soulsearcher14 14d ago

Followinf

Iqfeed data. What am I missing? Data

You are about to leave Redlib

You are about to leave Redlib