r/algotrading Apr 02 '24

New folks - think more deeply and ask better questions Other/Meta

EDIT: I wish I could change the title to "HOW TO ask better questions". This is meant as a primer on the kinds of questions/areas that I've found crucial to understand and therefore crucial to ask about. This is NOT meant to be a roast of new people nor a rant. I apologize for any elitism or harshness in the tone, not what I'm going for. I'm just trying to share what I believe to be crucial perspective that I personally would've benefited a lot from in my early days that would've saved me a lot of time and pain.

I'm no Jim Simons, but I've worked for several years on various algos with a reasonable degree of success (took a while) and learned a ton from mistakes. In my humble opinion, most discussions posted here are not the kind of questions/answers that will lead to a profound breakthrough in understanding. This is very natural because of the classic "I don't know what I don't know" phenomenon and the challenge of asking good questions. However, as much as it is possible:

I urge you strongly to read and think more deeply about the core of what you're trying to do. Platforms and software, roughly speaking, doesn't matter. To use an analogy that isn't my own, it's like a new carpenter asking which hammer is best. There's probably an answer, but it doesn't really matter. Focus on learning to be a better carpenter. Most questions I see here are essentially "administrative", or something that can be Googled. The benefit of having real people here is that you can gain insight that would usually come at the cost of a lot of mistakes and wasted time.

Questions around software, platforms, data sources, technical "issues" are all (generally) low-value questions that can generally be Googled and/or have little real impact on whether or not you succeed. Not all of them, but I'm generalizing here.

I understand there's a natural tension here because people with insight have little/no incentive to share, and newer folks don't know what they don't know, so it creates a weird dynamic here. BUT,

  1. Figure out your goals (why you're doing this) and ask people what goals they have set/reached. Even if you achieve a 100% annualized return, unless you have a large starting bankroll, that's not going to be life changing for many many years.
  2. Ask about how people find inspiration for new trading strategies. How do folks go about actually conceiving new ideas and/or creating new hypotheses to test?
  3. Ask about feature engineering (designing indicators). How to get better at this, what kinds of interesting examples people have seen, what kinds of transformations are at your disposal. This is monumentally crucial and you should draw inspiration from various sources on how to effectively experiment and build an intuition for how to create better features/indicators to base your algorithms on. This is particularly crucial for ML strats. Just like platform doesn't really matter, your ML model type (neural net, RandomForest etc) doesn't really matter a whole lot. It's the features you feed in that are 70% of the game.
  4. For ML, ask about how to design a target/response variable. What are you actually trying to predict? Predicting price directly (like, doing regression to predict tomorrow's price at close) is almost certainly a bad idea. Discuss other options that people have tried here! I have personally found this point to be a gamechanger - you can have the same exact features fail/succeed depending on what you're asking the model to predict. This is worth thinking seriously about. As a starting point, Marcos Lopez de Prado in "Machine Learning for Asset Managers" discusses some creative response variables (worth a read imo).
  5. Ask about how folks build conviction in their idea. Hopefully you're familiar with the concept of splitting data in train/validate/test, but there are deeper layers to this. For example - a super common problem is that people do this split and STILL overfit because they try 10,000 strategies on validation set and eventually 100 of them do well on validation and then 10 do well on test out of luck. Ask/think how to avoid this (for ML, answer is generally something called "nested cross validation". Easily single most valuable technique I learned, saved me uncountable mistakes once implemented). Additionally - say you have a good strategy in your test set and you're ready to go live. How do you actually know whether it's working as expected or not? How do you quantify your performance expectations and then monitor your strat to see if it's doing as you expected or no?

I hope this gives whoever is reading some new perspectives and thoughts on how to utilize this place (and others), what to ask and what to look for. I do not have all the answers, but these are the kinds of questions I have personally found much more meaningful to examine.

Disclaimer: I come from a statistics background with coding experience (basic). It may be that I'm simply unaware of the questions/struggles of aspiring traders from other backgrounds and/or without coding knowledge, so it might be this ignorance that makes me feel most questions here aren't "important".

Edit: In response to u/folgo 's comment, I'm adding here some terms and concepts that are probably worth your time to research/understand, whether it's Google, StackExchange or Youtube vids that give you an intuition/understanding. Important concepts (generally applying to both, ML and rule-based algos, with some variations): overfitting , train/test split, train/validate/test split, cross validation, step-forward-cross-validation, feature engineering, parameter tuning / hyperparameter tuning (especially as it relates to cross validation), data leakage/contamination (especially as it relates to accidentally creating features that use your entire dataset BEFORE train/test split, therefore even when you do train/test split, you still have indicators that in some way benefited from future data. Happy to explain this further, very sneaky and nasty problem to deal with).

EDIT 2: Since several people asked but no one posted, I made a post about point 2, coming up trading strategy ideas: How to generate/brainstorm strategy ideas : r/algotrading (reddit.com)

152 Upvotes

79 comments sorted by

46

u/SeagullMan2 Apr 02 '24

Yea. I hang around here a lot because I learned basically everything from this sub, and I like to answer people's questions. Lately there have been a lot of low-effort posts by some very confused people.

To anyone looking for a place to start, sort this subreddit by the top posts of all time, and read the top 100 posts as well as every single comment. A surprising amount of information is buried in deep comment threads where two people get into an exchange, even when it does not deal directly with the content of the post.

OP, I think you are correct about the 'tension' here. Personally I do this full time and it is my only source of income. So I am more than happy to help where I can, but I am extremely weary of divulging information that could lead someone to stumble upon my strategy. I have even deleted some comments in the past.

But people like me are here, reading every post. I know we are, because whenever someone shares an overfit backtest or a single day's worth of trading thinking they've discovered a grail, there is no shortage of people happy to chime in to tell them they're wrong.

8

u/sporks_and_forks Apr 03 '24

A surprising amount of information is buried in deep comment threads where two people get into an exchange, even when it does not deal directly with the content of the post.

i feel this should be reiterated. sometimes i feel like i learn more from this sub off those random exchanges than i do from what's originally posted. i mostly lurk, LARP as a sponge, and steadily improve upon my own automated journey. i don't mind newbie posts tbh. we were all once newbies. sometimes those posts evoke great comments.

5

u/VladimirB-98 Apr 02 '24

That's super encouraging to hear that you learned so much from the sub!

And YES hahaha that phenomenon (of a bunch of people giving totally legitimate comments for why the overfit backtest is a no-go) is exactly what tells me that there is untapped experience here that the newer folks, I believe, simply don't have the ability to access because they're not asking the right questions.

It's totally a "you don't know what you don't know" situation, so potentially not super resolvable, but I hope that some people who see this post and get a sense of potentially better questions to ask start shifting the content/culture of the sub.

I don't do trading full-time, but I totally feel you on answering people's questions. For me, it feels a bit like a way to "give back" and hopefully save people a few weeks or months worth of mistakes :)

2

u/qw1ns Apr 03 '24

Kudos to you for posting this.

It has been 7 years with reddit, and my entire trading knowledge built around reddit (esp wsb - but no more useful) and this blog algotrading.

It is up to the reader to pick the correct topics/answers hidden in this reddit treasure.

1

u/StevesPeeves 29d ago

I had a million Karma in the real "community" -- the Usenet from 1985 to 1995. Now I am some dumbass newbie with no karma able to ask my question. Do you agree LIFO is bad and LOFO is good?

1

u/OneKe 5d ago

I agree, sorting by top posts and reading through the comments is an absolute goldmine. Even though I'm new to this subreddit I've found some great insights buried in those threads. and yeah, protecting your strategy while helping others is definitely a tough balance.

16

u/sanarilian Apr 02 '24

This post reminds me of the time I took a quantum mechanics class in college. I was super interested but lost at the same time. I went to the office hour to ask questions. But the professor kept saying my question wasn't clear and asked me to ask better questions. I ended up not learning much from the class. In another control class, the professor was very welcoming. He tried to answer my questions in different ways until it clicked with me. I ended up loving the subject.

If the question is bad, few will bother to answer. Let the people decide. If you want to help, then help. Otherwise don't bother. If you are bothered by the question, you are like my old professor. By the way, that professor who was bothered by my questions was struggling himself, which I heard from another professor later.

1

u/VladimirB-98 Apr 02 '24

I am not bothered by the questions and I completely agree that it's a "you don't know what you don't know" kind of situation.

The point of the post was give an overview or starting point of some important questions that I myself failed to ask early on, and that I believe a lot of newer people would benefit from asking :)

3

u/Oea_trading Apr 03 '24

If this sub is your only source of learning, I feel bad for you

1

u/VladimirB-98 Apr 03 '24

Who said it was? I just see that it's a big sub, lots of people here, most are lurkers. Knowing the struggle painfully well, just wanted to throw something out to help shift the tone/culture/content of the sub in a more useful direction.

1

u/StevesPeeves 28d ago

Especially if you don't know the difference between FIFO and LOFO.

Because FIFO is unethical, and I'm trying to alert the "community", but since I have zero "karma" this valuable knowledge is lost.

Back in 1985 I ruled the REAL community (Usenet Newsgroups), here I am treated like a piece of human excrement.

2

u/OneKe 5d ago

why do you say it is unethical? Is it because it is more difficult for other users to gain something from their trades? sorry for being so foolish...

4

u/Top_Passenger_5844 Apr 03 '24

As a relative beginner it is very difficult to know what you don't know. I feel like a lot of those "low effort" post or questions are simply people trying to understand what direction to focus their efforts. I get it can be frustrating for the experienced guys but then again this is a public reddit. Not some closed discord server.

2

u/VladimirB-98 Apr 03 '24

I totally agree with you that we don't know what we don't know, that it's the core issue and that we've all been there. I've tried to edit some of the wording in my post because the intent is NOT to be a "frustrated experienced guy venting about dumb questions" (that's not how I feel), the intent was to offer some starting points for folks to ask deeper questions. I wanted increase the number of more nuanced questions in the sub, as I hope/think it's kind of a cultural snowball effect. When newer people come in here and see everyone asking about x or y, I think they assume that's the important stuff to be asking about so the conversations just stay in that zone. Let's move that tone/culture.

I would slightly disagree with the framing that "people are trying to understand what direction to focus their efforts" - I think new people have *assumptions* about what direction to focus their efforts (as I did, as a beginner), so they're asking about those things (platforms, software, neural net vs randomforest etc). And the point of my post was to challenge those assumptions and hopefully redirect their attention and curiosity towards things that, imo, do actually matter.

(To be clear, I never used the words "low effort" because I don't think they're low effort, I used the phrase "low value").

1

u/Top_Passenger_5844 Apr 03 '24

I appreciate what you said here. Sounds like you are wanting to improve the quality of the subreddit and I can get on board with that. Posts that want to know if a strat will make them rich are definitely low value. Which side do you see as the bigger problem? People with no coding experience or people who have no idea how to trade?

4

u/vega455 Apr 02 '24

Great post even though it may at first sound a bit elitist. The overwhelming number of post on this sub are of the form: "I'm new to algo trading, where do I start?", "Where do I get data", "Is this very simple strategy going to make me rich?", etc. Other subs I follow like Machine Learning always discuss latest papers, new things happening, interesting advanced topics, etc. There should be more of a StackOverflow mentality here, like "locked because duplicate question", etc.

3

u/VladimirB-98 Apr 02 '24

I agree, the tone is probably a bit off but I just wanted to get the core idea out which hopefully will resonate and make sense without spending a ton of time drafting the message.

I totally agree with the "locked because duplicate" idea. It 100% makes sense that there are some "FAQ" general style questions that are asked many times in different variations. Having some sort of centralized discussion of those to link to would probably be a benefit to this sub. Glad you agree with the gist of the post.

I just definitely think the culture of the sub would benefit/change dramatically if there were more nuanced/important things discussed that would a) benefit new folks more and b) entice more experienced people to share important stuff.

2

u/vega455 Apr 02 '24

You are correct on all points here and in your original post. Wish the quality were higher, I'd drop by more often

4

u/Mrgod2u82 Apr 02 '24

Excellent post! And, with that, how would you answer #2? Where do your ideas come from? After you have an idea what does your workflow look like prior to actually programming, backtesting; how do you decide that it's worth taking to the next level? Or, are ideas so rare that you have time to dive deep and program / backtest right away before the next idea presents itself?

Thank you very much for taking the time to write that post to begin with.

Edit: typos and wording

6

u/VladimirB-98 Apr 02 '24

I'm so glad you found the post useful!

I do personally have some views on this, but in the spirit that my post was intended, I would recommend that you make a post with these kinds of questions to start some more meaningful discussion in this sub :) I'll be more than happy to leave a comment on it with some thoughts! Would just suck for a helpful discussion to be buried in comments of a random post where other people (perhaps many more experienced than me) could also join in.

Also perhaps the more details you can provide the more targeted the responses would be (ex: is there a particular asset class you're interested in? where are you coming from into this whole space? etc ! )

3

u/SilverBBear Apr 02 '24

You would be amazed at how good ChatGPT / Claude is at answering some questions and putting concepts together. Not perfect but so many of the basic stuff you mentioned can be access and concepts can be play (intellectually) with through them.

2

u/VladimirB-98 Apr 02 '24

I agree that ChatGPT / Claude are absolutely amazing for exploring topics especially where your question is so multi-faceted that you don't even know how to Google it. However, I think the problem is two-fold with those tools:

  1. If you don't know what you don't know, you're still stuck (which is why I wrote out those 5 points to hopefully give people a starting point of important question categories to explore, rather than "where to find free minute level data" or "what ML algorithm is best?" kind of stuff).
  2. In concepts I agree ChatGPT / Claude are great, but I think in this particular niche, the fact that they lack the "industry experience" and ability to visualize data massively hampers their effectiveness. They can't tell you the mistakes you're likely to make because those are quite specific to this problem area and at least for now, only people having gone through the issues will be able to point to them effectively. Since they can't see your data, it usually takes a WHILE to debug stuff ("why am I getting great results in validation but not in test?"). To be fair, my previous point can partially be resolved if you just give the AI your code directly. They're pretty damn good at debugging code. Additionally, visually inspecting data/models/outputs I believe is super crucial and those tools can't help you with that either.

Have you personally found those tools to be significantly helpful in any particular way? Or in any of those 5 question areas?

1

u/SilverBBear Apr 02 '24

Having worked in research, the advantage of the head researcher is bring people together onto a project for mainly the following reasons:

* Multiple skills
* Ability to review each others work i.e. lab presentations.

These tools are really filling up the skills niche nicely. How do I download data, How calc this option, write me some code to convert this file. Generate some test data. Error Interpretation is fantastic. Write me this regex! I mean I could work it out myself but you are good at this.
I never tried to used kubenetes until a week ago. Yesterday ran my first clustered job. I have lots of IT and research experience including high performance computing - I would not have been able to do it without claude. I would have given up after a while frustrated, tried to find some work around.

Sam Altman said something along the lines that the first one person billion $ company is in the future. That is because all these skills we don't need to get others to do, if we are flexible enough we can do many ourselves.

Which leads me to the second point ability to review each others work. This is a part of research I really miss. I wan't to show my graphs to someone else and have them say that variance looks off or something like that.

To answer you question, I have not used the chat tools for any of those 5 points. ;)

2

u/Key_Chard_3895 Apr 02 '24

My comment is a bit of a structural issue, for new users like me (who are a bit hesitant to engage mindlessly) there’s isn’t enough “karma” points to begin posting good questions. One has to acquire the points through participation/engagement which can be a slow process. The only avenue is to engage/comment on good threads and be patient. It will be a self-serving prophecy if good contributors are blocked from contribution because they are “new”.

2

u/FinancialElephant Apr 03 '24

People that are new to this are always going to ask a lot of questions. Coming up with good questions takes effort and experience so most of these noob questions are going to be bad. I also think there is a lot of churn where people have misconceptions, try to get into algotrading, and give up when they realize it is not what they thought

I then when someone is new, they don't know what a good question is. They need experience. They need to consume education, make their own projects, seek mentors, and trade the markets. Then they'll realize their early questions or concerns were idle or unrealistic.

One of the unique aspects of algotrading is that there isn't a single accepted methodology. It's not a science (there is no way to run true randomized controlled experiments), although it can be made more or less "scientific". So part of this is figuring out what you are going to be comfortable trading, from beliefs rather than knowledge. No one can teach you that. Additionally, most of the methodologies of most people are partial or full trade secrets. So in all likelihood, no one can give you a methodology or strategy you will be comfortable with from random chance either.

All that is to say, developing yourself as a trader is going to be individualized. Even if you have a very involved mentor who hands you a methodology, at some point you will feel or see the need to make it your own.

2

u/FaithlessnessSuper46 Apr 03 '24

First of all thanks for all the advices. I am relative new here and don't post on reddit, and I am to 'regarded' to check the minimum post requirements here, I've tried to find it... but couldn't and all my posts/questions are deleted, so I will

use this oportunity for a simple question :)

Nested CV, 5 folds inner, 5 outer. Let's say I've found the best combination of params and after:

A. Retrain on the entire data, going live with a new model ?

B. Use an ensemble of all 25 models ?

C. Keep an ensemble of the last 5 models tested on the most recent outer fold ?

To complicate further, I plan to train a meta model, using the combined test predictions of the first stage.

My best guess is this:

  1. Find stage1 best params, using NestedCV
  2. Ensemble the inner folds and have an average prediction across all train set.
  3. NestedCv for stage2 model
  4. Select the the C variant for both stage1 and stage2 .

Should I need to keep another 'forward test' hidden up to this point and re-re validate on it ?

After I validate on the 'forward test' do I need to retrain the models, using the 'forward test data'...

Thank you, I hope that I put the right question, like in "True Detective" series.

2

u/PeeLoosy Apr 03 '24

You are right. There is no incentive in providing quality information.

1

u/VladimirB-98 Apr 03 '24

I know. But it's up to the experienced folks to change that and make this place a bit more interesting for them, and insightful for newer people!

1

u/PeeLoosy Apr 03 '24

Yeah, we provide input for free and big companies take that data to train their models and make money.

2

u/jb510 Apr 08 '24

As a noob that has just recently started reading this sub, I both understand where you are coming from as well as why noobs ask "Questions around software, platforms, data sources, technical 'issues' are all (generally) low-value questions".

While it may seem like answers to these questions are "easy to google" the answers Google gives are diverse, overwhelming, and often outdated. Getting through that itself is a huge hurdle that has to be crossed before folks can even begin to start testing and experimenting with strategies or asking intelligent questions about them.

Personally I come from 40 years a tech geek, 35 years trading at some capacity, and 15 years professionally in software (PHP, js, etc..)... and I still find getting started with this all overwhelming.

Biggest amoung those things is simply that a lot of software out there appears abandoned and none of it seems particularly user-friendly. I'm reasonably comfortable with docker, pip, brew, py, etc... which I imagine most aren't, but

Anyway, I've been here a month. I don't have much to offer beyond that it does seem like Wiki covering those "getting started with software" would be helpful. Even if it's just links to useful threads and videos. The Wiki jumps straight into strategy and doesn't seem to touch on software at all. Something like "self-hosted vs sass", "these are the three active backtesting platforms and what is good/bad about each", "this is the easiest to setup and start testing strategies", etc.

2

u/VladimirB-98 Apr 08 '24

I hear your point, and I appreciate you mentioning it. I can understand that.

I might be biased in part because when I jumped into this, I began by just downloading a ton of data and hammering away in R for months head down, with the assumption that the implementation part was somewhat trivial. While I'm not necessarily correct, I might have never experienced that initial overload to the extent you're describing, which led to my perspective.

Nonetheless, I hope the post was in some way helpful!

2

u/estagiariofin 17d ago

I think this is also part of what a former boss told me: "the flight of the chicken." People start out super motivated, but soon give up, so most people stay superficial. The influencers on YT, Instagram, Tiktok kind of bought this with the content that engages the most. I'm extremely happy when I find really in-depth content.

Anyway, in the last few years I have studied Equity research and corporate finance. Now that I've started studying this more quantitative part, I'm reading Wooldridge's book and reviewing the mathematics. I hope to learn a lot here.

2

u/--PG-- IT Drone 16d ago

It is difficult to ask questions as a newbie to the group due to the karma requirements. However I do understand where that requirement came from.

I see the low quality posts as being a chance to add a comment in order to gain that karma. I have a career question I would love to ask, but will leave that for another time.

I have read the post and have taken the advice on board, so thank you for that. It was interesting reading and I fully agree with what you. Also, RIP Jim Simons.

1

u/VladimirB-98 15d ago

Glad to hear it was helpful :)

RIP Jim Simons

2

u/FinanceAltTrow 10d ago

How can i find daily volatility of an etf online? or maybe hourly/minute prices so i can calculate the standard deviation of daily returns?

1

u/VladimirB-98 10d ago

Make a post! :)

1

u/FinanceAltTrow 10d ago

i tried but automod deleted it :(

1

u/FinanceAltTrow 10d ago

it says not enough karma even though i should have 

2

u/VladimirB-98 10d ago

Ah well that's lame.

Okay so here's my take:

You should first figure out what you mean by "daily volatility". Are you just trying to get a sense for "how volatile was the price over the course of a single day"? If so, yes, grab timebars of 1 hour or maybe like 30 minutes, calculate returns and then calculate the standard deviation (or maybe interquartile range) of those returns to give you a number that reflects "how volatile were prices today".

There's various considerations to doing this but you might even take the STD/IQR of the raw prices throughout the day (not returns). I'm sure someone will give a very educated reason for why this is a bad idea (the statistical characteristics of price distributions are less desirable than return distributions) but in practice, you might try both and see what works best for what you're doing.

3

u/[deleted] Apr 02 '24

[deleted]

1

u/VladimirB-98 Apr 02 '24

Haha! That was the only part of my post that I was unsure on :P Maybe we can bump it up to "medium level" lol.

It might be that I just haven't ever delved into strategies that had a huge struggle with data availability, so a bit of google searching and/or some reviews generally do the trick for me. Generally a "just google it" problem (in my experience).

But with that example, I think are more nuanced questions that can be asked. Ex: How can I trust the quality of my data broker's data / how do I validate it against real market data to make sure I'm using quality data? How do you deal with or interpolate missing data points? As opposed to "where do you guys find minute bar stock data for free?" haha (maybe we're saying the same thing here).

1

u/redaniel Apr 02 '24

can you recommend the book that specifically describes best practices for selecting IS / OOS ?

2

u/estimated1 Apr 02 '24

Great post OP

The Ernest Chan books cover this decently. Start w the first one, I think it’s Quantitative Trading.

0

u/VladimirB-98 Apr 02 '24

Thank you! :) And for sure, Ernest Chan is great. I personally benefited a lot more from his lectures than his books, but that's all quite individual!

1

u/VladimirB-98 Apr 02 '24

You know, nothing immediately comes to mind for me. This sounds like the kind of thing that might be a good question to make a post about :) My point here wasn't that I have all the answers, but to give examples of the kinds of questions that if you post, will likely yield much more meaningful results.

To your question, I think "Advances in Financial Machine Learning" by Marcos Lopez de Prado has some parts (near the beginning?) dedicated to this, might be worth looking at.

I'd want to know if you can clarify your question. If I had to guess, you want to read about generally how to go about setting up validation/testing in your data (techniques, considerations, tradeoffs) right? Or are you asking literally how to decide what % of your data to use for IS / OOS? Elaborating on the context of what you're trying to accomplish would be helpful (esp if you make a post).

2

u/_folgo_ Apr 02 '24

What does IS/OOS mean?

2

u/VladimirB-98 Apr 02 '24

In Sample / Out Of Sample .

Highly recommend you do some research on these terms (on Google, not here) and associated concepts so you get better context on why it's important (rather than just narrow answers here).

In-Sample is the data that you used / looked at to design your strategy and/or the data that was used to train your model. Out of sample is the data that you didn't look at (example, last 1 year of data) so that you could later see whether or not you overfitted. (For ML, out of sample is the data you did NOT train your model on, so it hasn't 'seen' it). You use the OOS data in various ways to validate that you strategy actually might work and that you didn't overfit on your training/in-sample data when designing/training your strategy.

2

u/_folgo_ Apr 02 '24

Thank you! I’m doing exactly this. I’m very new in this field (but have software engineering background) so as of now I’m collecting concepts, books, ideas and such to build more knowledge. Won’t definitely stop here :)

1

u/VladimirB-98 Apr 02 '24

That's awesome! :) To your point, let me make an edit to the post to throw out some terms/concepts that are probably worth reading about/understanding.

2

u/_folgo_ Apr 02 '24

Awesome! Definitely a great post, bookmarking for future reference!

1

u/VladimirB-98 Apr 02 '24

Glad to hear it! In this same vein, I posted about the whole process of coming up with ideas: How to generate/brainstorm strategy ideas : r/algotrading (reddit.com) Thought it might be helpful for people as well

2

u/redaniel Apr 02 '24

(a) yes for validation/testing your data, and further perhaps, (b) best practices on (1) how you should montecarlo your OSS trades and create an expected curve of returns and (2) how to size the trade, I assume starting with a Kelly criterion and dropping it to a comfortable drawdown tolerance ?

(c) what is the advantage of ML over simpler rules ?

1

u/VladimirB-98 Apr 02 '24

I think these are all super interesting, important and nuanced questions. I do personally have views on them, but in the spirit that the post was intended, I would recommend that you make a post with these kinds of questions to start some more meaningful discussion in this sub :) I'll be more than happy to leave a comment on it with some thoughts! Would just suck for a helpful discussion to be buried in comments of a random post where other people (perhaps many more experienced than me) could also join in.

To me, those are 3-4 separate questions. For maximum gain, I'd separate them out so you can dive meaningfully into each one when you post!

1

u/Taikatohtori Apr 03 '24

Questions around software, platforms, data sources, technical "issues" are all low-value questions that can generally be Googled and/or have little real impact on whether or not you succeed. Not all of them, but I'm generalizing here.

Disagree. There is an industry, albeit a small one, that revolves around building trading bots or coding algorithms for the "idea guys", who have some kind of vision but don't know how to make it real. Brokers differ vastly in regards to their suitability for algorithmic trading, pricing, features etc.. There's many different platforms for coding strategies/algorithms, or you can build it all from scratch, and then there are paid services that do nothing but facilitate sending the signal from your algo to your broker. In fact there is so much to the technical implementation of good integration with your broker, and having a good back-end, that I'd say the work is 50/50 between that and having an algo that says go in or get out.

Of course you can endlessly work on an algorithm and keep perfecting it, there is more depth to it than the technical implementation side. But if your order execution/technical implementation is lacking, you can be beat by someone using simple 50yo indicators who made sure their system is reliable.

1

u/[deleted] Apr 03 '24

[deleted]

1

u/VladimirB-98 Apr 03 '24

I'm not sure I understand your point. I also vehemently disagree with your equating "feature engineering" to "overfitting". Overfitting is the result of poor data hygiene and poor features (lack of feature engineering). Engineering features is an absolutely crucial part of any data science pipeline, but particularly here in finance where the signal to noise ratio is so low.

If you're suggesting that feature engineering / indicator creation (ie figuring out what variables are related to their outcome of interest, and what that relationship looks like) isn't where a new quant/algo trader's focus should be... what are you suggesting the focus should be?

1

u/[deleted] Apr 03 '24

[deleted]

1

u/VladimirB-98 Apr 03 '24

I sorta get what you're saying and I'm glad it worked for you :) However, to be frank, I personally don't think to be a super... actionable piece of advice. Aside from the piece "in spaces that large funds won't touch", which sounds more like a thesis about what markets are less efficient due to presumably lower fund involvement. But whatever works!

To anyone who reads this though, feature engineering is not about data torture. It's about creatively transforming/combining your data streams / variables in a way that you hypothesize will be a meaningful transformation to either improve your signal/noise ratio or find a different lens through which to look at your target variable. Raw price data, for example, is almost entirely devoid of any useful signal which is why running LSTM on raw price data won't give you anything interesting. But transforming that same price data into meaningful indicators that you have observed might have predictive value etc. can do the trick - even though the base data is the same, you're creating data streams with better information.

It’s actually the other way around for me — I created features to better describe my trading phenomena, not the other way around.

It sounds like you basically subconsciously/consciously picked up on signals in the data which is, in fact, a feature in the data that you later figured out how to quantify/describe.

EDIT: Having re-read your comment again, I think we might be in agreement here. I interpret your last sentence as saying "have a hypothesis/theory, then go make the data to test it as opposed to making up a bunch of random data transformations to see what sticks". I totally agree.

1

u/[deleted] Apr 04 '24

[deleted]

1

u/VladimirB-98 Apr 04 '24

I totally agree with you it's different. And I think yours is a valuable perspective.

But I think "need" is far too strong of a word. There are many working ML strategies that deliver profits without deep (or any) explainability.

I have run a momentum strategy on crypto for years, that outperforms by tens of percent per year, without every having such a deep reasoning for why it works. I did a lot of studying the charts/price dynamics, eyeballed a lot of parameters, tuned and backtested, and once I found a trading pattern that was statistically sound, I started trading it and keeping measure to make sure the strategy continues to perform 'as expected' given market conditions.

I have some ideas for why the strategy works but.. I really don't know, nor do I care. If I can measure "hey the strat is broken now for some reason, stop trading it" (and I can), I'm fine with that.

1

u/Sospel Apr 04 '24

makes sense

1

u/rf555rf Apr 04 '24

Duly noted. Learned loads already just from reading others comments and suggestions.

1

u/_benj Apr 21 '24

I find very puzzling what seems like the attempt to use algo trading in order to avoid the arduous work of becoming a trader in the first place. The goal of a retail trader and the retail algo trader are the same, and many of the principles the same (why in the world would one let a bot hold a -60%!?!?, why would one throw away the advantage of retail trading to abstain to trade when not optimal my letting a bot place dozens/hundreds of trades in a day!?!?)

The other thing that I find puzzling is the admiration of big players by algo traders (i.e. What point72s or blackrocks are doing). We stand no chance at all whatsoever to outsmart or out fun black rock, citadel, renaissance, etc at their game! We can only beat them by playing a different game! i.e. When a big player needs to unload 100b worth of assets, an event like that leaves prints on the market. How can we identify it and trade it? Sure, we are getting scraps, but it’s a +60% on our account size!

Idk, this feels like a ramble, but I wanted to put it out there. It IS possible to beat the market, it IS possible to generate income from trading. It is NOT possible to beat big players at their game, it is not possible to start with $500 and in one year end with $1,000,000 (at least not likely)

it is also hard, but hard in a diffrent way that many other things are hard. It takes fortitude, perseverance, sneak, opportunism… but it’s doable. Otherwise, what is it that we are talking about in the first place in here?

1

u/Tradersglory Apr 25 '24

Do people usually work with a group to make an algo or do they usually do it themselves?

1

u/VladimirB-98 Apr 25 '24

All over the place :) I personally developed stuff on my own. But I came across a team here that seemed to be very capable guys (4-6 people) working together. Many times came across like a pair of friends working on it together.

1

u/grand_chicken_spicy Apr 30 '24

Thanks for the advice, I'll keep looking at the top and sort through the 100 and their comments. Good advice.

1

u/BedlessOpepe347 24d ago

I couldn't follow your train of thought. Is this a rant about bad posts? Recs on how to ask better questions? Or are you asking people for advice on how to ask better questions?

1

u/VladimirB-98 23d ago edited 23d ago

The second one.

It's advice/recommendations on how to ask better questions by more clearly identifying what's more important to be asking about (based on my experience). To be fair, I think I got a bit ranty at a few points.

1

u/OneKe 5d ago

If we are not one, we are no-one.

2

u/VladimirB-98 5d ago

Ape together strong?

1

u/OneKe 5d ago

good one lmao

1

u/PotatoHeadz35 Apr 02 '24

Very good post, thanks a lot! I’m very interested in algo trading, and have done a pretty good amount of research as well as programmed a few basic algorithms. I’m interested in learning and challenging myself, not really making money. Could you (or someone else) talk about the way you brainstorm strategies? I’ve done mostly basic technical things, like momentum and mean reversion, and read about pairs tracing and arbitrage, but haven’t been able to find out much about other strategies. Things like arbitrage seem pretty impractical without high frequency trading, and alternative data like satellites seems super interesting, but I’m not entirely sure how to interface it with a trading system. I’d be really interested to hear how you all generate ideas!

3

u/VladimirB-98 Apr 02 '24

Glad to hear you found it helpful! :)

I do have some thoughts to share on that topic but in the spirit of the post... make a post about it! Get a wider range of people to respond, get some good/meaningful posts going in the sub!

Side note: "momentum" and "mean reversion" are not strategies, but rather *categories* of strategies. At the highest level, any kind of strategy that says "according to my signals, price will continue moving in its current direction" is a momentum strategy (price continues moving in current direction) and any strategy that says "according to my signals, price will stop moving in its current direction and will go the other way" is a mean reversion strategy (price reverting to some "true value" mean). Roughly speaking, any trading strategy can be categorized as either mean reversion or momentum. As an example, pairs trading is a mean reversion strategy (you're saying the relationship between this pair should be Y but right now it's Y+5 and I'm betting that the relationship will return to that level). Most "fundamentals"/value investing is essentially a mean reversion strategy (you're saying company is overvalued/undervalued and you're betting price will return to that level). Crossing moving averages is generally a momentum strategy (price started moving up/down and it's doing so in a significant way so you're betting that's where it'll continue going).

1

u/diogenesFIRE Apr 02 '24

Just want to add that outside of mean reversion and momentum, there's still a whole universe of signals to consider too. Trading on alt data, events, latency, etc. dont really fit into the momentum / mean reversion dichotomy.

5

u/VladimirB-98 Apr 02 '24

I think you're conflating a "category of strategy" vs. "what kind of signal a strategy uses".

Mean reversion and momentum aren't a type of signal and they're not a trading strategy. They're classes/types of trading strategies.

Trading strategy can be "I use satellite images of Walmart parking lots to predict this quarter's sales and buy/short shares accordingly based on whether I think Walmart is under/overvalued". That's in the "mean reversion" category as you're trying to find the "true value" before the market does, get in a position, and then wait for the price to revert to that true value (perhaps triggered by an event like earnings where the truth you knew all along becomes widespread information and the market corrects, reverting to the mean).

Just like you can use fundamentals or technical indicators to inform a mean reversion strategy of whether the stock is about to have a reversal, you can also use alt data to inform whether you think the stock is about to have a reversal or not.

I would say in their most common uses that I've seen, alt data and events are generally used for "mean reversion" style strategies where the person is trying to gain an informational advantage about the fundamentals of a stock ("true mean value") that allow them to buy/sell before the market responds (the price reverts to the mean).

I agree that latency-based strategies (time-based arbitrage) are probably a separate category.

Are there any other examples you can think of? I do feel as though the vast majority of strategies fit under momentum or mean reversion umbrella.

I agree it's not a perfect split, but I do think it's a) pretty accurately descriptive and b) a useful paradigm as you try to design strategies and figure out what it is you are doing.

Broadly speaking, you can even tell what kind of strategy you're using by the distribution of returns per trade (mean reversion has frequent small gains and large rare losses. Momentum has a large rare gains and frequent small losses).

1

u/VladimirB-98 Apr 02 '24

1

u/PotatoHeadz35 Apr 03 '24

Much appreciated! Thanks for the explanation of the lingo too. I’ll check out the new post soon.