I worked on a similar project about 9 months ago and encountered challenges with the Instaloader package due to Instagram's rate limits. My script got detected after roughly 200 requests. Have you found a workaround for this issue? I'd be interested in hearing about any solutions you've discovered.
What about the context window? I mean how, will you handle 10000 posts?
You have to write a batch system, that splits the 10000 posts into ~10-100 items / batch. API's such as OpenAI support batch calls.
I worked on a similar project about 9 months ago and encountered challenges with the Instaloader package due to Instagram's rate limits. My script got detected after roughly 200 requests. Have you found a workaround for this issue? I'd be interested in hearing about any solutions you've discovered.
Have you tried using a proxy? Or sleeping after 100 requests?
Yes, I've already used those methods, but it still got detected.
Hmm... Maybe you can find something useful in this article: https://decodingml.substack.com/p/highly-scalable-data-ingestion-architecture?r=1ttoeh