6 Comments
User's avatar
Ravish Mahajan's avatar

What about the context window? I mean how, will you handle 10000 posts?

Expand full comment
Paul Iusztin's avatar

You have to write a batch system, that splits the 10000 posts into ~10-100 items / batch. API's such as OpenAI support batch calls.

Expand full comment
Ghazal Vaughan's avatar

I worked on a similar project about 9 months ago and encountered challenges with the Instaloader package due to Instagram's rate limits. My script got detected after roughly 200 requests. Have you found a workaround for this issue? I'd be interested in hearing about any solutions you've discovered.

Expand full comment
Paul Iusztin's avatar

Have you tried using a proxy? Or sleeping after 100 requests?

Expand full comment
Ghazal Vaughan's avatar

Yes, I've already used those methods, but it still got detected.

Expand full comment
Paul Iusztin's avatar

Hmm... Maybe you can find something useful in this article: https://decodingml.substack.com/p/highly-scalable-data-ingestion-architecture?r=1ttoeh

Expand full comment