Instagram Graph API (Official)
Use the Instagram Graph API (requires a Facebook developer account).
You can query: Profile information (for business & creator accounts only). Media (images, captions, timestamps, hashtags). Engagement (likes, comments, etc.). Limitations: strict rate limits, authentication required, only public/business/creator data available.
Third-Party Datasets (Already Collected)
Kaggle has Instagram-related datasets that you can download directly: Example: Instagram User Analytics datasets
These datasets often include captions, hashtags, engagement metrics, but not raw media files. Advantage: No legal risk, instantly available.
Limitation: May not be the newest data.
Hashtag/Keyword Focused Data (via APIs)
Tools like CrowdTangle (Meta-owned, free for researchers) can be used for public content tracking.
Useful if you want to study trends, hashtags, or engagement at scale.
Image Datasets with Instagram-like Data
If you want Instagram-style images (for computer vision, ML), there are public datasets: InstaCities1M: 1M Instagram images geotagged in major cities. InstaFood: Food-related Instagram dataset. InstaFashion: Fashion dataset from Instagram.
Many of these are published by universities and available on GitHub or research repos.
Recommended Approach
If you want large datasets for training AI/ML, I suggest: Start with Kaggle + public research datasets. If you need fresh Instagram data → register for Instagram Graph API access. If your project is academic → check if you can apply for CrowdTangle access (Meta grants it for researchers).