Image Clustering
Automatically group similar images using KMeans, HDBSCAN, or GMM algorithms with state-of-the-art feature extractors.
A comprehensive Python toolkit for dataset curation, duplicate detection, quality control, and exploratory data analysis with state-of-the-art ML models.
Everything you need to organize and analyze your image datasets
Automatically group similar images using KMeans, HDBSCAN, or GMM algorithms with state-of-the-art feature extractors.
Leverage DINOv2, CLIP, ViT, ResNet, EfficientNet, and ConvNeXt for powerful feature extraction.
Find and remove duplicate or near-duplicate images from your datasets efficiently.
Generate beautiful visual grids of clustered images for easy inspection and analysis.
Visualize high-dimensional features with PCA, UMAP, or t-SNE for exploratory analysis.
Automatically organize images into cluster folders and export results to JSON.
Get started in seconds with pip
pip install imageatlas
pip install imageatlas[full]
If you wish to use the CLIP model, install it manually:
pip install git+https://github.com/openai/CLIP.git
Cluster your images in just a few lines of code
from imageatlas import ImageClusterer
# Initialize clusterer with state-of-the-art features
clusterer = ImageClusterer(
model='dinov2', # DINOv2, CLIP, ViT, ResNet, etc.
clustering_method='kmeans',
n_clusters=10,
device='cuda' # or 'cpu'
)
# Run clustering on your images
results = clusterer.fit("./path/to/images")
# Save results to JSON
results.to_json("./output/clustering_results.json")
# Create visual grids for each cluster
results.create_grids(
image_dir="./path/to/images",
output_dir="./output/grids"
)
# Organize images into cluster folders
results.create_cluster_folders(
image_dir="./path/to/images",
output_dir="./output/clusters"
)
That's it! Your images are now clustered, visualized, and organized. 🎉
Explore available models and algorithms
State-of-the-art models
Grouping algorithms
Parameters: n_clusters, min_cluster_size, min_samples, n_components
Visualization methods
Parameters: n_components, n_neighbors, min_dist, perplexity
If you use ImageAtlas in your research, please cite it
@software{imageatlas,
author = {Ahmad Javed},
title = {ImageAtlas: A Toolkit for Organizing, Cleaning and Analyzing Image Datasets},
year = {2024},
url = {https://github.com/ahmadjaved97/ImageAtlas}
}