Clustering using GPU nearest neighbours

Other

$60/hr Starting at $60

Engineered a GPU-accelerated review-clustering pipeline that generates Azure OpenAI embeddings and retrieves nearest neighbors in PostgreSQL using pgvector’s cosine-distance operator for fast similarity ranking. GitHub
Coupled UMAP for non-linear dimensionality reduction with HDBSCAN to discover dense semantic clusters without pre-specifying k, aligned to best-practice usage of both algorithms. UMAP Documentation+1
Orchestrated a 72-variant hyperparameter sweep (n_neighbors × min_dist × n_components × min_cluster_size × min_samples) and parallelized evaluation across 8 processes for efficient model selection.
Designed an auto-selection score with Min-Max normalization that weights Davies–Bouldin, Silhouette, and Calinski-Harabasz indices and enforces guardrails on valid cluster counts to pick the best iteration. Scikit-learn+1
Persisted per-iteration labels, metrics, and configs via SQLAlchemy; materialized the final reviews→cluster mapping in Postgres with safe delete/append semantics for idempotent reruns.
Added structured logging (structlog) and timestamped checkpoints, delivering a reproducible, configurable clustering service for large-scale product-review analysis (Python, RAPIDS cuML, PostgreSQL/pgvector, SQLAlchemy, Azure OpenAI, pandas, scikit-learn).

Data engineer

$60/hr Ongoing

Engineered a GPU-accelerated review-clustering pipeline that generates Azure OpenAI embeddings and retrieves nearest neighbors in PostgreSQL using pgvector’s cosine-distance operator for fast similarity ranking. GitHub
Coupled UMAP for non-linear dimensionality reduction with HDBSCAN to discover dense semantic clusters without pre-specifying k, aligned to best-practice usage of both algorithms. UMAP Documentation+1
Orchestrated a 72-variant hyperparameter sweep (n_neighbors × min_dist × n_components × min_cluster_size × min_samples) and parallelized evaluation across 8 processes for efficient model selection.
Designed an auto-selection score with Min-Max normalization that weights Davies–Bouldin, Silhouette, and Calinski-Harabasz indices and enforces guardrails on valid cluster counts to pick the best iteration. Scikit-learn+1
Persisted per-iteration labels, metrics, and configs via SQLAlchemy; materialized the final reviews→cluster mapping in Postgres with safe delete/append semantics for idempotent reruns.
Added structured logging (structlog) and timestamped checkpoints, delivering a reproducible, configurable clustering service for large-scale product-review analysis (Python, RAPIDS cuML, PostgreSQL/pgvector, SQLAlchemy, Azure OpenAI, pandas, scikit-learn).

Data engineer

Azure OpenAICluster AnalysisCluster ManagementClustering

Nanyo says,

Helped us resolve our issue

for AZURE AKS K10 ingres application gateway on Jul 08, 2024
Dan 54 says,

Thanks for all the help on this. Like I said I might be reaching out for more on this in the future.

for Amazon api gateway expert on Feb 04, 2023
Dan 54 says,

We will be working together in the future.

for Amazon api gateway expert on Jan 30, 2023
LSE 1 says,

Great working with Chris! Highly recommended!

for Cloud Engineer for Blockchain Analytics on Jul 23, 2021
Steve_Garelick says,

Great Job and Very fast.

for Install OSTicket on Server on Jun 06, 2021

Sign up or Log in to see more.

Browse Similar Freelance Experts