GANESH MORYE

Professional Summary

Experienced data scientist with over a decade of expertise providing guidance to global companies in the pursuit of their energy sustainability goals. Adept at handling complex, cross-functional datasets, identifying and extracting crucial insights, and translating data into measurable business benefits. Proven track record of leveraging data-driven decision-making and predictive modeling to effectively deliver significant, multi-million-dollar projects. Reliable team player with excellent communication and analytical skills, and passion for innovation.

Skills

Languages: Python, PySpark, SQL, Perl, VBA.

Machine Learning: Regression and Classification Models, Supervised and Unsupervised Learning, Natural Language Processing (NLP), Ensemble Models, Deep Learning (Neural Networks), Image Processing.

Data Visualizations: Matplotlib, Seaborn, Plotly.

Python Libraries: Pandas, NumPy, Scikit-Learn, NLTK, SciPy, OpenCV, TensorFlow, GeoPandas, Spacy.

Work Experience

GENERAL ASSEMBLY        Remote

Data Scientist        2021

Text Classification Model to Optimize Ad Campaign Targeting

  • Utilized Pushshift's API to scrape specific subreddits, and performed EDA to extract named entities, identify tags, parts of speech, sentiments, and tokens using Spacy for optimizing ad campaign data.
  • Implemented a pipeline to train and tune hyperparameters of multiple classifier models, achieving an accuracy score of 0.85 and ROC AUC of 0.92, a substantial improvement from the baseline accuracy of 0.41.

Topic Modeling to Contextualize Search Algorithms Results

  • Built an unsupervised Topic model using LDA (Gensim) and GSDMM to improve the user experience on the Twitter platform and drive user monetization.
  • Trained and tuned the model using over half-million twitter posts. Evaluated the model results for the coherence and investigated the output using pyLDAvis. Identified over 50 different topic clusters.

Image Classification Model to Detect Driver Distraction

  • Developed and trained Custom CNN, pre-trained VGG-16, VGG19, and EfficientNetB0 models on Google Colab on 102.1k driver images for advanced classification accuracy.
  • Enhanced the models with Image Augmentation techniques, resulting in significant performance improvement with a log-loss of 0.73 and accuracy of 0.7.

PETROTEL        Plano, TX

Data Scientist and Senior Simulation Engineer, Simulation Group 2007 – 2021

  • Spearheaded multiple multi-million-dollar projects for national and international oil companies, streamlining CAPEX and OPEX decisions through customized modeling pipelines.
  • Served as the technical point of contact, managing data quality and consistency issues by interacting with data owners, enabling timely resolution.
  • Built an efficient data visualization and analysis application using Python and VBA, saving over 500+ man-hours and streamlining client presentations.
  • Optimized model performance by leveraging data from 4 domains, and cross-validated it with analog data, resulting in a $100MM USD investment approval.
  • Conducted a rigorous data analysis of 2 decades of time-series data with 200 features, identifying additional revenue opportunities of 8-10%.
  • Evaluated production economics and conducted risk assessments for new field developments and acquisition opportunities, providing valuable insights for investment decisions.

Education

M.S. in Petroleum Engineering | University of Alaska Fairbanks2007

B.S. in Chemical Engineering | Institute of Chemical Technology (UDCT)2005