Professional Summary
Experienced data scientist with over a decade of expertise providing guidance to global companies in the pursuit of their energy sustainability goals. Adept at handling complex, cross-functional datasets, identifying and extracting crucial insights, and translating data into measurable business benefits. Proven track record of leveraging data-driven decision-making and predictive modeling to effectively deliver significant, multi-million-dollar projects. Reliable team player with excellent communication and analytical skills, and passion for innovation.
Skills
Languages: Python, PySpark, SQL, Perl, VBA.
Machine Learning: Regression and Classification Models, Supervised and Unsupervised Learning, Natural Language Processing (NLP), Ensemble Models, Deep Learning (Neural Networks), Image Processing.
Data Visualizations: Matplotlib, Seaborn, Plotly.
Python Libraries: Pandas, NumPy, Scikit-Learn, NLTK, SciPy, OpenCV, TensorFlow, GeoPandas, Spacy.
Work Experience
GENERAL ASSEMBLY Remote
Data Scientist 2021
Text Classification Model to Optimize Ad Campaign Targeting
- Utilized Pushshift's API to scrape specific subreddits, and performed EDA to extract named entities, identify tags, parts of speech, sentiments, and tokens using Spacy for optimizing ad campaign data.
- Implemented a pipeline to train and tune hyperparameters of multiple classifier models, achieving an accuracy score of 0.85 and ROC AUC of 0.92, a substantial improvement from the baseline accuracy of 0.41.
Topic Modeling to Contextualize Search Algorithms Results
- Built an unsupervised Topic model using LDA (Gensim) and GSDMM to improve the user experience on the Twitter platform and drive user monetization.
- Trained and tuned the model using over half-million twitter posts. Evaluated the model results for the coherence and investigated the output using pyLDAvis. Identified over 50 different topic clusters.
Image Classification Model to Detect Driver Distraction
- Developed and trained Custom CNN, pre-trained VGG-16, VGG19, and EfficientNetB0 models on Google Colab on 102.1k driver images for advanced classification accuracy.
- Enhanced the models with Image Augmentation techniques, resulting in significant performance improvement with a log-loss of 0.73 and accuracy of 0.7.
PETROTEL Plano, TX
Data Scientist and Senior Simulation Engineer, Simulation Group 2007 – 2021
- Spearheaded multiple multi-million-dollar projects for national and international oil companies, streamlining CAPEX and OPEX decisions through customized modeling pipelines.
- Served as the technical point of contact, managing data quality and consistency issues by interacting with data owners, enabling timely resolution.
- Built an efficient data visualization and analysis application using Python and VBA, saving over 500+ man-hours and streamlining client presentations.
- Optimized model performance by leveraging data from 4 domains, and cross-validated it with analog data, resulting in a $100MM USD investment approval.
- Conducted a rigorous data analysis of 2 decades of time-series data with 200 features, identifying additional revenue opportunities of 8-10%.
- Evaluated production economics and conducted risk assessments for new field developments and acquisition opportunities, providing valuable insights for investment decisions.
Education
M.S. in Petroleum Engineering | University of Alaska Fairbanks2007
B.S. in Chemical Engineering | Institute of Chemical Technology (UDCT)2005