2024
Aligning an LLM with Direct Preference Optimization
- Notebook example of fine-tuning a 1.6B parameter model from Stablity AI
- Uses a dataset of human preference data released by Anthropic
- Trains a new model that generates text more aligned with what humans prefer, rather than just modeling language based on what it saw during the unsupervised portion of training.
- Web application for summarizing PDF documents.
- Generates summaries for documents after upload.
- Documents stored in S3, metadata and summaries stored in DynamoDB.
- Pages for viewing existing documents and uploading new documents.
2023
- Writing custom PyTorch Dataset classes to handle data during training.
- Separating data processing code from training code, improving readability.
- Enabling training on datasets that don’t fit into memory.
- Utilizing transfer learning with a vision transformer model (vit_b_16).
- GitHub repo containing two examples can be viewed here.
- Titanic dataset example
- Food-101 dataset example
2021
- Create a sentiment classifier trained on a dataset of restaurant reviews from Yelp.
- Use a compact, pre-trained BERT model presented by Google Research in this paper as a base. This model is much smaller than the original BERT model, allowing it to easily fit in the 1 GB RAM limit of Streamlit Sharing, which I use to host the model in a web app.
- Tokenize the reviews, send them through the BERT model, and retrieve the output vectors in the [CLS] position.
- Train a classifer to take these output vectors and classify them as positive or negative.
- Create a web app that serves the model.
- Create a RESTful API that serves a machine learning model.
- Version 1 allows users to pass inputs as path variables.
- Version 2 allows users to pass inputs as a JSON payload.
- FastAPI automatically generates interactive documentation for the API using Swagger UI and Redocly.
- GitHub repo for this project can be viewed here.
- Multi-page web application that allows users to classify an image of an object.
- Provides two methods of image input. The user can upload an image or provide a URL to an image online.
- The web app processes the image for input into MobileNet and captions the image with the output prediction.
- GitHub repo for this project can be viewed here.
- Model is trained on 1,000 classes.
- Web application developed with Streamlit to classify Iris flowers.
- User can customize model input values using sliders in the sidebar.
- Visualizes confidence of the model’s prediction with Plotly.
- Visualizes the user input compared to other datapoints on a PCA plot.
- GitHub repo for this project can be viewed here.
- Article showing how I built this application here.
- Article was included in Streamlit’s Weekly Roundup
2020
- Create an automated serverless function with AWS Lambda to scrape and parse CNBC Finance articles daily.
- Use a deployment package with Python libraries not included in the AWS Lambda Python 3.8 environment.
- Save text files in an S3 bucket organized by year, month, and day, making it easily queryable from AWS Athena.
- Web application developed with TensorFlow.js that trains a feed-forward neural network within client’s browser.
- Enables the client to run live inference on custom inputs.
- Visualizes confidence of the model’s predictions using Plotly.js.
- GitHub repo for the project can be viewed here.
- Web application for visualizing COVID-19 case density across the U.S. and Florida
- Developed with Python webframework Flask.
- Interactive visualizations created with Python library Bokeh.
- Retrieves data from Johns Hopkins repository.
- Ran on a VM with Google Cloud Platform.
- Predict if a patient has Hepatitis, Fibrosis, Cirrhosis, or no disease given lab readings from their blood sample.
- Uses an ensemble of models: dense neural network, gradient boosted decision tree, and random forest.
- Keras, XGBoost, and scikit-learn.
- Plotting OHLC data using candlesticks
2019
2018
- Predict if a Titanic passenger will survive.
- Data cleaning and preparation.
- Feature engineering.
- Use a neural network developed with TensorFlow.
- Classify an image of a hand-written digit with classes 0-9.
- Utilize Keras in R.
2017
Notes
- Explanations and demos of various machine learning topics and techniques.
- Retrieving and streaming data with financial APIs and websockets
- Portfolio construction and risk analysis
- Options pricing analysis
- Backtesting investment strategies
- Implementing investment strategies with Alpaca
- Streaming live trading data to AWS
Certifications