Skip to the content.

Data preparation

In this folder you will find the data transformation pipelines used in this project before the actual training of the models. This is basically the Data Engineering part of the project.

Tweets Transformer

This project contains a pipeline that takes the twitter data, filters out the tweets that are not geotagged and sinks the result as parquet.

Amazon Product Reviews Transformer

This project contains two pipelines:

Scripts

This directory contains simple python/pyspark scripts that are too simple for a full fledged project.