Clairvoyant
Big Data Internship

After my freshman year, I spent the summer working at Clairvoyant, a company that specializes in Big Data. This was a very new territory for me, yet I enjoyed every second of it. I was exposed to tools like Hadoop, MapReduce and Hive along with learning how to build data pipelines for analytics. My main project was creating a pipeline for credit card data for one of our clients and cleaning that data.

Big Data
@ Clairvoyant

My first ever assignment was to clean an excel file that had misformatted data. I was told to use Python but had little experience. I was able to learn quickly on the job because of the hard deadlines and eventually was able to write python scripts to clean the excel sheet with no problem. I then applied these skills outside my job too in order to scrape websites of data I found interesting and analyze it. For example, one of my friends resells shoes on a site called StockX, so we scraped the data and created a visualization dashboard so he could decide when to buy and sell.

Big Data@ Clairvoyant

The most important work I did involved Hadoop and MapReduce. Once I was comfortable enough with the tools and had gone through all the training, I was given actual assignments that involved converting datasets to different formats so that it could be ingested into the pipeline. Working with Hadoop was challenging, but it gave me a lot of respect for the sheer power of the tool and how important it is. It exposed me to Data Warehouses, Data Lakes and the impending need for Data Scientists to efficiently manage large amounts of Data.