Dodge Challenger


When i was young, an activity I liked to do was guess how much I thought a car driving by was worth. My best friend’s dad owned a Chevrolet car dealership so I was around cars a lot. I would first try to make a guess based on the cars manufacturer, what it’s year and model was. I would then proceed to give an estimate of what I thought that car was worth.

With my love of cars, coupled with the recent task of selling my car, I asked the question; Is it possible to build an effective machine learning…

Feature Engineering! As much as we all sigh at the mere utterance of the dreaded phrase, we as data scientists spend the majority of our time engaging in this activity (as shown in fig 1.1). It seems a majority of data scientists find this part of their workload to be the most tedious and least enjoyable.

As tedious and unenjoyable it can be, it is vital and it can lead to a better performing model. Too many data scientists, the inability to quantify its impact (if any) until towards the end of the workload adds to the lack of enjoyment.

Part 1 of my “taking advantage of R visualizations” blog series

Photo by Cris DiNoto on Unsplash


In the field of data analytics, data visualization is one of the single most important tools when starting new projects. It aids discovery and allows data scientists to read the story of their data. I have recently noticed how much I love R over Python for my data visualization needs. Because of this, I have started a small blog series on R visualizations beginning with this blog on flow charts. You can find the code to this blog on my GitHub.

Flow Charts

Support Vector Machines


Within this blog, I will be presenting support vector machines, Illustrating how they work, showing them in action on a dataset, and providing real-life examples of SVM usage.

Want more control of your data visualizations? Are you bored of matplotlib? Do you enjoy the ease and simplicity of seaborn’s syntax? Do you long for more interaction with your birthed visualizations? Try an interactive library such as plotly.

Within this blog, I hope to guide you through the process of creating your first interactive visual. I will compare plotly with seaborn and see what you can and can’t do with each.

What is Seaborn?

Seaborn is a library built on matplotlib that use tidy, simple syntax to create effective graphs. Generally, it’s one of the fastest ways to produce a graph for…

How data scientists can improve their Jupyter notebook experience

Photo by Christopher Gower on Unsplash


When learning the programming language R for the first time, one of the first things I fell in love with was R-studio. I absolutely loved its pane layout, variable inspector, help documentation window, markdown exports, visualization output, speediness, and THE FACT THAT IT’S FREE. Python is my native tongue, and I generally work out of Jupyter notebooks for the majority of my projects. I like the ability of documenting my projects cell by cell that Jupyter notebooks offer me. I thought to myself, is it possible to turn Jupyter Lab into an IDE similar to R-Studio?

This blog is for…

order and chaos — David Wilson

The purpose of this blog is to simulate the mindset of a data scientist as he/she approaches a data driven problem.I will do this by attempting to introduce who I am with a psycho philosophical twist.

Hopefully, by the end of this blog, I will come to some sort of meaningful conclusion of what I know about myself, and why I chose the glorious field of data science.

Who am i?

Well that’s easy, I am a combination of a consciousness and a collective unconsciousness. (start broad before you narrow down)

As sufficient as I think that answer is, its satisfaction would be…

Paul Aleksis

Junior Data Scientist| Passionate about using data for social good.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store