Covid-19 Predicting the rate of spread with Scikit

Ai: African Intelligence
2 min readMar 21, 2020
Covid-19

Since December 2019 when WHO announced the outbreak of Covid-19 the rate of spread has been spontaneous. It started with vertical imported cases and now has become horizontal- community spread. The findings from this project can help countries prepare better based on the rate of spread

I have taken this challenge to help create a cleaner dataset for machine learning purposes. I am using real-time datasets from WHO and augmenting it with other relevant datasets. I will dive into the technicalities now.

The size and quality of datasets affects the accuracy of prediction. That said a bad dataset = bad prediction. As I said early on, through research there are a number of hypothetical variables which can help the prediction module to do its work to predict the rate of spread. The AI module will find patterns in the input variables for prediction.

The datasets I am using for augmentation apart from WHO datasets are

  1. Country temperature and humidity per country
  2. Average Contact tracing per country
  3. Number of governmental interventions per country — e.g ban on social gathering ,schools, travel restrictions, lock-down etc
  4. The urban population per country
  5. The household size per country
  6. Method of spread per country — Imported or local
  7. Mode of mass transportation per country — e.g bus , air, train
  8. Elderly population above 65 years per country
  9. Economic classification per country -e.g Low income to High income
  10. Number of days since pandemic outbreak

This might take a few days to complete and will urge anyone to pick the datasets I will share and use it. I will try as much as possible to clean up the data for easy training on your modules.

Below are the link to the datasets and Scikit ML model

Kaggle Notebook

--

--

Ai: African Intelligence

African Intelligence seeks to explore technologies that will be disruptive to the African Tech Ecosystem