WeRateDogs Archive

Wrangling,Analyzing and Visualization

Posted by Mohamed Abdo on Fri 10 July 2020

WeRateDogs wrangle Report

Data wrangling is a core skill that everyone who works with data should be familiar with since so much of the world’s data isn’t clean this process is divided to 3 steps:

1.Gathering data

2.Assesing data

3.cleaning data

Gather

Gathering data is the first step in data wrangling before it we don’t have any data after it we have This project involved gathering data from three different sources as listed below:

1.An existing file named twitter_archived_enhanced.csv , I read it in pandas dataframe.

2.download a file from internet using request.

3.query twitter api for each tweet in twitter_archive file using tweepy library

Asses

After gathering data the next step is assessing it visually and programmatically to detect quality and tidiness issues

After my assessing of data I found some issues:

quality

dogs_rates_archive table

• name column has none,a,an,.. values instead of NAN

• there are columns we don’t need (in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp)

• wrong data types(timestamp,tweet_id).

• there are retweeted tweets and we only want original tweeets

• rating_numerator and rating_denominator have wrong ratings, like 960/0 , 24/7, 9/11, 165/150

image_predictions table

• predicition 1,2,3 column are write as abbrevation p1,2,3

• there are tweets with no images as image_predictions table has 2075 observations and the archive table has 2356 observations

• tweet id is int64not object

• three predictions of the breed of dogs,but one of them is the most confident.

tweet_df table

change name of id column to tweet_id to be consistent

tidines

• dog stages is one variable but in 4 columns

• three datasets instead of one .

Cleaning

Cleaning data is the third step in data wrangling. It is the process of fixing the quality and tidiness issues that we identified in the assess step,to make sure the the data is accurate and cleaning. And then being able to analyze our data

Before starting the cleaning process I make a copy of each data set I have, then I tried to correct the issues identified in assessing step.

In [ ]:
 
In [1]:
import requests
import numpy as np
import pandas as pd
import tweepy
import json
import matplotlib.pyplot as plt
from timeit import default_timer as timer
import warnings
warnings.filterwarnings('ignore')

Gather

In [2]:
url= 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)
with open('image-predictions.tsv',mode='wb')as file:
    file.write(response.content)
In [3]:
dogs_rates_archive=pd.read_csv('twitter-archive-enhanced.csv')
image_predictions=pd.read_csv('image-predictions.tsv',sep= '\t')
In [4]:
import tweepy

consumer_key = 'YOUR CONSUMER KEY'
consumer_secret = 'YOUR CONSUMER SECRET'
access_token = 'YOUR ACCESS TOKEN'
access_secret = 'YOUR ACCESS SECRET'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth, parser=tweepy.parsers.JSONParser(), 
                 wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
In [5]:
start = timer()

tweet_list = []
tweet_errors = []
for tweet_id in dogs_rates_archive.tweet_id:
    try:
        tweet = api.get_status(tweet_id, tweet_mode = 'extended')
        tweet_list.append(tweet)
      
        
    except Exception as e:
        print(str(tweet_id) + '_' + str(e))
        tweet_errors.append(tweet_id)
end = timer()
print(end-start)   
   
Rate limit reached. Sleeping for: 167
888202515573088257_[{'code': 144, 'message': 'No status found with that ID.'}]
873697596434513921_[{'code': 144, 'message': 'No status found with that ID.'}]
872668790621863937_[{'code': 144, 'message': 'No status found with that ID.'}]
872261713294495745_[{'code': 144, 'message': 'No status found with that ID.'}]
869988702071779329_[{'code': 144, 'message': 'No status found with that ID.'}]
866816280283807744_[{'code': 144, 'message': 'No status found with that ID.'}]
861769973181624320_[{'code': 144, 'message': 'No status found with that ID.'}]
856602993587888130_[{'code': 144, 'message': 'No status found with that ID.'}]
851953902622658560_[{'code': 144, 'message': 'No status found with that ID.'}]
845459076796616705_[{'code': 144, 'message': 'No status found with that ID.'}]
844704788403113984_[{'code': 144, 'message': 'No status found with that ID.'}]
842892208864923648_[{'code': 144, 'message': 'No status found with that ID.'}]
837366284874571778_[{'code': 144, 'message': 'No status found with that ID.'}]
837012587749474308_[{'code': 144, 'message': 'No status found with that ID.'}]
829374341691346946_[{'code': 144, 'message': 'No status found with that ID.'}]
827228250799742977_[{'code': 144, 'message': 'No status found with that ID.'}]
812747805718642688_[{'code': 144, 'message': 'No status found with that ID.'}]
802247111496568832_[{'code': 144, 'message': 'No status found with that ID.'}]
779123168116150273_[{'code': 144, 'message': 'No status found with that ID.'}]
775096608509886464_[{'code': 144, 'message': 'No status found with that ID.'}]
770743923962707968_[{'code': 144, 'message': 'No status found with that ID.'}]
Rate limit reached. Sleeping for: 734
754011816964026368_[{'code': 144, 'message': 'No status found with that ID.'}]
680055455951884288_[{'code': 144, 'message': 'No status found with that ID.'}]
Rate limit reached. Sleeping for: 732
2084.2182542709998
In [21]:
with open('tweet_json.txt', 'w') as fb:
    json.dump(tweet_list, fb)
In [77]:
with open('tweet_json.txt') as file:
    tweet_list = json.load(file)
tweets_df = pd.DataFrame(tweet_list, columns = ['id','favorite_count', 'retweet_count'])
In [78]:
tweets_df.to_csv('tweets_df.csv',index = False)
In [4]:
tweets_df = pd.read_csv('tweets_df.csv')

Assess

In [32]:
dogs_rates_archive
       
Out[32]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
0 892420643555336193 NaN NaN 2017-08-01 16:23:56 +0000 <a href=”http://twitter.com/download/iphone” r… This is Phineas. He’s a mystical boy. Only eve… NaN NaN NaN https://twitter.com/dog_rates/status/892420643… 13 10 Phineas None None None None
1 892177421306343426 NaN NaN 2017-08-01 00:17:27 +0000 <a href=”http://twitter.com/download/iphone” r… This is Tilly. She’s just checking pup on you…. NaN NaN NaN https://twitter.com/dog_rates/status/892177421… 13 10 Tilly None None None None
2 891815181378084864 NaN NaN 2017-07-31 00:18:03 +0000 <a href=”http://twitter.com/download/iphone” r… This is Archie. He is a rare Norwegian Pouncin… NaN NaN NaN https://twitter.com/dog_rates/status/891815181… 12 10 Archie None None None None
3 891689557279858688 NaN NaN 2017-07-30 15:58:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Darla. She commenced a snooze mid meal… NaN NaN NaN https://twitter.com/dog_rates/status/891689557… 13 10 Darla None None None None
4 891327558926688256 NaN NaN 2017-07-29 16:00:24 +0000 <a href=”http://twitter.com/download/iphone” r… This is Franklin. He would like you to stop ca… NaN NaN NaN https://twitter.com/dog_rates/status/891327558… 12 10 Franklin None None None None
5 891087950875897856 NaN NaN 2017-07-29 00:08:17 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a majestic great white breaching … NaN NaN NaN https://twitter.com/dog_rates/status/891087950… 13 10 None None None None None
6 890971913173991426 NaN NaN 2017-07-28 16:27:12 +0000 <a href=”http://twitter.com/download/iphone” r… Meet Jax. He enjoys ice cream so much he gets … NaN NaN NaN https://gofundme.com/ydvmve-surgery-for-jax,ht… 13 10 Jax None None None None
7 890729181411237888 NaN NaN 2017-07-28 00:22:40 +0000 <a href=”http://twitter.com/download/iphone” r… When you watch your owner call another dog a g… NaN NaN NaN https://twitter.com/dog_rates/status/890729181… 13 10 None None None None None
8 890609185150312448 NaN NaN 2017-07-27 16:25:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Zoey. She doesn’t want to be one of th… NaN NaN NaN https://twitter.com/dog_rates/status/890609185… 13 10 Zoey None None None None
9 890240255349198849 NaN NaN 2017-07-26 15:59:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Cassie. She is a college pup. Studying… NaN NaN NaN https://twitter.com/dog_rates/status/890240255… 14 10 Cassie doggo None None None
10 890006608113172480 NaN NaN 2017-07-26 00:31:25 +0000 <a href=”http://twitter.com/download/iphone” r… This is Koda. He is a South Australian decksha… NaN NaN NaN https://twitter.com/dog_rates/status/890006608… 13 10 Koda None None None None
11 889880896479866881 NaN NaN 2017-07-25 16:11:53 +0000 <a href=”http://twitter.com/download/iphone” r… This is Bruno. He is a service shark. Only get… NaN NaN NaN https://twitter.com/dog_rates/status/889880896… 13 10 Bruno None None None None
12 889665388333682689 NaN NaN 2017-07-25 01:55:32 +0000 <a href=”http://twitter.com/download/iphone” r… Here’s a puppo that seems to be on the fence a… NaN NaN NaN https://twitter.com/dog_rates/status/889665388… 13 10 None None None None puppo
13 889638837579907072 NaN NaN 2017-07-25 00:10:02 +0000 <a href=”http://twitter.com/download/iphone” r… This is Ted. He does his best. Sometimes that’… NaN NaN NaN https://twitter.com/dog_rates/status/889638837… 12 10 Ted None None None None
14 889531135344209921 NaN NaN 2017-07-24 17:02:04 +0000 <a href=”http://twitter.com/download/iphone” r… This is Stuart. He’s sporting his favorite fan… NaN NaN NaN https://twitter.com/dog_rates/status/889531135… 13 10 Stuart None None None puppo
15 889278841981685760 NaN NaN 2017-07-24 00:19:32 +0000 <a href=”http://twitter.com/download/iphone” r… This is Oliver. You’re witnessing one of his m… NaN NaN NaN https://twitter.com/dog_rates/status/889278841… 13 10 Oliver None None None None
16 888917238123831296 NaN NaN 2017-07-23 00:22:39 +0000 <a href=”http://twitter.com/download/iphone” r… This is Jim. He found a fren. Taught him how t… NaN NaN NaN https://twitter.com/dog_rates/status/888917238… 12 10 Jim None None None None
17 888804989199671297 NaN NaN 2017-07-22 16:56:37 +0000 <a href=”http://twitter.com/download/iphone” r… This is Zeke. He has a new stick. Very proud o… NaN NaN NaN https://twitter.com/dog_rates/status/888804989… 13 10 Zeke None None None None
18 888554962724278272 NaN NaN 2017-07-22 00:23:06 +0000 <a href=”http://twitter.com/download/iphone” r… This is Ralphus. He’s powering up. Attempting … NaN NaN NaN https://twitter.com/dog_rates/status/888554962… 13 10 Ralphus None None None None
19 888202515573088257 NaN NaN 2017-07-21 01:02:36 +0000 <a href=”http://twitter.com/download/iphone” r… RT @dog_rates: This is Canela. She attempted s… 8.874740e+17 4.196984e+09 2017-07-19 00:47:34 +0000 https://twitter.com/dog_rates/status/887473957… 13 10 Canela None None None None
20 888078434458587136 NaN NaN 2017-07-20 16:49:33 +0000 <a href=”http://twitter.com/download/iphone” r… This is Gerald. He was just told he didn’t get… NaN NaN NaN https://twitter.com/dog_rates/status/888078434… 12 10 Gerald None None None None
21 887705289381826560 NaN NaN 2017-07-19 16:06:48 +0000 <a href=”http://twitter.com/download/iphone” r… This is Jeffrey. He has a monopoly on the pool… NaN NaN NaN https://twitter.com/dog_rates/status/887705289… 13 10 Jeffrey None None None None
22 887517139158093824 NaN NaN 2017-07-19 03:39:09 +0000 <a href=”http://twitter.com/download/iphone” r… I’ve yet to rate a Venezuelan Hover Wiener. Th… NaN NaN NaN https://twitter.com/dog_rates/status/887517139… 14 10 such None None None None
23 887473957103951883 NaN NaN 2017-07-19 00:47:34 +0000 <a href=”http://twitter.com/download/iphone” r… This is Canela. She attempted some fancy porch… NaN NaN NaN https://twitter.com/dog_rates/status/887473957… 13 10 Canela None None None None
24 887343217045368832 NaN NaN 2017-07-18 16:08:03 +0000 <a href=”http://twitter.com/download/iphone” r… You may not have known you needed to see this … NaN NaN NaN https://twitter.com/dog_rates/status/887343217… 13 10 None None None None None
25 887101392804085760 NaN NaN 2017-07-18 00:07:08 +0000 <a href=”http://twitter.com/download/iphone” r… This… is a Jubilant Antarctic House Bear. We… NaN NaN NaN https://twitter.com/dog_rates/status/887101392… 12 10 None None None None None
26 886983233522544640 NaN NaN 2017-07-17 16:17:36 +0000 <a href=”http://twitter.com/download/iphone” r… This is Maya. She’s very shy. Rarely leaves he… NaN NaN NaN https://twitter.com/dog_rates/status/886983233… 13 10 Maya None None None None
27 886736880519319552 NaN NaN 2017-07-16 23:58:41 +0000 <a href=”http://twitter.com/download/iphone” r… This is Mingus. He’s a wonderful father to his… NaN NaN NaN https://www.gofundme.com/mingusneedsus,https:/… 13 10 Mingus None None None None
28 886680336477933568 NaN NaN 2017-07-16 20:14:00 +0000 <a href=”http://twitter.com/download/iphone” r… This is Derek. He’s late for a dog meeting. 13… NaN NaN NaN https://twitter.com/dog_rates/status/886680336… 13 10 Derek None None None None
29 886366144734445568 NaN NaN 2017-07-15 23:25:31 +0000 <a href=”http://twitter.com/download/iphone” r… This is Roscoe. Another pupper fallen victim t… NaN NaN NaN https://twitter.com/dog_rates/status/886366144… 12 10 Roscoe None None pupper None
2326 666411507551481857 NaN NaN 2015-11-17 00:24:19 +0000 <a href=”http://twitter.com/download/iphone” r… This is quite the dog. Gets really excited whe… NaN NaN NaN https://twitter.com/dog_rates/status/666411507… 2 10 quite None None None None
2327 666407126856765440 NaN NaN 2015-11-17 00:06:54 +0000 <a href=”http://twitter.com/download/iphone” r… This is a southern Vesuvius bumblegruff. Can d… NaN NaN NaN https://twitter.com/dog_rates/status/666407126… 7 10 a None None None None
2328 666396247373291520 NaN NaN 2015-11-16 23:23:41 +0000 <a href=”http://twitter.com/download/iphone” r… Oh goodness. A super rare northeast Qdoba kang… NaN NaN NaN https://twitter.com/dog_rates/status/666396247… 9 10 None None None None None
2329 666373753744588802 NaN NaN 2015-11-16 21:54:18 +0000 <a href=”http://twitter.com/download/iphone” r… Those are sunglasses and a jean jacket. 11/10 … NaN NaN NaN https://twitter.com/dog_rates/status/666373753… 11 10 None None None None None
2330 666362758909284353 NaN NaN 2015-11-16 21:10:36 +0000 <a href=”http://twitter.com/download/iphone” r… Unique dog here. Very small. Lives in containe… NaN NaN NaN https://twitter.com/dog_rates/status/666362758… 6 10 None None None None None
2331 666353288456101888 NaN NaN 2015-11-16 20:32:58 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a mixed Asiago from the Galápagos… NaN NaN NaN https://twitter.com/dog_rates/status/666353288… 8 10 None None None None None
2332 666345417576210432 NaN NaN 2015-11-16 20:01:42 +0000 <a href=”http://twitter.com/download/iphone” r… Look at this jokester thinking seat belt laws … NaN NaN NaN https://twitter.com/dog_rates/status/666345417… 10 10 None None None None None
2333 666337882303524864 NaN NaN 2015-11-16 19:31:45 +0000 <a href=”http://twitter.com/download/iphone” r… This is an extremely rare horned Parthenon. No… NaN NaN NaN https://twitter.com/dog_rates/status/666337882… 9 10 an None None None None
2334 666293911632134144 NaN NaN 2015-11-16 16:37:02 +0000 <a href=”http://twitter.com/download/iphone” r… This is a funny dog. Weird toes. Won’t come do… NaN NaN NaN https://twitter.com/dog_rates/status/666293911… 3 10 a None None None None
2335 666287406224695296 NaN NaN 2015-11-16 16:11:11 +0000 <a href=”http://twitter.com/download/iphone” r… This is an Albanian 3 1/2 legged Episcopalian… NaN NaN NaN https://twitter.com/dog_rates/status/666287406… 1 2 an None None None None
2336 666273097616637952 NaN NaN 2015-11-16 15:14:19 +0000 <a href=”http://twitter.com/download/iphone” r… Can take selfies 11/10 https://t.co/ws2AMaNwPW NaN NaN NaN https://twitter.com/dog_rates/status/666273097… 11 10 None None None None None
2337 666268910803644416 NaN NaN 2015-11-16 14:57:41 +0000 <a href=”http://twitter.com/download/iphone” r… Very concerned about fellow dog trapped in com… NaN NaN NaN https://twitter.com/dog_rates/status/666268910… 10 10 None None None None None
2338 666104133288665088 NaN NaN 2015-11-16 04:02:55 +0000 <a href=”http://twitter.com/download/iphone” r… Not familiar with this breed. No tail (weird)…. NaN NaN NaN https://twitter.com/dog_rates/status/666104133… 1 10 None None None None None
2339 666102155909144576 NaN NaN 2015-11-16 03:55:04 +0000 <a href=”http://twitter.com/download/iphone” r… Oh my. Here you are seeing an Adobe Setter giv… NaN NaN NaN https://twitter.com/dog_rates/status/666102155… 11 10 None None None None None
2340 666099513787052032 NaN NaN 2015-11-16 03:44:34 +0000 <a href=”http://twitter.com/download/iphone” r… Can stand on stump for what seems like a while… NaN NaN NaN https://twitter.com/dog_rates/status/666099513… 8 10 None None None None None
2341 666094000022159362 NaN NaN 2015-11-16 03:22:39 +0000 <a href=”http://twitter.com/download/iphone” r… This appears to be a Mongolian Presbyterian mi… NaN NaN NaN https://twitter.com/dog_rates/status/666094000… 9 10 None None None None None
2342 666082916733198337 NaN NaN 2015-11-16 02:38:37 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a well-established sunblockerspan… NaN NaN NaN https://twitter.com/dog_rates/status/666082916… 6 10 None None None None None
2343 666073100786774016 NaN NaN 2015-11-16 01:59:36 +0000 <a href=”http://twitter.com/download/iphone” r… Let’s hope this flight isn’t Malaysian (lol). … NaN NaN NaN https://twitter.com/dog_rates/status/666073100… 10 10 None None None None None
2344 666071193221509120 NaN NaN 2015-11-16 01:52:02 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a northern speckled Rhododendron…. NaN NaN NaN https://twitter.com/dog_rates/status/666071193… 9 10 None None None None None
2345 666063827256086533 NaN NaN 2015-11-16 01:22:45 +0000 <a href=”http://twitter.com/download/iphone” r… This is the happiest dog you will ever see. Ve… NaN NaN NaN https://twitter.com/dog_rates/status/666063827… 10 10 the None None None None
2346 666058600524156928 NaN NaN 2015-11-16 01:01:59 +0000 <a href=”http://twitter.com/download/iphone” r… Here is the Rand Paul of retrievers folks! He’… NaN NaN NaN https://twitter.com/dog_rates/status/666058600… 8 10 the None None None None
2347 666057090499244032 NaN NaN 2015-11-16 00:55:59 +0000 <a href=”http://twitter.com/download/iphone” r… My oh my. This is a rare blond Canadian terrie… NaN NaN NaN https://twitter.com/dog_rates/status/666057090… 9 10 a None None None None
2348 666055525042405380 NaN NaN 2015-11-16 00:49:46 +0000 <a href=”http://twitter.com/download/iphone” r… Here is a Siberian heavily armored polar bear … NaN NaN NaN https://twitter.com/dog_rates/status/666055525… 10 10 a None None None None
2349 666051853826850816 NaN NaN 2015-11-16 00:35:11 +0000 <a href=”http://twitter.com/download/iphone” r… This is an odd dog. Hard on the outside but lo… NaN NaN NaN https://twitter.com/dog_rates/status/666051853… 2 10 an None None None None
2350 666050758794694657 NaN NaN 2015-11-16 00:30:50 +0000 <a href=”http://twitter.com/download/iphone” r… This is a truly beautiful English Wilson Staff… NaN NaN NaN https://twitter.com/dog_rates/status/666050758… 10 10 a None None None None
2351 666049248165822465 NaN NaN 2015-11-16 00:24:50 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a 1949 1st generation vulpix. Enj… NaN NaN NaN https://twitter.com/dog_rates/status/666049248… 5 10 None None None None None
2352 666044226329800704 NaN NaN 2015-11-16 00:04:52 +0000 <a href=”http://twitter.com/download/iphone” r… This is a purebred Piers Morgan. Loves to Netf… NaN NaN NaN https://twitter.com/dog_rates/status/666044226… 6 10 a None None None None
2353 666033412701032449 NaN NaN 2015-11-15 23:21:54 +0000 <a href=”http://twitter.com/download/iphone” r… Here is a very happy pup. Big fan of well-main… NaN NaN NaN https://twitter.com/dog_rates/status/666033412… 9 10 a None None None None
2354 666029285002620928 NaN NaN 2015-11-15 23:05:30 +0000 <a href=”http://twitter.com/download/iphone” r… This is a western brown Mitsubishi terrier. Up… NaN NaN NaN https://twitter.com/dog_rates/status/666029285… 7 10 a None None None None
2355 666020888022790149 NaN NaN 2015-11-15 22:32:08 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a Japanese Irish Setter. Lost eye… NaN NaN NaN https://twitter.com/dog_rates/status/666020888… 8 10 None None None None None

2356 rows × 17 columns

In [31]:
image_predictions
Out[31]:
tweet_id jpg_url img_num p1 p1_conf p1_dog p2 p2_conf p2_dog p3 p3_conf p3_dog
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg 1 Welsh_springer_spaniel 0.465074 True collie 0.156665 True Shetland_sheepdog 0.061428 True
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg 1 redbone 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg 1 German_shepherd 0.596461 True malinois 0.138584 True bloodhound 0.116197 True
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg 1 Rhodesian_ridgeback 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg 1 miniature_pinscher 0.560311 True Rottweiler 0.243682 True Doberman 0.154629 True
5 666050758794694657 https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg 1 Bernese_mountain_dog 0.651137 True English_springer 0.263788 True Greater_Swiss_Mountain_dog 0.016199 True
6 666051853826850816 https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg 1 box_turtle 0.933012 False mud_turtle 0.045885 False terrapin 0.017885 False
7 666055525042405380 https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg 1 chow 0.692517 True Tibetan_mastiff 0.058279 True fur_coat 0.054449 False
8 666057090499244032 https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg 1 shopping_cart 0.962465 False shopping_basket 0.014594 False golden_retriever 0.007959 True
9 666058600524156928 https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg 1 miniature_poodle 0.201493 True komondor 0.192305 True soft-coated_wheaten_terrier 0.082086 True
10 666063827256086533 https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg 1 golden_retriever 0.775930 True Tibetan_mastiff 0.093718 True Labrador_retriever 0.072427 True
11 666071193221509120 https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg 1 Gordon_setter 0.503672 True Yorkshire_terrier 0.174201 True Pekinese 0.109454 True
12 666073100786774016 https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg 1 Walker_hound 0.260857 True English_foxhound 0.175382 True Ibizan_hound 0.097471 True
13 666082916733198337 https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg 1 pug 0.489814 True bull_mastiff 0.404722 True French_bulldog 0.048960 True
14 666094000022159362 https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg 1 bloodhound 0.195217 True German_shepherd 0.078260 True malinois 0.075628 True
15 666099513787052032 https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg 1 Lhasa 0.582330 True Shih-Tzu 0.166192 True Dandie_Dinmont 0.089688 True
16 666102155909144576 https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg 1 English_setter 0.298617 True Newfoundland 0.149842 True borzoi 0.133649 True
17 666104133288665088 https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg 1 hen 0.965932 False cock 0.033919 False partridge 0.000052 False
18 666268910803644416 https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg 1 desktop_computer 0.086502 False desk 0.085547 False bookcase 0.079480 False
19 666273097616637952 https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg 1 Italian_greyhound 0.176053 True toy_terrier 0.111884 True basenji 0.111152 True
20 666287406224695296 https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg 1 Maltese_dog 0.857531 True toy_poodle 0.063064 True miniature_poodle 0.025581 True
21 666293911632134144 https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg 1 three-toed_sloth 0.914671 False otter 0.015250 False great_grey_owl 0.013207 False
22 666337882303524864 https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg 1 ox 0.416669 False Newfoundland 0.278407 True groenendael 0.102643 True
23 666345417576210432 https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg 1 golden_retriever 0.858744 True Chesapeake_Bay_retriever 0.054787 True Labrador_retriever 0.014241 True
24 666353288456101888 https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg 1 malamute 0.336874 True Siberian_husky 0.147655 True Eskimo_dog 0.093412 True
25 666362758909284353 https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg 1 guinea_pig 0.996496 False skunk 0.002402 False hamster 0.000461 False
26 666373753744588802 https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg 1 soft-coated_wheaten_terrier 0.326467 True Afghan_hound 0.259551 True briard 0.206803 True
27 666396247373291520 https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg 1 Chihuahua 0.978108 True toy_terrier 0.009397 True papillon 0.004577 True
28 666407126856765440 https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg 1 black-and-tan_coonhound 0.529139 True bloodhound 0.244220 True flat-coated_retriever 0.173810 True
29 666411507551481857 https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg 1 coho 0.404640 False barracouta 0.271485 False gar 0.189945 False
2045 886366144734445568 https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg 1 French_bulldog 0.999201 True Chihuahua 0.000361 True Boston_bull 0.000076 True
2046 886680336477933568 https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg 1 convertible 0.738995 False sports_car 0.139952 False car_wheel 0.044173 False
2047 886736880519319552 https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg 1 kuvasz 0.309706 True Great_Pyrenees 0.186136 True Dandie_Dinmont 0.086346 True
2048 886983233522544640 https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg 2 Chihuahua 0.793469 True toy_terrier 0.143528 True can_opener 0.032253 False
2049 887101392804085760 https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg 1 Samoyed 0.733942 True Eskimo_dog 0.035029 True Staffordshire_bullterrier 0.029705 True
2050 887343217045368832 https://pbs.twimg.com/ext_tw_video_thumb/88734… 1 Mexican_hairless 0.330741 True sea_lion 0.275645 False Weimaraner 0.134203 True
2051 887473957103951883 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg 2 Pembroke 0.809197 True Rhodesian_ridgeback 0.054950 True beagle 0.038915 True
2052 887517139158093824 https://pbs.twimg.com/ext_tw_video_thumb/88751… 1 limousine 0.130432 False tow_truck 0.029175 False shopping_cart 0.026321 False
2053 887705289381826560 https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg 1 basset 0.821664 True redbone 0.087582 True Weimaraner 0.026236 True
2054 888078434458587136 https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg 1 French_bulldog 0.995026 True pug 0.000932 True bull_mastiff 0.000903 True
2055 888202515573088257 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg 2 Pembroke 0.809197 True Rhodesian_ridgeback 0.054950 True beagle 0.038915 True
2056 888554962724278272 https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg 3 Siberian_husky 0.700377 True Eskimo_dog 0.166511 True malamute 0.111411 True
2057 888804989199671297 https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg 1 golden_retriever 0.469760 True Labrador_retriever 0.184172 True English_setter 0.073482 True
2058 888917238123831296 https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg 1 golden_retriever 0.714719 True Tibetan_mastiff 0.120184 True Labrador_retriever 0.105506 True
2059 889278841981685760 https://pbs.twimg.com/ext_tw_video_thumb/88927… 1 whippet 0.626152 True borzoi 0.194742 True Saluki 0.027351 True
2060 889531135344209921 https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg 1 golden_retriever 0.953442 True Labrador_retriever 0.013834 True redbone 0.007958 True
2061 889638837579907072 https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg 1 French_bulldog 0.991650 True boxer 0.002129 True Staffordshire_bullterrier 0.001498 True
2062 889665388333682689 https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg 1 Pembroke 0.966327 True Cardigan 0.027356 True basenji 0.004633 True
2063 889880896479866881 https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg 1 French_bulldog 0.377417 True Labrador_retriever 0.151317 True muzzle 0.082981 False
2064 890006608113172480 https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg 1 Samoyed 0.957979 True Pomeranian 0.013884 True chow 0.008167 True
2065 890240255349198849 https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg 1 Pembroke 0.511319 True Cardigan 0.451038 True Chihuahua 0.029248 True
2066 890609185150312448 https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg 1 Irish_terrier 0.487574 True Irish_setter 0.193054 True Chesapeake_Bay_retriever 0.118184 True
2067 890729181411237888 https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg 2 Pomeranian 0.566142 True Eskimo_dog 0.178406 True Pembroke 0.076507 True
2068 890971913173991426 https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg 1 Appenzeller 0.341703 True Border_collie 0.199287 True ice_lolly 0.193548 False
2069 891087950875897856 https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg 1 Chesapeake_Bay_retriever 0.425595 True Irish_terrier 0.116317 True Indian_elephant 0.076902 False
2070 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg 2 basset 0.555712 True English_springer 0.225770 True German_short-haired_pointer 0.175219 True
2071 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg 1 paper_towel 0.170278 False Labrador_retriever 0.168086 True spatula 0.040836 False
2072 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg 1 Chihuahua 0.716012 True malamute 0.078253 True kelpie 0.031379 True
2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg 1 Chihuahua 0.323581 True Pekinese 0.090647 True papillon 0.068957 True
2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg 1 orange 0.097049 False bagel 0.085851 False banana 0.076110 False

2075 rows × 12 columns

In [80]:
tweets_df
Out[80]:
id favorite_count retweet_count
0 892420643555336193 36776 7840
1 892177421306343426 31672 5804
2 891815181378084864 23851 3844
3 891689557279858688 40107 8009
4 891327558926688256 38309 8650
5 891087950875897856 19274 2884
6 890971913173991426 11230 1898
7 890729181411237888 62019 17505
8 890609185150312448 26513 3975
9 890240255349198849 30358 6813
10 890006608113172480 29208 6797
11 889880896479866881 26516 4631
12 889665388333682689 45738 9291
13 889638837579907072 25766 4171
14 889531135344209921 14392 2093
15 889278841981685760 23969 4959
16 888917238123831296 27706 4186
17 888804989199671297 24333 3952
18 888554962724278272 18818 3246
19 888078434458587136 20720 3213
20 887705289381826560 28741 4986
21 887517139158093824 44078 10896
22 887473957103951883 65556 16723
23 887343217045368832 32041 9703
24 887101392804085760 29117 5523
25 886983233522544640 33325 7121
26 886736880519319552 11434 2995
27 886680336477933568 21372 4140
28 886366144734445568 20125 2943
29 886267009285017600 114 4
2303 666411507551481857 418 312
2304 666407126856765440 100 34
2305 666396247373291520 160 80
2306 666373753744588802 175 85
2307 666362758909284353 739 531
2308 666353288456101888 206 71
2309 666345417576210432 280 129
2310 666337882303524864 183 85
2311 666293911632134144 470 324
2312 666287406224695296 139 62
2313 666273097616637952 161 74
2314 666268910803644416 96 31
2315 666104133288665088 13640 6079
2316 666102155909144576 73 11
2317 666099513787052032 143 63
2318 666094000022159362 156 68
2319 666082916733198337 104 42
2320 666073100786774016 303 147
2321 666071193221509120 137 54
2322 666063827256086533 452 200
2323 666058600524156928 107 55
2324 666057090499244032 274 128
2325 666055525042405380 414 224
2326 666051853826850816 1151 794
2327 666050758794694657 125 55
2328 666049248165822465 99 41
2329 666044226329800704 279 134
2330 666033412701032449 117 43
2331 666029285002620928 121 43
2332 666020888022790149 2453 473

2333 rows × 3 columns

In [34]:
dogs_rates_archive.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), object(10)
memory usage: 313.0+ KB
In [35]:
image_predictions.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id    2075 non-null int64
jpg_url     2075 non-null object
img_num     2075 non-null int64
p1          2075 non-null object
p1_conf     2075 non-null float64
p1_dog      2075 non-null bool
p2          2075 non-null object
p2_conf     2075 non-null float64
p2_dog      2075 non-null bool
p3          2075 non-null object
p3_conf     2075 non-null float64
p3_dog      2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB
In [36]:
tweets_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2333 entries, 0 to 2332
Data columns (total 3 columns):
id                2333 non-null int64
favorite_count    2333 non-null int64
retweet_count     2333 non-null int64
dtypes: int64(3)
memory usage: 54.8 KB
In [42]:
dogs_rates_archive.duplicated().sum()
Out[42]:
0
In [43]:
image_predictions.duplicated().sum()
Out[43]:
0
In [44]:
tweets_df.duplicated().sum()
Out[44]:
0
In [45]:
dogs_rates_archive.tweet_id.value_counts()
Out[45]:
749075273010798592    1
741099773336379392    1
798644042770751489    1
825120256414846976    1
769212283578875904    1
700462010979500032    1
780858289093574656    1
699775878809702401    1
880095782870896641    1
760521673607086080    1
776477788987613185    1
691820333922455552    1
715696743237730304    1
714606013974974464    1
760539183865880579    1
813157409116065792    1
676430933382295552    1
743510151680958465    1
837012587749474308    1
833722901757046785    1
818259473185828864    1
670704688707301377    1
667160273090932737    1
674394782723014656    1
672082170312290304    1
670093938074779648    1
759923798737051648    1
809920764300447744    1
805487436403003392    1
838085839343206401    1
                     ..
763956972077010945    1
870308999962521604    1
720775346191278080    1
785927819176054784    1
783347506784731136    1
775733305207554048    1
834209720923721728    1
825026590719483904    1
758405701903519748    1
668986018524233728    1
690938899477221376    1
667911425562669056    1
754482103782404096    1
713175907180089344    1
669015743032369152    1
672068090318987265    1
816829038950027264    1
683773439333797890    1
674291837063053312    1
837482249356513284    1
767500508068192258    1
773922284943896577    1
673342308415348736    1
886054160059072513    1
748307329658011649    1
715360349751484417    1
666817836334096384    1
794926597468000259    1
673705679337693185    1
700151421916807169    1
Name: tweet_id, Length: 2356, dtype: int64

quality

dogs_rates_archive table
  • name column has none,a,an,.. values instead of NAN
  • there are columns we don’t need (in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp)
  • wrong data types(timestamp,tweet_id).
  • there are retweeted tweets and we only want original tweeets
  • rating_numerator and rating_denominator have wrong ratings, like 960/0 , 24/7, 9/11, 165/150 ##### image_predictions table
  • predicition 1,2,3 column are write as abbrevation p1,2,3
  • there are tweets with no images as image_predictions table has 2075 observations and the archive table has 2356 observations
  • tweet id is int64not object
  • three predictions of the breed of dogs,but one of them is the most confident.
tweet_df table

change name of id column to tweet_id to be consistent

tidines

  • dog stages is one variable but in 4 columns
  • three datasets instead of one .

Clean

In [5]:
dogs_rates_archive_clean = dogs_rates_archive.copy()
image_predictions_clean = image_predictions.copy()
tweets_df_clean = tweets_df.copy()

dogs_rates_archive: name column has none,a,an,.. values instead of NAN

Define

Replace values that have uppercase and lowercase characters with NAN

code
In [6]:
mask1 = dogs_rates_archive_clean.name.str.isupper()
mask2 = dogs_rates_archive_clean.name.str.islower()
column_name = 'name'
dogs_rates_archive_clean.loc[(mask1 | mask2), column_name] = np.nan
In [7]:
dogs_rates_archive_clean['name'].replace('None', np.nan, inplace=True)
Test
In [9]:
dogs_rates_archive_clean
Out[9]:
tweet_id in_reply_to_status_id in_reply_to_user_id timestamp source text retweeted_status_id retweeted_status_user_id retweeted_status_timestamp expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
0 892420643555336193 NaN NaN 2017-08-01 16:23:56 +0000 <a href=”http://twitter.com/download/iphone” r… This is Phineas. He’s a mystical boy. Only eve… NaN NaN NaN https://twitter.com/dog_rates/status/892420643… 13 10 Phineas None None None None
1 892177421306343426 NaN NaN 2017-08-01 00:17:27 +0000 <a href=”http://twitter.com/download/iphone” r… This is Tilly. She’s just checking pup on you…. NaN NaN NaN https://twitter.com/dog_rates/status/892177421… 13 10 Tilly None None None None
2 891815181378084864 NaN NaN 2017-07-31 00:18:03 +0000 <a href=”http://twitter.com/download/iphone” r… This is Archie. He is a rare Norwegian Pouncin… NaN NaN NaN https://twitter.com/dog_rates/status/891815181… 12 10 Archie None None None None
3 891689557279858688 NaN NaN 2017-07-30 15:58:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Darla. She commenced a snooze mid meal… NaN NaN NaN https://twitter.com/dog_rates/status/891689557… 13 10 Darla None None None None
4 891327558926688256 NaN NaN 2017-07-29 16:00:24 +0000 <a href=”http://twitter.com/download/iphone” r… This is Franklin. He would like you to stop ca… NaN NaN NaN https://twitter.com/dog_rates/status/891327558… 12 10 Franklin None None None None
5 891087950875897856 NaN NaN 2017-07-29 00:08:17 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a majestic great white breaching … NaN NaN NaN https://twitter.com/dog_rates/status/891087950… 13 10 NaN None None None None
6 890971913173991426 NaN NaN 2017-07-28 16:27:12 +0000 <a href=”http://twitter.com/download/iphone” r… Meet Jax. He enjoys ice cream so much he gets … NaN NaN NaN https://gofundme.com/ydvmve-surgery-for-jax,ht… 13 10 Jax None None None None
7 890729181411237888 NaN NaN 2017-07-28 00:22:40 +0000 <a href=”http://twitter.com/download/iphone” r… When you watch your owner call another dog a g… NaN NaN NaN https://twitter.com/dog_rates/status/890729181… 13 10 NaN None None None None
8 890609185150312448 NaN NaN 2017-07-27 16:25:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Zoey. She doesn’t want to be one of th… NaN NaN NaN https://twitter.com/dog_rates/status/890609185… 13 10 Zoey None None None None
9 890240255349198849 NaN NaN 2017-07-26 15:59:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Cassie. She is a college pup. Studying… NaN NaN NaN https://twitter.com/dog_rates/status/890240255… 14 10 Cassie doggo None None None
10 890006608113172480 NaN NaN 2017-07-26 00:31:25 +0000 <a href=”http://twitter.com/download/iphone” r… This is Koda. He is a South Australian decksha… NaN NaN NaN https://twitter.com/dog_rates/status/890006608… 13 10 Koda None None None None
11 889880896479866881 NaN NaN 2017-07-25 16:11:53 +0000 <a href=”http://twitter.com/download/iphone” r… This is Bruno. He is a service shark. Only get… NaN NaN NaN https://twitter.com/dog_rates/status/889880896… 13 10 Bruno None None None None
12 889665388333682689 NaN NaN 2017-07-25 01:55:32 +0000 <a href=”http://twitter.com/download/iphone” r… Here’s a puppo that seems to be on the fence a… NaN NaN NaN https://twitter.com/dog_rates/status/889665388… 13 10 NaN None None None puppo
13 889638837579907072 NaN NaN 2017-07-25 00:10:02 +0000 <a href=”http://twitter.com/download/iphone” r… This is Ted. He does his best. Sometimes that’… NaN NaN NaN https://twitter.com/dog_rates/status/889638837… 12 10 Ted None None None None
14 889531135344209921 NaN NaN 2017-07-24 17:02:04 +0000 <a href=”http://twitter.com/download/iphone” r… This is Stuart. He’s sporting his favorite fan… NaN NaN NaN https://twitter.com/dog_rates/status/889531135… 13 10 Stuart None None None puppo
15 889278841981685760 NaN NaN 2017-07-24 00:19:32 +0000 <a href=”http://twitter.com/download/iphone” r… This is Oliver. You’re witnessing one of his m… NaN NaN NaN https://twitter.com/dog_rates/status/889278841… 13 10 Oliver None None None None
16 888917238123831296 NaN NaN 2017-07-23 00:22:39 +0000 <a href=”http://twitter.com/download/iphone” r… This is Jim. He found a fren. Taught him how t… NaN NaN NaN https://twitter.com/dog_rates/status/888917238… 12 10 Jim None None None None
17 888804989199671297 NaN NaN 2017-07-22 16:56:37 +0000 <a href=”http://twitter.com/download/iphone” r… This is Zeke. He has a new stick. Very proud o… NaN NaN NaN https://twitter.com/dog_rates/status/888804989… 13 10 Zeke None None None None
18 888554962724278272 NaN NaN 2017-07-22 00:23:06 +0000 <a href=”http://twitter.com/download/iphone” r… This is Ralphus. He’s powering up. Attempting … NaN NaN NaN https://twitter.com/dog_rates/status/888554962… 13 10 Ralphus None None None None
19 888202515573088257 NaN NaN 2017-07-21 01:02:36 +0000 <a href=”http://twitter.com/download/iphone” r… RT @dog_rates: This is Canela. She attempted s… 8.874740e+17 4.196984e+09 2017-07-19 00:47:34 +0000 https://twitter.com/dog_rates/status/887473957… 13 10 Canela None None None None
20 888078434458587136 NaN NaN 2017-07-20 16:49:33 +0000 <a href=”http://twitter.com/download/iphone” r… This is Gerald. He was just told he didn’t get… NaN NaN NaN https://twitter.com/dog_rates/status/888078434… 12 10 Gerald None None None None
21 887705289381826560 NaN NaN 2017-07-19 16:06:48 +0000 <a href=”http://twitter.com/download/iphone” r… This is Jeffrey. He has a monopoly on the pool… NaN NaN NaN https://twitter.com/dog_rates/status/887705289… 13 10 Jeffrey None None None None
22 887517139158093824 NaN NaN 2017-07-19 03:39:09 +0000 <a href=”http://twitter.com/download/iphone” r… I’ve yet to rate a Venezuelan Hover Wiener. Th… NaN NaN NaN https://twitter.com/dog_rates/status/887517139… 14 10 NaN None None None None
23 887473957103951883 NaN NaN 2017-07-19 00:47:34 +0000 <a href=”http://twitter.com/download/iphone” r… This is Canela. She attempted some fancy porch… NaN NaN NaN https://twitter.com/dog_rates/status/887473957… 13 10 Canela None None None None
24 887343217045368832 NaN NaN 2017-07-18 16:08:03 +0000 <a href=”http://twitter.com/download/iphone” r… You may not have known you needed to see this … NaN NaN NaN https://twitter.com/dog_rates/status/887343217… 13 10 NaN None None None None
25 887101392804085760 NaN NaN 2017-07-18 00:07:08 +0000 <a href=”http://twitter.com/download/iphone” r… This… is a Jubilant Antarctic House Bear. We… NaN NaN NaN https://twitter.com/dog_rates/status/887101392… 12 10 NaN None None None None
26 886983233522544640 NaN NaN 2017-07-17 16:17:36 +0000 <a href=”http://twitter.com/download/iphone” r… This is Maya. She’s very shy. Rarely leaves he… NaN NaN NaN https://twitter.com/dog_rates/status/886983233… 13 10 Maya None None None None
27 886736880519319552 NaN NaN 2017-07-16 23:58:41 +0000 <a href=”http://twitter.com/download/iphone” r… This is Mingus. He’s a wonderful father to his… NaN NaN NaN https://www.gofundme.com/mingusneedsus,https:/… 13 10 Mingus None None None None
28 886680336477933568 NaN NaN 2017-07-16 20:14:00 +0000 <a href=”http://twitter.com/download/iphone” r… This is Derek. He’s late for a dog meeting. 13… NaN NaN NaN https://twitter.com/dog_rates/status/886680336… 13 10 Derek None None None None
29 886366144734445568 NaN NaN 2017-07-15 23:25:31 +0000 <a href=”http://twitter.com/download/iphone” r… This is Roscoe. Another pupper fallen victim t… NaN NaN NaN https://twitter.com/dog_rates/status/886366144… 12 10 Roscoe None None pupper None
2326 666411507551481857 NaN NaN 2015-11-17 00:24:19 +0000 <a href=”http://twitter.com/download/iphone” r… This is quite the dog. Gets really excited whe… NaN NaN NaN https://twitter.com/dog_rates/status/666411507… 2 10 NaN None None None None
2327 666407126856765440 NaN NaN 2015-11-17 00:06:54 +0000 <a href=”http://twitter.com/download/iphone” r… This is a southern Vesuvius bumblegruff. Can d… NaN NaN NaN https://twitter.com/dog_rates/status/666407126… 7 10 NaN None None None None
2328 666396247373291520 NaN NaN 2015-11-16 23:23:41 +0000 <a href=”http://twitter.com/download/iphone” r… Oh goodness. A super rare northeast Qdoba kang… NaN NaN NaN https://twitter.com/dog_rates/status/666396247… 9 10 NaN None None None None
2329 666373753744588802 NaN NaN 2015-11-16 21:54:18 +0000 <a href=”http://twitter.com/download/iphone” r… Those are sunglasses and a jean jacket. 11/10 … NaN NaN NaN https://twitter.com/dog_rates/status/666373753… 11 10 NaN None None None None
2330 666362758909284353 NaN NaN 2015-11-16 21:10:36 +0000 <a href=”http://twitter.com/download/iphone” r… Unique dog here. Very small. Lives in containe… NaN NaN NaN https://twitter.com/dog_rates/status/666362758… 6 10 NaN None None None None
2331 666353288456101888 NaN NaN 2015-11-16 20:32:58 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a mixed Asiago from the Galápagos… NaN NaN NaN https://twitter.com/dog_rates/status/666353288… 8 10 NaN None None None None
2332 666345417576210432 NaN NaN 2015-11-16 20:01:42 +0000 <a href=”http://twitter.com/download/iphone” r… Look at this jokester thinking seat belt laws … NaN NaN NaN https://twitter.com/dog_rates/status/666345417… 10 10 NaN None None None None
2333 666337882303524864 NaN NaN 2015-11-16 19:31:45 +0000 <a href=”http://twitter.com/download/iphone” r… This is an extremely rare horned Parthenon. No… NaN NaN NaN https://twitter.com/dog_rates/status/666337882… 9 10 NaN None None None None
2334 666293911632134144 NaN NaN 2015-11-16 16:37:02 +0000 <a href=”http://twitter.com/download/iphone” r… This is a funny dog. Weird toes. Won’t come do… NaN NaN NaN https://twitter.com/dog_rates/status/666293911… 3 10 NaN None None None None
2335 666287406224695296 NaN NaN 2015-11-16 16:11:11 +0000 <a href=”http://twitter.com/download/iphone” r… This is an Albanian 3 1/2 legged Episcopalian… NaN NaN NaN https://twitter.com/dog_rates/status/666287406… 1 2 NaN None None None None
2336 666273097616637952 NaN NaN 2015-11-16 15:14:19 +0000 <a href=”http://twitter.com/download/iphone” r… Can take selfies 11/10 https://t.co/ws2AMaNwPW NaN NaN NaN https://twitter.com/dog_rates/status/666273097… 11 10 NaN None None None None
2337 666268910803644416 NaN NaN 2015-11-16 14:57:41 +0000 <a href=”http://twitter.com/download/iphone” r… Very concerned about fellow dog trapped in com… NaN NaN NaN https://twitter.com/dog_rates/status/666268910… 10 10 NaN None None None None
2338 666104133288665088 NaN NaN 2015-11-16 04:02:55 +0000 <a href=”http://twitter.com/download/iphone” r… Not familiar with this breed. No tail (weird)…. NaN NaN NaN https://twitter.com/dog_rates/status/666104133… 1 10 NaN None None None None
2339 666102155909144576 NaN NaN 2015-11-16 03:55:04 +0000 <a href=”http://twitter.com/download/iphone” r… Oh my. Here you are seeing an Adobe Setter giv… NaN NaN NaN https://twitter.com/dog_rates/status/666102155… 11 10 NaN None None None None
2340 666099513787052032 NaN NaN 2015-11-16 03:44:34 +0000 <a href=”http://twitter.com/download/iphone” r… Can stand on stump for what seems like a while… NaN NaN NaN https://twitter.com/dog_rates/status/666099513… 8 10 NaN None None None None
2341 666094000022159362 NaN NaN 2015-11-16 03:22:39 +0000 <a href=”http://twitter.com/download/iphone” r… This appears to be a Mongolian Presbyterian mi… NaN NaN NaN https://twitter.com/dog_rates/status/666094000… 9 10 NaN None None None None
2342 666082916733198337 NaN NaN 2015-11-16 02:38:37 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a well-established sunblockerspan… NaN NaN NaN https://twitter.com/dog_rates/status/666082916… 6 10 NaN None None None None
2343 666073100786774016 NaN NaN 2015-11-16 01:59:36 +0000 <a href=”http://twitter.com/download/iphone” r… Let’s hope this flight isn’t Malaysian (lol). … NaN NaN NaN https://twitter.com/dog_rates/status/666073100… 10 10 NaN None None None None
2344 666071193221509120 NaN NaN 2015-11-16 01:52:02 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a northern speckled Rhododendron…. NaN NaN NaN https://twitter.com/dog_rates/status/666071193… 9 10 NaN None None None None
2345 666063827256086533 NaN NaN 2015-11-16 01:22:45 +0000 <a href=”http://twitter.com/download/iphone” r… This is the happiest dog you will ever see. Ve… NaN NaN NaN https://twitter.com/dog_rates/status/666063827… 10 10 NaN None None None None
2346 666058600524156928 NaN NaN 2015-11-16 01:01:59 +0000 <a href=”http://twitter.com/download/iphone” r… Here is the Rand Paul of retrievers folks! He’… NaN NaN NaN https://twitter.com/dog_rates/status/666058600… 8 10 NaN None None None None
2347 666057090499244032 NaN NaN 2015-11-16 00:55:59 +0000 <a href=”http://twitter.com/download/iphone” r… My oh my. This is a rare blond Canadian terrie… NaN NaN NaN https://twitter.com/dog_rates/status/666057090… 9 10 NaN None None None None
2348 666055525042405380 NaN NaN 2015-11-16 00:49:46 +0000 <a href=”http://twitter.com/download/iphone” r… Here is a Siberian heavily armored polar bear … NaN NaN NaN https://twitter.com/dog_rates/status/666055525… 10 10 NaN None None None None
2349 666051853826850816 NaN NaN 2015-11-16 00:35:11 +0000 <a href=”http://twitter.com/download/iphone” r… This is an odd dog. Hard on the outside but lo… NaN NaN NaN https://twitter.com/dog_rates/status/666051853… 2 10 NaN None None None None
2350 666050758794694657 NaN NaN 2015-11-16 00:30:50 +0000 <a href=”http://twitter.com/download/iphone” r… This is a truly beautiful English Wilson Staff… NaN NaN NaN https://twitter.com/dog_rates/status/666050758… 10 10 NaN None None None None
2351 666049248165822465 NaN NaN 2015-11-16 00:24:50 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a 1949 1st generation vulpix. Enj… NaN NaN NaN https://twitter.com/dog_rates/status/666049248… 5 10 NaN None None None None
2352 666044226329800704 NaN NaN 2015-11-16 00:04:52 +0000 <a href=”http://twitter.com/download/iphone” r… This is a purebred Piers Morgan. Loves to Netf… NaN NaN NaN https://twitter.com/dog_rates/status/666044226… 6 10 NaN None None None None
2353 666033412701032449 NaN NaN 2015-11-15 23:21:54 +0000 <a href=”http://twitter.com/download/iphone” r… Here is a very happy pup. Big fan of well-main… NaN NaN NaN https://twitter.com/dog_rates/status/666033412… 9 10 NaN None None None None
2354 666029285002620928 NaN NaN 2015-11-15 23:05:30 +0000 <a href=”http://twitter.com/download/iphone” r… This is a western brown Mitsubishi terrier. Up… NaN NaN NaN https://twitter.com/dog_rates/status/666029285… 7 10 NaN None None None None
2355 666020888022790149 NaN NaN 2015-11-15 22:32:08 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a Japanese Irish Setter. Lost eye… NaN NaN NaN https://twitter.com/dog_rates/status/666020888… 8 10 NaN None None None None

2356 rows × 17 columns

Define

rating_numerator and rating_denominator have wrong ratings, like 960/0 , 24/7, 9/11, 165/150

Code
In [10]:
#drop rows where denominator != 10
dogs_rates_archive_clean.drop(dogs_rates_archive_clean[dogs_rates_archive_clean.rating_denominator != 10].index, inplace = True)
In [11]:
#drop rows rhere numerator more than 15
dogs_rates_archive_clean.drop(dogs_rates_archive_clean[dogs_rates_archive_clean.rating_numerator > 15].index, inplace = True)
Test
In [12]:
dogs_rates_archive_clean.rating_numerator
Out[12]:
0       13
1       13
2       12
3       13
4       12
5       13
6       13
7       13
8       13
9       14
10      13
11      13
12      13
13      12
14      13
15      13
16      12
17      13
18      13
19      13
20      12
21      13
22      14
23      13
24      13
25      12
26      13
27      13
28      13
29      12
        ..
2325    10
2326     2
2327     7
2328     9
2329    11
2330     6
2331     8
2332    10
2333     9
2334     3
2336    11
2337    10
2338     1
2339    11
2340     8
2341     9
2342     6
2343    10
2344     9
2345    10
2346     8
2347     9
2348    10
2349     2
2350    10
2351     5
2352     6
2353     9
2354     7
2355     8
Name: rating_numerator, Length: 2323, dtype: int64
In [13]:
dogs_rates_archive_clean.rating_denominator
Out[13]:
0       10
1       10
2       10
3       10
4       10
5       10
6       10
7       10
8       10
9       10
10      10
11      10
12      10
13      10
14      10
15      10
16      10
17      10
18      10
19      10
20      10
21      10
22      10
23      10
24      10
25      10
26      10
27      10
28      10
29      10
        ..
2325    10
2326    10
2327    10
2328    10
2329    10
2330    10
2331    10
2332    10
2333    10
2334    10
2336    10
2337    10
2338    10
2339    10
2340    10
2341    10
2342    10
2343    10
2344    10
2345    10
2346    10
2347    10
2348    10
2349    10
2350    10
2351    10
2352    10
2353    10
2354    10
2355    10
Name: rating_denominator, Length: 2323, dtype: int64
Define

delete retweeted tweets

Code
In [14]:
dogs_rates_archive_clean = dogs_rates_archive_clean[pd.isnull(dogs_rates_archive_clean.retweeted_status_id)]
Test
In [15]:
dogs_rates_archive_clean.retweeted_status_id.value_counts(dropna=False)
Out[15]:
NaN    2144
Name: retweeted_status_id, dtype: int64
Define

drop columns we don’t need (in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp)

Code
In [16]:
dogs_rates_archive_clean.drop(columns=['in_reply_to_status_id', 'in_reply_to_user_id', 'retweeted_status_id', 'retweeted_status_user_id', 'retweeted_status_timestamp'], inplace = True)
Test
In [17]:
dogs_rates_archive_clean
Out[17]:
tweet_id timestamp source text expanded_urls rating_numerator rating_denominator name doggo floofer pupper puppo
0 892420643555336193 2017-08-01 16:23:56 +0000 <a href=”http://twitter.com/download/iphone” r… This is Phineas. He’s a mystical boy. Only eve… https://twitter.com/dog_rates/status/892420643… 13 10 Phineas None None None None
1 892177421306343426 2017-08-01 00:17:27 +0000 <a href=”http://twitter.com/download/iphone” r… This is Tilly. She’s just checking pup on you…. https://twitter.com/dog_rates/status/892177421… 13 10 Tilly None None None None
2 891815181378084864 2017-07-31 00:18:03 +0000 <a href=”http://twitter.com/download/iphone” r… This is Archie. He is a rare Norwegian Pouncin… https://twitter.com/dog_rates/status/891815181… 12 10 Archie None None None None
3 891689557279858688 2017-07-30 15:58:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Darla. She commenced a snooze mid meal… https://twitter.com/dog_rates/status/891689557… 13 10 Darla None None None None
4 891327558926688256 2017-07-29 16:00:24 +0000 <a href=”http://twitter.com/download/iphone” r… This is Franklin. He would like you to stop ca… https://twitter.com/dog_rates/status/891327558… 12 10 Franklin None None None None
5 891087950875897856 2017-07-29 00:08:17 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a majestic great white breaching … https://twitter.com/dog_rates/status/891087950… 13 10 NaN None None None None
6 890971913173991426 2017-07-28 16:27:12 +0000 <a href=”http://twitter.com/download/iphone” r… Meet Jax. He enjoys ice cream so much he gets … https://gofundme.com/ydvmve-surgery-for-jax,ht… 13 10 Jax None None None None
7 890729181411237888 2017-07-28 00:22:40 +0000 <a href=”http://twitter.com/download/iphone” r… When you watch your owner call another dog a g… https://twitter.com/dog_rates/status/890729181… 13 10 NaN None None None None
8 890609185150312448 2017-07-27 16:25:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Zoey. She doesn’t want to be one of th… https://twitter.com/dog_rates/status/890609185… 13 10 Zoey None None None None
9 890240255349198849 2017-07-26 15:59:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Cassie. She is a college pup. Studying… https://twitter.com/dog_rates/status/890240255… 14 10 Cassie doggo None None None
10 890006608113172480 2017-07-26 00:31:25 +0000 <a href=”http://twitter.com/download/iphone” r… This is Koda. He is a South Australian decksha… https://twitter.com/dog_rates/status/890006608… 13 10 Koda None None None None
11 889880896479866881 2017-07-25 16:11:53 +0000 <a href=”http://twitter.com/download/iphone” r… This is Bruno. He is a service shark. Only get… https://twitter.com/dog_rates/status/889880896… 13 10 Bruno None None None None
12 889665388333682689 2017-07-25 01:55:32 +0000 <a href=”http://twitter.com/download/iphone” r… Here’s a puppo that seems to be on the fence a… https://twitter.com/dog_rates/status/889665388… 13 10 NaN None None None puppo
13 889638837579907072 2017-07-25 00:10:02 +0000 <a href=”http://twitter.com/download/iphone” r… This is Ted. He does his best. Sometimes that’… https://twitter.com/dog_rates/status/889638837… 12 10 Ted None None None None
14 889531135344209921 2017-07-24 17:02:04 +0000 <a href=”http://twitter.com/download/iphone” r… This is Stuart. He’s sporting his favorite fan… https://twitter.com/dog_rates/status/889531135… 13 10 Stuart None None None puppo
15 889278841981685760 2017-07-24 00:19:32 +0000 <a href=”http://twitter.com/download/iphone” r… This is Oliver. You’re witnessing one of his m… https://twitter.com/dog_rates/status/889278841… 13 10 Oliver None None None None
16 888917238123831296 2017-07-23 00:22:39 +0000 <a href=”http://twitter.com/download/iphone” r… This is Jim. He found a fren. Taught him how t… https://twitter.com/dog_rates/status/888917238… 12 10 Jim None None None None
17 888804989199671297 2017-07-22 16:56:37 +0000 <a href=”http://twitter.com/download/iphone” r… This is Zeke. He has a new stick. Very proud o… https://twitter.com/dog_rates/status/888804989… 13 10 Zeke None None None None
18 888554962724278272 2017-07-22 00:23:06 +0000 <a href=”http://twitter.com/download/iphone” r… This is Ralphus. He’s powering up. Attempting … https://twitter.com/dog_rates/status/888554962… 13 10 Ralphus None None None None
20 888078434458587136 2017-07-20 16:49:33 +0000 <a href=”http://twitter.com/download/iphone” r… This is Gerald. He was just told he didn’t get… https://twitter.com/dog_rates/status/888078434… 12 10 Gerald None None None None
21 887705289381826560 2017-07-19 16:06:48 +0000 <a href=”http://twitter.com/download/iphone” r… This is Jeffrey. He has a monopoly on the pool… https://twitter.com/dog_rates/status/887705289… 13 10 Jeffrey None None None None
22 887517139158093824 2017-07-19 03:39:09 +0000 <a href=”http://twitter.com/download/iphone” r… I’ve yet to rate a Venezuelan Hover Wiener. Th… https://twitter.com/dog_rates/status/887517139… 14 10 NaN None None None None
23 887473957103951883 2017-07-19 00:47:34 +0000 <a href=”http://twitter.com/download/iphone” r… This is Canela. She attempted some fancy porch… https://twitter.com/dog_rates/status/887473957… 13 10 Canela None None None None
24 887343217045368832 2017-07-18 16:08:03 +0000 <a href=”http://twitter.com/download/iphone” r… You may not have known you needed to see this … https://twitter.com/dog_rates/status/887343217… 13 10 NaN None None None None
25 887101392804085760 2017-07-18 00:07:08 +0000 <a href=”http://twitter.com/download/iphone” r… This… is a Jubilant Antarctic House Bear. We… https://twitter.com/dog_rates/status/887101392… 12 10 NaN None None None None
26 886983233522544640 2017-07-17 16:17:36 +0000 <a href=”http://twitter.com/download/iphone” r… This is Maya. She’s very shy. Rarely leaves he… https://twitter.com/dog_rates/status/886983233… 13 10 Maya None None None None
27 886736880519319552 2017-07-16 23:58:41 +0000 <a href=”http://twitter.com/download/iphone” r… This is Mingus. He’s a wonderful father to his… https://www.gofundme.com/mingusneedsus,https:/… 13 10 Mingus None None None None
28 886680336477933568 2017-07-16 20:14:00 +0000 <a href=”http://twitter.com/download/iphone” r… This is Derek. He’s late for a dog meeting. 13… https://twitter.com/dog_rates/status/886680336… 13 10 Derek None None None None
29 886366144734445568 2017-07-15 23:25:31 +0000 <a href=”http://twitter.com/download/iphone” r… This is Roscoe. Another pupper fallen victim t… https://twitter.com/dog_rates/status/886366144… 12 10 Roscoe None None pupper None
30 886267009285017600 2017-07-15 16:51:35 +0000 <a href=”http://twitter.com/download/iphone” r… @NonWhiteHat @MayhewMayhem omg hello tanner yo… NaN 12 10 NaN None None None None
2325 666418789513326592 2015-11-17 00:53:15 +0000 <a href=”http://twitter.com/download/iphone” r… This is Walter. He is an Alaskan Terrapin. Lov… https://twitter.com/dog_rates/status/666418789… 10 10 Walter None None None None
2326 666411507551481857 2015-11-17 00:24:19 +0000 <a href=”http://twitter.com/download/iphone” r… This is quite the dog. Gets really excited whe… https://twitter.com/dog_rates/status/666411507… 2 10 NaN None None None None
2327 666407126856765440 2015-11-17 00:06:54 +0000 <a href=”http://twitter.com/download/iphone” r… This is a southern Vesuvius bumblegruff. Can d… https://twitter.com/dog_rates/status/666407126… 7 10 NaN None None None None
2328 666396247373291520 2015-11-16 23:23:41 +0000 <a href=”http://twitter.com/download/iphone” r… Oh goodness. A super rare northeast Qdoba kang… https://twitter.com/dog_rates/status/666396247… 9 10 NaN None None None None
2329 666373753744588802 2015-11-16 21:54:18 +0000 <a href=”http://twitter.com/download/iphone” r… Those are sunglasses and a jean jacket. 11/10 … https://twitter.com/dog_rates/status/666373753… 11 10 NaN None None None None
2330 666362758909284353 2015-11-16 21:10:36 +0000 <a href=”http://twitter.com/download/iphone” r… Unique dog here. Very small. Lives in containe… https://twitter.com/dog_rates/status/666362758… 6 10 NaN None None None None
2331 666353288456101888 2015-11-16 20:32:58 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a mixed Asiago from the Galápagos… https://twitter.com/dog_rates/status/666353288… 8 10 NaN None None None None
2332 666345417576210432 2015-11-16 20:01:42 +0000 <a href=”http://twitter.com/download/iphone” r… Look at this jokester thinking seat belt laws … https://twitter.com/dog_rates/status/666345417… 10 10 NaN None None None None
2333 666337882303524864 2015-11-16 19:31:45 +0000 <a href=”http://twitter.com/download/iphone” r… This is an extremely rare horned Parthenon. No… https://twitter.com/dog_rates/status/666337882… 9 10 NaN None None None None
2334 666293911632134144 2015-11-16 16:37:02 +0000 <a href=”http://twitter.com/download/iphone” r… This is a funny dog. Weird toes. Won’t come do… https://twitter.com/dog_rates/status/666293911… 3 10 NaN None None None None
2336 666273097616637952 2015-11-16 15:14:19 +0000 <a href=”http://twitter.com/download/iphone” r… Can take selfies 11/10 https://t.co/ws2AMaNwPW https://twitter.com/dog_rates/status/666273097… 11 10 NaN None None None None
2337 666268910803644416 2015-11-16 14:57:41 +0000 <a href=”http://twitter.com/download/iphone” r… Very concerned about fellow dog trapped in com… https://twitter.com/dog_rates/status/666268910… 10 10 NaN None None None None
2338 666104133288665088 2015-11-16 04:02:55 +0000 <a href=”http://twitter.com/download/iphone” r… Not familiar with this breed. No tail (weird)…. https://twitter.com/dog_rates/status/666104133… 1 10 NaN None None None None
2339 666102155909144576 2015-11-16 03:55:04 +0000 <a href=”http://twitter.com/download/iphone” r… Oh my. Here you are seeing an Adobe Setter giv… https://twitter.com/dog_rates/status/666102155… 11 10 NaN None None None None
2340 666099513787052032 2015-11-16 03:44:34 +0000 <a href=”http://twitter.com/download/iphone” r… Can stand on stump for what seems like a while… https://twitter.com/dog_rates/status/666099513… 8 10 NaN None None None None
2341 666094000022159362 2015-11-16 03:22:39 +0000 <a href=”http://twitter.com/download/iphone” r… This appears to be a Mongolian Presbyterian mi… https://twitter.com/dog_rates/status/666094000… 9 10 NaN None None None None
2342 666082916733198337 2015-11-16 02:38:37 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a well-established sunblockerspan… https://twitter.com/dog_rates/status/666082916… 6 10 NaN None None None None
2343 666073100786774016 2015-11-16 01:59:36 +0000 <a href=”http://twitter.com/download/iphone” r… Let’s hope this flight isn’t Malaysian (lol). … https://twitter.com/dog_rates/status/666073100… 10 10 NaN None None None None
2344 666071193221509120 2015-11-16 01:52:02 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a northern speckled Rhododendron…. https://twitter.com/dog_rates/status/666071193… 9 10 NaN None None None None
2345 666063827256086533 2015-11-16 01:22:45 +0000 <a href=”http://twitter.com/download/iphone” r… This is the happiest dog you will ever see. Ve… https://twitter.com/dog_rates/status/666063827… 10 10 NaN None None None None
2346 666058600524156928 2015-11-16 01:01:59 +0000 <a href=”http://twitter.com/download/iphone” r… Here is the Rand Paul of retrievers folks! He’… https://twitter.com/dog_rates/status/666058600… 8 10 NaN None None None None
2347 666057090499244032 2015-11-16 00:55:59 +0000 <a href=”http://twitter.com/download/iphone” r… My oh my. This is a rare blond Canadian terrie… https://twitter.com/dog_rates/status/666057090… 9 10 NaN None None None None
2348 666055525042405380 2015-11-16 00:49:46 +0000 <a href=”http://twitter.com/download/iphone” r… Here is a Siberian heavily armored polar bear … https://twitter.com/dog_rates/status/666055525… 10 10 NaN None None None None
2349 666051853826850816 2015-11-16 00:35:11 +0000 <a href=”http://twitter.com/download/iphone” r… This is an odd dog. Hard on the outside but lo… https://twitter.com/dog_rates/status/666051853… 2 10 NaN None None None None
2350 666050758794694657 2015-11-16 00:30:50 +0000 <a href=”http://twitter.com/download/iphone” r… This is a truly beautiful English Wilson Staff… https://twitter.com/dog_rates/status/666050758… 10 10 NaN None None None None
2351 666049248165822465 2015-11-16 00:24:50 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a 1949 1st generation vulpix. Enj… https://twitter.com/dog_rates/status/666049248… 5 10 NaN None None None None
2352 666044226329800704 2015-11-16 00:04:52 +0000 <a href=”http://twitter.com/download/iphone” r… This is a purebred Piers Morgan. Loves to Netf… https://twitter.com/dog_rates/status/666044226… 6 10 NaN None None None None
2353 666033412701032449 2015-11-15 23:21:54 +0000 <a href=”http://twitter.com/download/iphone” r… Here is a very happy pup. Big fan of well-main… https://twitter.com/dog_rates/status/666033412… 9 10 NaN None None None None
2354 666029285002620928 2015-11-15 23:05:30 +0000 <a href=”http://twitter.com/download/iphone” r… This is a western brown Mitsubishi terrier. Up… https://twitter.com/dog_rates/status/666029285… 7 10 NaN None None None None
2355 666020888022790149 2015-11-15 22:32:08 +0000 <a href=”http://twitter.com/download/iphone” r… Here we have a Japanese Irish Setter. Lost eye… https://twitter.com/dog_rates/status/666020888… 8 10 NaN None None None None

2144 rows × 12 columns

define

correct data types(timestamp,tweet_id)

Code, Test
In [18]:
dogs_rates_archive_clean.astype({'timestamp': 'datetime64', 'tweet_id': 'object'},copy = False).dtypes
Out[18]:
tweet_id                      object
timestamp             datetime64[ns]
source                        object
text                          object
expanded_urls                 object
rating_numerator               int64
rating_denominator             int64
name                          object
doggo                         object
floofer                       object
pupper                        object
puppo                         object
dtype: object
Define

rename columns with more accurate names

Code
In [19]:
image_predictions_clean.rename(columns={'p1': 'Prediction1', 'p2': 'Prediction2', 'p3': 'Prediction3'}, inplace=True)
image_predictions_clean.rename(columns={'p1_conf': 'Prediction1_conf', 'p1_dog': 'Prediction1_dog'}, inplace=True)
image_predictions_clean.rename(columns={'p2_conf': 'Prediction2_conf', 'p2_dog': 'Prediction2_dog'}, inplace=True)
image_predictions_clean.rename(columns={'p3_conf': 'Prediction3_conf', 'p3_dog': 'Prediction3_dog'}, inplace=True)
Define

change datatype of tweet_id to object

In [20]:
image_predictions_clean.astype({'tweet_id': 'object'},copy = False).dtypes
Out[20]:
tweet_id             object
jpg_url              object
img_num               int64
Prediction1          object
Prediction1_conf    float64
Prediction1_dog        bool
Prediction2          object
Prediction2_conf    float64
Prediction2_dog        bool
Prediction3          object
Prediction3_conf    float64
Prediction3_dog        bool
dtype: object
Test
In [21]:
image_predictions_clean.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id            2075 non-null int64
jpg_url             2075 non-null object
img_num             2075 non-null int64
Prediction1         2075 non-null object
Prediction1_conf    2075 non-null float64
Prediction1_dog     2075 non-null bool
Prediction2         2075 non-null object
Prediction2_conf    2075 non-null float64
Prediction2_dog     2075 non-null bool
Prediction3         2075 non-null object
Prediction3_conf    2075 non-null float64
Prediction3_dog     2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB
Define

remove predictions other than the most confident one

Code
In [22]:
prediction = []
confidence = []
def most_confident_prediction(dataframe):
    if dataframe['Prediction1_dog'] == True:
        prediction.append(dataframe['Prediction1'])
        confidence.append(dataframe['Prediction1_conf'])
    elif dataframe['Prediction2_dog'] == True:
        prediction.append(dataframe['Prediction2'])
        confidence.append(dataframe['Prediction2_conf'])
    elif dataframe['Prediction3_dog'] == True:
        prediction.append(dataframe['Prediction3'])
        confidence.append(dataframe['Prediction3_conf'])   
    else:
        prediction.append('NAN')
        confidence.append(0)

image_predictions_clean.apply(most_confident_prediction, axis=1)
image_predictions_clean['prediction'] = prediction
image_predictions_clean['confidence'] = confidence
        
    
Test
In [23]:
image_predictions_clean
Out[23]:
tweet_id jpg_url img_num Prediction1 Prediction1_conf Prediction1_dog Prediction2 Prediction2_conf Prediction2_dog Prediction3 Prediction3_conf Prediction3_dog prediction confidence
0 666020888022790149 https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg 1 Welsh_springer_spaniel 0.465074 True collie 0.156665 True Shetland_sheepdog 0.061428 True Welsh_springer_spaniel 0.465074
1 666029285002620928 https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg 1 redbone 0.506826 True miniature_pinscher 0.074192 True Rhodesian_ridgeback 0.072010 True redbone 0.506826
2 666033412701032449 https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg 1 German_shepherd 0.596461 True malinois 0.138584 True bloodhound 0.116197 True German_shepherd 0.596461
3 666044226329800704 https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg 1 Rhodesian_ridgeback 0.408143 True redbone 0.360687 True miniature_pinscher 0.222752 True Rhodesian_ridgeback 0.408143
4 666049248165822465 https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg 1 miniature_pinscher 0.560311 True Rottweiler 0.243682 True Doberman 0.154629 True miniature_pinscher 0.560311
5 666050758794694657 https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg 1 Bernese_mountain_dog 0.651137 True English_springer 0.263788 True Greater_Swiss_Mountain_dog 0.016199 True Bernese_mountain_dog 0.651137
6 666051853826850816 https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg 1 box_turtle 0.933012 False mud_turtle 0.045885 False terrapin 0.017885 False NAN 0.000000
7 666055525042405380 https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg 1 chow 0.692517 True Tibetan_mastiff 0.058279 True fur_coat 0.054449 False chow 0.692517
8 666057090499244032 https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg 1 shopping_cart 0.962465 False shopping_basket 0.014594 False golden_retriever 0.007959 True golden_retriever 0.007959
9 666058600524156928 https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg 1 miniature_poodle 0.201493 True komondor 0.192305 True soft-coated_wheaten_terrier 0.082086 True miniature_poodle 0.201493
10 666063827256086533 https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg 1 golden_retriever 0.775930 True Tibetan_mastiff 0.093718 True Labrador_retriever 0.072427 True golden_retriever 0.775930
11 666071193221509120 https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg 1 Gordon_setter 0.503672 True Yorkshire_terrier 0.174201 True Pekinese 0.109454 True Gordon_setter 0.503672
12 666073100786774016 https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg 1 Walker_hound 0.260857 True English_foxhound 0.175382 True Ibizan_hound 0.097471 True Walker_hound 0.260857
13 666082916733198337 https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg 1 pug 0.489814 True bull_mastiff 0.404722 True French_bulldog 0.048960 True pug 0.489814
14 666094000022159362 https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg 1 bloodhound 0.195217 True German_shepherd 0.078260 True malinois 0.075628 True bloodhound 0.195217
15 666099513787052032 https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg 1 Lhasa 0.582330 True Shih-Tzu 0.166192 True Dandie_Dinmont 0.089688 True Lhasa 0.582330
16 666102155909144576 https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg 1 English_setter 0.298617 True Newfoundland 0.149842 True borzoi 0.133649 True English_setter 0.298617
17 666104133288665088 https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg 1 hen 0.965932 False cock 0.033919 False partridge 0.000052 False NAN 0.000000
18 666268910803644416 https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg 1 desktop_computer 0.086502 False desk 0.085547 False bookcase 0.079480 False NAN 0.000000
19 666273097616637952 https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg 1 Italian_greyhound 0.176053 True toy_terrier 0.111884 True basenji 0.111152 True Italian_greyhound 0.176053
20 666287406224695296 https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg 1 Maltese_dog 0.857531 True toy_poodle 0.063064 True miniature_poodle 0.025581 True Maltese_dog 0.857531
21 666293911632134144 https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg 1 three-toed_sloth 0.914671 False otter 0.015250 False great_grey_owl 0.013207 False NAN 0.000000
22 666337882303524864 https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg 1 ox 0.416669 False Newfoundland 0.278407 True groenendael 0.102643 True Newfoundland 0.278407
23 666345417576210432 https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg 1 golden_retriever 0.858744 True Chesapeake_Bay_retriever 0.054787 True Labrador_retriever 0.014241 True golden_retriever 0.858744
24 666353288456101888 https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg 1 malamute 0.336874 True Siberian_husky 0.147655 True Eskimo_dog 0.093412 True malamute 0.336874
25 666362758909284353 https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg 1 guinea_pig 0.996496 False skunk 0.002402 False hamster 0.000461 False NAN 0.000000
26 666373753744588802 https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg 1 soft-coated_wheaten_terrier 0.326467 True Afghan_hound 0.259551 True briard 0.206803 True soft-coated_wheaten_terrier 0.326467
27 666396247373291520 https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg 1 Chihuahua 0.978108 True toy_terrier 0.009397 True papillon 0.004577 True Chihuahua 0.978108
28 666407126856765440 https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg 1 black-and-tan_coonhound 0.529139 True bloodhound 0.244220 True flat-coated_retriever 0.173810 True black-and-tan_coonhound 0.529139
29 666411507551481857 https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg 1 coho 0.404640 False barracouta 0.271485 False gar 0.189945 False NAN 0.000000
2045 886366144734445568 https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg 1 French_bulldog 0.999201 True Chihuahua 0.000361 True Boston_bull 0.000076 True French_bulldog 0.999201
2046 886680336477933568 https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg 1 convertible 0.738995 False sports_car 0.139952 False car_wheel 0.044173 False NAN 0.000000
2047 886736880519319552 https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg 1 kuvasz 0.309706 True Great_Pyrenees 0.186136 True Dandie_Dinmont 0.086346 True kuvasz 0.309706
2048 886983233522544640 https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg 2 Chihuahua 0.793469 True toy_terrier 0.143528 True can_opener 0.032253 False Chihuahua 0.793469
2049 887101392804085760 https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg 1 Samoyed 0.733942 True Eskimo_dog 0.035029 True Staffordshire_bullterrier 0.029705 True Samoyed 0.733942
2050 887343217045368832 https://pbs.twimg.com/ext_tw_video_thumb/88734… 1 Mexican_hairless 0.330741 True sea_lion 0.275645 False Weimaraner 0.134203 True Mexican_hairless 0.330741
2051 887473957103951883 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg 2 Pembroke 0.809197 True Rhodesian_ridgeback 0.054950 True beagle 0.038915 True Pembroke 0.809197
2052 887517139158093824 https://pbs.twimg.com/ext_tw_video_thumb/88751… 1 limousine 0.130432 False tow_truck 0.029175 False shopping_cart 0.026321 False NAN 0.000000
2053 887705289381826560 https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg 1 basset 0.821664 True redbone 0.087582 True Weimaraner 0.026236 True basset 0.821664
2054 888078434458587136 https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg 1 French_bulldog 0.995026 True pug 0.000932 True bull_mastiff 0.000903 True French_bulldog 0.995026
2055 888202515573088257 https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg 2 Pembroke 0.809197 True Rhodesian_ridgeback 0.054950 True beagle 0.038915 True Pembroke 0.809197
2056 888554962724278272 https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg 3 Siberian_husky 0.700377 True Eskimo_dog 0.166511 True malamute 0.111411 True Siberian_husky 0.700377
2057 888804989199671297 https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg 1 golden_retriever 0.469760 True Labrador_retriever 0.184172 True English_setter 0.073482 True golden_retriever 0.469760
2058 888917238123831296 https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg 1 golden_retriever 0.714719 True Tibetan_mastiff 0.120184 True Labrador_retriever 0.105506 True golden_retriever 0.714719
2059 889278841981685760 https://pbs.twimg.com/ext_tw_video_thumb/88927… 1 whippet 0.626152 True borzoi 0.194742 True Saluki 0.027351 True whippet 0.626152
2060 889531135344209921 https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg 1 golden_retriever 0.953442 True Labrador_retriever 0.013834 True redbone 0.007958 True golden_retriever 0.953442
2061 889638837579907072 https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg 1 French_bulldog 0.991650 True boxer 0.002129 True Staffordshire_bullterrier 0.001498 True French_bulldog 0.991650
2062 889665388333682689 https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg 1 Pembroke 0.966327 True Cardigan 0.027356 True basenji 0.004633 True Pembroke 0.966327
2063 889880896479866881 https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg 1 French_bulldog 0.377417 True Labrador_retriever 0.151317 True muzzle 0.082981 False French_bulldog 0.377417
2064 890006608113172480 https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg 1 Samoyed 0.957979 True Pomeranian 0.013884 True chow 0.008167 True Samoyed 0.957979
2065 890240255349198849 https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg 1 Pembroke 0.511319 True Cardigan 0.451038 True Chihuahua 0.029248 True Pembroke 0.511319
2066 890609185150312448 https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg 1 Irish_terrier 0.487574 True Irish_setter 0.193054 True Chesapeake_Bay_retriever 0.118184 True Irish_terrier 0.487574
2067 890729181411237888 https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg 2 Pomeranian 0.566142 True Eskimo_dog 0.178406 True Pembroke 0.076507 True Pomeranian 0.566142
2068 890971913173991426 https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg 1 Appenzeller 0.341703 True Border_collie 0.199287 True ice_lolly 0.193548 False Appenzeller 0.341703
2069 891087950875897856 https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg 1 Chesapeake_Bay_retriever 0.425595 True Irish_terrier 0.116317 True Indian_elephant 0.076902 False Chesapeake_Bay_retriever 0.425595
2070 891327558926688256 https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg 2 basset 0.555712 True English_springer 0.225770 True German_short-haired_pointer 0.175219 True basset 0.555712
2071 891689557279858688 https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg 1 paper_towel 0.170278 False Labrador_retriever 0.168086 True spatula 0.040836 False Labrador_retriever 0.168086
2072 891815181378084864 https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg 1 Chihuahua 0.716012 True malamute 0.078253 True kelpie 0.031379 True Chihuahua 0.716012
2073 892177421306343426 https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg 1 Chihuahua 0.323581 True Pekinese 0.090647 True papillon 0.068957 True Chihuahua 0.323581
2074 892420643555336193 https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg 1 orange 0.097049 False bagel 0.085851 False banana 0.076110 False NAN 0.000000

2075 rows × 14 columns

Define

merge prediction and confidence colujmns with dog_rates_archive and delete entries withh no image prediction

Code
In [24]:
dogs_rates_archive_clean = pd.merge(dogs_rates_archive_clean, image_predictions_clean[['tweet_id', 'prediction', 'confidence']],
                                   on = 'tweet_id', how = 'inner')
Test
In [25]:
dogs_rates_archive_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1971 entries, 0 to 1970
Data columns (total 14 columns):
tweet_id              1971 non-null int64
timestamp             1971 non-null object
source                1971 non-null object
text                  1971 non-null object
expanded_urls         1971 non-null object
rating_numerator      1971 non-null int64
rating_denominator    1971 non-null int64
name                  1342 non-null object
doggo                 1971 non-null object
floofer               1971 non-null object
pupper                1971 non-null object
puppo                 1971 non-null object
prediction            1971 non-null object
confidence            1971 non-null float64
dtypes: float64(1), int64(3), object(10)
memory usage: 231.0+ KB
Define

make a new column (stages) which has the stages of dogs and delete columns(‘floofer’, ‘puppo’, ‘doggo’, ‘pupper’)

Code
In [26]:
dogs_rates_archive_clean['stages'] = dogs_rates_archive_clean.doggo + dogs_rates_archive_clean.floofer + dogs_rates_archive_clean.pupper + dogs_rates_archive_clean.puppo
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'NoneNoneNoneNone'] = None
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'doggoNoneNoneNone'] = 'doggo'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'NoneNonepupperNone'] = 'pupper'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'NoneNoneNonepuppo'] = 'puppo'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'NoneflooferNoneNone'] = 'floofer'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'doggoNoneNonepuppo'] = 'doggo'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'doggoNonepupperNone'] = 'doggo'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'doggoflooferNoneNone'] = 'doggo'



dogs_rates_archive_clean.drop(columns=['floofer', 'puppo', 'doggo', 'pupper'], inplace = True)
Test
In [27]:
dogs_rates_archive_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1971 entries, 0 to 1970
Data columns (total 11 columns):
tweet_id              1971 non-null int64
timestamp             1971 non-null object
source                1971 non-null object
text                  1971 non-null object
expanded_urls         1971 non-null object
rating_numerator      1971 non-null int64
rating_denominator    1971 non-null int64
name                  1342 non-null object
prediction            1971 non-null object
confidence            1971 non-null float64
stages                305 non-null object
dtypes: float64(1), int64(3), object(7)
memory usage: 184.8+ KB
Define

change data type of stages column to category

Code
In [28]:
dogs_rates_archive_clean.astype({'stages': 'category'},copy = False).dtypes
Out[28]:
tweet_id                 int64
timestamp               object
source                  object
text                    object
expanded_urls           object
rating_numerator         int64
rating_denominator       int64
name                    object
prediction              object
confidence             float64
stages                category
dtype: object
Define

rename id column in tweets_df to tweet_id

Code
In [29]:
tweets_df_clean.rename(columns={'id':'tweet_id'}, inplace=True)
Define

add tweets_df dataframe to dogs rates archive

Code
In [30]:
dogs_rates_archive_clean = pd.merge(dogs_rates_archive_clean, tweets_df_clean,
                                   on = 'tweet_id', how = 'inner')
Test
In [31]:
dogs_rates_archive_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1964 entries, 0 to 1963
Data columns (total 13 columns):
tweet_id              1964 non-null int64
timestamp             1964 non-null object
source                1964 non-null object
text                  1964 non-null object
expanded_urls         1964 non-null object
rating_numerator      1964 non-null int64
rating_denominator    1964 non-null int64
name                  1335 non-null object
prediction            1964 non-null object
confidence            1964 non-null float64
stages                304 non-null object
favorite_count        1964 non-null int64
retweet_count         1964 non-null int64
dtypes: float64(1), int64(5), object(7)
memory usage: 214.8+ KB

Storing data

In [32]:
dogs_rates_archive_clean.to_csv('twitter_archive_master.csv',index = False)

Analyzing, and visualization

In [33]:
df=pd.read_csv('twitter_archive_master.csv')
In [34]:
df.head()
Out[34]:
tweet_id timestamp source text expanded_urls rating_numerator rating_denominator name prediction confidence stages favorite_count retweet_count
0 892420643555336193 2017-08-01 16:23:56 +0000 <a href=”http://twitter.com/download/iphone” r… This is Phineas. He’s a mystical boy. Only eve… https://twitter.com/dog_rates/status/892420643… 13 10 Phineas NAN 0.000000 NaN 36776 7840
1 892177421306343426 2017-08-01 00:17:27 +0000 <a href=”http://twitter.com/download/iphone” r… This is Tilly. She’s just checking pup on you…. https://twitter.com/dog_rates/status/892177421… 13 10 Tilly Chihuahua 0.323581 NaN 31672 5804
2 891815181378084864 2017-07-31 00:18:03 +0000 <a href=”http://twitter.com/download/iphone” r… This is Archie. He is a rare Norwegian Pouncin… https://twitter.com/dog_rates/status/891815181… 12 10 Archie Chihuahua 0.716012 NaN 23851 3844
3 891689557279858688 2017-07-30 15:58:51 +0000 <a href=”http://twitter.com/download/iphone” r… This is Darla. She commenced a snooze mid meal… https://twitter.com/dog_rates/status/891689557… 13 10 Darla Labrador_retriever 0.168086 NaN 40107 8009
4 891327558926688256 2017-07-29 16:00:24 +0000 <a href=”http://twitter.com/download/iphone” r… This is Franklin. He would like you to stop ca… https://twitter.com/dog_rates/status/891327558… 12 10 Franklin basset 0.555712 NaN 38309 8650
In [35]:
#get the number of eveery dog stage
df_stages = df.groupby('stages')['tweet_id'].count()
df_stages
Out[35]:
stages
doggo       73
floofer      7
pupper     202
puppo       22
Name: tweet_id, dtype: int64

visualization

In [39]:
#plot the count of edog stages in a bar graph
plt.pie(df_stages,labels=['doggo','floofer','pupper','puppo'])
plt.title('The distribution of dog stages');
Out[39]:
{'tags': ['remove_input']}

insights

from this plot:

The dog stage that has the highest number of tweets is pupper

The dog stage that has the lowest number of tweets is floofer

visualization

In [37]:
# plot the ralationship between retweet count an favourite counts
plt.scatter(df['favorite_count'],df['retweet_count'])
plt.title('the relation between likes count and retweet count')
plt.xlabel('likes count')
plt.ylabel('retweett count');
In [38]:
# calculate the correlation coeffecient
df['favorite_count'].corr(df['retweet_count'])
Out[38]:
0.92886023983010291

insights

there is a linear relation between likes count and retweet count