WeRateDogs wrangle Report¶
Data wrangling is a core skill that everyone who works with data should be familiar with since so much of the world’s data isn’t clean this process is divided to 3 steps:
1.Gathering data
2.Assesing data
3.cleaning data
Gather¶
Gathering data is the first step in data wrangling before it we don’t have any data after it we have This project involved gathering data from three different sources as listed below:
1.An existing file named twitter_archived_enhanced.csv , I read it in pandas dataframe.
2.download a file from internet using request.
3.query twitter api for each tweet in twitter_archive file using tweepy library
Asses¶
After gathering data the next step is assessing it visually and programmatically to detect quality and tidiness issues
After my assessing of data I found some issues:
quality
dogs_rates_archive
table¶
• name column has none,a,an,.. values instead of NAN
• there are columns we don’t need (in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp)
• wrong data types(timestamp,tweet_id).
• there are retweeted tweets and we only want original tweeets
• rating_numerator and rating_denominator have wrong ratings, like 960/0 , 24/7, 9/11, 165/150
image_predictions
table¶
• predicition 1,2,3 column are write as abbrevation p1,2,3
• there are tweets with no images as image_predictions table has 2075 observations and the archive table has 2356 observations
• tweet id is int64not object
• three predictions of the breed of dogs,but one of them is the most confident.
tweet_df
table¶
change name of id column to tweet_id to be consistent
tidines
• dog stages is one variable but in 4 columns
• three datasets instead of one .
Cleaning¶
Cleaning data is the third step in data wrangling. It is the process of fixing the quality and tidiness issues that we identified in the assess step,to make sure the the data is accurate and cleaning. And then being able to analyze our data
Before starting the cleaning process I make a copy of each data set I have, then I tried to correct the issues identified in assessing step.
import requests
import numpy as np
import pandas as pd
import tweepy
import json
import matplotlib.pyplot as plt
from timeit import default_timer as timer
import warnings
warnings.filterwarnings('ignore')
Gather¶
url= 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)
with open('image-predictions.tsv',mode='wb')as file:
file.write(response.content)
dogs_rates_archive=pd.read_csv('twitter-archive-enhanced.csv')
image_predictions=pd.read_csv('image-predictions.tsv',sep= '\t')
import tweepy
consumer_key = 'YOUR CONSUMER KEY'
consumer_secret = 'YOUR CONSUMER SECRET'
access_token = 'YOUR ACCESS TOKEN'
access_secret = 'YOUR ACCESS SECRET'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, parser=tweepy.parsers.JSONParser(),
wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
start = timer()
tweet_list = []
tweet_errors = []
for tweet_id in dogs_rates_archive.tweet_id:
try:
tweet = api.get_status(tweet_id, tweet_mode = 'extended')
tweet_list.append(tweet)
except Exception as e:
print(str(tweet_id) + '_' + str(e))
tweet_errors.append(tweet_id)
end = timer()
print(end-start)
Rate limit reached. Sleeping for: 167 888202515573088257_[{'code': 144, 'message': 'No status found with that ID.'}] 873697596434513921_[{'code': 144, 'message': 'No status found with that ID.'}] 872668790621863937_[{'code': 144, 'message': 'No status found with that ID.'}] 872261713294495745_[{'code': 144, 'message': 'No status found with that ID.'}] 869988702071779329_[{'code': 144, 'message': 'No status found with that ID.'}] 866816280283807744_[{'code': 144, 'message': 'No status found with that ID.'}] 861769973181624320_[{'code': 144, 'message': 'No status found with that ID.'}] 856602993587888130_[{'code': 144, 'message': 'No status found with that ID.'}] 851953902622658560_[{'code': 144, 'message': 'No status found with that ID.'}] 845459076796616705_[{'code': 144, 'message': 'No status found with that ID.'}] 844704788403113984_[{'code': 144, 'message': 'No status found with that ID.'}] 842892208864923648_[{'code': 144, 'message': 'No status found with that ID.'}] 837366284874571778_[{'code': 144, 'message': 'No status found with that ID.'}] 837012587749474308_[{'code': 144, 'message': 'No status found with that ID.'}] 829374341691346946_[{'code': 144, 'message': 'No status found with that ID.'}] 827228250799742977_[{'code': 144, 'message': 'No status found with that ID.'}] 812747805718642688_[{'code': 144, 'message': 'No status found with that ID.'}] 802247111496568832_[{'code': 144, 'message': 'No status found with that ID.'}] 779123168116150273_[{'code': 144, 'message': 'No status found with that ID.'}] 775096608509886464_[{'code': 144, 'message': 'No status found with that ID.'}] 770743923962707968_[{'code': 144, 'message': 'No status found with that ID.'}] Rate limit reached. Sleeping for: 734 754011816964026368_[{'code': 144, 'message': 'No status found with that ID.'}] 680055455951884288_[{'code': 144, 'message': 'No status found with that ID.'}] Rate limit reached. Sleeping for: 732 2084.2182542709998
with open('tweet_json.txt', 'w') as fb:
json.dump(tweet_list, fb)
with open('tweet_json.txt') as file:
tweet_list = json.load(file)
tweets_df = pd.DataFrame(tweet_list, columns = ['id','favorite_count', 'retweet_count'])
tweets_df.to_csv('tweets_df.csv',index = False)
tweets_df = pd.read_csv('tweets_df.csv')
Assess¶
dogs_rates_archive
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892420643555336193 | NaN | NaN | 2017-08-01 16:23:56 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Phineas. He’s a mystical boy. Only eve… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892420643… | 13 | 10 | Phineas | None | None | None | None |
1 | 892177421306343426 | NaN | NaN | 2017-08-01 00:17:27 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Tilly. She’s just checking pup on you…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892177421… | 13 | 10 | Tilly | None | None | None | None |
2 | 891815181378084864 | NaN | NaN | 2017-07-31 00:18:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Archie. He is a rare Norwegian Pouncin… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891815181… | 12 | 10 | Archie | None | None | None | None |
3 | 891689557279858688 | NaN | NaN | 2017-07-30 15:58:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Darla. She commenced a snooze mid meal… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891689557… | 13 | 10 | Darla | None | None | None | None |
4 | 891327558926688256 | NaN | NaN | 2017-07-29 16:00:24 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Franklin. He would like you to stop ca… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891327558… | 12 | 10 | Franklin | None | None | None | None |
5 | 891087950875897856 | NaN | NaN | 2017-07-29 00:08:17 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a majestic great white breaching … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891087950… | 13 | 10 | None | None | None | None | None |
6 | 890971913173991426 | NaN | NaN | 2017-07-28 16:27:12 +0000 | <a href=”http://twitter.com/download/iphone” r… | Meet Jax. He enjoys ice cream so much he gets … | NaN | NaN | NaN | https://gofundme.com/ydvmve-surgery-for-jax,ht… | 13 | 10 | Jax | None | None | None | None |
7 | 890729181411237888 | NaN | NaN | 2017-07-28 00:22:40 +0000 | <a href=”http://twitter.com/download/iphone” r… | When you watch your owner call another dog a g… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890729181… | 13 | 10 | None | None | None | None | None |
8 | 890609185150312448 | NaN | NaN | 2017-07-27 16:25:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Zoey. She doesn’t want to be one of th… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890609185… | 13 | 10 | Zoey | None | None | None | None |
9 | 890240255349198849 | NaN | NaN | 2017-07-26 15:59:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Cassie. She is a college pup. Studying… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890240255… | 14 | 10 | Cassie | doggo | None | None | None |
10 | 890006608113172480 | NaN | NaN | 2017-07-26 00:31:25 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Koda. He is a South Australian decksha… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890006608… | 13 | 10 | Koda | None | None | None | None |
11 | 889880896479866881 | NaN | NaN | 2017-07-25 16:11:53 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Bruno. He is a service shark. Only get… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889880896… | 13 | 10 | Bruno | None | None | None | None |
12 | 889665388333682689 | NaN | NaN | 2017-07-25 01:55:32 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here’s a puppo that seems to be on the fence a… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889665388… | 13 | 10 | None | None | None | None | puppo |
13 | 889638837579907072 | NaN | NaN | 2017-07-25 00:10:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Ted. He does his best. Sometimes that’… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889638837… | 12 | 10 | Ted | None | None | None | None |
14 | 889531135344209921 | NaN | NaN | 2017-07-24 17:02:04 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Stuart. He’s sporting his favorite fan… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889531135… | 13 | 10 | Stuart | None | None | None | puppo |
15 | 889278841981685760 | NaN | NaN | 2017-07-24 00:19:32 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Oliver. You’re witnessing one of his m… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889278841… | 13 | 10 | Oliver | None | None | None | None |
16 | 888917238123831296 | NaN | NaN | 2017-07-23 00:22:39 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Jim. He found a fren. Taught him how t… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888917238… | 12 | 10 | Jim | None | None | None | None |
17 | 888804989199671297 | NaN | NaN | 2017-07-22 16:56:37 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Zeke. He has a new stick. Very proud o… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888804989… | 13 | 10 | Zeke | None | None | None | None |
18 | 888554962724278272 | NaN | NaN | 2017-07-22 00:23:06 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Ralphus. He’s powering up. Attempting … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888554962… | 13 | 10 | Ralphus | None | None | None | None |
19 | 888202515573088257 | NaN | NaN | 2017-07-21 01:02:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | RT @dog_rates: This is Canela. She attempted s… | 8.874740e+17 | 4.196984e+09 | 2017-07-19 00:47:34 +0000 | https://twitter.com/dog_rates/status/887473957… | 13 | 10 | Canela | None | None | None | None |
20 | 888078434458587136 | NaN | NaN | 2017-07-20 16:49:33 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Gerald. He was just told he didn’t get… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888078434… | 12 | 10 | Gerald | None | None | None | None |
21 | 887705289381826560 | NaN | NaN | 2017-07-19 16:06:48 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Jeffrey. He has a monopoly on the pool… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887705289… | 13 | 10 | Jeffrey | None | None | None | None |
22 | 887517139158093824 | NaN | NaN | 2017-07-19 03:39:09 +0000 | <a href=”http://twitter.com/download/iphone” r… | I’ve yet to rate a Venezuelan Hover Wiener. Th… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887517139… | 14 | 10 | such | None | None | None | None |
23 | 887473957103951883 | NaN | NaN | 2017-07-19 00:47:34 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Canela. She attempted some fancy porch… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887473957… | 13 | 10 | Canela | None | None | None | None |
24 | 887343217045368832 | NaN | NaN | 2017-07-18 16:08:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | You may not have known you needed to see this … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887343217… | 13 | 10 | None | None | None | None | None |
25 | 887101392804085760 | NaN | NaN | 2017-07-18 00:07:08 +0000 | <a href=”http://twitter.com/download/iphone” r… | This… is a Jubilant Antarctic House Bear. We… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887101392… | 12 | 10 | None | None | None | None | None |
26 | 886983233522544640 | NaN | NaN | 2017-07-17 16:17:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Maya. She’s very shy. Rarely leaves he… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886983233… | 13 | 10 | Maya | None | None | None | None |
27 | 886736880519319552 | NaN | NaN | 2017-07-16 23:58:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Mingus. He’s a wonderful father to his… | NaN | NaN | NaN | https://www.gofundme.com/mingusneedsus,https:/… | 13 | 10 | Mingus | None | None | None | None |
28 | 886680336477933568 | NaN | NaN | 2017-07-16 20:14:00 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Derek. He’s late for a dog meeting. 13… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886680336… | 13 | 10 | Derek | None | None | None | None |
29 | 886366144734445568 | NaN | NaN | 2017-07-15 23:25:31 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Roscoe. Another pupper fallen victim t… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886366144… | 12 | 10 | Roscoe | None | None | pupper | None |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
2326 | 666411507551481857 | NaN | NaN | 2015-11-17 00:24:19 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is quite the dog. Gets really excited whe… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666411507… | 2 | 10 | quite | None | None | None | None |
2327 | 666407126856765440 | NaN | NaN | 2015-11-17 00:06:54 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a southern Vesuvius bumblegruff. Can d… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666407126… | 7 | 10 | a | None | None | None | None |
2328 | 666396247373291520 | NaN | NaN | 2015-11-16 23:23:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | Oh goodness. A super rare northeast Qdoba kang… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666396247… | 9 | 10 | None | None | None | None | None |
2329 | 666373753744588802 | NaN | NaN | 2015-11-16 21:54:18 +0000 | <a href=”http://twitter.com/download/iphone” r… | Those are sunglasses and a jean jacket. 11/10 … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666373753… | 11 | 10 | None | None | None | None | None |
2330 | 666362758909284353 | NaN | NaN | 2015-11-16 21:10:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | Unique dog here. Very small. Lives in containe… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666362758… | 6 | 10 | None | None | None | None | None |
2331 | 666353288456101888 | NaN | NaN | 2015-11-16 20:32:58 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a mixed Asiago from the Galápagos… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666353288… | 8 | 10 | None | None | None | None | None |
2332 | 666345417576210432 | NaN | NaN | 2015-11-16 20:01:42 +0000 | <a href=”http://twitter.com/download/iphone” r… | Look at this jokester thinking seat belt laws … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666345417… | 10 | 10 | None | None | None | None | None |
2333 | 666337882303524864 | NaN | NaN | 2015-11-16 19:31:45 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an extremely rare horned Parthenon. No… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666337882… | 9 | 10 | an | None | None | None | None |
2334 | 666293911632134144 | NaN | NaN | 2015-11-16 16:37:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a funny dog. Weird toes. Won’t come do… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666293911… | 3 | 10 | a | None | None | None | None |
2335 | 666287406224695296 | NaN | NaN | 2015-11-16 16:11:11 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an Albanian 3 1/2 legged Episcopalian… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666287406… | 1 | 2 | an | None | None | None | None |
2336 | 666273097616637952 | NaN | NaN | 2015-11-16 15:14:19 +0000 | <a href=”http://twitter.com/download/iphone” r… | Can take selfies 11/10 https://t.co/ws2AMaNwPW | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666273097… | 11 | 10 | None | None | None | None | None |
2337 | 666268910803644416 | NaN | NaN | 2015-11-16 14:57:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | Very concerned about fellow dog trapped in com… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666268910… | 10 | 10 | None | None | None | None | None |
2338 | 666104133288665088 | NaN | NaN | 2015-11-16 04:02:55 +0000 | <a href=”http://twitter.com/download/iphone” r… | Not familiar with this breed. No tail (weird)…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666104133… | 1 | 10 | None | None | None | None | None |
2339 | 666102155909144576 | NaN | NaN | 2015-11-16 03:55:04 +0000 | <a href=”http://twitter.com/download/iphone” r… | Oh my. Here you are seeing an Adobe Setter giv… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666102155… | 11 | 10 | None | None | None | None | None |
2340 | 666099513787052032 | NaN | NaN | 2015-11-16 03:44:34 +0000 | <a href=”http://twitter.com/download/iphone” r… | Can stand on stump for what seems like a while… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666099513… | 8 | 10 | None | None | None | None | None |
2341 | 666094000022159362 | NaN | NaN | 2015-11-16 03:22:39 +0000 | <a href=”http://twitter.com/download/iphone” r… | This appears to be a Mongolian Presbyterian mi… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666094000… | 9 | 10 | None | None | None | None | None |
2342 | 666082916733198337 | NaN | NaN | 2015-11-16 02:38:37 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a well-established sunblockerspan… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666082916… | 6 | 10 | None | None | None | None | None |
2343 | 666073100786774016 | NaN | NaN | 2015-11-16 01:59:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | Let’s hope this flight isn’t Malaysian (lol). … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666073100… | 10 | 10 | None | None | None | None | None |
2344 | 666071193221509120 | NaN | NaN | 2015-11-16 01:52:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a northern speckled Rhododendron…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666071193… | 9 | 10 | None | None | None | None | None |
2345 | 666063827256086533 | NaN | NaN | 2015-11-16 01:22:45 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is the happiest dog you will ever see. Ve… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666063827… | 10 | 10 | the | None | None | None | None |
2346 | 666058600524156928 | NaN | NaN | 2015-11-16 01:01:59 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is the Rand Paul of retrievers folks! He’… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666058600… | 8 | 10 | the | None | None | None | None |
2347 | 666057090499244032 | NaN | NaN | 2015-11-16 00:55:59 +0000 | <a href=”http://twitter.com/download/iphone” r… | My oh my. This is a rare blond Canadian terrie… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666057090… | 9 | 10 | a | None | None | None | None |
2348 | 666055525042405380 | NaN | NaN | 2015-11-16 00:49:46 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is a Siberian heavily armored polar bear … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666055525… | 10 | 10 | a | None | None | None | None |
2349 | 666051853826850816 | NaN | NaN | 2015-11-16 00:35:11 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an odd dog. Hard on the outside but lo… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666051853… | 2 | 10 | an | None | None | None | None |
2350 | 666050758794694657 | NaN | NaN | 2015-11-16 00:30:50 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a truly beautiful English Wilson Staff… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666050758… | 10 | 10 | a | None | None | None | None |
2351 | 666049248165822465 | NaN | NaN | 2015-11-16 00:24:50 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a 1949 1st generation vulpix. Enj… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666049248… | 5 | 10 | None | None | None | None | None |
2352 | 666044226329800704 | NaN | NaN | 2015-11-16 00:04:52 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a purebred Piers Morgan. Loves to Netf… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666044226… | 6 | 10 | a | None | None | None | None |
2353 | 666033412701032449 | NaN | NaN | 2015-11-15 23:21:54 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is a very happy pup. Big fan of well-main… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666033412… | 9 | 10 | a | None | None | None | None |
2354 | 666029285002620928 | NaN | NaN | 2015-11-15 23:05:30 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a western brown Mitsubishi terrier. Up… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666029285… | 7 | 10 | a | None | None | None | None |
2355 | 666020888022790149 | NaN | NaN | 2015-11-15 22:32:08 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a Japanese Irish Setter. Lost eye… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666020888… | 8 | 10 | None | None | None | None | None |
2356 rows × 17 columns
image_predictions
tweet_id | jpg_url | img_num | p1 | p1_conf | p1_dog | p2 | p2_conf | p2_dog | p3 | p3_conf | p3_dog | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | Welsh_springer_spaniel | 0.465074 | True | collie | 0.156665 | True | Shetland_sheepdog | 0.061428 | True |
1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | redbone | 0.506826 | True | miniature_pinscher | 0.074192 | True | Rhodesian_ridgeback | 0.072010 | True |
2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | German_shepherd | 0.596461 | True | malinois | 0.138584 | True | bloodhound | 0.116197 | True |
3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | 1 | Rhodesian_ridgeback | 0.408143 | True | redbone | 0.360687 | True | miniature_pinscher | 0.222752 | True |
4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | 1 | miniature_pinscher | 0.560311 | True | Rottweiler | 0.243682 | True | Doberman | 0.154629 | True |
5 | 666050758794694657 | https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg | 1 | Bernese_mountain_dog | 0.651137 | True | English_springer | 0.263788 | True | Greater_Swiss_Mountain_dog | 0.016199 | True |
6 | 666051853826850816 | https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg | 1 | box_turtle | 0.933012 | False | mud_turtle | 0.045885 | False | terrapin | 0.017885 | False |
7 | 666055525042405380 | https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg | 1 | chow | 0.692517 | True | Tibetan_mastiff | 0.058279 | True | fur_coat | 0.054449 | False |
8 | 666057090499244032 | https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg | 1 | shopping_cart | 0.962465 | False | shopping_basket | 0.014594 | False | golden_retriever | 0.007959 | True |
9 | 666058600524156928 | https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg | 1 | miniature_poodle | 0.201493 | True | komondor | 0.192305 | True | soft-coated_wheaten_terrier | 0.082086 | True |
10 | 666063827256086533 | https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg | 1 | golden_retriever | 0.775930 | True | Tibetan_mastiff | 0.093718 | True | Labrador_retriever | 0.072427 | True |
11 | 666071193221509120 | https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg | 1 | Gordon_setter | 0.503672 | True | Yorkshire_terrier | 0.174201 | True | Pekinese | 0.109454 | True |
12 | 666073100786774016 | https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg | 1 | Walker_hound | 0.260857 | True | English_foxhound | 0.175382 | True | Ibizan_hound | 0.097471 | True |
13 | 666082916733198337 | https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg | 1 | pug | 0.489814 | True | bull_mastiff | 0.404722 | True | French_bulldog | 0.048960 | True |
14 | 666094000022159362 | https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg | 1 | bloodhound | 0.195217 | True | German_shepherd | 0.078260 | True | malinois | 0.075628 | True |
15 | 666099513787052032 | https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg | 1 | Lhasa | 0.582330 | True | Shih-Tzu | 0.166192 | True | Dandie_Dinmont | 0.089688 | True |
16 | 666102155909144576 | https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg | 1 | English_setter | 0.298617 | True | Newfoundland | 0.149842 | True | borzoi | 0.133649 | True |
17 | 666104133288665088 | https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg | 1 | hen | 0.965932 | False | cock | 0.033919 | False | partridge | 0.000052 | False |
18 | 666268910803644416 | https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg | 1 | desktop_computer | 0.086502 | False | desk | 0.085547 | False | bookcase | 0.079480 | False |
19 | 666273097616637952 | https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg | 1 | Italian_greyhound | 0.176053 | True | toy_terrier | 0.111884 | True | basenji | 0.111152 | True |
20 | 666287406224695296 | https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg | 1 | Maltese_dog | 0.857531 | True | toy_poodle | 0.063064 | True | miniature_poodle | 0.025581 | True |
21 | 666293911632134144 | https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg | 1 | three-toed_sloth | 0.914671 | False | otter | 0.015250 | False | great_grey_owl | 0.013207 | False |
22 | 666337882303524864 | https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg | 1 | ox | 0.416669 | False | Newfoundland | 0.278407 | True | groenendael | 0.102643 | True |
23 | 666345417576210432 | https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg | 1 | golden_retriever | 0.858744 | True | Chesapeake_Bay_retriever | 0.054787 | True | Labrador_retriever | 0.014241 | True |
24 | 666353288456101888 | https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg | 1 | malamute | 0.336874 | True | Siberian_husky | 0.147655 | True | Eskimo_dog | 0.093412 | True |
25 | 666362758909284353 | https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg | 1 | guinea_pig | 0.996496 | False | skunk | 0.002402 | False | hamster | 0.000461 | False |
26 | 666373753744588802 | https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg | 1 | soft-coated_wheaten_terrier | 0.326467 | True | Afghan_hound | 0.259551 | True | briard | 0.206803 | True |
27 | 666396247373291520 | https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg | 1 | Chihuahua | 0.978108 | True | toy_terrier | 0.009397 | True | papillon | 0.004577 | True |
28 | 666407126856765440 | https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg | 1 | black-and-tan_coonhound | 0.529139 | True | bloodhound | 0.244220 | True | flat-coated_retriever | 0.173810 | True |
29 | 666411507551481857 | https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg | 1 | coho | 0.404640 | False | barracouta | 0.271485 | False | gar | 0.189945 | False |
… | … | … | … | … | … | … | … | … | … | … | … | … |
2045 | 886366144734445568 | https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg | 1 | French_bulldog | 0.999201 | True | Chihuahua | 0.000361 | True | Boston_bull | 0.000076 | True |
2046 | 886680336477933568 | https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg | 1 | convertible | 0.738995 | False | sports_car | 0.139952 | False | car_wheel | 0.044173 | False |
2047 | 886736880519319552 | https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg | 1 | kuvasz | 0.309706 | True | Great_Pyrenees | 0.186136 | True | Dandie_Dinmont | 0.086346 | True |
2048 | 886983233522544640 | https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg | 2 | Chihuahua | 0.793469 | True | toy_terrier | 0.143528 | True | can_opener | 0.032253 | False |
2049 | 887101392804085760 | https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg | 1 | Samoyed | 0.733942 | True | Eskimo_dog | 0.035029 | True | Staffordshire_bullterrier | 0.029705 | True |
2050 | 887343217045368832 | https://pbs.twimg.com/ext_tw_video_thumb/88734… | 1 | Mexican_hairless | 0.330741 | True | sea_lion | 0.275645 | False | Weimaraner | 0.134203 | True |
2051 | 887473957103951883 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True |
2052 | 887517139158093824 | https://pbs.twimg.com/ext_tw_video_thumb/88751… | 1 | limousine | 0.130432 | False | tow_truck | 0.029175 | False | shopping_cart | 0.026321 | False |
2053 | 887705289381826560 | https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg | 1 | basset | 0.821664 | True | redbone | 0.087582 | True | Weimaraner | 0.026236 | True |
2054 | 888078434458587136 | https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg | 1 | French_bulldog | 0.995026 | True | pug | 0.000932 | True | bull_mastiff | 0.000903 | True |
2055 | 888202515573088257 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True |
2056 | 888554962724278272 | https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg | 3 | Siberian_husky | 0.700377 | True | Eskimo_dog | 0.166511 | True | malamute | 0.111411 | True |
2057 | 888804989199671297 | https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg | 1 | golden_retriever | 0.469760 | True | Labrador_retriever | 0.184172 | True | English_setter | 0.073482 | True |
2058 | 888917238123831296 | https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg | 1 | golden_retriever | 0.714719 | True | Tibetan_mastiff | 0.120184 | True | Labrador_retriever | 0.105506 | True |
2059 | 889278841981685760 | https://pbs.twimg.com/ext_tw_video_thumb/88927… | 1 | whippet | 0.626152 | True | borzoi | 0.194742 | True | Saluki | 0.027351 | True |
2060 | 889531135344209921 | https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg | 1 | golden_retriever | 0.953442 | True | Labrador_retriever | 0.013834 | True | redbone | 0.007958 | True |
2061 | 889638837579907072 | https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg | 1 | French_bulldog | 0.991650 | True | boxer | 0.002129 | True | Staffordshire_bullterrier | 0.001498 | True |
2062 | 889665388333682689 | https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg | 1 | Pembroke | 0.966327 | True | Cardigan | 0.027356 | True | basenji | 0.004633 | True |
2063 | 889880896479866881 | https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg | 1 | French_bulldog | 0.377417 | True | Labrador_retriever | 0.151317 | True | muzzle | 0.082981 | False |
2064 | 890006608113172480 | https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg | 1 | Samoyed | 0.957979 | True | Pomeranian | 0.013884 | True | chow | 0.008167 | True |
2065 | 890240255349198849 | https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg | 1 | Pembroke | 0.511319 | True | Cardigan | 0.451038 | True | Chihuahua | 0.029248 | True |
2066 | 890609185150312448 | https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg | 1 | Irish_terrier | 0.487574 | True | Irish_setter | 0.193054 | True | Chesapeake_Bay_retriever | 0.118184 | True |
2067 | 890729181411237888 | https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg | 2 | Pomeranian | 0.566142 | True | Eskimo_dog | 0.178406 | True | Pembroke | 0.076507 | True |
2068 | 890971913173991426 | https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg | 1 | Appenzeller | 0.341703 | True | Border_collie | 0.199287 | True | ice_lolly | 0.193548 | False |
2069 | 891087950875897856 | https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg | 1 | Chesapeake_Bay_retriever | 0.425595 | True | Irish_terrier | 0.116317 | True | Indian_elephant | 0.076902 | False |
2070 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | 2 | basset | 0.555712 | True | English_springer | 0.225770 | True | German_short-haired_pointer | 0.175219 | True |
2071 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | 1 | paper_towel | 0.170278 | False | Labrador_retriever | 0.168086 | True | spatula | 0.040836 | False |
2072 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | 1 | Chihuahua | 0.716012 | True | malamute | 0.078253 | True | kelpie | 0.031379 | True |
2073 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | 1 | Chihuahua | 0.323581 | True | Pekinese | 0.090647 | True | papillon | 0.068957 | True |
2074 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | 1 | orange | 0.097049 | False | bagel | 0.085851 | False | banana | 0.076110 | False |
2075 rows × 12 columns
tweets_df
id | favorite_count | retweet_count | |
---|---|---|---|
0 | 892420643555336193 | 36776 | 7840 |
1 | 892177421306343426 | 31672 | 5804 |
2 | 891815181378084864 | 23851 | 3844 |
3 | 891689557279858688 | 40107 | 8009 |
4 | 891327558926688256 | 38309 | 8650 |
5 | 891087950875897856 | 19274 | 2884 |
6 | 890971913173991426 | 11230 | 1898 |
7 | 890729181411237888 | 62019 | 17505 |
8 | 890609185150312448 | 26513 | 3975 |
9 | 890240255349198849 | 30358 | 6813 |
10 | 890006608113172480 | 29208 | 6797 |
11 | 889880896479866881 | 26516 | 4631 |
12 | 889665388333682689 | 45738 | 9291 |
13 | 889638837579907072 | 25766 | 4171 |
14 | 889531135344209921 | 14392 | 2093 |
15 | 889278841981685760 | 23969 | 4959 |
16 | 888917238123831296 | 27706 | 4186 |
17 | 888804989199671297 | 24333 | 3952 |
18 | 888554962724278272 | 18818 | 3246 |
19 | 888078434458587136 | 20720 | 3213 |
20 | 887705289381826560 | 28741 | 4986 |
21 | 887517139158093824 | 44078 | 10896 |
22 | 887473957103951883 | 65556 | 16723 |
23 | 887343217045368832 | 32041 | 9703 |
24 | 887101392804085760 | 29117 | 5523 |
25 | 886983233522544640 | 33325 | 7121 |
26 | 886736880519319552 | 11434 | 2995 |
27 | 886680336477933568 | 21372 | 4140 |
28 | 886366144734445568 | 20125 | 2943 |
29 | 886267009285017600 | 114 | 4 |
… | … | … | … |
2303 | 666411507551481857 | 418 | 312 |
2304 | 666407126856765440 | 100 | 34 |
2305 | 666396247373291520 | 160 | 80 |
2306 | 666373753744588802 | 175 | 85 |
2307 | 666362758909284353 | 739 | 531 |
2308 | 666353288456101888 | 206 | 71 |
2309 | 666345417576210432 | 280 | 129 |
2310 | 666337882303524864 | 183 | 85 |
2311 | 666293911632134144 | 470 | 324 |
2312 | 666287406224695296 | 139 | 62 |
2313 | 666273097616637952 | 161 | 74 |
2314 | 666268910803644416 | 96 | 31 |
2315 | 666104133288665088 | 13640 | 6079 |
2316 | 666102155909144576 | 73 | 11 |
2317 | 666099513787052032 | 143 | 63 |
2318 | 666094000022159362 | 156 | 68 |
2319 | 666082916733198337 | 104 | 42 |
2320 | 666073100786774016 | 303 | 147 |
2321 | 666071193221509120 | 137 | 54 |
2322 | 666063827256086533 | 452 | 200 |
2323 | 666058600524156928 | 107 | 55 |
2324 | 666057090499244032 | 274 | 128 |
2325 | 666055525042405380 | 414 | 224 |
2326 | 666051853826850816 | 1151 | 794 |
2327 | 666050758794694657 | 125 | 55 |
2328 | 666049248165822465 | 99 | 41 |
2329 | 666044226329800704 | 279 | 134 |
2330 | 666033412701032449 | 117 | 43 |
2331 | 666029285002620928 | 121 | 43 |
2332 | 666020888022790149 | 2453 | 473 |
2333 rows × 3 columns
dogs_rates_archive.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2356 entries, 0 to 2355 Data columns (total 17 columns): tweet_id 2356 non-null int64 in_reply_to_status_id 78 non-null float64 in_reply_to_user_id 78 non-null float64 timestamp 2356 non-null object source 2356 non-null object text 2356 non-null object retweeted_status_id 181 non-null float64 retweeted_status_user_id 181 non-null float64 retweeted_status_timestamp 181 non-null object expanded_urls 2297 non-null object rating_numerator 2356 non-null int64 rating_denominator 2356 non-null int64 name 2356 non-null object doggo 2356 non-null object floofer 2356 non-null object pupper 2356 non-null object puppo 2356 non-null object dtypes: float64(4), int64(3), object(10) memory usage: 313.0+ KB
image_predictions.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2075 entries, 0 to 2074 Data columns (total 12 columns): tweet_id 2075 non-null int64 jpg_url 2075 non-null object img_num 2075 non-null int64 p1 2075 non-null object p1_conf 2075 non-null float64 p1_dog 2075 non-null bool p2 2075 non-null object p2_conf 2075 non-null float64 p2_dog 2075 non-null bool p3 2075 non-null object p3_conf 2075 non-null float64 p3_dog 2075 non-null bool dtypes: bool(3), float64(3), int64(2), object(4) memory usage: 152.1+ KB
tweets_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2333 entries, 0 to 2332 Data columns (total 3 columns): id 2333 non-null int64 favorite_count 2333 non-null int64 retweet_count 2333 non-null int64 dtypes: int64(3) memory usage: 54.8 KB
dogs_rates_archive.duplicated().sum()
0
image_predictions.duplicated().sum()
0
tweets_df.duplicated().sum()
0
dogs_rates_archive.tweet_id.value_counts()
749075273010798592 1 741099773336379392 1 798644042770751489 1 825120256414846976 1 769212283578875904 1 700462010979500032 1 780858289093574656 1 699775878809702401 1 880095782870896641 1 760521673607086080 1 776477788987613185 1 691820333922455552 1 715696743237730304 1 714606013974974464 1 760539183865880579 1 813157409116065792 1 676430933382295552 1 743510151680958465 1 837012587749474308 1 833722901757046785 1 818259473185828864 1 670704688707301377 1 667160273090932737 1 674394782723014656 1 672082170312290304 1 670093938074779648 1 759923798737051648 1 809920764300447744 1 805487436403003392 1 838085839343206401 1 .. 763956972077010945 1 870308999962521604 1 720775346191278080 1 785927819176054784 1 783347506784731136 1 775733305207554048 1 834209720923721728 1 825026590719483904 1 758405701903519748 1 668986018524233728 1 690938899477221376 1 667911425562669056 1 754482103782404096 1 713175907180089344 1 669015743032369152 1 672068090318987265 1 816829038950027264 1 683773439333797890 1 674291837063053312 1 837482249356513284 1 767500508068192258 1 773922284943896577 1 673342308415348736 1 886054160059072513 1 748307329658011649 1 715360349751484417 1 666817836334096384 1 794926597468000259 1 673705679337693185 1 700151421916807169 1 Name: tweet_id, Length: 2356, dtype: int64
quality¶
dogs_rates_archive
table¶
- name column has none,a,an,.. values instead of NAN
- there are columns we don’t need (in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp)
- wrong data types(timestamp,tweet_id).
- there are retweeted tweets and we only want original tweeets
- rating_numerator and rating_denominator have wrong ratings, like 960/0 , 24/7, 9/11, 165/150
#####
image_predictions
table - predicition 1,2,3 column are write as abbrevation p1,2,3
- there are tweets with no images as image_predictions table has 2075 observations and the archive table has 2356 observations
- tweet id is int64not object
- three predictions of the breed of dogs,but one of them is the most confident.
tweet_df
table¶
change name of id column to tweet_id to be consistent
tidines¶
- dog stages is one variable but in 4 columns
- three datasets instead of one .
Clean¶
dogs_rates_archive_clean = dogs_rates_archive.copy()
image_predictions_clean = image_predictions.copy()
tweets_df_clean = tweets_df.copy()
dogs_rates_archive
: name column has none,a,an,.. values instead of NAN¶
Define¶
Replace values that have uppercase and lowercase characters with NAN
code¶
mask1 = dogs_rates_archive_clean.name.str.isupper()
mask2 = dogs_rates_archive_clean.name.str.islower()
column_name = 'name'
dogs_rates_archive_clean.loc[(mask1 | mask2), column_name] = np.nan
dogs_rates_archive_clean['name'].replace('None', np.nan, inplace=True)
Test¶
dogs_rates_archive_clean
tweet_id | in_reply_to_status_id | in_reply_to_user_id | timestamp | source | text | retweeted_status_id | retweeted_status_user_id | retweeted_status_timestamp | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892420643555336193 | NaN | NaN | 2017-08-01 16:23:56 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Phineas. He’s a mystical boy. Only eve… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892420643… | 13 | 10 | Phineas | None | None | None | None |
1 | 892177421306343426 | NaN | NaN | 2017-08-01 00:17:27 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Tilly. She’s just checking pup on you…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/892177421… | 13 | 10 | Tilly | None | None | None | None |
2 | 891815181378084864 | NaN | NaN | 2017-07-31 00:18:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Archie. He is a rare Norwegian Pouncin… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891815181… | 12 | 10 | Archie | None | None | None | None |
3 | 891689557279858688 | NaN | NaN | 2017-07-30 15:58:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Darla. She commenced a snooze mid meal… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891689557… | 13 | 10 | Darla | None | None | None | None |
4 | 891327558926688256 | NaN | NaN | 2017-07-29 16:00:24 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Franklin. He would like you to stop ca… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891327558… | 12 | 10 | Franklin | None | None | None | None |
5 | 891087950875897856 | NaN | NaN | 2017-07-29 00:08:17 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a majestic great white breaching … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/891087950… | 13 | 10 | NaN | None | None | None | None |
6 | 890971913173991426 | NaN | NaN | 2017-07-28 16:27:12 +0000 | <a href=”http://twitter.com/download/iphone” r… | Meet Jax. He enjoys ice cream so much he gets … | NaN | NaN | NaN | https://gofundme.com/ydvmve-surgery-for-jax,ht… | 13 | 10 | Jax | None | None | None | None |
7 | 890729181411237888 | NaN | NaN | 2017-07-28 00:22:40 +0000 | <a href=”http://twitter.com/download/iphone” r… | When you watch your owner call another dog a g… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890729181… | 13 | 10 | NaN | None | None | None | None |
8 | 890609185150312448 | NaN | NaN | 2017-07-27 16:25:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Zoey. She doesn’t want to be one of th… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890609185… | 13 | 10 | Zoey | None | None | None | None |
9 | 890240255349198849 | NaN | NaN | 2017-07-26 15:59:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Cassie. She is a college pup. Studying… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890240255… | 14 | 10 | Cassie | doggo | None | None | None |
10 | 890006608113172480 | NaN | NaN | 2017-07-26 00:31:25 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Koda. He is a South Australian decksha… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/890006608… | 13 | 10 | Koda | None | None | None | None |
11 | 889880896479866881 | NaN | NaN | 2017-07-25 16:11:53 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Bruno. He is a service shark. Only get… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889880896… | 13 | 10 | Bruno | None | None | None | None |
12 | 889665388333682689 | NaN | NaN | 2017-07-25 01:55:32 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here’s a puppo that seems to be on the fence a… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889665388… | 13 | 10 | NaN | None | None | None | puppo |
13 | 889638837579907072 | NaN | NaN | 2017-07-25 00:10:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Ted. He does his best. Sometimes that’… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889638837… | 12 | 10 | Ted | None | None | None | None |
14 | 889531135344209921 | NaN | NaN | 2017-07-24 17:02:04 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Stuart. He’s sporting his favorite fan… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889531135… | 13 | 10 | Stuart | None | None | None | puppo |
15 | 889278841981685760 | NaN | NaN | 2017-07-24 00:19:32 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Oliver. You’re witnessing one of his m… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/889278841… | 13 | 10 | Oliver | None | None | None | None |
16 | 888917238123831296 | NaN | NaN | 2017-07-23 00:22:39 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Jim. He found a fren. Taught him how t… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888917238… | 12 | 10 | Jim | None | None | None | None |
17 | 888804989199671297 | NaN | NaN | 2017-07-22 16:56:37 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Zeke. He has a new stick. Very proud o… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888804989… | 13 | 10 | Zeke | None | None | None | None |
18 | 888554962724278272 | NaN | NaN | 2017-07-22 00:23:06 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Ralphus. He’s powering up. Attempting … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888554962… | 13 | 10 | Ralphus | None | None | None | None |
19 | 888202515573088257 | NaN | NaN | 2017-07-21 01:02:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | RT @dog_rates: This is Canela. She attempted s… | 8.874740e+17 | 4.196984e+09 | 2017-07-19 00:47:34 +0000 | https://twitter.com/dog_rates/status/887473957… | 13 | 10 | Canela | None | None | None | None |
20 | 888078434458587136 | NaN | NaN | 2017-07-20 16:49:33 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Gerald. He was just told he didn’t get… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/888078434… | 12 | 10 | Gerald | None | None | None | None |
21 | 887705289381826560 | NaN | NaN | 2017-07-19 16:06:48 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Jeffrey. He has a monopoly on the pool… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887705289… | 13 | 10 | Jeffrey | None | None | None | None |
22 | 887517139158093824 | NaN | NaN | 2017-07-19 03:39:09 +0000 | <a href=”http://twitter.com/download/iphone” r… | I’ve yet to rate a Venezuelan Hover Wiener. Th… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887517139… | 14 | 10 | NaN | None | None | None | None |
23 | 887473957103951883 | NaN | NaN | 2017-07-19 00:47:34 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Canela. She attempted some fancy porch… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887473957… | 13 | 10 | Canela | None | None | None | None |
24 | 887343217045368832 | NaN | NaN | 2017-07-18 16:08:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | You may not have known you needed to see this … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887343217… | 13 | 10 | NaN | None | None | None | None |
25 | 887101392804085760 | NaN | NaN | 2017-07-18 00:07:08 +0000 | <a href=”http://twitter.com/download/iphone” r… | This… is a Jubilant Antarctic House Bear. We… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/887101392… | 12 | 10 | NaN | None | None | None | None |
26 | 886983233522544640 | NaN | NaN | 2017-07-17 16:17:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Maya. She’s very shy. Rarely leaves he… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886983233… | 13 | 10 | Maya | None | None | None | None |
27 | 886736880519319552 | NaN | NaN | 2017-07-16 23:58:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Mingus. He’s a wonderful father to his… | NaN | NaN | NaN | https://www.gofundme.com/mingusneedsus,https:/… | 13 | 10 | Mingus | None | None | None | None |
28 | 886680336477933568 | NaN | NaN | 2017-07-16 20:14:00 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Derek. He’s late for a dog meeting. 13… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886680336… | 13 | 10 | Derek | None | None | None | None |
29 | 886366144734445568 | NaN | NaN | 2017-07-15 23:25:31 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Roscoe. Another pupper fallen victim t… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/886366144… | 12 | 10 | Roscoe | None | None | pupper | None |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
2326 | 666411507551481857 | NaN | NaN | 2015-11-17 00:24:19 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is quite the dog. Gets really excited whe… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666411507… | 2 | 10 | NaN | None | None | None | None |
2327 | 666407126856765440 | NaN | NaN | 2015-11-17 00:06:54 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a southern Vesuvius bumblegruff. Can d… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666407126… | 7 | 10 | NaN | None | None | None | None |
2328 | 666396247373291520 | NaN | NaN | 2015-11-16 23:23:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | Oh goodness. A super rare northeast Qdoba kang… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666396247… | 9 | 10 | NaN | None | None | None | None |
2329 | 666373753744588802 | NaN | NaN | 2015-11-16 21:54:18 +0000 | <a href=”http://twitter.com/download/iphone” r… | Those are sunglasses and a jean jacket. 11/10 … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666373753… | 11 | 10 | NaN | None | None | None | None |
2330 | 666362758909284353 | NaN | NaN | 2015-11-16 21:10:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | Unique dog here. Very small. Lives in containe… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666362758… | 6 | 10 | NaN | None | None | None | None |
2331 | 666353288456101888 | NaN | NaN | 2015-11-16 20:32:58 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a mixed Asiago from the Galápagos… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666353288… | 8 | 10 | NaN | None | None | None | None |
2332 | 666345417576210432 | NaN | NaN | 2015-11-16 20:01:42 +0000 | <a href=”http://twitter.com/download/iphone” r… | Look at this jokester thinking seat belt laws … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666345417… | 10 | 10 | NaN | None | None | None | None |
2333 | 666337882303524864 | NaN | NaN | 2015-11-16 19:31:45 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an extremely rare horned Parthenon. No… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666337882… | 9 | 10 | NaN | None | None | None | None |
2334 | 666293911632134144 | NaN | NaN | 2015-11-16 16:37:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a funny dog. Weird toes. Won’t come do… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666293911… | 3 | 10 | NaN | None | None | None | None |
2335 | 666287406224695296 | NaN | NaN | 2015-11-16 16:11:11 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an Albanian 3 1/2 legged Episcopalian… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666287406… | 1 | 2 | NaN | None | None | None | None |
2336 | 666273097616637952 | NaN | NaN | 2015-11-16 15:14:19 +0000 | <a href=”http://twitter.com/download/iphone” r… | Can take selfies 11/10 https://t.co/ws2AMaNwPW | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666273097… | 11 | 10 | NaN | None | None | None | None |
2337 | 666268910803644416 | NaN | NaN | 2015-11-16 14:57:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | Very concerned about fellow dog trapped in com… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666268910… | 10 | 10 | NaN | None | None | None | None |
2338 | 666104133288665088 | NaN | NaN | 2015-11-16 04:02:55 +0000 | <a href=”http://twitter.com/download/iphone” r… | Not familiar with this breed. No tail (weird)…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666104133… | 1 | 10 | NaN | None | None | None | None |
2339 | 666102155909144576 | NaN | NaN | 2015-11-16 03:55:04 +0000 | <a href=”http://twitter.com/download/iphone” r… | Oh my. Here you are seeing an Adobe Setter giv… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666102155… | 11 | 10 | NaN | None | None | None | None |
2340 | 666099513787052032 | NaN | NaN | 2015-11-16 03:44:34 +0000 | <a href=”http://twitter.com/download/iphone” r… | Can stand on stump for what seems like a while… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666099513… | 8 | 10 | NaN | None | None | None | None |
2341 | 666094000022159362 | NaN | NaN | 2015-11-16 03:22:39 +0000 | <a href=”http://twitter.com/download/iphone” r… | This appears to be a Mongolian Presbyterian mi… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666094000… | 9 | 10 | NaN | None | None | None | None |
2342 | 666082916733198337 | NaN | NaN | 2015-11-16 02:38:37 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a well-established sunblockerspan… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666082916… | 6 | 10 | NaN | None | None | None | None |
2343 | 666073100786774016 | NaN | NaN | 2015-11-16 01:59:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | Let’s hope this flight isn’t Malaysian (lol). … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666073100… | 10 | 10 | NaN | None | None | None | None |
2344 | 666071193221509120 | NaN | NaN | 2015-11-16 01:52:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a northern speckled Rhododendron…. | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666071193… | 9 | 10 | NaN | None | None | None | None |
2345 | 666063827256086533 | NaN | NaN | 2015-11-16 01:22:45 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is the happiest dog you will ever see. Ve… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666063827… | 10 | 10 | NaN | None | None | None | None |
2346 | 666058600524156928 | NaN | NaN | 2015-11-16 01:01:59 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is the Rand Paul of retrievers folks! He’… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666058600… | 8 | 10 | NaN | None | None | None | None |
2347 | 666057090499244032 | NaN | NaN | 2015-11-16 00:55:59 +0000 | <a href=”http://twitter.com/download/iphone” r… | My oh my. This is a rare blond Canadian terrie… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666057090… | 9 | 10 | NaN | None | None | None | None |
2348 | 666055525042405380 | NaN | NaN | 2015-11-16 00:49:46 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is a Siberian heavily armored polar bear … | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666055525… | 10 | 10 | NaN | None | None | None | None |
2349 | 666051853826850816 | NaN | NaN | 2015-11-16 00:35:11 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an odd dog. Hard on the outside but lo… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666051853… | 2 | 10 | NaN | None | None | None | None |
2350 | 666050758794694657 | NaN | NaN | 2015-11-16 00:30:50 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a truly beautiful English Wilson Staff… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666050758… | 10 | 10 | NaN | None | None | None | None |
2351 | 666049248165822465 | NaN | NaN | 2015-11-16 00:24:50 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a 1949 1st generation vulpix. Enj… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666049248… | 5 | 10 | NaN | None | None | None | None |
2352 | 666044226329800704 | NaN | NaN | 2015-11-16 00:04:52 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a purebred Piers Morgan. Loves to Netf… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666044226… | 6 | 10 | NaN | None | None | None | None |
2353 | 666033412701032449 | NaN | NaN | 2015-11-15 23:21:54 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is a very happy pup. Big fan of well-main… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666033412… | 9 | 10 | NaN | None | None | None | None |
2354 | 666029285002620928 | NaN | NaN | 2015-11-15 23:05:30 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a western brown Mitsubishi terrier. Up… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666029285… | 7 | 10 | NaN | None | None | None | None |
2355 | 666020888022790149 | NaN | NaN | 2015-11-15 22:32:08 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a Japanese Irish Setter. Lost eye… | NaN | NaN | NaN | https://twitter.com/dog_rates/status/666020888… | 8 | 10 | NaN | None | None | None | None |
2356 rows × 17 columns
Define¶
rating_numerator and rating_denominator have wrong ratings, like 960/0 , 24/7, 9/11, 165/150
Code¶
#drop rows where denominator != 10
dogs_rates_archive_clean.drop(dogs_rates_archive_clean[dogs_rates_archive_clean.rating_denominator != 10].index, inplace = True)
#drop rows rhere numerator more than 15
dogs_rates_archive_clean.drop(dogs_rates_archive_clean[dogs_rates_archive_clean.rating_numerator > 15].index, inplace = True)
Test¶
dogs_rates_archive_clean.rating_numerator
0 13 1 13 2 12 3 13 4 12 5 13 6 13 7 13 8 13 9 14 10 13 11 13 12 13 13 12 14 13 15 13 16 12 17 13 18 13 19 13 20 12 21 13 22 14 23 13 24 13 25 12 26 13 27 13 28 13 29 12 .. 2325 10 2326 2 2327 7 2328 9 2329 11 2330 6 2331 8 2332 10 2333 9 2334 3 2336 11 2337 10 2338 1 2339 11 2340 8 2341 9 2342 6 2343 10 2344 9 2345 10 2346 8 2347 9 2348 10 2349 2 2350 10 2351 5 2352 6 2353 9 2354 7 2355 8 Name: rating_numerator, Length: 2323, dtype: int64
dogs_rates_archive_clean.rating_denominator
0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9 10 10 10 11 10 12 10 13 10 14 10 15 10 16 10 17 10 18 10 19 10 20 10 21 10 22 10 23 10 24 10 25 10 26 10 27 10 28 10 29 10 .. 2325 10 2326 10 2327 10 2328 10 2329 10 2330 10 2331 10 2332 10 2333 10 2334 10 2336 10 2337 10 2338 10 2339 10 2340 10 2341 10 2342 10 2343 10 2344 10 2345 10 2346 10 2347 10 2348 10 2349 10 2350 10 2351 10 2352 10 2353 10 2354 10 2355 10 Name: rating_denominator, Length: 2323, dtype: int64
Define¶
delete retweeted tweets
Code¶
dogs_rates_archive_clean = dogs_rates_archive_clean[pd.isnull(dogs_rates_archive_clean.retweeted_status_id)]
Test¶
dogs_rates_archive_clean.retweeted_status_id.value_counts(dropna=False)
NaN 2144 Name: retweeted_status_id, dtype: int64
Define¶
drop columns we don’t need (in_reply_to_status_id, in_reply_to_user_id, retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp)
Code¶
dogs_rates_archive_clean.drop(columns=['in_reply_to_status_id', 'in_reply_to_user_id', 'retweeted_status_id', 'retweeted_status_user_id', 'retweeted_status_timestamp'], inplace = True)
Test¶
dogs_rates_archive_clean
tweet_id | timestamp | source | text | expanded_urls | rating_numerator | rating_denominator | name | doggo | floofer | pupper | puppo | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892420643555336193 | 2017-08-01 16:23:56 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Phineas. He’s a mystical boy. Only eve… | https://twitter.com/dog_rates/status/892420643… | 13 | 10 | Phineas | None | None | None | None |
1 | 892177421306343426 | 2017-08-01 00:17:27 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Tilly. She’s just checking pup on you…. | https://twitter.com/dog_rates/status/892177421… | 13 | 10 | Tilly | None | None | None | None |
2 | 891815181378084864 | 2017-07-31 00:18:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Archie. He is a rare Norwegian Pouncin… | https://twitter.com/dog_rates/status/891815181… | 12 | 10 | Archie | None | None | None | None |
3 | 891689557279858688 | 2017-07-30 15:58:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Darla. She commenced a snooze mid meal… | https://twitter.com/dog_rates/status/891689557… | 13 | 10 | Darla | None | None | None | None |
4 | 891327558926688256 | 2017-07-29 16:00:24 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Franklin. He would like you to stop ca… | https://twitter.com/dog_rates/status/891327558… | 12 | 10 | Franklin | None | None | None | None |
5 | 891087950875897856 | 2017-07-29 00:08:17 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a majestic great white breaching … | https://twitter.com/dog_rates/status/891087950… | 13 | 10 | NaN | None | None | None | None |
6 | 890971913173991426 | 2017-07-28 16:27:12 +0000 | <a href=”http://twitter.com/download/iphone” r… | Meet Jax. He enjoys ice cream so much he gets … | https://gofundme.com/ydvmve-surgery-for-jax,ht… | 13 | 10 | Jax | None | None | None | None |
7 | 890729181411237888 | 2017-07-28 00:22:40 +0000 | <a href=”http://twitter.com/download/iphone” r… | When you watch your owner call another dog a g… | https://twitter.com/dog_rates/status/890729181… | 13 | 10 | NaN | None | None | None | None |
8 | 890609185150312448 | 2017-07-27 16:25:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Zoey. She doesn’t want to be one of th… | https://twitter.com/dog_rates/status/890609185… | 13 | 10 | Zoey | None | None | None | None |
9 | 890240255349198849 | 2017-07-26 15:59:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Cassie. She is a college pup. Studying… | https://twitter.com/dog_rates/status/890240255… | 14 | 10 | Cassie | doggo | None | None | None |
10 | 890006608113172480 | 2017-07-26 00:31:25 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Koda. He is a South Australian decksha… | https://twitter.com/dog_rates/status/890006608… | 13 | 10 | Koda | None | None | None | None |
11 | 889880896479866881 | 2017-07-25 16:11:53 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Bruno. He is a service shark. Only get… | https://twitter.com/dog_rates/status/889880896… | 13 | 10 | Bruno | None | None | None | None |
12 | 889665388333682689 | 2017-07-25 01:55:32 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here’s a puppo that seems to be on the fence a… | https://twitter.com/dog_rates/status/889665388… | 13 | 10 | NaN | None | None | None | puppo |
13 | 889638837579907072 | 2017-07-25 00:10:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Ted. He does his best. Sometimes that’… | https://twitter.com/dog_rates/status/889638837… | 12 | 10 | Ted | None | None | None | None |
14 | 889531135344209921 | 2017-07-24 17:02:04 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Stuart. He’s sporting his favorite fan… | https://twitter.com/dog_rates/status/889531135… | 13 | 10 | Stuart | None | None | None | puppo |
15 | 889278841981685760 | 2017-07-24 00:19:32 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Oliver. You’re witnessing one of his m… | https://twitter.com/dog_rates/status/889278841… | 13 | 10 | Oliver | None | None | None | None |
16 | 888917238123831296 | 2017-07-23 00:22:39 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Jim. He found a fren. Taught him how t… | https://twitter.com/dog_rates/status/888917238… | 12 | 10 | Jim | None | None | None | None |
17 | 888804989199671297 | 2017-07-22 16:56:37 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Zeke. He has a new stick. Very proud o… | https://twitter.com/dog_rates/status/888804989… | 13 | 10 | Zeke | None | None | None | None |
18 | 888554962724278272 | 2017-07-22 00:23:06 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Ralphus. He’s powering up. Attempting … | https://twitter.com/dog_rates/status/888554962… | 13 | 10 | Ralphus | None | None | None | None |
20 | 888078434458587136 | 2017-07-20 16:49:33 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Gerald. He was just told he didn’t get… | https://twitter.com/dog_rates/status/888078434… | 12 | 10 | Gerald | None | None | None | None |
21 | 887705289381826560 | 2017-07-19 16:06:48 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Jeffrey. He has a monopoly on the pool… | https://twitter.com/dog_rates/status/887705289… | 13 | 10 | Jeffrey | None | None | None | None |
22 | 887517139158093824 | 2017-07-19 03:39:09 +0000 | <a href=”http://twitter.com/download/iphone” r… | I’ve yet to rate a Venezuelan Hover Wiener. Th… | https://twitter.com/dog_rates/status/887517139… | 14 | 10 | NaN | None | None | None | None |
23 | 887473957103951883 | 2017-07-19 00:47:34 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Canela. She attempted some fancy porch… | https://twitter.com/dog_rates/status/887473957… | 13 | 10 | Canela | None | None | None | None |
24 | 887343217045368832 | 2017-07-18 16:08:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | You may not have known you needed to see this … | https://twitter.com/dog_rates/status/887343217… | 13 | 10 | NaN | None | None | None | None |
25 | 887101392804085760 | 2017-07-18 00:07:08 +0000 | <a href=”http://twitter.com/download/iphone” r… | This… is a Jubilant Antarctic House Bear. We… | https://twitter.com/dog_rates/status/887101392… | 12 | 10 | NaN | None | None | None | None |
26 | 886983233522544640 | 2017-07-17 16:17:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Maya. She’s very shy. Rarely leaves he… | https://twitter.com/dog_rates/status/886983233… | 13 | 10 | Maya | None | None | None | None |
27 | 886736880519319552 | 2017-07-16 23:58:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Mingus. He’s a wonderful father to his… | https://www.gofundme.com/mingusneedsus,https:/… | 13 | 10 | Mingus | None | None | None | None |
28 | 886680336477933568 | 2017-07-16 20:14:00 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Derek. He’s late for a dog meeting. 13… | https://twitter.com/dog_rates/status/886680336… | 13 | 10 | Derek | None | None | None | None |
29 | 886366144734445568 | 2017-07-15 23:25:31 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Roscoe. Another pupper fallen victim t… | https://twitter.com/dog_rates/status/886366144… | 12 | 10 | Roscoe | None | None | pupper | None |
30 | 886267009285017600 | 2017-07-15 16:51:35 +0000 | <a href=”http://twitter.com/download/iphone” r… | @NonWhiteHat @MayhewMayhem omg hello tanner yo… | NaN | 12 | 10 | NaN | None | None | None | None |
… | … | … | … | … | … | … | … | … | … | … | … | … |
2325 | 666418789513326592 | 2015-11-17 00:53:15 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Walter. He is an Alaskan Terrapin. Lov… | https://twitter.com/dog_rates/status/666418789… | 10 | 10 | Walter | None | None | None | None |
2326 | 666411507551481857 | 2015-11-17 00:24:19 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is quite the dog. Gets really excited whe… | https://twitter.com/dog_rates/status/666411507… | 2 | 10 | NaN | None | None | None | None |
2327 | 666407126856765440 | 2015-11-17 00:06:54 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a southern Vesuvius bumblegruff. Can d… | https://twitter.com/dog_rates/status/666407126… | 7 | 10 | NaN | None | None | None | None |
2328 | 666396247373291520 | 2015-11-16 23:23:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | Oh goodness. A super rare northeast Qdoba kang… | https://twitter.com/dog_rates/status/666396247… | 9 | 10 | NaN | None | None | None | None |
2329 | 666373753744588802 | 2015-11-16 21:54:18 +0000 | <a href=”http://twitter.com/download/iphone” r… | Those are sunglasses and a jean jacket. 11/10 … | https://twitter.com/dog_rates/status/666373753… | 11 | 10 | NaN | None | None | None | None |
2330 | 666362758909284353 | 2015-11-16 21:10:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | Unique dog here. Very small. Lives in containe… | https://twitter.com/dog_rates/status/666362758… | 6 | 10 | NaN | None | None | None | None |
2331 | 666353288456101888 | 2015-11-16 20:32:58 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a mixed Asiago from the Galápagos… | https://twitter.com/dog_rates/status/666353288… | 8 | 10 | NaN | None | None | None | None |
2332 | 666345417576210432 | 2015-11-16 20:01:42 +0000 | <a href=”http://twitter.com/download/iphone” r… | Look at this jokester thinking seat belt laws … | https://twitter.com/dog_rates/status/666345417… | 10 | 10 | NaN | None | None | None | None |
2333 | 666337882303524864 | 2015-11-16 19:31:45 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an extremely rare horned Parthenon. No… | https://twitter.com/dog_rates/status/666337882… | 9 | 10 | NaN | None | None | None | None |
2334 | 666293911632134144 | 2015-11-16 16:37:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a funny dog. Weird toes. Won’t come do… | https://twitter.com/dog_rates/status/666293911… | 3 | 10 | NaN | None | None | None | None |
2336 | 666273097616637952 | 2015-11-16 15:14:19 +0000 | <a href=”http://twitter.com/download/iphone” r… | Can take selfies 11/10 https://t.co/ws2AMaNwPW | https://twitter.com/dog_rates/status/666273097… | 11 | 10 | NaN | None | None | None | None |
2337 | 666268910803644416 | 2015-11-16 14:57:41 +0000 | <a href=”http://twitter.com/download/iphone” r… | Very concerned about fellow dog trapped in com… | https://twitter.com/dog_rates/status/666268910… | 10 | 10 | NaN | None | None | None | None |
2338 | 666104133288665088 | 2015-11-16 04:02:55 +0000 | <a href=”http://twitter.com/download/iphone” r… | Not familiar with this breed. No tail (weird)…. | https://twitter.com/dog_rates/status/666104133… | 1 | 10 | NaN | None | None | None | None |
2339 | 666102155909144576 | 2015-11-16 03:55:04 +0000 | <a href=”http://twitter.com/download/iphone” r… | Oh my. Here you are seeing an Adobe Setter giv… | https://twitter.com/dog_rates/status/666102155… | 11 | 10 | NaN | None | None | None | None |
2340 | 666099513787052032 | 2015-11-16 03:44:34 +0000 | <a href=”http://twitter.com/download/iphone” r… | Can stand on stump for what seems like a while… | https://twitter.com/dog_rates/status/666099513… | 8 | 10 | NaN | None | None | None | None |
2341 | 666094000022159362 | 2015-11-16 03:22:39 +0000 | <a href=”http://twitter.com/download/iphone” r… | This appears to be a Mongolian Presbyterian mi… | https://twitter.com/dog_rates/status/666094000… | 9 | 10 | NaN | None | None | None | None |
2342 | 666082916733198337 | 2015-11-16 02:38:37 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a well-established sunblockerspan… | https://twitter.com/dog_rates/status/666082916… | 6 | 10 | NaN | None | None | None | None |
2343 | 666073100786774016 | 2015-11-16 01:59:36 +0000 | <a href=”http://twitter.com/download/iphone” r… | Let’s hope this flight isn’t Malaysian (lol). … | https://twitter.com/dog_rates/status/666073100… | 10 | 10 | NaN | None | None | None | None |
2344 | 666071193221509120 | 2015-11-16 01:52:02 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a northern speckled Rhododendron…. | https://twitter.com/dog_rates/status/666071193… | 9 | 10 | NaN | None | None | None | None |
2345 | 666063827256086533 | 2015-11-16 01:22:45 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is the happiest dog you will ever see. Ve… | https://twitter.com/dog_rates/status/666063827… | 10 | 10 | NaN | None | None | None | None |
2346 | 666058600524156928 | 2015-11-16 01:01:59 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is the Rand Paul of retrievers folks! He’… | https://twitter.com/dog_rates/status/666058600… | 8 | 10 | NaN | None | None | None | None |
2347 | 666057090499244032 | 2015-11-16 00:55:59 +0000 | <a href=”http://twitter.com/download/iphone” r… | My oh my. This is a rare blond Canadian terrie… | https://twitter.com/dog_rates/status/666057090… | 9 | 10 | NaN | None | None | None | None |
2348 | 666055525042405380 | 2015-11-16 00:49:46 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is a Siberian heavily armored polar bear … | https://twitter.com/dog_rates/status/666055525… | 10 | 10 | NaN | None | None | None | None |
2349 | 666051853826850816 | 2015-11-16 00:35:11 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is an odd dog. Hard on the outside but lo… | https://twitter.com/dog_rates/status/666051853… | 2 | 10 | NaN | None | None | None | None |
2350 | 666050758794694657 | 2015-11-16 00:30:50 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a truly beautiful English Wilson Staff… | https://twitter.com/dog_rates/status/666050758… | 10 | 10 | NaN | None | None | None | None |
2351 | 666049248165822465 | 2015-11-16 00:24:50 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a 1949 1st generation vulpix. Enj… | https://twitter.com/dog_rates/status/666049248… | 5 | 10 | NaN | None | None | None | None |
2352 | 666044226329800704 | 2015-11-16 00:04:52 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a purebred Piers Morgan. Loves to Netf… | https://twitter.com/dog_rates/status/666044226… | 6 | 10 | NaN | None | None | None | None |
2353 | 666033412701032449 | 2015-11-15 23:21:54 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here is a very happy pup. Big fan of well-main… | https://twitter.com/dog_rates/status/666033412… | 9 | 10 | NaN | None | None | None | None |
2354 | 666029285002620928 | 2015-11-15 23:05:30 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is a western brown Mitsubishi terrier. Up… | https://twitter.com/dog_rates/status/666029285… | 7 | 10 | NaN | None | None | None | None |
2355 | 666020888022790149 | 2015-11-15 22:32:08 +0000 | <a href=”http://twitter.com/download/iphone” r… | Here we have a Japanese Irish Setter. Lost eye… | https://twitter.com/dog_rates/status/666020888… | 8 | 10 | NaN | None | None | None | None |
2144 rows × 12 columns
define¶
correct data types(timestamp,tweet_id)
Code, Test¶
dogs_rates_archive_clean.astype({'timestamp': 'datetime64', 'tweet_id': 'object'},copy = False).dtypes
tweet_id object timestamp datetime64[ns] source object text object expanded_urls object rating_numerator int64 rating_denominator int64 name object doggo object floofer object pupper object puppo object dtype: object
Define¶
rename columns with more accurate names
Code¶
image_predictions_clean.rename(columns={'p1': 'Prediction1', 'p2': 'Prediction2', 'p3': 'Prediction3'}, inplace=True)
image_predictions_clean.rename(columns={'p1_conf': 'Prediction1_conf', 'p1_dog': 'Prediction1_dog'}, inplace=True)
image_predictions_clean.rename(columns={'p2_conf': 'Prediction2_conf', 'p2_dog': 'Prediction2_dog'}, inplace=True)
image_predictions_clean.rename(columns={'p3_conf': 'Prediction3_conf', 'p3_dog': 'Prediction3_dog'}, inplace=True)
Define¶
change datatype of tweet_id to object
image_predictions_clean.astype({'tweet_id': 'object'},copy = False).dtypes
tweet_id object jpg_url object img_num int64 Prediction1 object Prediction1_conf float64 Prediction1_dog bool Prediction2 object Prediction2_conf float64 Prediction2_dog bool Prediction3 object Prediction3_conf float64 Prediction3_dog bool dtype: object
Test¶
image_predictions_clean.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2075 entries, 0 to 2074 Data columns (total 12 columns): tweet_id 2075 non-null int64 jpg_url 2075 non-null object img_num 2075 non-null int64 Prediction1 2075 non-null object Prediction1_conf 2075 non-null float64 Prediction1_dog 2075 non-null bool Prediction2 2075 non-null object Prediction2_conf 2075 non-null float64 Prediction2_dog 2075 non-null bool Prediction3 2075 non-null object Prediction3_conf 2075 non-null float64 Prediction3_dog 2075 non-null bool dtypes: bool(3), float64(3), int64(2), object(4) memory usage: 152.1+ KB
Define¶
remove predictions other than the most confident one
Code¶
prediction = []
confidence = []
def most_confident_prediction(dataframe):
if dataframe['Prediction1_dog'] == True:
prediction.append(dataframe['Prediction1'])
confidence.append(dataframe['Prediction1_conf'])
elif dataframe['Prediction2_dog'] == True:
prediction.append(dataframe['Prediction2'])
confidence.append(dataframe['Prediction2_conf'])
elif dataframe['Prediction3_dog'] == True:
prediction.append(dataframe['Prediction3'])
confidence.append(dataframe['Prediction3_conf'])
else:
prediction.append('NAN')
confidence.append(0)
image_predictions_clean.apply(most_confident_prediction, axis=1)
image_predictions_clean['prediction'] = prediction
image_predictions_clean['confidence'] = confidence
Test¶
image_predictions_clean
tweet_id | jpg_url | img_num | Prediction1 | Prediction1_conf | Prediction1_dog | Prediction2 | Prediction2_conf | Prediction2_dog | Prediction3 | Prediction3_conf | Prediction3_dog | prediction | confidence | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 666020888022790149 | https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg | 1 | Welsh_springer_spaniel | 0.465074 | True | collie | 0.156665 | True | Shetland_sheepdog | 0.061428 | True | Welsh_springer_spaniel | 0.465074 |
1 | 666029285002620928 | https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg | 1 | redbone | 0.506826 | True | miniature_pinscher | 0.074192 | True | Rhodesian_ridgeback | 0.072010 | True | redbone | 0.506826 |
2 | 666033412701032449 | https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg | 1 | German_shepherd | 0.596461 | True | malinois | 0.138584 | True | bloodhound | 0.116197 | True | German_shepherd | 0.596461 |
3 | 666044226329800704 | https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg | 1 | Rhodesian_ridgeback | 0.408143 | True | redbone | 0.360687 | True | miniature_pinscher | 0.222752 | True | Rhodesian_ridgeback | 0.408143 |
4 | 666049248165822465 | https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg | 1 | miniature_pinscher | 0.560311 | True | Rottweiler | 0.243682 | True | Doberman | 0.154629 | True | miniature_pinscher | 0.560311 |
5 | 666050758794694657 | https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg | 1 | Bernese_mountain_dog | 0.651137 | True | English_springer | 0.263788 | True | Greater_Swiss_Mountain_dog | 0.016199 | True | Bernese_mountain_dog | 0.651137 |
6 | 666051853826850816 | https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg | 1 | box_turtle | 0.933012 | False | mud_turtle | 0.045885 | False | terrapin | 0.017885 | False | NAN | 0.000000 |
7 | 666055525042405380 | https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg | 1 | chow | 0.692517 | True | Tibetan_mastiff | 0.058279 | True | fur_coat | 0.054449 | False | chow | 0.692517 |
8 | 666057090499244032 | https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg | 1 | shopping_cart | 0.962465 | False | shopping_basket | 0.014594 | False | golden_retriever | 0.007959 | True | golden_retriever | 0.007959 |
9 | 666058600524156928 | https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg | 1 | miniature_poodle | 0.201493 | True | komondor | 0.192305 | True | soft-coated_wheaten_terrier | 0.082086 | True | miniature_poodle | 0.201493 |
10 | 666063827256086533 | https://pbs.twimg.com/media/CT5Vg_wXIAAXfnj.jpg | 1 | golden_retriever | 0.775930 | True | Tibetan_mastiff | 0.093718 | True | Labrador_retriever | 0.072427 | True | golden_retriever | 0.775930 |
11 | 666071193221509120 | https://pbs.twimg.com/media/CT5cN_3WEAAlOoZ.jpg | 1 | Gordon_setter | 0.503672 | True | Yorkshire_terrier | 0.174201 | True | Pekinese | 0.109454 | True | Gordon_setter | 0.503672 |
12 | 666073100786774016 | https://pbs.twimg.com/media/CT5d9DZXAAALcwe.jpg | 1 | Walker_hound | 0.260857 | True | English_foxhound | 0.175382 | True | Ibizan_hound | 0.097471 | True | Walker_hound | 0.260857 |
13 | 666082916733198337 | https://pbs.twimg.com/media/CT5m4VGWEAAtKc8.jpg | 1 | pug | 0.489814 | True | bull_mastiff | 0.404722 | True | French_bulldog | 0.048960 | True | pug | 0.489814 |
14 | 666094000022159362 | https://pbs.twimg.com/media/CT5w9gUW4AAsBNN.jpg | 1 | bloodhound | 0.195217 | True | German_shepherd | 0.078260 | True | malinois | 0.075628 | True | bloodhound | 0.195217 |
15 | 666099513787052032 | https://pbs.twimg.com/media/CT51-JJUEAA6hV8.jpg | 1 | Lhasa | 0.582330 | True | Shih-Tzu | 0.166192 | True | Dandie_Dinmont | 0.089688 | True | Lhasa | 0.582330 |
16 | 666102155909144576 | https://pbs.twimg.com/media/CT54YGiWUAEZnoK.jpg | 1 | English_setter | 0.298617 | True | Newfoundland | 0.149842 | True | borzoi | 0.133649 | True | English_setter | 0.298617 |
17 | 666104133288665088 | https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg | 1 | hen | 0.965932 | False | cock | 0.033919 | False | partridge | 0.000052 | False | NAN | 0.000000 |
18 | 666268910803644416 | https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg | 1 | desktop_computer | 0.086502 | False | desk | 0.085547 | False | bookcase | 0.079480 | False | NAN | 0.000000 |
19 | 666273097616637952 | https://pbs.twimg.com/media/CT8T1mtUwAA3aqm.jpg | 1 | Italian_greyhound | 0.176053 | True | toy_terrier | 0.111884 | True | basenji | 0.111152 | True | Italian_greyhound | 0.176053 |
20 | 666287406224695296 | https://pbs.twimg.com/media/CT8g3BpUEAAuFjg.jpg | 1 | Maltese_dog | 0.857531 | True | toy_poodle | 0.063064 | True | miniature_poodle | 0.025581 | True | Maltese_dog | 0.857531 |
21 | 666293911632134144 | https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg | 1 | three-toed_sloth | 0.914671 | False | otter | 0.015250 | False | great_grey_owl | 0.013207 | False | NAN | 0.000000 |
22 | 666337882303524864 | https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg | 1 | ox | 0.416669 | False | Newfoundland | 0.278407 | True | groenendael | 0.102643 | True | Newfoundland | 0.278407 |
23 | 666345417576210432 | https://pbs.twimg.com/media/CT9Vn7PWoAA_ZCM.jpg | 1 | golden_retriever | 0.858744 | True | Chesapeake_Bay_retriever | 0.054787 | True | Labrador_retriever | 0.014241 | True | golden_retriever | 0.858744 |
24 | 666353288456101888 | https://pbs.twimg.com/media/CT9cx0tUEAAhNN_.jpg | 1 | malamute | 0.336874 | True | Siberian_husky | 0.147655 | True | Eskimo_dog | 0.093412 | True | malamute | 0.336874 |
25 | 666362758909284353 | https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg | 1 | guinea_pig | 0.996496 | False | skunk | 0.002402 | False | hamster | 0.000461 | False | NAN | 0.000000 |
26 | 666373753744588802 | https://pbs.twimg.com/media/CT9vZEYWUAAlZ05.jpg | 1 | soft-coated_wheaten_terrier | 0.326467 | True | Afghan_hound | 0.259551 | True | briard | 0.206803 | True | soft-coated_wheaten_terrier | 0.326467 |
27 | 666396247373291520 | https://pbs.twimg.com/media/CT-D2ZHWIAA3gK1.jpg | 1 | Chihuahua | 0.978108 | True | toy_terrier | 0.009397 | True | papillon | 0.004577 | True | Chihuahua | 0.978108 |
28 | 666407126856765440 | https://pbs.twimg.com/media/CT-NvwmW4AAugGZ.jpg | 1 | black-and-tan_coonhound | 0.529139 | True | bloodhound | 0.244220 | True | flat-coated_retriever | 0.173810 | True | black-and-tan_coonhound | 0.529139 |
29 | 666411507551481857 | https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg | 1 | coho | 0.404640 | False | barracouta | 0.271485 | False | gar | 0.189945 | False | NAN | 0.000000 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
2045 | 886366144734445568 | https://pbs.twimg.com/media/DE0BTnQUwAApKEH.jpg | 1 | French_bulldog | 0.999201 | True | Chihuahua | 0.000361 | True | Boston_bull | 0.000076 | True | French_bulldog | 0.999201 |
2046 | 886680336477933568 | https://pbs.twimg.com/media/DE4fEDzWAAAyHMM.jpg | 1 | convertible | 0.738995 | False | sports_car | 0.139952 | False | car_wheel | 0.044173 | False | NAN | 0.000000 |
2047 | 886736880519319552 | https://pbs.twimg.com/media/DE5Se8FXcAAJFx4.jpg | 1 | kuvasz | 0.309706 | True | Great_Pyrenees | 0.186136 | True | Dandie_Dinmont | 0.086346 | True | kuvasz | 0.309706 |
2048 | 886983233522544640 | https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg | 2 | Chihuahua | 0.793469 | True | toy_terrier | 0.143528 | True | can_opener | 0.032253 | False | Chihuahua | 0.793469 |
2049 | 887101392804085760 | https://pbs.twimg.com/media/DE-eAq6UwAA-jaE.jpg | 1 | Samoyed | 0.733942 | True | Eskimo_dog | 0.035029 | True | Staffordshire_bullterrier | 0.029705 | True | Samoyed | 0.733942 |
2050 | 887343217045368832 | https://pbs.twimg.com/ext_tw_video_thumb/88734… | 1 | Mexican_hairless | 0.330741 | True | sea_lion | 0.275645 | False | Weimaraner | 0.134203 | True | Mexican_hairless | 0.330741 |
2051 | 887473957103951883 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True | Pembroke | 0.809197 |
2052 | 887517139158093824 | https://pbs.twimg.com/ext_tw_video_thumb/88751… | 1 | limousine | 0.130432 | False | tow_truck | 0.029175 | False | shopping_cart | 0.026321 | False | NAN | 0.000000 |
2053 | 887705289381826560 | https://pbs.twimg.com/media/DFHDQBbXgAEqY7t.jpg | 1 | basset | 0.821664 | True | redbone | 0.087582 | True | Weimaraner | 0.026236 | True | basset | 0.821664 |
2054 | 888078434458587136 | https://pbs.twimg.com/media/DFMWn56WsAAkA7B.jpg | 1 | French_bulldog | 0.995026 | True | pug | 0.000932 | True | bull_mastiff | 0.000903 | True | French_bulldog | 0.995026 |
2055 | 888202515573088257 | https://pbs.twimg.com/media/DFDw2tyUQAAAFke.jpg | 2 | Pembroke | 0.809197 | True | Rhodesian_ridgeback | 0.054950 | True | beagle | 0.038915 | True | Pembroke | 0.809197 |
2056 | 888554962724278272 | https://pbs.twimg.com/media/DFTH_O-UQAACu20.jpg | 3 | Siberian_husky | 0.700377 | True | Eskimo_dog | 0.166511 | True | malamute | 0.111411 | True | Siberian_husky | 0.700377 |
2057 | 888804989199671297 | https://pbs.twimg.com/media/DFWra-3VYAA2piG.jpg | 1 | golden_retriever | 0.469760 | True | Labrador_retriever | 0.184172 | True | English_setter | 0.073482 | True | golden_retriever | 0.469760 |
2058 | 888917238123831296 | https://pbs.twimg.com/media/DFYRgsOUQAARGhO.jpg | 1 | golden_retriever | 0.714719 | True | Tibetan_mastiff | 0.120184 | True | Labrador_retriever | 0.105506 | True | golden_retriever | 0.714719 |
2059 | 889278841981685760 | https://pbs.twimg.com/ext_tw_video_thumb/88927… | 1 | whippet | 0.626152 | True | borzoi | 0.194742 | True | Saluki | 0.027351 | True | whippet | 0.626152 |
2060 | 889531135344209921 | https://pbs.twimg.com/media/DFg_2PVW0AEHN3p.jpg | 1 | golden_retriever | 0.953442 | True | Labrador_retriever | 0.013834 | True | redbone | 0.007958 | True | golden_retriever | 0.953442 |
2061 | 889638837579907072 | https://pbs.twimg.com/media/DFihzFfXsAYGDPR.jpg | 1 | French_bulldog | 0.991650 | True | boxer | 0.002129 | True | Staffordshire_bullterrier | 0.001498 | True | French_bulldog | 0.991650 |
2062 | 889665388333682689 | https://pbs.twimg.com/media/DFi579UWsAAatzw.jpg | 1 | Pembroke | 0.966327 | True | Cardigan | 0.027356 | True | basenji | 0.004633 | True | Pembroke | 0.966327 |
2063 | 889880896479866881 | https://pbs.twimg.com/media/DFl99B1WsAITKsg.jpg | 1 | French_bulldog | 0.377417 | True | Labrador_retriever | 0.151317 | True | muzzle | 0.082981 | False | French_bulldog | 0.377417 |
2064 | 890006608113172480 | https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg | 1 | Samoyed | 0.957979 | True | Pomeranian | 0.013884 | True | chow | 0.008167 | True | Samoyed | 0.957979 |
2065 | 890240255349198849 | https://pbs.twimg.com/media/DFrEyVuW0AAO3t9.jpg | 1 | Pembroke | 0.511319 | True | Cardigan | 0.451038 | True | Chihuahua | 0.029248 | True | Pembroke | 0.511319 |
2066 | 890609185150312448 | https://pbs.twimg.com/media/DFwUU__XcAEpyXI.jpg | 1 | Irish_terrier | 0.487574 | True | Irish_setter | 0.193054 | True | Chesapeake_Bay_retriever | 0.118184 | True | Irish_terrier | 0.487574 |
2067 | 890729181411237888 | https://pbs.twimg.com/media/DFyBahAVwAAhUTd.jpg | 2 | Pomeranian | 0.566142 | True | Eskimo_dog | 0.178406 | True | Pembroke | 0.076507 | True | Pomeranian | 0.566142 |
2068 | 890971913173991426 | https://pbs.twimg.com/media/DF1eOmZXUAALUcq.jpg | 1 | Appenzeller | 0.341703 | True | Border_collie | 0.199287 | True | ice_lolly | 0.193548 | False | Appenzeller | 0.341703 |
2069 | 891087950875897856 | https://pbs.twimg.com/media/DF3HwyEWsAABqE6.jpg | 1 | Chesapeake_Bay_retriever | 0.425595 | True | Irish_terrier | 0.116317 | True | Indian_elephant | 0.076902 | False | Chesapeake_Bay_retriever | 0.425595 |
2070 | 891327558926688256 | https://pbs.twimg.com/media/DF6hr6BUMAAzZgT.jpg | 2 | basset | 0.555712 | True | English_springer | 0.225770 | True | German_short-haired_pointer | 0.175219 | True | basset | 0.555712 |
2071 | 891689557279858688 | https://pbs.twimg.com/media/DF_q7IAWsAEuuN8.jpg | 1 | paper_towel | 0.170278 | False | Labrador_retriever | 0.168086 | True | spatula | 0.040836 | False | Labrador_retriever | 0.168086 |
2072 | 891815181378084864 | https://pbs.twimg.com/media/DGBdLU1WsAANxJ9.jpg | 1 | Chihuahua | 0.716012 | True | malamute | 0.078253 | True | kelpie | 0.031379 | True | Chihuahua | 0.716012 |
2073 | 892177421306343426 | https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg | 1 | Chihuahua | 0.323581 | True | Pekinese | 0.090647 | True | papillon | 0.068957 | True | Chihuahua | 0.323581 |
2074 | 892420643555336193 | https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg | 1 | orange | 0.097049 | False | bagel | 0.085851 | False | banana | 0.076110 | False | NAN | 0.000000 |
2075 rows × 14 columns
Define¶
merge prediction and confidence colujmns with dog_rates_archive and delete entries withh no image prediction
Code¶
dogs_rates_archive_clean = pd.merge(dogs_rates_archive_clean, image_predictions_clean[['tweet_id', 'prediction', 'confidence']],
on = 'tweet_id', how = 'inner')
Test¶
dogs_rates_archive_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1971 entries, 0 to 1970 Data columns (total 14 columns): tweet_id 1971 non-null int64 timestamp 1971 non-null object source 1971 non-null object text 1971 non-null object expanded_urls 1971 non-null object rating_numerator 1971 non-null int64 rating_denominator 1971 non-null int64 name 1342 non-null object doggo 1971 non-null object floofer 1971 non-null object pupper 1971 non-null object puppo 1971 non-null object prediction 1971 non-null object confidence 1971 non-null float64 dtypes: float64(1), int64(3), object(10) memory usage: 231.0+ KB
Define¶
make a new column (stages) which has the stages of dogs and delete columns(‘floofer’, ‘puppo’, ‘doggo’, ‘pupper’)
Code¶
dogs_rates_archive_clean['stages'] = dogs_rates_archive_clean.doggo + dogs_rates_archive_clean.floofer + dogs_rates_archive_clean.pupper + dogs_rates_archive_clean.puppo
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'NoneNoneNoneNone'] = None
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'doggoNoneNoneNone'] = 'doggo'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'NoneNonepupperNone'] = 'pupper'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'NoneNoneNonepuppo'] = 'puppo'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'NoneflooferNoneNone'] = 'floofer'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'doggoNoneNonepuppo'] = 'doggo'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'doggoNonepupperNone'] = 'doggo'
dogs_rates_archive_clean.stages.loc[dogs_rates_archive_clean.stages == 'doggoflooferNoneNone'] = 'doggo'
dogs_rates_archive_clean.drop(columns=['floofer', 'puppo', 'doggo', 'pupper'], inplace = True)
Test¶
dogs_rates_archive_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1971 entries, 0 to 1970 Data columns (total 11 columns): tweet_id 1971 non-null int64 timestamp 1971 non-null object source 1971 non-null object text 1971 non-null object expanded_urls 1971 non-null object rating_numerator 1971 non-null int64 rating_denominator 1971 non-null int64 name 1342 non-null object prediction 1971 non-null object confidence 1971 non-null float64 stages 305 non-null object dtypes: float64(1), int64(3), object(7) memory usage: 184.8+ KB
Define¶
change data type of stages column to category
Code¶
dogs_rates_archive_clean.astype({'stages': 'category'},copy = False).dtypes
tweet_id int64 timestamp object source object text object expanded_urls object rating_numerator int64 rating_denominator int64 name object prediction object confidence float64 stages category dtype: object
Define¶
rename id column in tweets_df to tweet_id
Code¶
tweets_df_clean.rename(columns={'id':'tweet_id'}, inplace=True)
Define¶
add tweets_df dataframe to dogs rates archive
Code¶
dogs_rates_archive_clean = pd.merge(dogs_rates_archive_clean, tweets_df_clean,
on = 'tweet_id', how = 'inner')
Test¶
dogs_rates_archive_clean.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1964 entries, 0 to 1963 Data columns (total 13 columns): tweet_id 1964 non-null int64 timestamp 1964 non-null object source 1964 non-null object text 1964 non-null object expanded_urls 1964 non-null object rating_numerator 1964 non-null int64 rating_denominator 1964 non-null int64 name 1335 non-null object prediction 1964 non-null object confidence 1964 non-null float64 stages 304 non-null object favorite_count 1964 non-null int64 retweet_count 1964 non-null int64 dtypes: float64(1), int64(5), object(7) memory usage: 214.8+ KB
Storing data¶
dogs_rates_archive_clean.to_csv('twitter_archive_master.csv',index = False)
Analyzing, and visualization¶
df=pd.read_csv('twitter_archive_master.csv')
df.head()
tweet_id | timestamp | source | text | expanded_urls | rating_numerator | rating_denominator | name | prediction | confidence | stages | favorite_count | retweet_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 892420643555336193 | 2017-08-01 16:23:56 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Phineas. He’s a mystical boy. Only eve… | https://twitter.com/dog_rates/status/892420643… | 13 | 10 | Phineas | NAN | 0.000000 | NaN | 36776 | 7840 |
1 | 892177421306343426 | 2017-08-01 00:17:27 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Tilly. She’s just checking pup on you…. | https://twitter.com/dog_rates/status/892177421… | 13 | 10 | Tilly | Chihuahua | 0.323581 | NaN | 31672 | 5804 |
2 | 891815181378084864 | 2017-07-31 00:18:03 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Archie. He is a rare Norwegian Pouncin… | https://twitter.com/dog_rates/status/891815181… | 12 | 10 | Archie | Chihuahua | 0.716012 | NaN | 23851 | 3844 |
3 | 891689557279858688 | 2017-07-30 15:58:51 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Darla. She commenced a snooze mid meal… | https://twitter.com/dog_rates/status/891689557… | 13 | 10 | Darla | Labrador_retriever | 0.168086 | NaN | 40107 | 8009 |
4 | 891327558926688256 | 2017-07-29 16:00:24 +0000 | <a href=”http://twitter.com/download/iphone” r… | This is Franklin. He would like you to stop ca… | https://twitter.com/dog_rates/status/891327558… | 12 | 10 | Franklin | basset | 0.555712 | NaN | 38309 | 8650 |
#get the number of eveery dog stage
df_stages = df.groupby('stages')['tweet_id'].count()
df_stages
stages doggo 73 floofer 7 pupper 202 puppo 22 Name: tweet_id, dtype: int64
visualization¶
#plot the count of edog stages in a bar graph
plt.pie(df_stages,labels=['doggo','floofer','pupper','puppo'])
plt.title('The distribution of dog stages');
{'tags': ['remove_input']}
insights¶
from this plot:
The dog stage that has the highest number of tweets is pupper
The dog stage that has the lowest number of tweets is floofer
visualization¶
# plot the ralationship between retweet count an favourite counts
plt.scatter(df['favorite_count'],df['retweet_count'])
plt.title('the relation between likes count and retweet count')
plt.xlabel('likes count')
plt.ylabel('retweett count');
# calculate the correlation coeffecient
df['favorite_count'].corr(df['retweet_count'])
0.92886023983010291
insights¶
there is a linear relation between likes count and retweet count