AnalysisProjects.github.io

Analytics Projects Showcase

Welcome to my GitHub repository showcasing my analytics projects! In this repository, you’ll find a collection of projects where I’ve used data analysis, visualization, and storytelling to gain insights and solve real-world problems.

Projects

Here’s a list of projects I’ve worked on:

1. Project 1 Amazon Prime Data Insights.:

Exploring Amazon Prime Data Insights The world of entertainment is evolving rapidly, and Amazon Prime has become a significant player in providing a wide range of content to its subscribers. This article takes you on a journey through data analysis using Python and the Pandas library to uncover insights from Amazon Prime’s dataset up to the year 2021.

A Glimpse into the Dataset The dataset offers a snapshot of Amazon Prime’s content offerings, capturing records spanning various genres, languages, and years. It’s a treasure trove waiting to be explored.

Unveiling the Story Behind the Data To kick off our exploration, the data cleaning process began with removing duplicate records. This step ensures that our subsequent analysis is based on accurate and unique data points. Furthermore, a heatmap visualization helped us identify and highlight null values, giving us a clear picture of areas that might need attention.

“Monster Maker” - A Hidden Gem One of the highlights of our analysis was the discovery of the intriguing show “Monster Maker.” Through careful examination, we not only unearthed its unique show_id but also revealed the creative director responsible for this captivating series.

Time Travel Through Content Releases Analyzing the temporal aspect, we delved into the distribution of content releases over the years. The year 2021 emerged as a pivotal point, boasting an impressive 1442 releases that catered to diverse viewer preferences.

Amazon Prime’s Content Spectrum The dataset’s richness became evident as we visualized the breakdown of content. With 7814 movies and 1854 TV shows, Amazon Prime offers a vibrant spectrum of entertainment, ensuring there’s something for every viewer.

A Cinematic Reflection on 2020 In the year 2020, the platform shone a spotlight on 962 movies, contributing significantly to its cinematic library. This year highlighted the platform’s commitment to delivering a diverse range of movies.

Capturing India’s Essence Zooming into a specific region, we uncovered the local flavor with 229 titles exclusive to India. This statistic is a testament to Amazon Prime’s global and local appeal, providing content that resonates with different cultures.

Celebrating Content Architects - Directors Recognizing the role of directors in shaping content, we celebrated the top 10 directors who contributed their visionary touch to Amazon Prime’s content universe. Their creative influence brought captivating stories to screens worldwide.

A Glimpse into Comedy & Drama Engaging the filter mode, we specifically explored movies categorized as “comedy” from the UK. This deep dive allowed us to discern the intricate threads of different ratings, adding depth to the viewing experience.

Closing Thoughts Exploring Amazon Prime’s data landscape was an enchanting journey that unveiled intricate patterns, diverse narratives, and a tapestry of entertainment that transcends boundaries of time and culture. Each data point we analyzed told a story, highlighting the power of data to uncover narratives beyond what we see on the screen.

As data analysis enthusiasts and storytellers, we are constantly amazed by how data can reveal the stories that shape our world.

2. Project 2 Top 1000 Steam Games 2023:

In this analysis, we delved into a dataset named “Top 1000 Steam Games 2023,” which was provided in CSV format. The dataset contains various attributes related to games available on the Steam platform. Our objective was to extract insights from the dataset using Python and various libraries for data manipulation, visualization, and analysis. We addressed a series of questions to gain insights into different aspects of the gaming industry.

Python Libraries Used:

pandas for data manipulation and analysis.
numpy for numerical operations.
matplotlib and seaborn for data visualization.

Dataset Attributes:

appid: Game ID on Steam.
name: Name of the game.
developer: Developer of the game.
publisher: Publisher of the game.
positive: Count of positive reviews.
negative: Count of negative reviews.
owners: Estimated number of game owners.
price: Current price of the game.
initialprice: Initial price of the game.
discount: Discount percentage for the game.
languages: Languages supported by the game.
genre: Genre of the game.
ccu: Current concurrent players.
tags: Tags associated with the game.

Questions Explored:

Duplicate Records: We checked for duplicate records in the dataset and removed them if found.
Null Values Heatmap: We visualized the presence of null values in the dataset using a heatmap.
Specific Game Info: We retrieved the appid and developer information for the game “Left 4 Dead 2.”
Top Selling Games: We identified the top 10 best-selling games based on the number of owners.
Price Distribution: We visualized the distribution of game prices in the database.
Genre Analysis: We determined which game genre has the highest average positive review score.
Developer Success: We identified the developer with the highest number of games in the database.
Publisher Influence: We analyzed the correlation between the number of games published by a publisher and their average review scores.
Price vs. Reviews: We investigated whether there is a relationship between game price and review scores.
Discount Impact: We examined whether games with higher discounts tend to have more positive reviews.
Language Diversity: We calculated the average number of different languages supported by the games.
Average Game Price: We determined the average price of games in each genre.
Positive vs. Negative Reviews: We explored the correlation between the number of positive and negative reviews for a game.
Game Ownership vs. Price: We analyzed the relationship between the number of owners and the game price.
Developer vs. Publisher Success: We looked for patterns between a developer’s success and the publisher they work with.
Popular Genres: We identified the top three most popular game genres based on the number of games.
Discount and Ownership: We investigated the relationship between the discount percentage and the number of owners a game has.
Developer Performance: We found the developer with the highest average review score.
Genre and Price: We explored whether there is a correlation between the game genre and its price.

Insights and Observations:

Some games have a large number of positive reviews, indicating their popularity.
Game prices exhibit a diverse distribution, with variations in pricing strategies.
Certain game genres tend to have higher average positive review scores.
Some developers have a substantial presence in the dataset, indicating their influence.
Positive correlations exist between publisher’s success and their average review scores.
No clear linear relationship between game price and review scores is apparent.
Games with higher discounts tend to have more positive reviews, suggesting a promotional effect.
Games support a variety of languages, reflecting efforts to cater to a diverse audience.
Game prices differ across genres, with some genres commanding higher prices.
A correlation exists between positive and negative reviews, but causation is not established.
No strong relationship exists between game ownership and price.
Developer success varies regardless of the publisher they work with.
Certain genres, like “Action” and “Indie,” dominate the database.
Limited correlation between discount percentage and ownership suggests other factors at play.
Developer “Valve” has the highest average review score.
Game genres and their prices are moderately correlated.

Concluding Remarks:

Through comprehensive data analysis and visualization, we’ve uncovered valuable insights into the gaming industry and the attributes that influence game popularity, pricing, and performance. While this report highlights key findings, further exploration and statistical analysis can provide more in-depth insights into the complex dynamics of the gaming market.

3. Project 3 Otodom Properties:

In this analysis, we worked with a MySQL dataset containing property listings from the “otodom” database. The dataset consists of various attributes related to properties listed for sale. Our objective was to extract insights from the dataset using Python and various libraries for data manipulation, visualization, and analysis. We addressed a series of questions to gain insights into different aspects of the dataset.

Python Libraries Used:

pymysql for connecting to the MySQL database and fetching data.
pandas for data manipulation and analysis.
matplotlib and seaborn for data visualization.
sklearn for linear regression and correlation analysis.

Questions Explored:

Distribution of Listings Between Different Markets: We visualized the distribution of listings across different markets using a bar chart, allowing us to compare listing counts for each market.
Variation of Average Price Across Different Markets: We plotted the average price for each market, offering insights into how property prices vary among markets.
Highest and Lowest Average Prices by Location: By analyzing location-based average prices, we identified the locations with the highest and lowest average property prices.
Correlation Between Surface Area and Price: We used scatter plots to explore whether there’s a correlation between the surface area of properties and their prices.
Distribution of Remote Support Feature: We examined the distribution of properties with remote support as a feature, segmented by market.
Common Advertiser Types: We determined the most common advertiser types using bar charts, giving an overview of who lists properties most frequently.
Relationship Between Number of Rooms and Price: Scatter plots helped us understand how the number of rooms in a property relates to its price.
Distribution of Different Property Forms: We created a bar chart to visualize the distribution of various property forms within the dataset.
Ratio of Properties For Sale vs. Not For Sale: We calculated the ratio of properties listed for sale versus those not for sale and visualized it using a bar chart.
Markets With Highest Number of Listings For Sale: We identified the markets with the highest number of listings available for sale.
Differences in Price Between Properties With Different Forms: By comparing box plots, we explored potential differences in prices based on different property forms.
Properties With Highest Price Per Square Meter: We calculated the price per square meter for each property and found those with the highest values.
Correlation Between Keywords and Higher Prices: We performed text analysis to identify keywords in property descriptions that correlated with higher prices.
Distribution of Property Types Across Markets: We visualized the distribution of property types in different markets using a stacked bar chart.

Insights and Observations:

Certain markets have significantly higher listing counts than others, indicating varying levels of real estate activity.
Property prices exhibit notable differences between markets, suggesting the influence of location on pricing.
Locations with higher average prices are often associated with more upscale neighborhoods or prime areas.
Positive correlations between surface area and price hint at larger properties being generally more expensive.
Remote support feature appears to be relatively common in the dataset, with varying adoption across markets.
Individual property owners are common advertisers, while professional agencies also have a significant presence.
Properties with more rooms tend to have higher prices, aligning with the assumption of larger and more valuable properties.
Apartments dominate the property forms, followed by houses and other types.
The ratio of properties for sale versus not for sale is indicative of market dynamics.
Certain markets exhibit a high concentration of listings for sale, reflecting market supply and demand.
Price differences between property forms might be influenced by factors specific to each type.
Properties with the highest price per square meter might indicate premium locations or unique features.
Keywords in descriptions like “luxury,” “spacious,” or “exclusive” could correlate with higher prices.
Different markets show distinct preferences for property types, potentially due to local trends and demand.

Concluding Remarks:

Through extensive data analysis and visualization, we have gained valuable insights into the dataset’s characteristics and its relationships. We’ve explored diverse aspects of the real estate market, from prices and property forms to market dynamics and features. While this report highlights key findings, further in-depth analysis and domain-specific insights could be derived with more sophisticated techniques.

About Me

I’m a passionate data analyst with a keen interest in uncovering insights from data. I enjoy working on projects that challenge me and allow me to apply my skills to real-world scenarios.

Contact

Feel free to reach out to me if you have any questions or if you’d like to collaborate on a project. You can find me on LinkedIn and Twitter.

Acknowledgments

I’d like to thank the open-source community and fellow data enthusiasts for their continuous support and inspiration.