APINYA BOONLAKSAKUL
![Harvesting Crop Field](https://static.wixstatic.com/media/11062b_41839d8dad734602af4e4e7ed242709f~mv2_d_3600_2700_s_4_2.jpg/v1/fill/w_627,h_470,al_c,q_80,usm_0.66_1.00_0.01,enc_avif,quality_auto/11062b_41839d8dad734602af4e4e7ed242709f~mv2_d_3600_2700_s_4_2.jpg)
EU AGRICULTURE ANALYSIS
The European Union stands as a leading and robust union. This inspired my curiosity to investigate the level of accessibility for non-EU countries to penetrate the agricultural commodities market within EU member states.
Purpose and Context
After completing five projects during my data analytics course at CareerFoundry, it was an exciting moment to begin creating my own project from scratch. This final project demonstrates my abilities, showcasing everything I have learned throughout the course - from sourcing my own data, formulating hypotheses, and performing exploratory analysis using Python, to presenting valuable insights through a Tableau dashboard.
Objective
To explore the agricultural commodities within EU countries, starting from a 'tomato' commodity and extract valuable insights regarding the potential for market entry from countries outside the EU.
Duration
October 2023
Data
-
EU estimated agricultural balance sheets (2002-2023) from the European Union Data Portal
-
Demographic indicators for 1950-2100 from the Population Division of the United Nations
-
EU Countries area from the Central Intelligence Agency (CIA)
Skills
-
Conducting analysis using Python
-
Data cleaning
-
Exploratory analysis
-
Descriptive statistics analysis
-
Supervised machine learning: regression analysis
-
Unsupervised machine learning: cluster analysis
-
Time series analysis
-
Geographical and data visualizations
Tools
Language: Python
Library: Pandas, NumPy, SciPy
Seaborn, Matplotlib,
scikit-learn
Application: Jupyter Notebook
Excel
Tableau
This outlines the approach I employed to analyze this project.
Step 1. Sourcing open data
I encountered a challenge in sourcing the data for analysis, as it needed to meet specific criteria. These criteria included:
- being from an authoritative source
- recent (no older than 3 years)
- containing at least 2-3 continuous variables and 2-3 categorical variables
- having a minimum of 1,500 rows
- including a geographical component
I initiated the quest for data that not only met these prerequisites but also promised valuable insights and practical results.
Ultimately, I came across an EU agricultural commodities dataset, which encompassed annual production and consumption figures, covering both imports and exports within and outside European countries for 78 agricultural commodities. This dataset comprised nearly 50,000 rows. However, I realized that I could uncover even more insights by incorporating additional variables related to each country. To enhance my analysis, I gathered demographic data and information on the land area of EU countries.
With this expanded dataset, I was equipped with sufficient information to formulate insightful questions, enabling me to extract meaningful conclusions and insights.
![UN-pic.png](https://static.wixstatic.com/media/b83767_a0a8336aeec842029360938effaadf1f~mv2.png/v1/fill/w_189,h_65,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/UN-pic.png)
![CIA-pic.png](https://static.wixstatic.com/media/b83767_5d9af672e14d420da7d72330dd818ca1~mv2.png/v1/fill/w_95,h_95,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/CIA-pic.png)
![](https://static.wixstatic.com/media/b83767_327bbc1c16e9413d9766e0cbbd811944~mv2.png/v1/fill/w_424,h_407,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_327bbc1c16e9413d9766e0cbbd811944~mv2.png)
Step 2. Cleaning data
I initiated the data cleaning process with the primary dataset, the EU agricultural balance sheet data, which consisted of 50,000 rows and 58 columns. To efficiently clean and standardize the dataset, I opted for Excel, as it provided a comprehensive view of the data. Once the data was cleaned and made consistent, I proceeded to perform data wrangling, including grouping and merging two variables from the demographic data and land area datasets using Python.
Step 3. Exploring relationship
I carried out exploratory visual analysis using Python libraries to gain insights from the data. First, I visualized the correlation heatmap to identify relationships between variables. Additionally, I created categorical plots for specific commodities. These visualizations sparked my curiosity, prompting me to formulate hypotheses for further testing. My initial focus was on the commodity "tomatoes."
Correlation Heatmap
Tomato Imports & Exports, Population, and Land Area
![](https://static.wixstatic.com/media/b83767_fbfe1167a1a742128cd6a106a105c709~mv2.png/v1/fill/w_285,h_290,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_fbfe1167a1a742128cd6a106a105c709~mv2.png)
A strong correlation with coefficients of 0.9 and 0.82 indicates that EU countries export both inside and outside the EU, and the higher a country's population, the greater its imports from within the EU.
Categorical Plot of Tomato Types
![](https://static.wixstatic.com/media/b83767_c1c99aecf1c749dca607ea3928eb629e~mv2.png/v1/fill/w_314,h_291,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_c1c99aecf1c749dca607ea3928eb629e~mv2.png)
![](https://static.wixstatic.com/media/b83767_0b60821899774b92965e4b5280730eb8~mv2.png/v1/fill/w_313,h_289,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_0b60821899774b92965e4b5280730eb8~mv2.png)
Tomato paste holds the top position in import volume, both from within the EU and from outside the EU. It is followed by fresh tomatoes in the second position. The third highest import volume from outside the EU is for dried tomatoes, while within the EU, it is for peeled and unpeeled tomatoes.
Step 4. Data Analysis
![](https://static.wixstatic.com/media/b83767_246497ba6fc64c0db59a335d67c4cfdd~mv2.png/v1/fill/w_432,h_352,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_246497ba6fc64c0db59a335d67c4cfdd~mv2.png)
Conducting a geospatial analysis by creating a choropleth map gave me a better vision by country among the EU, especially the interactive dashboard in Tableau, see in this link.
![](https://static.wixstatic.com/media/b83767_15d1e21ae0334d9cbb8e61c394f752a5~mv2.png/v1/fill/w_402,h_209,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_15d1e21ae0334d9cbb8e61c394f752a5~mv2.png)
To validate the hypotheses, I conducted a linear regression analysis, allowing for a comprehensive assessment of the data and the relationships under investigation.
Hypothesis No.1
"If countries have a low quantity of tomato exports, then they tend to have a high quantity of tomato imports."
![](https://static.wixstatic.com/media/b83767_868bff9c17734acbbaf8c594ff88dff8~mv2.png/v1/fill/w_332,h_316,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_868bff9c17734acbbaf8c594ff88dff8~mv2.png)
Hypothesis No.2
"If countries have a low quantity of tomato imports within the EU, then they generally have a high quantity of tomato imports from outside the EU."
![](https://static.wixstatic.com/media/b83767_f8a2ba8c51244449bb7999fa8855cc83~mv2.png/v1/fill/w_331,h_318,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_f8a2ba8c51244449bb7999fa8855cc83~mv2.png)
The result turned out that both hypotheses are rejected as the regression lines exhibit an upward trend. However, clusters of data points in the lower range of exports tend to demonstrate higher quantities of imports, mirroring the patterns observed in both EU and non-EU imports. Therefore, it is advisable to conduct a detailed country-by-country examination in this scenario.
Step 5. Create a dashboard
I opted to visualize my analysis using Tableau, which provides interactive features. I began the project by posing five key questions (the 5-W questions) to engage the audience.
![](https://static.wixstatic.com/media/b83767_6740710f83ea405fadbb949bea0b7179~mv2.png/v1/fill/w_557,h_381,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_6740710f83ea405fadbb949bea0b7179~mv2.png)
![](https://static.wixstatic.com/media/b83767_18c7ac0363b14e3592eb5c9191bf3243~mv2.png/v1/fill/w_645,h_359,al_c,q_85,usm_0.66_1.00_0.01,enc_avif,quality_auto/b83767_18c7ac0363b14e3592eb5c9191bf3243~mv2.png)
I then guided the audience through an overview of the major commodities, utilizing geospatial analysis through the creation of choropleth maps. These maps separated each commodity type, with a specific focus on total imports and exports.
Subsequently, I delved into the analysis of the commodity "tomatoes," presenting my findings, discussing the challenges of entering the EU market, and outlining potential next steps.
Step 6. Conclusions
-
Population is positively associated with higher tomato imports.
-
The primary tomato import categories are ranked in the following order: (1) tomato paste, (2) fresh tomatoes, (3) dried tomatoes, and (4) peeled and unpeeled tomatoes.
-
Germany and France are significant importers of tomatoes. Nevertheless, the challenge lies in their higher import volumes from EU countries when compared to non-EU sources.
-
Italy leads in tomato imports from non-EU countries, but it's also the largest tomato exporter among the EU member states.
-
Tomato export volumes do not necessarily correlate with import volumes.
-
EU countries tend to prioritize intra-EU tomato imports over imports from outside the EU.
Limitations
-
Stringent Quality Standards: The European Union has strict quality and safety standards for agricultural products, including tomatoes. Meeting these standards can be a significant challenge for exporters.
-
Seasonal Demand: Tomato and other fresh agricultural products consumption can be seasonal, so exporters may need to adapt to fluctuations in demand.
-
Tariffs and Import Duties: Import duties and tariffs can vary among EU member states. Understanding and managing these costs can impact the competitiveness of exported tomatoes.
![](https://static.wixstatic.com/media/11062b_1bb9fa5d0fe74f3c82a50ba7079a91cb~mv2.jpeg/v1/fill/w_481,h_321,al_c,q_80,usm_0.66_1.00_0.01,enc_avif,quality_auto/11062b_1bb9fa5d0fe74f3c82a50ba7079a91cb~mv2.jpeg)
-
Transportation and Logistics: Ensuring the timely and efficient transportation of tomatoes to various EU destinations while maintaining their freshness can be a logistical challenge.
Next Steps
-
Investigate additional factors that may be linked to commodity trade.
-
Extend the analysis to encompass other commodities, offering valuable insights for individuals interested in statistical analysis of the European Union's commodity market.
Challenges
This has been the most challenging project for me, as it required the application of all the knowledge I have acquired and practiced throughout my data analytics course. I have thoroughly enjoyed immersing myself in the world of data, shaping my analytical narrative, formulating my own questions and finding answers, and utilizing a variety of tools from the course, as well as conducting independent research.
I am grateful for the guidance provided by both my tutor and mentor, which enabled me to conduct my own unique analysis.