poverty &
parenting.

Hello, we are TES!

This is our project, Poverty and Parenting: Economic Analysis of Birth Rates in the Philippines, a data science adventure dedicated to analyzing the dynamics surrounding child birth statistics and socioeconomic factors, particularly focusing on the influence of economic status and geographic location in the Philippines. Through our findings, we aim to promote equitable access to healthcare services and facilitate informed family planning decisions.

WFV Spring
Ayen Manguan
Elijah Mejilla
Jose Tomanan
GitHub Repository

Why are we doing this?

It is expensive to have a child in this economy.

Raising a child in the Philippines is estimated to cost around 300,000 pesos per year (Isla, 2023), a considerable expense for families in a developing nation with increasingly high inflation rates and cost of living (Yu, 2024).

In 2023, however, CNA Insider reported that the Philippines has successfully lowered its birth rate. This begs the question, what led to this development? Does the economic situation of the people in the Philippines impact their decision to have children?

References: [1][2][3]

Problem

Despite the significant financial burden of raising children in the Philippines, there is a lack of understanding regarding how the economic circumstances of individuals influence their decisions regarding family planning.

Solution

Using data science, we intend on analyzing economic indicators, geographical data, and maternal preferences (do rural municipalities prefer traditional methods?) to explore their combined impact on birth rates in the Philippines and promote family planning in highly impacted areas.

So we'd like to figure out...

Primary Question

How does economic status affect the frequency of live births per unit of population?

Less economically advanced == more conscious of having children?

Secondary Question

Which child delivery method do mothers prefer more, and does it correlate to location & economic status?

Health professionals, traditional attendants, or other methods?

And we hypothesize...

Null Hypothesis

Economic status has no significant effect on the frequency of live births per unit of population and the preferred delivery method of mothers.

Alternative Hypothesis

There is a significant correlation between the economic status, frequency of live births, and the preferred delivery method of mothers.

Our plan of action!

We collated statistical datasets from the Philippine Statistics Authority (PSA) on live births and poverty rates for each municipality in the Philippines and use tools from statistics and machine learning to come up with conclusions that paint the relationships between these parameters.

For this project, we obtained and collected three datasets.

The exact data we need are live birth counts, poverty estimates, and total population.

Collection Process

We collected the data through the official website of the Philippine Statistics Authority (PSA), which displays the latest publicly available data as of April 2024.

Sample Size

A total of 1634 municipalities were recorded in the datasheet for all three datasets.

Preparation for Processing

PSA’s collated data is organized in a “report” form; that is, entries may contain summaries for the municipalities (by province, then by region). To prepare it for processing, we manually organized in proper rows and columns, ensuring that every row corresponds to one municipality in the Philippines.

Check out our complete dataset!

Datasets from PSA

Registered live births in the Philippines (2022)

Source (Philippine Statistics Authority)

Processed by PSA on July 31, 2023

Released on January 5, 2024

This includes both timely and late registered births that occurred from January 2022 to December 2022. Note that “-” values are treated as 0s.

The important columns found in this dataset include birth months per birth occurrence and attendant of birth per birth occurrence. Data is available for each municipality.

Released on April 2, 2024

The dataset details the poverty incidence in each city and municipality of the Philippines, and was processed through a technique called Small Area Estimation (SAE).

The important column found in this dataset is the poverty incidence for each municipality.

Census of Population and Housing (2020)

Source (Philippine Statistics Authority)

Released on July 7, 2021

The dataset includes the census in each municipality in the Philippines. The important column obtained from this dataset is the total population for each entry, since total population is needed to calculate the birth rate.

Exploring our data is the first step to understanding it better!

Data preprocesing

We first preprocess our data and ensure that it is clean, standardized, and ready for analysis! We try to make a few fields uniform across all datasets so that we can merge them into one. Having all our datasets collated into one dataframe is necessary for data analysis.

Cleaning Live Births Dataset

From the original dataset, we convert string representations (due to commas, etc.) of number values to numbers. We also substitute hyphen (-) values with 0 as per the dataset's original behavior and collate the birth count from the different months and sex assigned at birth into one column (Total Births). Lastly, we remove the extra whitespaces in the 'Municipality' and 'Province' fields.

Cleaning Poverty Incidence Dataset

We drop the unnecessary columns, leaving only the 'Poverty Incidence' for the year 2021. The dataset is ready to go.

Cleaning Population Dataset

Much like the live births dataset, string representations of number values (due to commas) were converted. With this, the dataset is now ready for merging.

Matching Location Names

Our key in merging the three datasets is the 'Province' and 'Municipality'. To match them, we remove information inside parentheses. They usually include additional information like '(Capital)'. We also set both columns to title case. Next, we check for any location that does not match between the datasets and manually correct them. These cases are usually due to different spellings (Pinamungahan, Pinamungajan), mispelled words (San Idelfonso, San Ildefonso), different conventions (City Of Carmona, Carmona), or old names (Bumbaran, Amai Manabilang).

Merged Dataframe

After ensuring all locations are matched, we merge the three datasets into one dataframe with 1634 rows (no data loss!). The dataframe has information on location (region, province, municipality), poverty incidence, total population, total births, and delivery methods.

And there we go, our data has been cleaned!

Exploring Data

We now look at the distribution of our data, check for any outliers, and see if there are any significant relationships between our variables.

Birth Rate Distribution

After calculating the birth rate from the birth count and total population, we can plot it per region. Without considering outliers, the highest municipal birth rate is 1.75% (found in MIMAROPA) , while the lowest is 0.65% (found in BARMM). In terms of average per region, Bicol has the highest birth rate.

Poverty Incidence and Birth Rate

We plot the poverty incidence against the birth rate for each municipality. Since there are no apparent trends at this point, we explore further!

Question 1: Live births and Economic Status

How does economic status affect the frequency of live births per unit of population? While being of lower economic status should logically result in having a lower birth rate to compensate for the family's lack of purchasing power, this family decision can be hindered by one's lack of access to education. So we ask: are less economically advanced areas more conscious of having children? Or is it the other way around? Is financial status a factor in family planning in Filipino families in the first place? The answers to these questions will be visualized through the following joint plots.

Preprocessing

First, we take the previous scatter plot we made it, then differentiate each region by assigning a color. To do so, we create a duplicate of the merged dataframe, suit it for this particular analysis by dropping unused columns, then run the dataframe through a joint plot. Along with this, we also rename each region so they can be more easily readable in the plot legend, as will be seen later.

Poverty Incidence vs Birth Rate, by Region

Improving on the scatter plot produces the following joint plot. From the image, one can observe that the birth rates of most Philippine municipalities cluster around the 1.0 to 1.5 births per capita range, and the poverty incidence mostly ranges from 0 to 40. Basic statistical testing using Pearson correlation coefficients gives values of p = 0.212 and r = -0.03, which shows an insignificant relationship between the two variables. However, we can improve this visualization further by grouping the municipalities according to the province they belong in.

Aggregating per province

Using the number of municipalities in a province as the variable size of each marker, we are now able to better visualize the results above! From this figure, it is also more readily apparent that the BARMM region is an outlier, being away from the main cluster of datapoints. This can be attributed to the political instability and struggles in economic development in the Bangsamoro region, being only officially declared in 2018 and still constantly facing armed conflict in its short lifetime. This results in high poverty incidence in the region regardless of birth rate.

Removing BARMM as a region

As a matter of fact, if we omit this region from our computation, the plot would look a lot simpler. Testing this using Pearson correlation coefficient, this would yield p = 1.64e-17 and r = 0.216, implying a strong significant correlation. This gives us the insight that outside of BARMM's economic and political issues, the trend in the Philippines is that increasing povery incidence comes with increasing birth rate. This can be attributed to the country's poor sex education, which results in a lack of family planning in poorer Philippine areas.

Question 1: How does economic status affect the frequency of live births per unit of population?

Economic status and birth rate are strongly and significantly correlated but only when BARMM is excluded, i.e. the relationship is not significant otherwise. This can be attributed to a variety of factors, including poor education quality and poor public awareness about safety in sex. However, while this takeaway is useful insight for analyzing the current status of the nation, it is still important to realize that it is not proper to exclude any Philippine region out of the narrative, regardless of how problematic the region may be.

Question 2: Child Delivery Methods

Which child delivery method do mothers prefer more, and does it correlate to location & economic status? Before hospital medical care became more accessible in the Philippines, midwives were the go-to in delivering Filipino babies. Is that still the case today? In our data, delivery methods are divided into Health Professionals (hospitals), Traditional Birth Attendants (midwives or local clinics), Others, and Not Stated. We hypothesize that less economically advanced areas still rely on traditional attendants due to accessibility and culture.

Preprocessing

First, let's turn the delivery methods into categorical data and add a new column for the count. Our plan is to visualize our data using economic status as a parameter; to aid us in doing so, we can represent `Poverty Incidence` in ranges or bins. This makes our plot more digestible.

Birth count by deliver method per poverty incidence

Here, we can see that an overwhelming majority across all economic brackets rely on `Health Professional` methods to deliver their children. Moreover, there is still a small percentage of people preferring `Traditional Birth Attendants`, while a significantly small amount use `Others`. This trend towards professional healthcare can help reduce maternal and infant mortality rates and improve overall health statistics. Let's dive deeper and analyze the trends for each delivery method!

Health Professionals

We can see that lower poverty incidence ranges have higher count for `Health Professionals`, and that there is a downward trend as poverty incidence increases. Moreover, regions nearer to the capital also have higher counts. This can be attributed to the fact that they are more economically advanced and have better access to professional health care.

Traditional Birth Attendants

Next, for `Traditional Birth Attendant`, the plot is much more balanced across poverty incidence ranges as compared to `Health Professionals`. The regions are also well-represented across the plot. The regions in Mindanao are much more evident, which means that these areas still 'prefer' traditional birth attendants to health professionals. This 'preference' could be due to professional health care accessibility or culture; what we can say for sure is that there is still a significant number of Filipinos who use traditional birth attendants as a delivery method.

Others and not stated

The trend for the delivery method `Others` is quite similar to that of `Birth Attendants`. The same reasoning (health care access, culture) can be applied, but it should be noted that the total count for this plot is much less than the former. This is a good thing since it is always safer to perform childbirth in a professional setting. The amount of data for deliveries by unstated means is too little to draw a conclusion.

Hypothesis Testing

Now, we check if there is a correlation between birth delivery methods and poverty incidence using logistic regression. Since it is binary, our two outcomes will be if it was delivered by a health professional or not. We use poverty incidence as the independent variable and `Count` as weight of our model. Based on the results, there is a statistically significant correlation between poverty incidence and the likelihood of having a health professional in childbirth. Since the coefficient for poverty incidence is negative (-0.0037) and the p-values < 0, we can conclude that as poverty incidence increases, the odds of a health professional childbirth decreases! Although, the low R-squared value (0.35) can suggest that there are likely other factors, too.

Question 2: Which child delivery method do mothers prefer more, and does it correlate to location & economic status?

From the analysis, location and economic status does somewhat influence the preference on child delivery methods of mothers. For location, despite the trend towards professional healthcare, there still exists a significant number of Filipinos who opt for traditional birth attendants, especially in regions with lower economic status. Logistic regression also reveals that as poverty incidence increases, the odds of opting for a health professional during childbirth decreases. This preference may be influenced by factors such as accessibility to professional healthcare and cultural beliefs; further qualitative research can help us understand the complex dynamics of childbirth choices in the Philippines!

Nutshell Plot

From here, we try to extrapolate patterns within our data. We then visualize our results to see if they correspond with any of our hypotheses. We first merge all three datasets that we've obtained.

Geospatial Mapping

Now how poverty incidence and birth rate affect different areas of the Philippines, then one option would be to plot our data geospatially using colors, across different provinces in the country. This would help us gain an intuitive understanding of how these variables correlate and interact with one another.

Individual plots

We first plot the maps of each of both poverty incidence and birth rate individually using GeoPandas.

2D colormap

Now, the challenge is to represent both poverty incidence and birth rate in the Philippine map. To do this, we would need a way to represent two variables using color. Our solution: we created this 2D colormap by linear interpolating two 1-dimensional colormaps! Isn’t that neat?

Mapping values

We then use correspond each region to a corresponding birth rate and poverty incidence values, then correspond those values to a color.

Plotting

And now, all that's left to do is plot the map.

Final Plot

Putting it all together, we have:

Analysis

Using the extremes of both axes, we determine four colors at each of the corners. There's blue which corresponds to low poverty incidence and low birth rate; purple which signifies low poverty incidence and high birth rate; green which shows high poverty incidence; and yellow which shows high poverty incidence and high birth rate.

There are two main hypotheses on how poverty incidence can affect birth rates.

The first hypothesis is that the poorer a region is, then the less access they would have to proper birth control and family planning knowledge--this would lead to poorer areas having higher birth rates. The second guess is that if the region is poorer, then their birth rates would be lower because they would be less capable of financially supporting children.

From the leftmost graph showing only poverty incidence, we see that more northern areas (i.e., Luzon) tend to have a lower poverty incidence, while more southern areas (i.e., Mindanao) generally have a higher poverty incidence. This observation will help us in our analysis.

We see that in the main graph, areas in luzon tend to range from purple to indigo. This means that these areas tend to have both low poverty incidence and high birth rates.

In the Visayan areas, we see different areas range from purple to orange; this suggests that there exists moderate-to-high birthrates regardless of the poverty incidence within an area.

Finally, we observe that Mindanao has the greatest variation. However, while there are areas that are colored blue/green (indicating low birth rates), majority of the areas range from the purple to orange spectrum. This indicates, once again, moderate-to-high birth rates regardless of poverty incidence

One possible reason for the greener areas is that more farflung areas may have less access to healthcare, and in turn, the facilities needed to record births.

On a baseline, we can make an interpretation that the birth rate of a region is relatively consistent, regardless of its corresponding poverty incidence.

We now translate our findings into a machine learning model.

Preliminaries

As a prerequisite, we categorize Birth Rates into Birth Rate Categories, i.e. classes of High (greater than 3.10%), Medium (between 1.078% and 3.10%), or Low (less than 1.078%), where these classifications are from the definition provided by the World Bank’s data. To visualize this change, we use Pandas to visualize how these birth rates are currently distributed in the dataset. We can easily see that the dataset is skewed, and higher birth rates are underrepresented during the training of this model. This should be kept in mind during interpretation of the results.

Reference: World Bank

Feature and target variables

Now we choose the feature and target variables for our model, from the discoveries we have made during exploratory data analysis. Feature variables chosen were Municipality, Region, Poverty Incidence, and Total Population, in order to predict the target variable Birth Rate Category, our newly made column.

Modelling and Training

As the Pearson correlation coefficient test conducted previously showed a clear relationship between Poverty Incidence and Birth Rate, then we thus consider the logistic regression model for our purpose. Using Python library sklearn, we train this model by splitting our dataset: 70% will be for training, and the other 30% will be for testing. The model was then trained for up to 1000 maximum iterations. To test for overfitting and how well our model fits to unseen data, cross-validation is also performed (where the data was split into subsets then training and validating with them).

Training Results

After model training, the model provided a training accuracy of 81%, and a test accuracy of 84%. As for the cross-validation, the process showed a mean accuracy of 79%. With this, we can infer that the model is not overfitting and is generalizing well to unseen data. The chosen features were also found to be significant in predicting the target variable.

What are the implications of our results?

On our machine learning model

Interpreting the model produced and its accuracy, the model is able to predict birth rate categories at a reasonable degree of accuracy, given the features it uses. Performing cross-validation was also able to validate that the data is not overfitting, and is reasonably correct when predicting completely unseen data.

Thus, a robust and reliable model was developed. However, it is important to note the underrepresentation of municipalities with high birth rates, which means that the current model would have trouble with such scenarios.

Hence, to improve upon this model, one can extrapolate this towards future datasets involving poverty index, which would allow us to predict birth rate and plan healthcare facilities accordingly, especially toward underrepresented classes. More advanced algorithms, specifically those that can handle outliers in the dataset (i.e. BARMM), can also be used to train the model and improve its performance.

Excluding Bangsamoro region

As demonstrated by our EDA, if we include all the municipalities in the Philippines, then there seems to be no significant correlation between poverty incidence and birth rate; however, as mentioned earler, this correlation becomes strongly significant once we exclude municipalities from BARMM or the Bangsamoro region. We treat this region as an outlier, and momentarily exclude this from our analysis.

On our research questions

Poverty incidence in the Philippines generally has a positive correlation with birth rate. In simpler terms, regions that are economically poorer tend to have higher birth rates than regions that are economically richer. This is an alarming point of concern, since raising a child is an incredibly expensive task—over PHP 300,000.00 per year (Isla, 2023). This would mean that children living in impoverished areas may not have adequate resources to be raised in a financially secure environment, and this is an indicator on where to focus our healthcare resources. Furthermore, we have shown that the poorer an area is, the less likely its population is to have their births delivered by medical professionals. This suggests that people from poorer areas may not have the resources to afford proper healthcare in childbirth.

On Bangsamoro Region

We hypothesized earlier that poorer regions (such as BARMM) may not have adequate access the facilities required to document their births. However, further statistics show that this is simply not true: BARMM has a national birth registration rate of 96.6%, which means that 96.6% of all total births within Bangsamoro are documented (Rappler, 2024). What then, may cause the exceedingly low birth rates within the region? UNFPA Philippines (2023) suggests that this may be caused by ongoing armed conflict within the region tied with its severe poverty incidence.

Overall

Together with our data, this suggests that although economic state (i.e. poverty incidence) is one indicator of birthrate, the former is not the sole determinant of the latter. We must look at the issue not merely from a statistical perspective—but from a societal and human point of view.

References: [1][2][3]

What does this entire project want to say? Where can we go from here?

On childbirth and family planning

The process of childbirth and pregnancy can be distressing and uncertain, but it could lead to incredibly fulfilling and meaningful moments in the lives’ of parents. Its significance to our society cannot be understated; and this is something that should be reflected by the quality and accessibility of our healthcare.

Parents and individuals must be aware of the risks, consequences, and responsibilities regarding sex and safety in order to make informed decisions about family planning. Contraceptives such as condoms and pills must also be destigmatized and made affordable to the general public. People should only have children if they want to and if they are aware of what it entails.

What needs to be done

Furthermore, relevant healthcare and medical information must be made accessible to those who need to consult about pregnancy and childbirth, especially towards economically poorer areas that have a larger frequency of births by non-medical professionals. Although some people may prefer these types of delivery for non-financial reasons, medical options must be made available to them regardless.

Where to go from here

Further studies could incorporate more updated datasets, especially in outlier areas such as BARMM, or in more geographically specific areas such as barangays. Studying these locations outside of pure numerical data may lead to definitive and causal conclusions, outside of the conclusions that we provided which are significant, but purely correlational. That is to say, although we can suggest reasons for these results and correlations, these are not definitive until we study poverty and parenting from a multidimensional view.

Finally...

Everyone deserves the right to have a child; however, children also deserve to be raised in homes where they feel secure and nurtured. The positive correlation between poverty incidence and birth rate could point to a lack of proper education on family planning, among other reasons; and since this correlation is consistent across different regions, then it is not a moral or individual failing—it is a systemic failing of our healthcare.

Ayen Manguan

Henlo, I’m Ayen, a third year computer science student from the University of the Philippines Diliman. My passions lie in programming, designing, and storytelling, manifesting in full-stack web development. I gain satisfaction in making things work and seeing others benefit from these innovations.

Away from the keyboard, I enjoy consuming all sorts of media, jumping from one obsession to the next. I also like cats, fried eggs, and puzzle games.

ammanguan@up.edu.ph

Elijah Mejilla

Hey, I’m Elijah—a second year Computer Science undergraduate. I’m passionate about CS at its core: from gates and semiconductors to algorithms and machine learning. I hope to leverage these interests into tangible results in either Data Science or Web Development.

When I’m tired of writing monospaced text, I find myself writing poems and essays, dabbling in different instruments, going for runs or to the gym, and learning whatever language piques my interest.

jgmejilla@up.edu.ph

Jose Tomanan

Heya! I’m Jose, a second year Computer Science student from UP Diliman. I am interested in project management and software development.

I spend most of my time away from the keyboard, consuming non-tech related passions such as fashion, fitness, and basketball. Though more recently, I have been taking active interest in data science, figuring things out on my own and visualizing results.

jdtomanan@up.edu.ph