Multinomial regression is a statistical technique used to model the relationship between a categorical dependent variable with more than two levels and one or more independent variables. It is particularly useful when the outcome variable is nominal (i.e., categories without a natural order) and can be applied to survey data, such as the Bogotá Travel Survey.
The multinomial regression is the extension of binary logistic regression modeling categorical outcomes with more than two unordered categories. Rather than predicting a single probability as in logistic regression, it predicts a set of probabilities- one for each possible category of the dependent variable. The interpretation of the coefficients (β) is the change in log-odds of choosing category j over the baseline for a one-unit increase in predictor Xi, while holding all other variables constant.
Since P87-P101 consists of three categorical responses, “increase”, “not expected to see change”, and “expected to decrease”. The multinomial regression is suitable for determining the relationship between individuals’ expectations and specific socio-economic predictors. The category of “not expected to see change” is used as the dependent variable baseline category.
\[ \log\left(\frac{P(Y = j1)}{P(Y = \text{base})}\right) = \beta_{j0} + \beta_{j1}X_1 + \beta_{j2}X_2 + \cdots + \beta_{jp}X_p \] where:
In additional to that, multinomial regression is series of formulas for each category \(j\) compared to the baseline category (e.g., “not change”):
\[ \log\left(\frac{P(Y = j2)}{P(Y = \text{base})}\right) = \beta_{j0} + \beta_{j1}X_1 + \beta_{j2}X_2 + \cdots + \beta_{jp}X_p \] where:
In short:
\[ \log\frac{P(Y = j \mid X)}{P(Y = 0 \mid X)} = \beta_{0j} + \beta_j^T X, \quad j = 1, \dots, J \] In odd ratio form, the probabilities for each category \(j\) can be expressed as:
\[ P(Y = j \mid X) = \frac{\exp\bigl(\beta_{0j} + \beta_j^T X\bigr)} {1 + \displaystyle\sum_{k=1}^J \exp\bigl(\beta_{0k} + \beta_k^T X\bigr)}, \quad j = 1, \dots, J \]
where:
\[ P(Y = 0 \mid X) = \frac{1} {1 + \displaystyle\sum_{k=1}^J \exp\bigl(\beta_{0k} + \beta_k^T X\bigr)} \] where:
There are three key assumptions for multinomial regression:
exp (β) - 1
percentage higher odds of choosing “increase”
over not change compared to reference categories.All the predictors were treated as factors and regrouped to eliminate small sample categories. The specific variable, regrouping process, and reference category are listed below.
P1
): 1- House, 2- Apartment, 3- Room in
tenement, 4- other type of housing; Other type of housing (4) was
used as reference category.P82
): 1 - Own, 2 - Rent; Own was used
as the reference category.P10
): 1 – female, 2 – male; Female was used
as the reference category.P12
) with
recategorization: Primary – primary school or lower,
LowerSecondary -Junior high school complete, UpperSecondary – Senior
high school complete (10th and 11th grades), Technological –
technician/technological complete, University- University degree or
higher; Reference category: Upper secondary (high school
complete)P42
) with
recategorization: Public transit - including all buses
(public), informal (private bus), taxi, driving, motorcycle, bike,
walking, and others; Reference category: otherP13
) with
recategorization: student – include preschool,
employed, self-employed, informal, NA, Others; Reference category:
other-employedP50
) with recategorization:
low- under 1,160, lower-mid – 1,161-2,500, upper-mid – 2,501 – 4,900,
high – over 4,901; other (not answer); Reference category:
otherP83
) with recategorization:
short – less than or equal to 5 years, medium - More than 5 or less than
or equal to 15 years, long – More than 15 years. Reference category:
medium Edad
) with recategorization:
under 18 years, 3,4,5,6,7 (without further recategorize) .Reference
category: 3Since the primary purpose of the model is to identify different characteristics of people who have different expectations about the impact of the Bogotá Metro, we will use the following criteria to determine which predictors to include in the model (explortary analysis rather than make prediction):
Inclusive Criteria:
Exclusive Criteria:
X = k
, you almost never (or never)
observe the baseline outcome. It almost always signals
(quasi-)perfect separation or very sparse data, so you’ll want to
inspect your frequency tables and perhaps apply a regularization or
category‐collapsing strategy.Since all housing type (P1) are statistically significant, we will not include it in the model, since all housing type shows signals of (quasi-)perfect separation, indicate that the housing type is not a good predictor for the dependent variable.( maybe because there is no significant difference between people in different household answer the survey differently or sample size is too small.)
“How do you think the value of the housing or rent in which you live will change after the inauguration of the First and Second Line of the Bogotá Metro?”
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P87"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P87 <- as.factor(regressor$P87)
regressor$P87<-relevel(regressor$P87, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P87~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 911.462080
## iter 20 value 887.642347
## iter 30 value 884.965908
## iter 40 value 883.553217
## iter 50 value 883.495178
## iter 60 value 883.427651
## iter 70 value 883.419003
## final value 883.418756
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 2.463* | 0.901 | 0.420 | 2.148 | 0.032 |
1 | pop_num | 0.842*** | -0.172 | 0.045 | -3.817 | 0.000 |
1 | major_trans_2020bicycle | 2.614 | 0.961 | 0.597 | 1.609 | 0.108 |
1 | major_trans_2020informal | 2.282 | 0.825 | 0.613 | 1.345 | 0.179 |
1 | major_trans_2020motorcyle | 1.276 | 0.243 | 0.312 | 0.780 | 0.436 |
1 | major_trans_2020personal_veh | 0.862 | -0.148 | 0.297 | -0.498 | 0.618 |
1 | major_trans_2020public_tansit | 1.249 | 0.222 | 0.191 | 1.160 | 0.246 |
1 | major_trans_2020taxi | 0.876 | -0.133 | 0.400 | -0.332 | 0.740 |
1 | major_trans_2020walking | 1.862* | 0.622 | 0.305 | 2.037 | 0.042 |
1 | incomeHigh | 0.522* | -0.650 | 0.307 | -2.117 | 0.034 |
1 | incomeLow | 1.016 | 0.015 | 0.211 | 0.073 | 0.942 |
1 | incomelower-mid | 1.021 | 0.021 | 0.179 | 0.116 | 0.907 |
1 | incomeUpper-mid | 0.902 | -0.103 | 0.204 | -0.505 | 0.614 |
1 | edu_attLowerSecondary | 1.139 | 0.130 | 0.193 | 0.673 | 0.501 |
1 | edu_attNA | 0.579 | -0.547 | 0.440 | -1.243 | 0.214 |
1 | edu_attPrimary | 0.831 | -0.186 | 0.239 | -0.776 | 0.437 |
1 | edu_attTechnological | 1.786* | 0.580 | 0.242 | 2.393 | 0.017 |
1 | edu_attUniversity | 0.934 | -0.069 | 0.210 | -0.326 | 0.744 |
1 | occupationemployed | 0.85 | -0.162 | 0.214 | -0.756 | 0.449 |
1 | occupationinformal | 0.949 | -0.053 | 0.304 | -0.173 | 0.863 |
1 | occupationNA | 1.462 | 0.380 | 0.637 | 0.596 | 0.551 |
1 | occupationself-employed | 1.042 | 0.041 | 0.200 | 0.205 | 0.838 |
1 | occupationstudent | 1.367 | 0.313 | 0.310 | 1.009 | 0.313 |
1 | age4 | 1.404 | 0.339 | 0.244 | 1.391 | 0.164 |
1 | age5 | 1.454 | 0.374 | 0.255 | 1.469 | 0.142 |
1 | age6 | 1.63 | 0.488 | 0.268 | 1.824 | 0.068 |
1 | age7 | 1.181 | 0.167 | 0.276 | 0.604 | 0.546 |
1 | age8 | 0.928 | -0.075 | 0.302 | -0.247 | 0.805 |
1 | ageunder_18 | 2.22* | 0.797 | 0.319 | 2.499 | 0.012 |
1 | gender1 | 1.092 | 0.088 | 0.129 | 0.682 | 0.495 |
1 | rent_own2 | 0.649** | -0.432 | 0.155 | -2.789 | 0.005 |
1 | live_timelong | 1.512* | 0.414 | 0.183 | 2.266 | 0.023 |
1 | live_timeshort | 1.09 | 0.086 | 0.161 | 0.534 | 0.593 |
3 | (Intercept) | 0.096 | -2.342 | 1.262 | -1.856 | 0.063 |
3 | pop_num | 0.958 | -0.042 | 0.129 | -0.328 | 0.743 |
3 | major_trans_2020bicycle | 0*** | -10.733 | 0.000 | -1633320.129 | 0.000 |
3 | major_trans_2020informal | 0*** | -12.674 | 0.000 | -9199391.811 | 0.000 |
3 | major_trans_2020motorcyle | 0.33 | -1.110 | 1.140 | -0.974 | 0.330 |
3 | major_trans_2020personal_veh | 1.212 | 0.193 | 0.684 | 0.281 | 0.778 |
3 | major_trans_2020public_tansit | 0.462 | -0.771 | 0.509 | -1.516 | 0.130 |
3 | major_trans_2020taxi | 0.779 | -0.250 | 1.234 | -0.203 | 0.839 |
3 | major_trans_2020walking | 2.789 | 1.026 | 0.620 | 1.655 | 0.098 |
3 | incomeHigh | 3.188 | 1.159 | 0.822 | 1.410 | 0.159 |
3 | incomeLow | 8.24** | 2.109 | 0.641 | 3.292 | 0.001 |
3 | incomelower-mid | 1.565 | 0.448 | 0.700 | 0.640 | 0.522 |
3 | incomeUpper-mid | 2.483 | 0.910 | 0.688 | 1.323 | 0.186 |
3 | edu_attLowerSecondary | 0.745 | -0.294 | 0.555 | -0.529 | 0.596 |
3 | edu_attNA | 2.237 | 0.805 | 0.958 | 0.840 | 0.401 |
3 | edu_attPrimary | 0.495 | -0.703 | 0.700 | -1.004 | 0.315 |
3 | edu_attTechnological | 0.658 | -0.418 | 0.749 | -0.558 | 0.577 |
3 | edu_attUniversity | 1.183 | 0.168 | 0.564 | 0.297 | 0.766 |
3 | occupationemployed | 1.937 | 0.661 | 0.619 | 1.067 | 0.286 |
3 | occupationinformal | 0.538 | -0.620 | 1.155 | -0.537 | 0.592 |
3 | occupationNA | 0*** | -12.913 | 0.000 | -29759751.992 | 0.000 |
3 | occupationself-employed | 1.674 | 0.515 | 0.552 | 0.934 | 0.350 |
3 | occupationstudent | 1.74 | 0.554 | 0.937 | 0.591 | 0.554 |
3 | age4 | 1.192 | 0.176 | 0.755 | 0.232 | 0.816 |
3 | age5 | 1.12 | 0.114 | 0.768 | 0.148 | 0.882 |
3 | age6 | 0.65 | -0.431 | 0.908 | -0.475 | 0.635 |
3 | age7 | 2.834 | 1.042 | 0.789 | 1.321 | 0.187 |
3 | age8 | 1.975 | 0.681 | 0.891 | 0.764 | 0.445 |
3 | ageunder_18 | 1.138 | 0.129 | 0.973 | 0.133 | 0.894 |
3 | gender1 | 0.639 | -0.448 | 0.369 | -1.212 | 0.226 |
3 | rent_own2 | 0.511 | -0.671 | 0.427 | -1.573 | 0.116 |
3 | live_timelong | 0.333* | -1.099 | 0.509 | -2.161 | 0.031 |
3 | live_timeshort | 0.488 | -0.717 | 0.447 | -1.603 | 0.109 |
P3
): For every one
additional person in household, the odds responding “increase” rather
than “not change” are 0.824 times greater – or in other words, the odds
of responding “increase” than “no change” decrease by 17.6% for
every additional person in the household, with holding other
variables constant. For every 1 additional person in household, the
log-odds of responding “increase” rather than “not change” increase by
0.4, with holding other variables constant.P12
): The log-odds of responding “increase” rather than
“not change” are 0.58 higher for people with a technological education
compared to those with upper secondary education, with holding other
variables constant. In other words, people with a technological
education have 78.6% higher odds of choosing “increase” than “not
change” compared to people with upper secondary education, with holding
other variables constant.P50
): The log-odds of
responding “increase” rather than “not change” are 0.65 lower for people
with high income compared to those with not report their income, with
holding other variables constant. In other words, people with
high income have 47.8% lower odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P42
): The log-odds of responding “increase” rather than
“not change” are 0.622 higher for people who walk as their major
transportation mode before 2020 compared to those who use other
transportation modes (not major), with holding other variables constant.
In other words, people who walk as their major transportation
mode before 2020 have 86.2% lower odds of choosing “increase” than “not
change” compared to people who use other transportation modes, with
holding other variables constant.Edad
): Although the young
population is also statistically significant, I am hesitate to include
this as reliable predictors since we join the person data to household
data, if household answer they prefer the choice metro is make the
housing value higher, then their kids corresponding results share the
same way. It’s not meaningful as the data nature of children do not
answer this question individually.Question: should we suggests
that family with young kids tend to prefer the statement that the metro
will increase the housing value?P82
): The log-odds of
responding “increase” rather than “not change” are 0.432 lower for
people who rent than those who own their home, with holding other
variables constant. In other words, renter have 35.1% lower odds
of choosing “increase” than “not change” compared to people who own
their home, with holding other variables constant.P83
): The log-odds of
responding “increase” rather than “not change” are 0.414 higher for
people who have lived in their current home for a long time compared to
those who have lived in their current home for a medium time, with
holding other variables constant. In other words, people who
have lived in their current home for a long time have 51.2% higher odds
of choosing “increase” than “not change” compared to people who have
lived in their current home for a medium time, with holding other
variables constant.P50
): The log-odds of
responding “decrease” rather than “not change” are 2.109 higher for
people with low income compared to those with not report their income,
with holding other variables constant. In other words, people
with low income have 724% higher odds of choosing “decrease” than “not
change” compared to people with not report their income, with holding
other variables constant. (Potential concerning as the
log-odds is very high, indicating that the sample size of this category
is too small to draw any conclusion.)P83
): The log-odds of
responding “decrease” rather than “not change” are 1.099 lower for
people who have lived in their current home for a long time compared to
those who have lived in their current home for a medium time, with
holding other variables constant. In other words, people who
have lived in their current home for a long time have 66.7% lower odds
of choosing “decrease” than “not change” compared to people who have
lived in their current home for a medium time, with holding other
variables constant.We would like to know your perception of the possible impacts of the First and Second Line of the Bogotá Metro once it is inaugurated and in operation. Please indicate your perception of each of the following statements:
Safety in the neighborhood.
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P90"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P90 <- as.factor(regressor$P90)
regressor$P90<-relevel(regressor$P90, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P90~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 1308.638464
## iter 20 value 1293.640602
## iter 30 value 1292.644919
## iter 40 value 1292.477482
## iter 50 value 1292.385358
## iter 60 value 1292.359217
## iter 70 value 1292.350499
## final value 1292.349966
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 0.671 | -0.398 | 0.455 | -0.876 | 0.381 |
1 | pop_num | 0.964 | -0.037 | 0.048 | -0.762 | 0.446 |
1 | major_trans_2020bicycle | 0.583 | -0.540 | 0.577 | -0.936 | 0.349 |
1 | major_trans_2020informal | 0.941 | -0.061 | 0.518 | -0.118 | 0.906 |
1 | major_trans_2020motorcyle | 0.858 | -0.153 | 0.326 | -0.469 | 0.639 |
1 | major_trans_2020personal_veh | 0.36** | -1.022 | 0.320 | -3.195 | 0.001 |
1 | major_trans_2020public_tansit | 0.64* | -0.447 | 0.206 | -2.169 | 0.030 |
1 | major_trans_2020taxi | 0.412* | -0.886 | 0.445 | -1.989 | 0.047 |
1 | major_trans_2020walking | 0.441** | -0.818 | 0.311 | -2.631 | 0.009 |
1 | incomeHigh | 3.902*** | 1.361 | 0.374 | 3.644 | 0.000 |
1 | incomeLow | 1.15 | 0.140 | 0.217 | 0.643 | 0.520 |
1 | incomelower-mid | 1.298 | 0.261 | 0.195 | 1.339 | 0.181 |
1 | incomeUpper-mid | 1.222 | 0.200 | 0.218 | 0.919 | 0.358 |
1 | edu_attLowerSecondary | 1.295 | 0.259 | 0.214 | 1.208 | 0.227 |
1 | edu_attNA | 1.411 | 0.344 | 0.478 | 0.720 | 0.472 |
1 | edu_attPrimary | 1.065 | 0.063 | 0.264 | 0.241 | 0.810 |
1 | edu_attTechnological | 1.593 | 0.466 | 0.255 | 1.823 | 0.068 |
1 | edu_attUniversity | 1.328 | 0.284 | 0.237 | 1.198 | 0.231 |
1 | occupationemployed | 0.748 | -0.290 | 0.232 | -1.250 | 0.211 |
1 | occupationinformal | 1.134 | 0.125 | 0.335 | 0.374 | 0.708 |
1 | occupationNA | 0.281 | -1.269 | 0.824 | -1.540 | 0.123 |
1 | occupationself-employed | 0.99 | -0.010 | 0.211 | -0.048 | 0.962 |
1 | occupationstudent | 0.991 | -0.009 | 0.333 | -0.028 | 0.978 |
1 | age4 | 1.031 | 0.030 | 0.272 | 0.112 | 0.911 |
1 | age5 | 1.253 | 0.225 | 0.280 | 0.805 | 0.421 |
1 | age6 | 1.436 | 0.362 | 0.286 | 1.264 | 0.206 |
1 | age7 | 1.328 | 0.284 | 0.303 | 0.937 | 0.349 |
1 | age8 | 1.856 | 0.618 | 0.328 | 1.886 | 0.059 |
1 | ageunder_18 | 1.31 | 0.270 | 0.333 | 0.811 | 0.417 |
1 | gender1 | 1.005 | 0.005 | 0.137 | 0.035 | 0.972 |
1 | rent_own2 | 0.911 | -0.093 | 0.161 | -0.576 | 0.565 |
1 | live_timelong | 1.055 | 0.053 | 0.190 | 0.281 | 0.778 |
1 | live_timeshort | 0.925 | -0.078 | 0.171 | -0.460 | 0.646 |
3 | (Intercept) | 1.309 | 0.269 | 0.493 | 0.545 | 0.585 |
3 | pop_num | 1.034 | 0.033 | 0.053 | 0.625 | 0.532 |
3 | major_trans_2020bicycle | 0.526 | -0.642 | 0.701 | -0.916 | 0.360 |
3 | major_trans_2020informal | 0*** | -11.447 | 0.000 | -1445177.635 | 0.000 |
3 | major_trans_2020motorcyle | 0.649 | -0.433 | 0.372 | -1.163 | 0.245 |
3 | major_trans_2020personal_veh | 0.247*** | -1.399 | 0.373 | -3.749 | 0.000 |
3 | major_trans_2020public_tansit | 0.676 | -0.391 | 0.229 | -1.706 | 0.088 |
3 | major_trans_2020taxi | 0.465 | -0.765 | 0.464 | -1.650 | 0.099 |
3 | major_trans_2020walking | 0.402** | -0.912 | 0.344 | -2.655 | 0.008 |
3 | incomeHigh | 3.391** | 1.221 | 0.393 | 3.111 | 0.002 |
3 | incomeLow | 0.441** | -0.818 | 0.263 | -3.113 | 0.002 |
3 | incomelower-mid | 1.256 | 0.228 | 0.200 | 1.141 | 0.254 |
3 | incomeUpper-mid | 0.861 | -0.149 | 0.234 | -0.638 | 0.524 |
3 | edu_attLowerSecondary | 0.47** | -0.754 | 0.226 | -3.337 | 0.001 |
3 | edu_attNA | 0.457 | -0.784 | 0.552 | -1.420 | 0.155 |
3 | edu_attPrimary | 0.483** | -0.727 | 0.280 | -2.601 | 0.009 |
3 | edu_attTechnological | 0.975 | -0.026 | 0.259 | -0.100 | 0.921 |
3 | edu_attUniversity | 0.804 | -0.219 | 0.239 | -0.916 | 0.360 |
3 | occupationemployed | 0.528* | -0.639 | 0.249 | -2.565 | 0.010 |
3 | occupationinformal | 1.162 | 0.150 | 0.350 | 0.430 | 0.667 |
3 | occupationNA | 1.364 | 0.311 | 0.637 | 0.488 | 0.625 |
3 | occupationself-employed | 0.68 | -0.386 | 0.233 | -1.656 | 0.098 |
3 | occupationstudent | 0.56 | -0.579 | 0.350 | -1.653 | 0.098 |
3 | age4 | 1.386 | 0.327 | 0.291 | 1.122 | 0.262 |
3 | age5 | 1.136 | 0.128 | 0.311 | 0.411 | 0.681 |
3 | age6 | 0.961 | -0.040 | 0.330 | -0.122 | 0.903 |
3 | age7 | 1.683 | 0.520 | 0.326 | 1.597 | 0.110 |
3 | age8 | 1.161 | 0.149 | 0.369 | 0.405 | 0.685 |
3 | ageunder_18 | 2.707** | 0.996 | 0.353 | 2.819 | 0.005 |
3 | gender1 | 0.93 | -0.073 | 0.151 | -0.484 | 0.628 |
3 | rent_own2 | 0.833 | -0.183 | 0.177 | -1.034 | 0.301 |
3 | live_timelong | 1.441 | 0.365 | 0.205 | 1.778 | 0.075 |
3 | live_timeshort | 0.772 | -0.259 | 0.194 | -1.336 | 0.182 |
P42
): People who use
personal vehicle as their major transportation mode before 2020 have 64%
lower odds of choosing “increase” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who use
public transit as their major transportation mode before 2020 have 36%
lower odds of choosing “increase” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who use taxi as
their major transportation mode before 2020 have 59.8% lower odds of
choosing “increase” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P42
): People who walk as their
major transportation mode before 2020 have 55.9% lower odds of choosing
“increase” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with high
income have 2.9 times higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant. (highly significant, may cause by unproper
reference or small sample size)P42
): People who use
personal vehicle as their major transportation mode before 2020 have
75.3% lower odds of choosing “decrease” than “not change” compared to
people who use other transportation modes, with holding other variables
constant. (Since both “decrease” and “increased” are negative, this
means that people who use personal vehicle as their major transportation
mode before 2020 are likely to choose “not change” than change compared
to people who use other transportation modes. further reference category
need to choose to further examine the hypothesis)
-walking (P42
): People who walk as their
major transportation mode before 2020 have 59.8% lower odds of choosing
“decrease” than “not change” compared to people who use other
transportation modes, with holding other variables constant.(Note:
similar to personal vehicle).P50
): People with high
income have 2.39 times higher odds of choosing “decrease” than “not
change” compared to people with not report their income, with holding
other variables constant. (same direction than previous)P50
): People with low
income have 55.9% lower odds of choosing “decrease” than “not change”
compared to people with not report their income, with holding other
variables constant. (highly significant, may cause by unproper
reference or small sample size)P12
): People with
lower secondary education have 53% lower odds of choosing “decrease”
than “not change” compared to people with upper secondary education,
with holding other variables constant.P12
): People with primary
education have 51.7% lower odds of choosing “decrease” than “not change”
compared to people with upper secondary education, with holding other
variables constant.P14
): People who
are employed (formal job) have 47.2% lower odds of choosing “decrease”
than “not change” compared to people who are unemployed, with holding
other variables constant.Edad
): People who are
under 18 years old have 1.7 times higher odds of choosing “decrease”
than “not change” compared to people who are over 65 years old, with
holding other variables constant. (data structure
uncertain)Statement : Cost of living in the neighborhood.
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P91"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P91 <- as.factor(regressor$P91)
regressor$P91<-relevel(regressor$P91, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P91~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 977.490844
## iter 20 value 896.087631
## iter 30 value 893.815932
## iter 40 value 893.233630
## iter 50 value 892.994556
## iter 60 value 892.869042
## iter 70 value 892.799358
## final value 892.797214
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 1.719 | 0.542 | 0.414 | 1.308 | 0.191 |
1 | pop_num | 0.953 | -0.048 | 0.044 | -1.086 | 0.277 |
1 | major_trans_2020bicycle | 0.65 | -0.431 | 0.562 | -0.767 | 0.443 |
1 | major_trans_2020informal | 1.181 | 0.167 | 0.610 | 0.273 | 0.785 |
1 | major_trans_2020motorcyle | 1.249 | 0.222 | 0.350 | 0.634 | 0.526 |
1 | major_trans_2020personal_veh | 0.502* | -0.689 | 0.304 | -2.269 | 0.023 |
1 | major_trans_2020public_tansit | 0.475*** | -0.744 | 0.203 | -3.665 | 0.000 |
1 | major_trans_2020taxi | 0.16*** | -1.831 | 0.395 | -4.631 | 0.000 |
1 | major_trans_2020walking | 0.742 | -0.298 | 0.294 | -1.015 | 0.310 |
1 | incomeHigh | 7.065*** | 1.955 | 0.434 | 4.509 | 0.000 |
1 | incomeLow | 1.124 | 0.117 | 0.194 | 0.604 | 0.546 |
1 | incomelower-mid | 1.948*** | 0.667 | 0.173 | 3.862 | 0.000 |
1 | incomeUpper-mid | 1.805** | 0.590 | 0.197 | 3.004 | 0.003 |
1 | edu_attLowerSecondary | 1.004 | 0.004 | 0.184 | 0.024 | 0.981 |
1 | edu_attNA | 0.869 | -0.140 | 0.430 | -0.325 | 0.745 |
1 | edu_attPrimary | 1.487 | 0.397 | 0.239 | 1.663 | 0.096 |
1 | edu_attTechnological | 1.319 | 0.277 | 0.226 | 1.223 | 0.221 |
1 | edu_attUniversity | 1.333 | 0.288 | 0.206 | 1.398 | 0.162 |
1 | occupationemployed | 0.799 | -0.225 | 0.212 | -1.062 | 0.288 |
1 | occupationinformal | 0.87 | -0.139 | 0.300 | -0.465 | 0.642 |
1 | occupationNA | 1.465 | 0.382 | 0.593 | 0.645 | 0.519 |
1 | occupationself-employed | 0.925 | -0.078 | 0.197 | -0.394 | 0.694 |
1 | occupationstudent | 0.883 | -0.125 | 0.295 | -0.424 | 0.672 |
1 | age4 | 1.031 | 0.031 | 0.237 | 0.131 | 0.896 |
1 | age5 | 1.382 | 0.324 | 0.252 | 1.286 | 0.198 |
1 | age6 | 1.442 | 0.366 | 0.261 | 1.401 | 0.161 |
1 | age7 | 1.194 | 0.177 | 0.271 | 0.653 | 0.513 |
1 | age8 | 0.811 | -0.210 | 0.299 | -0.703 | 0.482 |
1 | ageunder_18 | 1.587 | 0.462 | 0.298 | 1.553 | 0.120 |
1 | gender1 | 0.977 | -0.023 | 0.126 | -0.182 | 0.856 |
1 | rent_own2 | 1.185 | 0.170 | 0.149 | 1.140 | 0.254 |
1 | live_timelong | 1.539* | 0.431 | 0.176 | 2.447 | 0.014 |
1 | live_timeshort | 0.796 | -0.229 | 0.159 | -1.437 | 0.151 |
3 | (Intercept) | 0.067* | -2.702 | 1.358 | -1.989 | 0.047 |
3 | pop_num | 1.219 | 0.198 | 0.138 | 1.439 | 0.150 |
3 | major_trans_2020bicycle | 5.208 | 1.650 | 0.987 | 1.672 | 0.095 |
3 | major_trans_2020informal | 0*** | -27.288 | 0.000 | -141797477386499.375 | 0.000 |
3 | major_trans_2020motorcyle | 0*** | -36.442 | 0.000 | -12053977956368364.000 | 0.000 |
3 | major_trans_2020personal_veh | 0.707 | -0.346 | 0.854 | -0.405 | 0.685 |
3 | major_trans_2020public_tansit | 0.563 | -0.575 | 0.573 | -1.003 | 0.316 |
3 | major_trans_2020taxi | 0 | -53.793 | NaN | NaN | NaN |
3 | major_trans_2020walking | 1.384 | 0.325 | 0.749 | 0.434 | 0.664 |
3 | incomeHigh | 1.906 | 0.645 | 0.890 | 0.725 | 0.469 |
3 | incomeLow | 0*** | -15.072 | 0.000 | -5161725.858 | 0.000 |
3 | incomelower-mid | 0.51 | -0.674 | 0.433 | -1.555 | 0.120 |
3 | incomeUpper-mid | 0.138* | -1.977 | 0.789 | -2.505 | 0.012 |
3 | edu_attLowerSecondary | 1.224 | 0.202 | 0.571 | 0.355 | 0.723 |
3 | edu_attNA | 0 | -44.089 | NaN | NaN | NaN |
3 | edu_attPrimary | 1.301 | 0.263 | 0.700 | 0.376 | 0.707 |
3 | edu_attTechnological | 2.148 | 0.765 | 0.663 | 1.154 | 0.249 |
3 | edu_attUniversity | 1.052 | 0.050 | 0.688 | 0.073 | 0.942 |
3 | occupationemployed | 0.425 | -0.857 | 0.680 | -1.259 | 0.208 |
3 | occupationinformal | 2.465 | 0.902 | 0.783 | 1.152 | 0.249 |
3 | occupationNA | 0 | -43.764 | NaN | NaN | NaN |
3 | occupationself-employed | 1.213 | 0.193 | 0.560 | 0.345 | 0.730 |
3 | occupationstudent | 0.62 | -0.478 | 1.150 | -0.416 | 0.677 |
3 | age4 | 1.581 | 0.458 | 0.983 | 0.466 | 0.641 |
3 | age5 | 1.818 | 0.598 | 0.990 | 0.604 | 0.546 |
3 | age6 | 1.172 | 0.159 | 1.085 | 0.146 | 0.884 |
3 | age7 | 1.819 | 0.598 | 1.031 | 0.581 | 0.562 |
3 | age8 | 4.459 | 1.495 | 1.040 | 1.437 | 0.151 |
3 | ageunder_18 | 3.013 | 1.103 | 1.090 | 1.012 | 0.311 |
3 | gender1 | 0.736 | -0.306 | 0.378 | -0.810 | 0.418 |
3 | rent_own2 | 2.327 | 0.845 | 0.467 | 1.810 | 0.070 |
3 | live_timelong | 0.516 | -0.661 | 0.533 | -1.241 | 0.215 |
3 | live_timeshort | 0.479 | -0.735 | 0.441 | -1.666 | 0.096 |
P42
): People who use
public transit as their major transportation mode before 2020 have 52.5%
higher odds of choosing “increase” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who use
personal vehicle as their major transportation mode before 2020 have 50%
lower odds of choosing “increase” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who use taxi as
their major transportation mode before 2020 have 84% lower odds of
choosing “increase” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with high
income have 6 times higher odds of choosing “increase” than “not change”
compared to people with not report their income, with holding other
variables constant. (highly significant, may cause by unproper
reference or small sample size)P50
): People with
low-mid income have 94.8% higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P50
): People with
upper-mid income have 80.5% higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P83
): People who have
lived in their current home for a long time have 53.9% higher odds of
choosing “increase” than “not change” compared to people who have lived
in their current home for a medium time, with holding other variables
constant.P50
): People with
upper-mid income have 86.2% lower odds of choosing “decrease” than “not
change” compared to people with not report their income, with holding
other variables constant.Statement: Local commerce (formal and informal).
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P92"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P92 <- as.factor(regressor$P92)
regressor$P92<-relevel(regressor$P92, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P92~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 1144.388625
## iter 20 value 1090.869323
## iter 30 value 1081.913837
## iter 40 value 1081.034325
## iter 50 value 1080.930365
## iter 60 value 1080.825708
## iter 70 value 1080.817099
## final value 1080.816874
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 0.708 | -0.345 | 0.415 | -0.832 | 0.406 |
1 | pop_num | 1.025 | 0.025 | 0.044 | 0.556 | 0.578 |
1 | major_trans_2020bicycle | 16818800.114*** | 16.638 | 0.366 | 45.521 | 0.000 |
1 | major_trans_2020informal | 0.853 | -0.160 | 0.537 | -0.297 | 0.766 |
1 | major_trans_2020motorcyle | 2.993** | 1.096 | 0.358 | 3.064 | 0.002 |
1 | major_trans_2020personal_veh | 1.374 | 0.318 | 0.309 | 1.030 | 0.303 |
1 | major_trans_2020public_tansit | 0.784 | -0.244 | 0.193 | -1.259 | 0.208 |
1 | major_trans_2020taxi | 1.299 | 0.262 | 0.408 | 0.641 | 0.522 |
1 | major_trans_2020walking | 0.882 | -0.125 | 0.279 | -0.449 | 0.654 |
1 | incomeHigh | 3.248** | 1.178 | 0.362 | 3.255 | 0.001 |
1 | incomeLow | 1.521* | 0.419 | 0.200 | 2.093 | 0.036 |
1 | incomelower-mid | 1.366 | 0.312 | 0.176 | 1.771 | 0.077 |
1 | incomeUpper-mid | 1.676* | 0.516 | 0.200 | 2.578 | 0.010 |
1 | edu_attLowerSecondary | 1.598* | 0.469 | 0.187 | 2.512 | 0.012 |
1 | edu_attNA | 0.823 | -0.195 | 0.426 | -0.459 | 0.646 |
1 | edu_attPrimary | 1.566 | 0.448 | 0.238 | 1.882 | 0.060 |
1 | edu_attTechnological | 1.263 | 0.234 | 0.226 | 1.034 | 0.301 |
1 | edu_attUniversity | 1.499 | 0.405 | 0.207 | 1.957 | 0.050 |
1 | occupationemployed | 0.852 | -0.160 | 0.213 | -0.750 | 0.453 |
1 | occupationinformal | 0.76 | -0.275 | 0.302 | -0.912 | 0.362 |
1 | occupationNA | 0.78 | -0.248 | 0.573 | -0.433 | 0.665 |
1 | occupationself-employed | 0.79 | -0.236 | 0.197 | -1.203 | 0.229 |
1 | occupationstudent | 1.338 | 0.291 | 0.303 | 0.961 | 0.337 |
1 | age4 | 0.984 | -0.016 | 0.244 | -0.067 | 0.947 |
1 | age5 | 0.943 | -0.059 | 0.254 | -0.231 | 0.817 |
1 | age6 | 1.209 | 0.190 | 0.265 | 0.716 | 0.474 |
1 | age7 | 1.065 | 0.063 | 0.277 | 0.226 | 0.821 |
1 | age8 | 0.618 | -0.482 | 0.301 | -1.604 | 0.109 |
1 | ageunder_18 | 0.983 | -0.017 | 0.306 | -0.057 | 0.955 |
1 | gender1 | 1.073 | 0.070 | 0.127 | 0.553 | 0.580 |
1 | rent_own2 | 0.881 | -0.127 | 0.149 | -0.854 | 0.393 |
1 | live_timelong | 1.511* | 0.413 | 0.175 | 2.363 | 0.018 |
1 | live_timeshort | 1.362 | 0.309 | 0.160 | 1.935 | 0.053 |
3 | (Intercept) | 1.111 | 0.105 | 0.726 | 0.145 | 0.885 |
3 | pop_num | 0.869 | -0.140 | 0.086 | -1.641 | 0.101 |
3 | major_trans_2020bicycle | 12287884.951*** | 16.324 | 0.366 | 44.662 | 0.000 |
3 | major_trans_2020informal | 0*** | -12.633 | 0.000 | -6898216.756 | 0.000 |
3 | major_trans_2020motorcyle | 0.352 | -1.045 | 0.815 | -1.282 | 0.200 |
3 | major_trans_2020personal_veh | 0.841 | -0.173 | 0.473 | -0.366 | 0.714 |
3 | major_trans_2020public_tansit | 0.475* | -0.745 | 0.315 | -2.365 | 0.018 |
3 | major_trans_2020taxi | 0.286 | -1.252 | 0.833 | -1.503 | 0.133 |
3 | major_trans_2020walking | 0.347* | -1.059 | 0.532 | -1.990 | 0.047 |
3 | incomeHigh | 1.11 | 0.104 | 0.635 | 0.164 | 0.870 |
3 | incomeLow | 0.218** | -1.523 | 0.458 | -3.323 | 0.001 |
3 | incomelower-mid | 0.792 | -0.233 | 0.280 | -0.833 | 0.405 |
3 | incomeUpper-mid | 0.64 | -0.447 | 0.352 | -1.271 | 0.204 |
3 | edu_attLowerSecondary | 1.003 | 0.003 | 0.360 | 0.009 | 0.993 |
3 | edu_attNA | 0.437 | -0.828 | 1.094 | -0.757 | 0.449 |
3 | edu_attPrimary | 2.014 | 0.700 | 0.408 | 1.717 | 0.086 |
3 | edu_attTechnological | 1.465 | 0.382 | 0.392 | 0.974 | 0.330 |
3 | edu_attUniversity | 1.095 | 0.091 | 0.386 | 0.235 | 0.814 |
3 | occupationemployed | 0.646 | -0.437 | 0.382 | -1.144 | 0.253 |
3 | occupationinformal | 0.864 | -0.146 | 0.553 | -0.265 | 0.791 |
3 | occupationNA | 1.374 | 0.317 | 1.143 | 0.278 | 0.781 |
3 | occupationself-employed | 0.875 | -0.133 | 0.339 | -0.394 | 0.694 |
3 | occupationstudent | 1.094 | 0.090 | 0.552 | 0.164 | 0.870 |
3 | age4 | 0.941 | -0.061 | 0.464 | -0.131 | 0.896 |
3 | age5 | 0.707 | -0.346 | 0.496 | -0.698 | 0.485 |
3 | age6 | 0.891 | -0.115 | 0.506 | -0.228 | 0.820 |
3 | age7 | 1.595 | 0.467 | 0.480 | 0.973 | 0.331 |
3 | age8 | 0.628 | -0.466 | 0.554 | -0.842 | 0.400 |
3 | ageunder_18 | 0.769 | -0.263 | 0.575 | -0.457 | 0.648 |
3 | gender1 | 0.921 | -0.082 | 0.227 | -0.362 | 0.717 |
3 | rent_own2 | 1.101 | 0.096 | 0.267 | 0.361 | 0.718 |
3 | live_timelong | 1.152 | 0.141 | 0.298 | 0.474 | 0.636 |
3 | live_timeshort | 0.598 | -0.513 | 0.284 | -1.807 | 0.071 |
P42
): People who use
motorcycle as their major transportation mode before 2020 have 1.9 times
higher odds of choosing “increase” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P50
): People with high
income have 2.24 times higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P50
): People with
upper-mid income have 67.6% higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P50
): People with
lower-mid income have 36.6% higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant. (Note: p_values =0.077, not met threshold
but close enough)P50
): People with low
income have 52.1% higher odds of choosing “increase” than “not change”
compared to people with not report their income, with holding other
variables constant.P12
): People with
lower secondary education have 59.8% higher odds of choosing “increase”
than “not change” compared to people with upper secondary education,
with holding other variables constant.P12
): People with primary
education have 56.6% higher odds of choosing “increase” than “not
change” compared to people with upper secondary education, with holding
other variables constant. (Note: p_values = 0.06, not met threshold
but close enough)P12
): People with
university education have 49.9% higher odds of choosing “increase” than
“not change” compared to people with upper secondary education, with
holding other variables constant.P42
): People who use
public transit as their major transportation mode before 2020 have 52.5%
lower odds of choosing “decrease” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who walk as
their major transportation mode before 2020 have 65.3% lower odds of
choosing “decrease” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with low
income have 78.2% lower odds of choosing “decrease” than “not change”
compared to people with not report their income, with holding other
variables constant.Statement: Satisfaction with public transportation.
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P95"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P95 <- as.factor(regressor$P95)
regressor$P95 <- relevel(regressor$P95, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P95~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 1279.093946
## iter 20 value 1239.584340
## iter 30 value 1235.730436
## iter 40 value 1235.456147
## iter 50 value 1235.274705
## iter 60 value 1235.177883
## iter 70 value 1235.150669
## final value 1235.149798
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 1.984 | 0.685 | 0.433 | 1.584 | 0.113 |
1 | pop_num | 0.933 | -0.069 | 0.046 | -1.508 | 0.132 |
1 | major_trans_2020bicycle | 1621098.969*** | 14.299 | 0.275 | 52.022 | 0.000 |
1 | major_trans_2020informal | 0.47 | -0.755 | 0.801 | -0.943 | 0.346 |
1 | major_trans_2020motorcyle | 1.084 | 0.081 | 0.324 | 0.249 | 0.803 |
1 | major_trans_2020personal_veh | 1.055 | 0.053 | 0.317 | 0.168 | 0.866 |
1 | major_trans_2020public_tansit | 0.582** | -0.542 | 0.199 | -2.724 | 0.006 |
1 | major_trans_2020taxi | 0.575 | -0.553 | 0.455 | -1.215 | 0.224 |
1 | major_trans_2020walking | 0.399** | -0.919 | 0.286 | -3.210 | 0.001 |
1 | incomeHigh | 3.432*** | 1.233 | 0.342 | 3.605 | 0.000 |
1 | incomeLow | 0.968 | -0.033 | 0.206 | -0.160 | 0.873 |
1 | incomelower-mid | 1.345 | 0.296 | 0.182 | 1.628 | 0.103 |
1 | incomeUpper-mid | 1.481 | 0.393 | 0.205 | 1.917 | 0.055 |
1 | edu_attLowerSecondary | 0.734 | -0.310 | 0.195 | -1.585 | 0.113 |
1 | edu_attNA | 1.28 | 0.247 | 0.485 | 0.510 | 0.610 |
1 | edu_attPrimary | 0.576* | -0.552 | 0.243 | -2.272 | 0.023 |
1 | edu_attTechnological | 1.14 | 0.131 | 0.237 | 0.554 | 0.580 |
1 | edu_attUniversity | 0.793 | -0.232 | 0.214 | -1.084 | 0.278 |
1 | occupationemployed | 0.993 | -0.007 | 0.222 | -0.033 | 0.974 |
1 | occupationinformal | 0.898 | -0.108 | 0.321 | -0.336 | 0.737 |
1 | occupationNA | 0.785 | -0.242 | 0.571 | -0.424 | 0.672 |
1 | occupationself-employed | 0.893 | -0.114 | 0.205 | -0.553 | 0.580 |
1 | occupationstudent | 1.043 | 0.042 | 0.316 | 0.132 | 0.895 |
1 | age4 | 0.596* | -0.517 | 0.259 | -1.998 | 0.046 |
1 | age5 | 0.956 | -0.045 | 0.268 | -0.168 | 0.867 |
1 | age6 | 1.03 | 0.029 | 0.279 | 0.104 | 0.917 |
1 | age7 | 0.79 | -0.236 | 0.291 | -0.811 | 0.417 |
1 | age8 | 0.877 | -0.131 | 0.318 | -0.413 | 0.679 |
1 | ageunder_18 | 1.196 | 0.179 | 0.313 | 0.572 | 0.568 |
1 | gender1 | 1.086 | 0.083 | 0.131 | 0.632 | 0.527 |
1 | rent_own2 | 1.153 | 0.143 | 0.152 | 0.937 | 0.349 |
1 | live_timelong | 1.247 | 0.220 | 0.180 | 1.225 | 0.221 |
1 | live_timeshort | 1.272 | 0.241 | 0.164 | 1.466 | 0.143 |
3 | (Intercept) | 0.833 | -0.183 | 0.561 | -0.326 | 0.744 |
3 | pop_num | 1.061 | 0.059 | 0.061 | 0.972 | 0.331 |
3 | major_trans_2020bicycle | 3158765.297*** | 14.966 | 0.275 | 54.449 | 0.000 |
3 | major_trans_2020informal | 9.335** | 2.234 | 0.687 | 3.253 | 0.001 |
3 | major_trans_2020motorcyle | 0.727 | -0.319 | 0.492 | -0.647 | 0.517 |
3 | major_trans_2020personal_veh | 1.576 | 0.455 | 0.414 | 1.099 | 0.272 |
3 | major_trans_2020public_tansit | 0.786 | -0.241 | 0.276 | -0.875 | 0.382 |
3 | major_trans_2020taxi | 2.501 | 0.917 | 0.485 | 1.890 | 0.059 |
3 | major_trans_2020walking | 0.512 | -0.670 | 0.415 | -1.614 | 0.106 |
3 | incomeHigh | 0.26 | -1.348 | 0.781 | -1.725 | 0.084 |
3 | incomeLow | 0.484* | -0.726 | 0.286 | -2.533 | 0.011 |
3 | incomelower-mid | 1.066 | 0.064 | 0.227 | 0.282 | 0.778 |
3 | incomeUpper-mid | 0.786 | -0.241 | 0.273 | -0.884 | 0.377 |
3 | edu_attLowerSecondary | 1.172 | 0.159 | 0.263 | 0.605 | 0.545 |
3 | edu_attNA | 1.83 | 0.604 | 0.622 | 0.972 | 0.331 |
3 | edu_attPrimary | 0.829 | -0.187 | 0.332 | -0.565 | 0.572 |
3 | edu_attTechnological | 1.359 | 0.307 | 0.319 | 0.963 | 0.335 |
3 | edu_attUniversity | 0.906 | -0.098 | 0.298 | -0.330 | 0.741 |
3 | occupationemployed | 0.568* | -0.565 | 0.288 | -1.965 | 0.049 |
3 | occupationinformal | 0.833 | -0.182 | 0.389 | -0.469 | 0.639 |
3 | occupationNA | 0*** | -14.859 | 0.000 | -32932964.338 | 0.000 |
3 | occupationself-employed | 0.628 | -0.466 | 0.261 | -1.783 | 0.075 |
3 | occupationstudent | 0.852 | -0.160 | 0.397 | -0.403 | 0.687 |
3 | age4 | 0.711 | -0.341 | 0.319 | -1.069 | 0.285 |
3 | age5 | 0.459* | -0.779 | 0.362 | -2.152 | 0.031 |
3 | age6 | 0.607 | -0.499 | 0.363 | -1.376 | 0.169 |
3 | age7 | 0.705 | -0.350 | 0.360 | -0.972 | 0.331 |
3 | age8 | 0.577 | -0.550 | 0.406 | -1.355 | 0.175 |
3 | ageunder_18 | 0.44 | -0.821 | 0.418 | -1.964 | 0.050 |
3 | gender1 | 1.289 | 0.254 | 0.174 | 1.460 | 0.144 |
3 | rent_own2 | 1.21 | 0.191 | 0.205 | 0.927 | 0.354 |
3 | live_timelong | 0.957 | -0.044 | 0.235 | -0.187 | 0.852 |
3 | live_timeshort | 0.888 | -0.118 | 0.220 | -0.539 | 0.590 |
P42
): People who use
public transit as their major transportation mode before 2020 have 41.8%
higher odds of choosing “increase” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who walk as
their major transportation mode before 2020 have 60.1% higher odds of
choosing “increase” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with high
income have 2.432 times higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P12
): People with
primary education have 42.4% lower odds of choosing “increase” than “not
change” compared to people with upper secondary education, with holding
other variables constant.Edad
): People aged 4 have 40.4%
lower odds of choosing “increase” than “not change” compared to people
aged 18-24, with holding other variables constant.P42
): People
who use informal transportation as their major transportation mode
before 2020 have 8.335 higher odds of choosing “decrease” than “not
change” compared to people who use other transportation modes, with
holding other variables constant.P50
): People with low
income have 51.6% lower odds of choosing “decrease” than “not change”
compared to people with not report their income, with holding other
variables constant.P14
): People who
are employed (formal job) have 53.2% lower odds of choosing “decrease”
than “not change” compared to people who are unemployed, with holding
other variables constant.Edad
): People aged 50-64 have
54.1% lower odds of choosing “decrease” than “not change” compared to
people aged 18-24, with holding other variables constant.Statement: Travel time/commute to your most frequent travel destination.
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P96"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P96 <- as.factor(regressor$P96)
regressor$P96 <- relevel(regressor$P96, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P96~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 1298.076800
## iter 20 value 1269.850620
## iter 30 value 1267.516267
## iter 40 value 1267.127998
## iter 50 value 1266.877008
## iter 60 value 1266.808287
## iter 70 value 1266.780560
## final value 1266.778376
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 0.477 | -0.741 | 0.545 | -1.361 | 0.173 |
1 | pop_num | 1.189** | 0.173 | 0.057 | 3.023 | 0.003 |
1 | major_trans_2020bicycle | 0.692 | -0.369 | 0.677 | -0.544 | 0.586 |
1 | major_trans_2020informal | 0*** | -14.158 | 0.000 | -19154853.704 | 0.000 |
1 | major_trans_2020motorcyle | 2.502* | 0.917 | 0.420 | 2.183 | 0.029 |
1 | major_trans_2020personal_veh | 1.515 | 0.415 | 0.390 | 1.065 | 0.287 |
1 | major_trans_2020public_tansit | 1.211 | 0.191 | 0.244 | 0.784 | 0.433 |
1 | major_trans_2020taxi | 0*** | -16.292 | 0.000 | -238764941.558 | 0.000 |
1 | major_trans_2020walking | 0.421* | -0.864 | 0.383 | -2.256 | 0.024 |
1 | incomeHigh | 0.666 | -0.406 | 0.411 | -0.989 | 0.323 |
1 | incomeLow | 1.239 | 0.214 | 0.244 | 0.877 | 0.380 |
1 | incomelower-mid | 1.173 | 0.159 | 0.230 | 0.694 | 0.488 |
1 | incomeUpper-mid | 0.525* | -0.645 | 0.272 | -2.369 | 0.018 |
1 | edu_attLowerSecondary | 0.628 | -0.465 | 0.243 | -1.909 | 0.056 |
1 | edu_attNA | 1.401 | 0.337 | 0.546 | 0.617 | 0.537 |
1 | edu_attPrimary | 0.602 | -0.508 | 0.302 | -1.685 | 0.092 |
1 | edu_attTechnological | 0.595 | -0.518 | 0.309 | -1.679 | 0.093 |
1 | edu_attUniversity | 0.943 | -0.059 | 0.267 | -0.220 | 0.826 |
1 | occupationemployed | 0.709 | -0.345 | 0.276 | -1.249 | 0.212 |
1 | occupationinformal | 0.307** | -1.182 | 0.419 | -2.822 | 0.005 |
1 | occupationNA | 0.929 | -0.074 | 0.683 | -0.108 | 0.914 |
1 | occupationself-employed | 0.469** | -0.758 | 0.261 | -2.903 | 0.004 |
1 | occupationstudent | 0.39* | -0.943 | 0.400 | -2.359 | 0.018 |
1 | age4 | 0.926 | -0.077 | 0.332 | -0.233 | 0.816 |
1 | age5 | 1.224 | 0.202 | 0.350 | 0.578 | 0.563 |
1 | age6 | 1.158 | 0.146 | 0.355 | 0.412 | 0.680 |
1 | age7 | 0.847 | -0.166 | 0.374 | -0.444 | 0.657 |
1 | age8 | 1.061 | 0.059 | 0.410 | 0.145 | 0.885 |
1 | ageunder_18 | 1.607 | 0.474 | 0.399 | 1.187 | 0.235 |
1 | gender1 | 1.168 | 0.155 | 0.166 | 0.934 | 0.350 |
1 | rent_own2 | 1.848** | 0.614 | 0.198 | 3.108 | 0.002 |
1 | live_timelong | 1.034 | 0.033 | 0.227 | 0.145 | 0.884 |
1 | live_timeshort | 0.796 | -0.228 | 0.207 | -1.099 | 0.272 |
3 | (Intercept) | 1.352 | 0.301 | 0.443 | 0.680 | 0.497 |
3 | pop_num | 1.003 | 0.003 | 0.048 | 0.066 | 0.947 |
3 | major_trans_2020bicycle | 0.803 | -0.220 | 0.564 | -0.390 | 0.697 |
3 | major_trans_2020informal | 1.964 | 0.675 | 0.537 | 1.257 | 0.209 |
3 | major_trans_2020motorcyle | 2.333* | 0.847 | 0.357 | 2.373 | 0.018 |
3 | major_trans_2020personal_veh | 1.543 | 0.433 | 0.314 | 1.379 | 0.168 |
3 | major_trans_2020public_tansit | 1.224 | 0.202 | 0.204 | 0.988 | 0.323 |
3 | major_trans_2020taxi | 1.828 | 0.603 | 0.410 | 1.470 | 0.141 |
3 | major_trans_2020walking | 0.557* | -0.585 | 0.293 | -1.999 | 0.046 |
3 | incomeHigh | 0.973 | -0.028 | 0.330 | -0.084 | 0.933 |
3 | incomeLow | 0.814 | -0.205 | 0.218 | -0.943 | 0.346 |
3 | incomelower-mid | 1.995*** | 0.691 | 0.187 | 3.693 | 0.000 |
3 | incomeUpper-mid | 1.153 | 0.143 | 0.205 | 0.695 | 0.487 |
3 | edu_attLowerSecondary | 0.97 | -0.031 | 0.205 | -0.149 | 0.882 |
3 | edu_attNA | 0.851 | -0.162 | 0.530 | -0.306 | 0.760 |
3 | edu_attPrimary | 0.768 | -0.264 | 0.256 | -1.032 | 0.302 |
3 | edu_attTechnological | 1.144 | 0.135 | 0.243 | 0.556 | 0.578 |
3 | edu_attUniversity | 1.179 | 0.165 | 0.226 | 0.729 | 0.466 |
3 | occupationemployed | 0.781 | -0.247 | 0.232 | -1.065 | 0.287 |
3 | occupationinformal | 0.636 | -0.452 | 0.320 | -1.411 | 0.158 |
3 | occupationNA | 0.424 | -0.857 | 0.710 | -1.208 | 0.227 |
3 | occupationself-employed | 0.649* | -0.433 | 0.211 | -2.048 | 0.041 |
3 | occupationstudent | 0.525* | -0.644 | 0.325 | -1.980 | 0.048 |
3 | age4 | 0.707 | -0.346 | 0.262 | -1.324 | 0.185 |
3 | age5 | 0.899 | -0.106 | 0.276 | -0.385 | 0.700 |
3 | age6 | 0.666 | -0.407 | 0.285 | -1.426 | 0.154 |
3 | age7 | 0.59 | -0.528 | 0.292 | -1.809 | 0.071 |
3 | age8 | 0.849 | -0.163 | 0.327 | -0.500 | 0.617 |
3 | ageunder_18 | 1.23 | 0.207 | 0.321 | 0.644 | 0.519 |
3 | gender1 | 0.985 | -0.015 | 0.134 | -0.114 | 0.909 |
3 | rent_own2 | 1.195 | 0.178 | 0.158 | 1.127 | 0.260 |
3 | live_timelong | 1.17 | 0.157 | 0.185 | 0.846 | 0.398 |
3 | live_timeshort | 1.095 | 0.091 | 0.170 | 0.535 | 0.593 |
P3
): For every one
additional person in household, the odds of responding “increase” than
“no change” increase by 18.9% for every additional person in the
household, with holding other variables constant.P42
): People who use
motorcycle as their major transportation mode before 2020 have 1.5 times
higher odds of choosing “increase” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who walk as
their major transportation mode before 2020 have 57.9% lower odds of
choosing “increase” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with
upper-mid income have 47.5% lower odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P12
):
People with lower secondary education have 37.2% lower odds of choosing
“increase” than “not change” compared to people with upper secondary
education, with holding other variables constant.(Note: p_values=
0.056)P12
): People with
primary education have 39.8% lower odds of choosing “increase” than “not
change” compared to people with upper secondary education, with holding
other variables constant. (Note: p_values= 0.092)P12
): People
with technological education have 40.5% lower odds of choosing
“increase” than “not change” compared to people with upper secondary
education, with holding other variables constant.(Note: p_values=
0.093)P42
): People who use
motorcycle as their major transportation mode before 2020 have 1.3 times
higher odds of choosing “decrease” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who walk as
their major transportation mode before 2020 have 44.3% higher odds of
choosing “decrease” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with
lower-mid income have 99.5% higher odds of choosing “decrease” than “not
change” compared to people with not report their income, with holding
other variables constant.P14
): People who are
self-employed have 35.1% lower odds of choosing “decrease” than “not
change” compared to people who are unemployed, with holding other
variables constant.P14
): People who are students
have 47.5% lower odds of choosing “decrease” than “not change” compared
to people who are unemployed, with holding other variables constant.
(Including students in primary, high school, and university, which may
make this unrealiable?)Statement: Hearing pollution
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P98"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P98 <- as.factor(regressor$P98)
regressor$P98 <- relevel(regressor$P98, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P98~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 1306.277112
## iter 20 value 1257.754774
## iter 30 value 1252.786776
## iter 40 value 1252.093043
## iter 50 value 1251.925320
## iter 60 value 1251.834443
## iter 70 value 1251.784621
## final value 1251.782907
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 0.449 | -0.802 | 0.446 | -1.798 | 0.072 |
1 | pop_num | 0.922 | -0.081 | 0.048 | -1.669 | 0.095 |
1 | major_trans_2020bicycle | 1190275.539*** | 13.990 | 0.260 | 53.713 | 0.000 |
1 | major_trans_2020informal | 0.911 | -0.093 | 0.559 | -0.166 | 0.868 |
1 | major_trans_2020motorcyle | 2.035* | 0.711 | 0.358 | 1.985 | 0.047 |
1 | major_trans_2020personal_veh | 1.788 | 0.581 | 0.340 | 1.708 | 0.088 |
1 | major_trans_2020public_tansit | 0.773 | -0.258 | 0.206 | -1.254 | 0.210 |
1 | major_trans_2020taxi | 0.435* | -0.833 | 0.424 | -1.967 | 0.049 |
1 | major_trans_2020walking | 1.284 | 0.250 | 0.302 | 0.827 | 0.408 |
1 | incomeHigh | 18.64*** | 2.925 | 0.751 | 3.896 | 0.000 |
1 | incomeLow | 1.758** | 0.564 | 0.216 | 2.616 | 0.009 |
1 | incomelower-mid | 2.187*** | 0.782 | 0.191 | 4.106 | 0.000 |
1 | incomeUpper-mid | 2.086*** | 0.735 | 0.210 | 3.500 | 0.000 |
1 | edu_attLowerSecondary | 2.039*** | 0.712 | 0.204 | 3.488 | 0.000 |
1 | edu_attNA | 0.788 | -0.239 | 0.456 | -0.524 | 0.601 |
1 | edu_attPrimary | 1.8* | 0.588 | 0.252 | 2.332 | 0.020 |
1 | edu_attTechnological | 2.533*** | 0.929 | 0.246 | 3.781 | 0.000 |
1 | edu_attUniversity | 2.837*** | 1.043 | 0.229 | 4.555 | 0.000 |
1 | occupationemployed | 0.862 | -0.149 | 0.233 | -0.638 | 0.523 |
1 | occupationinformal | 0.646 | -0.437 | 0.339 | -1.289 | 0.197 |
1 | occupationNA | 0.492 | -0.708 | 0.675 | -1.050 | 0.294 |
1 | occupationself-employed | 0.808 | -0.213 | 0.214 | -0.996 | 0.319 |
1 | occupationstudent | 1.168 | 0.156 | 0.323 | 0.482 | 0.629 |
1 | age4 | 0.987 | -0.013 | 0.265 | -0.050 | 0.960 |
1 | age5 | 1.259 | 0.230 | 0.277 | 0.832 | 0.405 |
1 | age6 | 1.144 | 0.134 | 0.291 | 0.462 | 0.644 |
1 | age7 | 1.218 | 0.197 | 0.299 | 0.658 | 0.510 |
1 | age8 | 0.906 | -0.099 | 0.331 | -0.299 | 0.765 |
1 | ageunder_18 | 1.022 | 0.021 | 0.320 | 0.067 | 0.947 |
1 | gender1 | 0.959 | -0.042 | 0.138 | -0.306 | 0.760 |
1 | rent_own2 | 1.2 | 0.182 | 0.161 | 1.131 | 0.258 |
1 | live_timelong | 1.532* | 0.427 | 0.189 | 2.263 | 0.024 |
1 | live_timeshort | 1.169 | 0.156 | 0.174 | 0.899 | 0.369 |
3 | (Intercept) | 0.326* | -1.121 | 0.532 | -2.106 | 0.035 |
3 | pop_num | 0.965 | -0.035 | 0.058 | -0.613 | 0.540 |
3 | major_trans_2020bicycle | 1771347.35*** | 14.387 | 0.260 | 55.239 | 0.000 |
3 | major_trans_2020informal | 0.857 | -0.154 | 0.750 | -0.205 | 0.837 |
3 | major_trans_2020motorcyle | 0.961 | -0.040 | 0.466 | -0.085 | 0.932 |
3 | major_trans_2020personal_veh | 1.012 | 0.012 | 0.424 | 0.027 | 0.978 |
3 | major_trans_2020public_tansit | 0.937 | -0.065 | 0.250 | -0.260 | 0.795 |
3 | major_trans_2020taxi | 0.431 | -0.842 | 0.525 | -1.603 | 0.109 |
3 | major_trans_2020walking | 0.773 | -0.258 | 0.397 | -0.650 | 0.516 |
3 | incomeHigh | 26.625*** | 3.282 | 0.763 | 4.301 | 0.000 |
3 | incomeLow | 1.091 | 0.087 | 0.255 | 0.343 | 0.732 |
3 | incomelower-mid | 1.637* | 0.493 | 0.217 | 2.268 | 0.023 |
3 | incomeUpper-mid | 0.75 | -0.287 | 0.277 | -1.038 | 0.299 |
3 | edu_attLowerSecondary | 1.694* | 0.527 | 0.236 | 2.235 | 0.025 |
3 | edu_attNA | 0.194* | -1.639 | 0.795 | -2.060 | 0.039 |
3 | edu_attPrimary | 1.175 | 0.162 | 0.302 | 0.536 | 0.592 |
3 | edu_attTechnological | 1.609 | 0.476 | 0.301 | 1.580 | 0.114 |
3 | edu_attUniversity | 1.921* | 0.653 | 0.273 | 2.394 | 0.017 |
3 | occupationemployed | 1 | 0.000 | 0.275 | 0.001 | 0.999 |
3 | occupationinformal | 1.336 | 0.290 | 0.368 | 0.788 | 0.431 |
3 | occupationNA | 0.947 | -0.054 | 0.696 | -0.078 | 0.938 |
3 | occupationself-employed | 0.773 | -0.258 | 0.255 | -1.010 | 0.312 |
3 | occupationstudent | 0.961 | -0.040 | 0.407 | -0.098 | 0.922 |
3 | age4 | 0.879 | -0.129 | 0.323 | -0.398 | 0.691 |
3 | age5 | 0.976 | -0.024 | 0.339 | -0.070 | 0.944 |
3 | age6 | 1.527 | 0.423 | 0.339 | 1.249 | 0.212 |
3 | age7 | 1.446 | 0.369 | 0.356 | 1.037 | 0.300 |
3 | age8 | 1.714 | 0.539 | 0.387 | 1.395 | 0.163 |
3 | ageunder_18 | 0.939 | -0.063 | 0.410 | -0.153 | 0.879 |
3 | gender1 | 1.088 | 0.085 | 0.166 | 0.509 | 0.610 |
3 | rent_own2 | 0.99 | -0.010 | 0.194 | -0.049 | 0.961 |
3 | live_timelong | 1.141 | 0.132 | 0.229 | 0.575 | 0.565 |
3 | live_timeshort | 1.256 | 0.228 | 0.207 | 1.098 | 0.272 |
P42
): People who use
motorcycle as their major transportation mode before 2020 have 1.035
times higher odds of choosing “increase” than “not change” compared to
people who use other transportation modes, with holding other variables
constant.P42
): People who use taxi as
their major transportation mode before 2020 have 56.5% higher odds of
choosing “increase” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with high
income have 17.64 times higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant. (highly significant, may cause by perfect
sample)P50
): People with low
income have 75.8% higher odds of choosing “increase” than “not change”
compared to people with not report their income, with holding other
variables constant.P50
): People with
lower-mid income have 1.187 times higher odds of choosing “increase”
than “not change” compared to people with not report their income, with
holding other variables constant.P50
): People with
upper-mid income have 1.086 times higher odds of choosing “increase”
than “not change” compared to people with not report their income, with
holding other variables constant.P12
): People with
primary education have 80% higher odds of choosing “increase” than “not
change” compared to people with upper secondary education, with holding
other variables constant.P12
):
People with lower secondary education have 1.039 times higher odds of
choosing “increase” than “not change” compared to people with upper
secondary education, with holding other variables constant.P12
): People
with technological education have 1.533 times higher odds of choosing
“increase” than “not change” compared to people with upper secondary
education, with holding other variables constant.P12
): People
with university education have 1.837 times higher odds of choosing
“increase” than “not change” compared to people with upper secondary
education, with holding other variables constant.P83
): People who live
in the same place for a long time have 53.2% higher odds of choosing
“increase” than “not change” compared to people who live in the same
place for a medium time, with holding other variables constant.P50
): People with high
income have 25.625 times higher odds of choosing “decrease” than “not
change” compared to people with not report their income, with holding
other variables constant. (highly significant, may cause by
perfect sample)P50
): People with
lower-mid income have 63.7% higher odds of choosing “decrease” than “not
change” compared to people with not report their income, with holding
other variables constant.P12
):
People with lower secondary education have 69.4% higher odds of choosing
“decrease” than “not change” compared to people with upper secondary
education, with holding other variables constant.P12
): People
with university education have 92.1% higher odds of choosing “decrease”
than “not change” compared to people with upper secondary education,
with holding other variables constant.Statement: Public spaces (sidewalks, green areas, parks)
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P100"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P100 <- as.factor(regressor$P100)
regressor$P100 <- relevel(regressor$P100, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(
pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P100~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 1320.048476
## iter 20 value 1302.323119
## iter 30 value 1300.429399
## iter 40 value 1300.290302
## iter 50 value 1300.193184
## iter 60 value 1300.155461
## iter 70 value 1300.143468
## final value 1300.142476
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 0.747 | -0.292 | 0.472 | -0.619 | 0.536 |
1 | pop_num | 0.969 | -0.032 | 0.050 | -0.634 | 0.526 |
1 | major_trans_2020bicycle | 7.65* | 2.035 | 0.809 | 2.514 | 0.012 |
1 | major_trans_2020informal | 0.88 | -0.128 | 0.562 | -0.228 | 0.820 |
1 | major_trans_2020motorcyle | 0.726 | -0.320 | 0.340 | -0.942 | 0.346 |
1 | major_trans_2020personal_veh | 0.869 | -0.141 | 0.333 | -0.423 | 0.672 |
1 | major_trans_2020public_tansit | 0.75 | -0.287 | 0.221 | -1.300 | 0.193 |
1 | major_trans_2020taxi | 0.077*** | -2.569 | 0.651 | -3.944 | 0.000 |
1 | major_trans_2020walking | 0.682 | -0.383 | 0.315 | -1.215 | 0.225 |
1 | incomeHigh | 1.754 | 0.562 | 0.364 | 1.541 | 0.123 |
1 | incomeLow | 0.771 | -0.260 | 0.239 | -1.087 | 0.277 |
1 | incomelower-mid | 1.676* | 0.516 | 0.200 | 2.575 | 0.010 |
1 | incomeUpper-mid | 0.978 | -0.022 | 0.232 | -0.096 | 0.924 |
1 | edu_attLowerSecondary | 1.257 | 0.229 | 0.217 | 1.056 | 0.291 |
1 | edu_attNA | 1.97 | 0.678 | 0.453 | 1.498 | 0.134 |
1 | edu_attPrimary | 1.108 | 0.103 | 0.268 | 0.383 | 0.701 |
1 | edu_attTechnological | 1.478 | 0.391 | 0.259 | 1.508 | 0.132 |
1 | edu_attUniversity | 1.809* | 0.593 | 0.238 | 2.488 | 0.013 |
1 | occupationemployed | 0.896 | -0.110 | 0.234 | -0.470 | 0.638 |
1 | occupationinformal | 0.861 | -0.150 | 0.354 | -0.424 | 0.671 |
1 | occupationNA | 0.759 | -0.276 | 0.652 | -0.423 | 0.673 |
1 | occupationself-employed | 0.874 | -0.135 | 0.218 | -0.619 | 0.536 |
1 | occupationstudent | 0.741 | -0.299 | 0.344 | -0.869 | 0.385 |
1 | age4 | 0.948 | -0.054 | 0.288 | -0.187 | 0.852 |
1 | age5 | 0.957 | -0.044 | 0.296 | -0.149 | 0.882 |
1 | age6 | 0.765 | -0.268 | 0.308 | -0.868 | 0.386 |
1 | age7 | 1.168 | 0.156 | 0.313 | 0.497 | 0.619 |
1 | age8 | 0.913 | -0.091 | 0.341 | -0.268 | 0.789 |
1 | ageunder_18 | 0.854 | -0.158 | 0.345 | -0.459 | 0.646 |
1 | gender1 | 1.057 | 0.056 | 0.144 | 0.388 | 0.698 |
1 | rent_own2 | 0.766 | -0.266 | 0.167 | -1.596 | 0.111 |
1 | live_timelong | 1.402 | 0.338 | 0.193 | 1.750 | 0.080 |
1 | live_timeshort | 0.866 | -0.144 | 0.184 | -0.779 | 0.436 |
3 | (Intercept) | 0.872 | -0.137 | 0.470 | -0.291 | 0.771 |
3 | pop_num | 0.941 | -0.060 | 0.051 | -1.186 | 0.236 |
3 | major_trans_2020bicycle | 3.371 | 1.215 | 0.833 | 1.458 | 0.145 |
3 | major_trans_2020informal | 0*** | -13.733 | 0.000 | -16068365.145 | 0.000 |
3 | major_trans_2020motorcyle | 0.473* | -0.748 | 0.354 | -2.112 | 0.035 |
3 | major_trans_2020personal_veh | 0.852 | -0.160 | 0.331 | -0.484 | 0.628 |
3 | major_trans_2020public_tansit | 0.733 | -0.311 | 0.212 | -1.464 | 0.143 |
3 | major_trans_2020taxi | 0.162** | -1.820 | 0.534 | -3.406 | 0.001 |
3 | major_trans_2020walking | 0.473* | -0.748 | 0.328 | -2.278 | 0.023 |
3 | incomeHigh | 1.479 | 0.391 | 0.360 | 1.087 | 0.277 |
3 | incomeLow | 0.632* | -0.459 | 0.226 | -2.033 | 0.042 |
3 | incomelower-mid | 0.822 | -0.196 | 0.198 | -0.994 | 0.320 |
3 | incomeUpper-mid | 0.806 | -0.216 | 0.220 | -0.983 | 0.326 |
3 | edu_attLowerSecondary | 1.715* | 0.539 | 0.220 | 2.451 | 0.014 |
3 | edu_attNA | 0.684 | -0.380 | 0.679 | -0.559 | 0.576 |
3 | edu_attPrimary | 1.47 | 0.385 | 0.279 | 1.382 | 0.167 |
3 | edu_attTechnological | 1.99** | 0.688 | 0.258 | 2.667 | 0.008 |
3 | edu_attUniversity | 2.246** | 0.809 | 0.240 | 3.375 | 0.001 |
3 | occupationemployed | 0.823 | -0.195 | 0.246 | -0.793 | 0.428 |
3 | occupationinformal | 1.196 | 0.179 | 0.343 | 0.523 | 0.601 |
3 | occupationNA | 1.017 | 0.017 | 0.735 | 0.023 | 0.981 |
3 | occupationself-employed | 1.198 | 0.180 | 0.227 | 0.793 | 0.427 |
3 | occupationstudent | 1.451 | 0.372 | 0.342 | 1.088 | 0.277 |
3 | age4 | 1.083 | 0.079 | 0.269 | 0.296 | 0.767 |
3 | age5 | 0.774 | -0.256 | 0.286 | -0.895 | 0.371 |
3 | age6 | 0.768 | -0.264 | 0.296 | -0.890 | 0.373 |
3 | age7 | 0.964 | -0.036 | 0.310 | -0.117 | 0.907 |
3 | age8 | 0.626 | -0.468 | 0.352 | -1.330 | 0.184 |
3 | ageunder_18 | 0.375** | -0.981 | 0.349 | -2.811 | 0.005 |
3 | gender1 | 0.867 | -0.143 | 0.143 | -0.996 | 0.319 |
3 | rent_own2 | 1.243 | 0.218 | 0.170 | 1.282 | 0.200 |
3 | live_timelong | 1.039 | 0.038 | 0.201 | 0.190 | 0.849 |
3 | live_timeshort | 1.009 | 0.009 | 0.176 | 0.052 | 0.959 |
P42
): People who use bicycle
as their major transportation mode before 2020 have 6.65 times higher
odds of choosing “increase” than “not change” compared to people who use
other transportation modes, with holding other variables constant.P42
): People who use taxi as
their major transportation mode before 2020 have 92.3% higher odds of
choosing “increase” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with
lower-mid income have 67.6% higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P12
): People
with University education have 80.9% higher odds of choosing “increase”
than “not change” compared to people with upper secondary education,
with holding other variables constant.P42
): People who use
motorcycle as their major transportation mode before 2020 have 52.7%
lower odds of choosing “decrease” than “not change” compared to people
who use other transportation modes, with holding other variables
constant.P42
): People who use taxi as
their major transportation mode before 2020 have 83.8% lower odds of
choosing “decrease” than “not change” compared to people who use other
transportation modes, with holding other variables constant.
walking (P42
): People who walk as their
major transportation mode before 2020 have 52.7% lower odds of choosing
“decrease” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with low
income have 36.8% lower odds of choosing “decrease” than “not change”
compared to people with not report their income, with holding other
variables constant.P12
):
People with lower secondary education have 71.5% higher odds of choosing
“decrease” than “not change” compared to people with upper secondary
education, with holding other variables constant.P12
): People
with technological education have 99% higher odds of choosing “decrease”
than “not change” compared to people with upper secondary education,
with holding other variables constant.P12
): People
with University education have 1.246 times higher odds of choosing
“decrease” than “not change” compared to people with upper secondary
education, with holding other variables constant.Edad
): People under 18
years old have 62.5% lower odds of choosing “decrease” than “not change”
compared to people who are 18-24 years old, with holding other variables
constant. (Note: same issues)Statement: New housing projects
trips <- readRDS("data/008-24 BBDD Procesamiento Etapas.rds")
hog <- readRDS("data/008-24 BBDD Procesamiento Hogares.rds")
per <- readRDS("data/008-24 BBDD Procesamiento Personas.rds")
per_complt <- per %>%
left_join(hog,by="ID_Hogar")
dependent_variable<- "P101"
independent_variables <- c("P3", "P42",
"P50", "P12", "P14",
"Edad", "P10", "P12","P13","P15", "P14", "P82", "P83")
regressor<- per_complt %>%
select(all_of(dependent_variable), all_of(independent_variables))
regressor <- regressor %>%
mutate(
across(
where(is.labelled), # pick all haven_labelled columns
~ zap_labels(.) # strip off the labels, leaving the underlying numeric
)
)
regressor$P101 <- as.factor(regressor$P101)
regressor$P101 <- relevel(regressor$P101, ref = "2") # Relevel to set the reference category
regressor<-regressor%>%
rename(pop_num=P3,
major_trans_2020=P42,
income= P50,
rent_own= P82,
live_time= P83
)
regressor<-regressor%>%
rename(edu_att= P12,
occupation= P14,
gender= P10,
age= Edad
)
regressor$rent_own<- as.factor(regressor$rent_own) # own =1, rent =2
regressor$rent_own <- relevel(regressor$rent_own, ref = "1") # Relevel to set the reference category
regressor$gender <- as.factor(regressor$gender) #female =1, male=2
regressor$gender <- relevel(regressor$gender, ref = "2") # Relevel to set the reference category
regressor$edu_att <- dplyr::case_when(
regressor$edu_att %in% c(1, 2, 3) ~ "Primary",
regressor$edu_att %in% c(4, 5) ~ "LowerSecondary",
regressor$edu_att %in% c(6, 7) ~ "UpperSecondary",
regressor$edu_att %in% c(8, 9) ~ "Technological",
regressor$edu_att %in% c(10, 11, 12, 13) ~ "University",
regressor$edu_att == 97 ~ "NA",
)
regressor$edu_att <- as.factor(regressor$edu_att)
regressor$edu_att <- relevel(regressor$edu_att, ref = "UpperSecondary") # Relevel to set the reference category
regressor<-regressor %>%
mutate(major_trans_2020= case_when(
major_trans_2020 %in% c(1,2,3,4,5,6,10,16) ~ "public_tansit",
major_trans_2020 %in% c(7,8,9) ~ "informal",
major_trans_2020 %in% c(11,12) ~ "taxi",
major_trans_2020 %in% c(22,23) ~ "personal_veh",
major_trans_2020 %in% c(24,25) ~"motorcyle",
major_trans_2020 %in% c(25,27,28,17) ~ "bicycle",
major_trans_2020==34 ~ "walking",
TRUE ~ "other"
))
regressor$major_trans_2020<-as.factor(regressor$major_trans_2020)
regressor$major_trans_2020<-relevel(regressor$major_trans_2020,ref = "other")
regressor <- regressor %>%
mutate(
# 1) if P13 not NA, take P13, otherwise keep original P14
occupation = if_else(!is.na(P13), as.character(P13), as.character(occupation)),
# 2) if P15 not NA, paste it to the (possibly updated) P14; else leave as is
occupation = if_else(
!is.na(P15),
paste(occupation, P15, sep = " / "), # use whatever separator you like
occupation
)
)
regressor$occupation <- str_remove_all(regressor$occupation, "(^NA\\s*/\\s*)|(\\s*/\\s*NA$)")
regressor<-regressor%>%
mutate(occupation= as.numeric(occupation)) %>%
select(-P13, -P15)
regressor<-regressor%>%
mutate(occupation= case_when(
occupation %in% c(1,2,3,4,5,22) ~ "student",
occupation %in% c(11,12) ~ "employed",
occupation %in% c(13,14,15,16) ~ "self-employed",
occupation %in% c(6,7,8,9,17) ~ "informal",
occupation == 97 ~ "NA",
TRUE ~ "Other-unemployed"
))
regressor$occupation <- as.factor(regressor$occupation)
regressor$occupation <- relevel(regressor$occupation, ref = "Other-unemployed") # Relevel to set the reference category
regressor<-regressor%>%
mutate(income= case_when(
income %in% c(1,2,3) ~ "Low",
income %in% c(4,5,6) ~ "lower-mid",
income %in% c(7,8) ~ "Upper-mid",
income %in% c(9,10,11) ~ "High",
TRUE ~ "Other"
))%>%
mutate(income = as.factor(income))
regressor$income <- relevel(regressor$income, ref = "Other") # Relevel to set the reference category
regressor<-regressor%>%
mutate(live_time= case_when(
live_time %in% c(1,2) ~ "short",
live_time %in% c(3,4) ~ "medium",
live_time %in% c(5,6) ~ "long",
TRUE ~ "NA"
)) %>%
mutate(live_time = as.factor(live_time))
regressor$live_time <- relevel(regressor$live_time, ref = "medium") # Relevel to set the reference category
regressor <- regressor %>%
mutate(
age = if_else(
age %in% c(1,2),
"under_18",
as.character(age) # keeps the original age for everyone else
)
)%>%
mutate(age = as.factor(age))
model_house<-multinom(P101~.,data=regressor)
## # weights: 102 (66 variable)
## initial value 1418.308465
## iter 10 value 1197.424483
## iter 20 value 1136.881800
## iter 30 value 1131.476923
## iter 40 value 1131.125872
## iter 50 value 1131.088156
## iter 60 value 1131.043201
## iter 70 value 1131.035149
## final value 1131.034791
## converged
z<-summary(model_house)$coefficients/summary(model_house)$standard.errors
p_values<- (1 - pnorm(abs(z), 0, 1)) * 2
# 1. grab raw summary
s <- summary(model_house)
coef_mat<- s$coefficients
se_mat <- s$standard.errors
# 2. compute z-scores & p-values
z_mat <- coef_mat / se_mat
p_mat <- 2 * pnorm(-abs(z_mat))
# 3. pivot to long form
df_coef <- as.data.frame(coef_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="Coef")
df_se <- as.data.frame(se_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="SE")
df_z <- as.data.frame(z_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="z")
df_p <- as.data.frame(p_mat) %>%
rownames_to_column("Outcome") %>%
pivot_longer(-Outcome, names_to="Predictor", values_to="p.value")
# 4. join and format, adding stars
results <- df_coef %>%
left_join(df_se, by=c("Outcome","Predictor")) %>%
left_join(df_z, by=c("Outcome","Predictor")) %>%
left_join(df_p, by=c("Outcome","Predictor")) %>%
mutate(
OR = exp(Coef),
across(c(Coef, SE, z, OR, p.value), ~ round(., 3)),
stars = case_when(
p.value < 0.001 ~ "***",
p.value < 0.01 ~ "**",
p.value < 0.05 ~ "*",
TRUE ~ ""
),
OR = paste0(OR, stars)
) %>%
select(Outcome, Predictor, OR, Coef, SE, z, p.value)
# 5. render as styled HTML
kable(
results,
format = "html",
table.attr = 'class="table table-striped"',
col.names = c("Outcome", "Predictor", "OR", "Coef", "SE", "z-score", "p-value"),
caption = "Multinomial logit: Odds Ratios (with significance), Coefs, SEs, z-scores & p-values"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE
)
Outcome | Predictor | OR | Coef | SE | z-score | p-value |
---|---|---|---|---|---|---|
1 | (Intercept) | 0.971 | -0.029 | 0.424 | -0.069 | 0.945 |
1 | pop_num | 0.905* | -0.100 | 0.046 | -2.178 | 0.029 |
1 | major_trans_2020bicycle | 0.748 | -0.290 | 0.565 | -0.513 | 0.608 |
1 | major_trans_2020informal | 1.452 | 0.373 | 0.612 | 0.609 | 0.542 |
1 | major_trans_2020motorcyle | 0.978 | -0.022 | 0.317 | -0.069 | 0.945 |
1 | major_trans_2020personal_veh | 0.472* | -0.751 | 0.305 | -2.461 | 0.014 |
1 | major_trans_2020public_tansit | 0.662* | -0.413 | 0.200 | -2.065 | 0.039 |
1 | major_trans_2020taxi | 0.336** | -1.090 | 0.412 | -2.646 | 0.008 |
1 | major_trans_2020walking | 0.629 | -0.464 | 0.286 | -1.622 | 0.105 |
1 | incomeHigh | 2.16* | 0.770 | 0.323 | 2.384 | 0.017 |
1 | incomeLow | 1.977** | 0.682 | 0.208 | 3.276 | 0.001 |
1 | incomelower-mid | 2.332*** | 0.847 | 0.175 | 4.835 | 0.000 |
1 | incomeUpper-mid | 2.033*** | 0.709 | 0.199 | 3.562 | 0.000 |
1 | edu_attLowerSecondary | 1.5* | 0.406 | 0.192 | 2.114 | 0.034 |
1 | edu_attNA | 1.323 | 0.280 | 0.456 | 0.613 | 0.540 |
1 | edu_attPrimary | 1.081 | 0.078 | 0.241 | 0.324 | 0.746 |
1 | edu_attTechnological | 1.93** | 0.658 | 0.234 | 2.805 | 0.005 |
1 | edu_attUniversity | 1.754** | 0.562 | 0.212 | 2.654 | 0.008 |
1 | occupationemployed | 0.551** | -0.596 | 0.222 | -2.682 | 0.007 |
1 | occupationinformal | 0.677 | -0.389 | 0.319 | -1.221 | 0.222 |
1 | occupationNA | 0.435 | -0.834 | 0.610 | -1.366 | 0.172 |
1 | occupationself-employed | 0.493** | -0.707 | 0.207 | -3.416 | 0.001 |
1 | occupationstudent | 0.932 | -0.071 | 0.313 | -0.227 | 0.821 |
1 | age4 | 1.366 | 0.312 | 0.246 | 1.270 | 0.204 |
1 | age5 | 1.713* | 0.538 | 0.259 | 2.081 | 0.037 |
1 | age6 | 2.152** | 0.767 | 0.271 | 2.831 | 0.005 |
1 | age7 | 1.251 | 0.224 | 0.278 | 0.806 | 0.420 |
1 | age8 | 1.333 | 0.287 | 0.309 | 0.930 | 0.353 |
1 | ageunder_18 | 1.317 | 0.275 | 0.309 | 0.892 | 0.373 |
1 | gender1 | 1.165 | 0.152 | 0.130 | 1.176 | 0.240 |
1 | rent_own2 | 1.15 | 0.140 | 0.152 | 0.921 | 0.357 |
1 | live_timelong | 1.819** | 0.598 | 0.182 | 3.294 | 0.001 |
1 | live_timeshort | 1.029 | 0.029 | 0.160 | 0.181 | 0.856 |
3 | (Intercept) | 0.12** | -2.121 | 0.714 | -2.971 | 0.003 |
3 | pop_num | 0.931 | -0.071 | 0.072 | -0.984 | 0.325 |
3 | major_trans_2020bicycle | 2.837 | 1.043 | 0.808 | 1.290 | 0.197 |
3 | major_trans_2020informal | 0*** | -11.810 | 0.000 | -2463238.940 | 0.000 |
3 | major_trans_2020motorcyle | 0.483 | -0.728 | 0.821 | -0.887 | 0.375 |
3 | major_trans_2020personal_veh | 1.606 | 0.474 | 0.492 | 0.964 | 0.335 |
3 | major_trans_2020public_tansit | 1.59 | 0.464 | 0.367 | 1.262 | 0.207 |
3 | major_trans_2020taxi | 1.166 | 0.154 | 0.619 | 0.249 | 0.803 |
3 | major_trans_2020walking | 0.953 | -0.048 | 0.525 | -0.091 | 0.927 |
3 | incomeHigh | 1.527 | 0.423 | 0.540 | 0.783 | 0.434 |
3 | incomeLow | 2.906*** | 1.067 | 0.304 | 3.508 | 0.000 |
3 | incomelower-mid | 1.589 | 0.463 | 0.289 | 1.605 | 0.109 |
3 | incomeUpper-mid | 1.226 | 0.204 | 0.343 | 0.595 | 0.552 |
3 | edu_attLowerSecondary | 1.046 | 0.045 | 0.299 | 0.151 | 0.880 |
3 | edu_attNA | 0.583 | -0.540 | 0.831 | -0.649 | 0.516 |
3 | edu_attPrimary | 1.01 | 0.010 | 0.363 | 0.029 | 0.977 |
3 | edu_attTechnological | 1.348 | 0.298 | 0.370 | 0.807 | 0.420 |
3 | edu_attUniversity | 1.174 | 0.160 | 0.338 | 0.475 | 0.635 |
3 | occupationemployed | 0.409* | -0.894 | 0.360 | -2.483 | 0.013 |
3 | occupationinformal | 0.799 | -0.225 | 0.468 | -0.480 | 0.631 |
3 | occupationNA | 0.681 | -0.385 | 0.888 | -0.433 | 0.665 |
3 | occupationself-employed | 0.521* | -0.652 | 0.316 | -2.061 | 0.039 |
3 | occupationstudent | 1.398 | 0.335 | 0.476 | 0.703 | 0.482 |
3 | age4 | 1.173 | 0.160 | 0.422 | 0.379 | 0.705 |
3 | age5 | 2.561* | 0.941 | 0.428 | 2.200 | 0.028 |
3 | age6 | 1.85 | 0.615 | 0.457 | 1.348 | 0.178 |
3 | age7 | 1.629 | 0.488 | 0.457 | 1.069 | 0.285 |
3 | age8 | 1.086 | 0.083 | 0.511 | 0.162 | 0.871 |
3 | ageunder_18 | 1.28 | 0.247 | 0.458 | 0.538 | 0.591 |
3 | gender1 | 0.969 | -0.031 | 0.206 | -0.152 | 0.880 |
3 | rent_own2 | 1.073 | 0.070 | 0.245 | 0.286 | 0.775 |
3 | live_timelong | 2.216** | 0.796 | 0.284 | 2.802 | 0.005 |
3 | live_timeshort | 1.134 | 0.126 | 0.268 | 0.471 | 0.638 |
P3
): For every one
additional person in household, the odds of responding “increase” than
“no change” decreased by 9.5% for every additional person in the
household, with holding other variables constant.P42
): People who use
personal vehicle as their major transportation mode before 2020 have
52.8% lower odds of choosing “increase” than “not change” compared to
people who use other transportation modes, with holding other variables
constant.P42
): People who use public
transit as their major transportation mode before 2020 have 33.8% lower
odds of choosing “increase” than “not change” compared to people who use
other transportation modes, with holding other variables constant.P42
): People who use taxi as
their major transportation mode before 2020 have 66.4% lower odds of
choosing “increase” than “not change” compared to people who use other
transportation modes, with holding other variables constant.P50
): People with high
income have 1.16 times higher odds of choosing “increase” than “not
change” compared to people with not report their income, with holding
other variables constant.P50
): People with
upper-mid income have 1.033 times higher odds of choosing “increase”
than “not change” compared to people with not report their income, with
holding other variables constant.P50
): People with
lower-mid income have 1.332 times higher odds of choosing “increase”
than “not change” compared to people with not report their income, with
holding other variables constant.P50
): People with low
income have 97.7% higher odds of choosing “increase” than “not change”
compared to people with not report their income, with holding other
variables constant.P12
): People with
lower secondary education have 50% higher odds of choosing “increase”
than “not change” compared to people with upper secondary education,
with holding other variables constant.P50
): People with low
income have 1.906 times higher odds of choosing “not change” than
“decrease” compared to people with not report their income, with holding
other variables constant.P14
): People who are
employed have 59.1% lower odds of choosing “not change” than “decrease”
compared to people who are unemployed, with holding other variables
constant.Edad
): People who are within
age5
have 1.5 times higher odds of choosing “not change”
than “decrease” compared to people who are 18-25 years old, with holding
other variables constant.P83
): People who have
lived in their current residence for a long time have 1.216 times higher
odds of choosing “not change” than “decrease” compared to people who
have lived in their current residence for a medium time, with holding
other variables constant.If the majority of the people prefer one answer to the question, specifically, then this predictor is not statistically significant. (For P87, whether people think the housing values or rent would increase, if the majority of people, regardless of the type of house they live in, all think the housing price would rise, then the housing type predictor is not a great predictor.) Here, I want to get a sense of whether this question is more subjective or objective. However, I did identify some key factors that affect people’s responses. Understanding the nature of the question would help determine whether I should further explore the unreliable predictors or leave them as they are.
The second problem arises because I treated all predictor categories as unordered factors. From a statistical perspective, it can only interpret the results as indicating that one specific group of the answers (e.g., people with a university degree or higher) tends to be more optimistic than the reference group (e.g., people with primary school degrees). Still, it does not necessarily prove that people with higher degrees are more likely to be optimistic as well, because the statistics may show that those with secondary degrees may be less confident than those with higher degrees. To identify a trend, we may want to try a different approach to handle the predictors. Please let me know what you think. Thanks.
The last question is about whether multinomial regression is the greatest approach. The multinomial regression is ideal to predict an unordered categorical variable, but not an ordered categorical variable, because it uses one category (in this case, “unchanged” as baseline). The interpretation for the coefficient would be “The log-odds of responding ‘increase’ (or ‘decrease’) vs ‘not change’ are the beta coefficient higher for categories one compared to the reference categories.” Initially, I thought “Increase”, “unchanged”, and “decrease” as unordered categorical variables, simply meaning they are viewed as separate categories, disregarding the sequence here.
The ordered categorical variable is similar to students’ grades, where A (90-100) is the highest, followed by B (80-90), C (70-80), and D (60-70). They are ordered categorical variables, but not numeric. Eugene doubted my approach to treating increases, unchanged, and decreases as unrelated. If we want to recover a trend rather than log likelihood, we will need to use a different regression model – the ordinal regression model. However, the multinomial regression results remain valid and may be helpful in the future.
Considering changing the reference category, may be much useful coefficient (middle level)- average level of Bogota.
Other thoughts to improve the model?