A Reproducible Pipeline for Generating Hexagonal Grids of Brazilian Municipalities

Author

Flávio Soares, Clara Penz, Daniel Vartanian, Camila Nastari Fernandes & Mariana Abrantes Giannotti

Published

October 26, 2025

Project Status: Active – The project has reached a stable, usable state and is being actively developed. License: GPLv3 License: CC BY-NC-SA 4.0

This pipeline is a Work In Progress (WIP) and is under active development. It may not yet be stable or suitable for public use. Please use it with caution and report any issues you encounter.

Overview

This report presents a reproducible pipeline for generating hexagonal grids of Brazilian municipalities. The pipeline was developed in the R programming language by Flávio Soares and Clara Penz, with further adaptations by Daniel Vartanian.

For instructions on how to run the pipeline, see the repository README.

Problem

Data Availability

The processed data are available in both rds and parquet formats through a dedicated repository on the Open Science Framework (OSF). A metadata file is included alongside the validated datasets.

Because the raw data are not publicly available, only authorized personnel can access the processed files. They are protected with RSA 4096-bit encryption (OpenSSL) and a 32-byte password to ensure data security.

If you already have access to the OSF repository and the project keys, click here to access the data. A backup copy of the raw data is also stored on OSF and can be accessed here. You can also retrieve these files directly from R using the osfr package.

Methods

Source of Data

The data used in this report come from the following sources:

  • Brazilian Institute of Geography and Statistics (IBGE): Data from the 2022 Brazilian Census.
  • Mapzen Terrain Tile: Data on terrain and elevation.
  • OpenStreetMap (OSM): Geospatial data on highways, roads, and other infrastructure for Brazilian municipalities.
  • OpenTopography: Data on topography and elevation.

Data Munging

The data munging follow the data science workflow outlined by Wickham et al. (2023), as illustrated in Figure 1. All processes were made using the Quarto publishing system (Allaire et al., n.d.), the R programming language (R Core Team, n.d.) and several R packages.

For data manipulation and workflow, priority was given to packages from the tidyverse, rOpenSci and r-spatial ecosystems, as well as other packages adhering to the tidy tools manifesto (Wickham, 2023).

Figure 1: Data science workflow created by Wickham, Çetinkaya-Runde, and Grolemund.

Source: Reproduced from Wickham et al. (2023).

Code Style

The Tidyverse code style guide and design principles were followed to ensure consistency and enhance readability.

Reproducibility

The pipeline is fully reproducible and can be run again at any time. To ensure consistent results, the renv package (Ushey & Wickham, n.d.) is used to manage and restore the R environment. See the README file in the code repository to learn how to run it.

Set the Environment

Load Packages

Set Keys

osf_pat <- Sys.getenv("OSF_PAT") # askpass()
osf_auth(osf_pat)
public_key <- here("_ssh", "id_rsa.pub")
private_key <- here("_ssh", "id_rsa")
password <- Sys.getenv("ACESSOSAN_PASSWORD") # askpass()

Set Input and Output Paths

dir_inputs <- here("1-inputs")
dir_parcial <- here("2-parcial")
for (i in c(dir_inputs, dir_parcial)) {
  if (!dir_exists(i)) {
    dir_create(i, recurse = TRUE)
  }
}

Set Municipality Data

municipios <- c(
  3550308, # São Paulo
  2507507, # João Pessoa
  3106200, # Belo Horizonte
  4314902, # Porto Alegre
  1721000, # Palmas
  5300108, # Brasília
  5208707  # Goiânia
)

Set Initial Variables

set.seed(2025)

Download IBGE Census Data

Download File

osf_raw_data_id <- "zuy4s"
osf_raw_data_files <-
  osf_raw_data_id |>
  osf_retrieve_node() |>
  osf_ls_files(
    type = "file",
    pattern = "censo2022_hex",
    n_max = Inf
  )

osf_raw_data_files
ibge_2022_census_hex_file <-
  osf_raw_data_files |>
  osf_download(path = dir_inputs, conflicts = "overwrite") |>
  extract2("local_path")

Unlock File

ibge_2022_census_hex_file <-
  ibge_2022_census_hex_file |>
  unlock_file(
    private_key = private_key,
    suffix = ".lockr",
    remove_file = TRUE,
    password = password
  )

Download Brazil OSM Data

file.path(
  "https://download.geofabrik.de",
  "south-america",
  "brazil-latest.osm.pbf"
) |>
  curl_download(
    destfile = here(dir_inputs, "brazil-latest.osm.pbf"),
    quiet = FALSE
  )
osm_brazil_latest_file <- here(dir_inputs, "brazil-latest.osm.pbf")

01.01-criar_malha_hexagonal_areas_total_urbana.R

Cria malhas hexagonais para os municípios a serem analisados.

Aqui teria 2 etapas que estão faltando no script:

  1. Trabalhar a área urbanizada do IBGE.
  2. Gerar hexágonos da área urbanizada.

Ler Hexágonos Urbanizados do Brasil com Dados do Censo 2022

hexurb <-
  ibge_2022_census_hex_file |>
  read_delim(delim = ",") |>
  # Filtrar fora linhas com somente 0 em todas as variáveis exceto `h3_address`.
  filter(!if_all(-h3_address, \(x) x == 0))

hexurb |> glimpse()

Criar Malha Hexagonal e Separar entre Urbano e Não-Urbano

for (cod in municipios) {
  # Baixar geometria do município.
  municipio_geom <- read_municipality(code_muni = cod, year = 2020)

  # Converter polígono para células H3.
  hex <- polygon_to_cells(geometry = municipio_geom$geom, res = 9)

  # Converter células H3 de volta para polígonos.
  hexgrid <- cell_to_polygon(input = hex, simple = FALSE)

  print(paste("Malha hexagonal criada para", cod))

  # Separar hexágonos urbanizados.
  hex_urb_mun <-
    hexgrid |>
    left_join(hexurb, by = "h3_address") |>
    filter(if_all(-h3_address, ~ !is.na(.x))) |>
    mutate(across(where(is.numeric) & !any_of("h3_address"), abs))

  print("Filtragem realizada")

  # Criar diretório.
  dir_hex <- file.path(dir_parcial, cod, "hex")
  dir.create(dir_hex, showWarnings = FALSE, recursive = TRUE)

  # Salvar arquivos.
  # Total
  st_write_parquet(hexgrid, file.path(dir_hex, "hex.parquet"))
  # Urbanizado
  st_write_parquet(hex_urb_mun, file.path(dir_hex, "hex_urbanizado.parquet"))
}

01.02-processar_elevation.R

Criar Arquivo .tiff para a Área Urbanizada de Cada Município

for (cod in municipios) {
  # Definir caminho do arquivo `.tiff`.
  elevation_path <- file.path(dir_parcial, cod, "elevation.tiff")

  # Ler `hexgrid` do município
  hexgrid <-
    file.path(
      dir_parcial,
      cod,
      "hex",
      "hex_urbanizado.parquet"
    ) |>
    st_read_parquet()

  # Criar raster de elevação (zoom `z=13`)
  elev_raster <-
    hexgrid |>
    get_elev_raster(
      z = 13,
      override_size_check = TRUE
    )

  # Salvar .tiff
  writeRaster(elev_raster, elevation_path, overwrite = TRUE)

  print(paste("Arquivo .tiff criado para município", cod))
}

01.03-processar_osm.R

Criar Malha de Transporte para a Área Urbanizada de Cada Município

for (cod in municipios) {
  print(paste("Processando", cod))

  # Definir diretório de saída.
  dir_mun <- file.path(dir_parcial, cod)
  dir.create(dir_mun, showWarnings = FALSE, recursive = TRUE)

  # Bounding box do município.
  mun_hex <-
    file.path(
      dir_mun,
      "hex",
      "hex_urbanizado.parquet"
    ) |>
    st_read_parquet()

  mun_bbox <- st_bbox(mun_hex)
  br_pbf <- osm_brazil_latest_file
  mun_pbf <- file.path(dir_mun, "redeviaria.osm.pbf")

  # Executa Osmosis.
  tic(msg = paste("Extraindo malha OSM para", cod))
  system2(
    "osmosis",
    args = c(
      paste("--read-pbf", br_pbf),
      "--bounding-box",
      paste0("left=", mun_bbox["xmin"]),
      paste0("bottom=", mun_bbox["ymin"]),
      paste0("right=", mun_bbox["xmax"]),
      paste0("top=", mun_bbox["ymax"]),
      paste("--write-pbf", mun_pbf)
    )
  )
  toc()
}

Citation

When using this data, you must also cite the original data sources.

To cite this work, please use the following format:

Soares, F., Penz, C., Vartanian, D., Fernandes, C. N., & Giannotti, M. A. (2025). A reproducible pipeline for generating hexagonal grids of Brazilian municipalities [Computer software]. Center for Metropolitan Studies of the University of São Paulo. https://cem-usp.github.io/brazil-hexagonal-grid

A BibTeX entry for LaTeX users is

@software{soares2025,
  title = {A reproducible pipeline for generating hexagonal grids of Brazilian municipalities},
  author = {{Flávio Soares} and {Clara Penz} and {Daniel Vartanian} and {Camila Nastari Fernandes} and {Mariana Abrantes Giannotti}},
  year = {2025},
  address = {São Paulo},
  institution = {Center for Metropolitan Studies of the University of São Paulo},
  langid = {en},
  url = {https://cem-usp.github.io/brazil-hexagonal-grid}
}

License

License: GPLv3 License: CC BY-NC-SA 4.0

The original data sources may be subject to their own licensing terms and conditions.

The code in this report is licensed under the GNU General Public License Version 3, while the report is available under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Copyright (C) 2025 Center for Metropolitan Studies

The code in this report is free software: you can redistribute it and/or
modify it under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your option)
any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program. If not, see <https://www.gnu.org/licenses/>.

Acknowledgments

AcessoSAN Logo
This work is part of a research project by the Polytechnic School (Poli) of the University of São Paulo (USP), in partnership with the Secretariat for Food and Nutrition Security (SESAN) of the Ministry of Social Development, Family, and the Fight Against Hunger (MDS), titled: AcessoSAN: Mapping Food Access to Support Public Policies on Food and Nutrition Security and Hunger Reduction in Brazilian Cities.
CEM Logo
This work was developed with support from the Center for Metropolitan Studies (CEM) based at the School of Philosophy, Letters and Human Sciences (FFLCH) of the University of São Paulo (USP) and at the Brazilian Center for Analysis and Planning (CEBRAP).
FAPESP Logo
This study was financed, in part, by the São Paulo Research Foundation (FAPESP), Brazil. Process Number 2025/17879-2.

References

Allaire, J. J., Teague, C., Xie, Y., & Dervieux, C. (n.d.). Quarto [Computer software]. Zenodo. https://doi.org/10.5281/ZENODO.5960048
R Core Team. (n.d.). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org
Ushey, K., & Wickham, H. (n.d.). renv: Project environments [Computer software]. https://doi.org/10.32614/CRAN.package.renv
Wickham, H. (2023). The tidy tools manifesto. Tidyverse. https://tidyverse.tidyverse.org/articles/manifesto.html
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for data science: Import, tidy, transform, visualize, and model data (2nd ed.). O’Reilly Media. https://r4ds.hadley.nz