Help for package EconGeo

Title:

Computing Key Indicators of the Spatial Distribution of Economic Activities

Version:

2.0

Date:

2023-06-22

Description:

Computes a series of indices commonly used in the fields of economic geography, economic complexity, and evolutionary economics to describe the location, distribution, spatial organization, structure, and complexity of economic activities. Functions include basic spatial indicators such as the location quotient, the Krugman specialization index, the Herfindahl or the Shannon entropy indices but also more advanced functions to compute different forms of normalized relatedness between economic activities or network-based measures of economic complexity. Most of the functions use matrix calculus and are based on bipartite (incidence) matrices consisting of region - industry pairs. These are described in Balland (2017) http://econ.geo.uu.nl/peeg/peeg1709.pdf.

URL:

https://github.com/PABalland/EconGeo

Depends:

R (≥ 3.3.1)

Imports:

Matrix, reshape

License:

GPL-2 | GPL-3

Encoding:

UTF-8

RoxygenNote:

7.2.3

BugReports:

https://github.com/PABalland/EconGeo/issues

NeedsCompilation:

Packaged:

2023-06-24 12:02:47 UTC; admin

Author:

Pierre-Alexandre Balland [aut, cre, cph]

Maintainer:

Pierre-Alexandre Balland <p.balland@uu.nl>

Repository:

CRAN

Date/Publication:

2023-06-26 12:00:05 UTC

Compute the number of co-occurrences between industry pairs from an incidence (industry - event) matrix

Description

This function computes the number of co-occurrences between industry pairs from an incidence (industry - event) matrix

Usage

co_occurrence(mat, diagonal = FALSE, list = FALSE)

Arguments

mat

An incidence matrix with industries in rows and events in columns

diagonal

Logical; shall the values in the diagonal of the co-occurrence matrix be included in the output? Defaults to FALSE (values in the diagonal are set to 0), but can be set to TRUE (values in the diagonal reflects in how many events a single industry can be found)

list

Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list

Value

The co-occurrence matrix as an R matrix object.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a region - events matrix
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 5)
rownames(mat) <- c("I1", "I2", "I3", "I4")
colnames(mat) <- c("US1", "US2", "US3", "US4", "US5")

## run the function
co_occurrence(mat)
co_occurrence(mat, diagonal = TRUE)

## generate a regular data frame (list)
my_list <- get_list(mat)

## run the function
co_occurrence(my_list, list = TRUE)
co_occurrence(my_list, list = TRUE, diagonal = TRUE)

Compute a simple measure of diversity of regions

Description

This function computes a simple measure of diversity of regions by counting the number of industries in which a region has a relative comparative advantage (location quotient > 1) from regions - industries (incidence) matrices

Usage

diversity(mat, rca = FALSE)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

rca

Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed

Value

A numeric vector representing the share of a tech in a city's portfolio

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.

Examples

## generate a region - industry matrix with full count
set.seed(31)
mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
diversity(mat, rca = TRUE)

## generate a region - industry matrix in which cells represent the presence/absence of a RCA
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
diversity(mat)

Compute the ease of recombination of a given technological class

Description

This function computes the ease of recombination of a given technological class from technological classes - patents (incidence) matrices

Usage

ease_recombination(mat, sparse = FALSE, list = FALSE)

Arguments

mat

A bipartite adjacency matrix (can be a sparse matrix)

sparse

Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix

list

Logical; is the input a list? Defaults to FALSE, but can be set to TRUE if the input matrix is a list

Value

A data frame with two columns: "tech" representing the technological class and "eor" representing the ease of recombination of the technological class

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039

Examples

## generate a technology - patent matrix
set.seed(31)
mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5)
rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6")
colnames(mat) <- c("US1", "US2", "US3", "US4", "US5")

## generate a technology - patent sparse matrix
library(Matrix)
smat <- Matrix(mat, sparse = TRUE)

## run the function
ease_recombination(mat)
ease_recombination(smat, sparse = TRUE)

## generate a regular data frame (list)
my_list <- get_list(mat)

## run the function
ease_recombination(my_list, list = TRUE)

Compute the Shannon entropy index from regions - industries matrices

Description

This function computes the Shannon entropy index from regions - industries matrices from (incidence) regions - industries matrices

Usage

entropy(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A numeric vector representing the Shannon entropy index computed from the regions - industries matrix

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Shannon, C.E., Weaver, W. (1949) The Mathematical Theory of Communication. Univ of Illinois Press.

Frenken, K., Van Oort, F. and Verburg, T. (2007) Related variety, unrelated variety and regional economic growth, Regional studies 41 (5): 685-697.

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
entropy(mat)

Generate a data frame of entry events from multiple regions - industries matrices (same matrix composition for the different periods)

Description

This function generates a data frame of entry events from multiple regions - industries matrices (different matrix compositions are allowed). In this function, the maximum number of periods is limited to 20.

Usage

entry_list(...)

Arguments

...

Incidence matrices with regions in rows and industries in columns (period ... - optional)

Value

A data frame representing the entry events from multiple regions - industries matrices, with columns "region" (representing the region), "industry" (representing the industry), "entry" (representing the entry event), and "period" (representing the period)

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl
Wolf-Hendrik Uhlbach w.p.uhlbach@students.uu.nl

References

Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250

Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114

Examples

## generate a first region - industry matrix in which cells represent the presence/absence
## of a RCA (period 1)
set.seed(31)
mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix in which cells represent the presence/absence
## of a RCA (period 2)
mat2 <- mat1
mat2[3, 1] <- 1

## run the function
entry_list(mat1, mat2)

## generate a third region - industry matrix in which cells represent the presence/absence
## of a RCA (period 3)
mat3 <- mat2
mat3[5, 2] <- 1

## run the function
entry_list(mat1, mat2, mat3)

## generate a fourth region - industry matrix in which cells represent the presence/absence
## of a RCA (period 4)
mat4 <- mat3
mat4[5, 4] <- 1

## run the function
entry_list(mat1, mat2, mat3, mat4)

Generate a matrix of entry events from two regions - industries matrices (same matrix composition from two different periods)

Description

This function generates a matrix of entry events from two regions - industries matrices (different matrix compositions are allowed)

Usage

entry_mat(mat1, mat2)

Arguments

mat1

An incidence matrix with regions in rows and industries in columns (period 1)

mat2

An incidence matrix with regions in rows and industries in columns (period 2)

Value

A matrix representing the entry events from two regions - industries matrices, with rows representing regions and columns representing industries

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl
Wolf-Hendrik Uhlbach w.p.uhlbach@students.uu.nl

References

Examples

## generate a first region - industry matrix in which cells represent the presence/absence
## of a RCA (period 1)
set.seed(31)
mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix in which cells represent the presence/absence
## of a RCA (period 2)
mat2 <- mat1
mat2[3, 1] <- 1


## run the function
entry_mat(mat1, mat2)

Generate a data frame of exit events from multiple regions - industries matrices (same matrix composition for the different periods)

Description

This function generates a data frame of exit events from multiple regions - industries matrices (different matrix compositions are allowed). In this function, the maximum number of periods is limited to 20.

Usage

exit_list(...)

Arguments

...

Incidence matrices with regions in rows and industries in columns (period ... - optional)

Value

A data frame representing the exit events from multiple regions - industries matrices, with columns "region" (representing the region), "industry" (representing the industry), "exit" (representing the exit event), and "period" (representing the period)

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl
Wolf-Hendrik Uhlbach w.p.uhlbach@students.uu.nl

References

Examples

## generate a first region - industry matrix in which cells represent the presence/absence
## of a RCA (period 1)
set.seed(31)
mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix in which cells represent the presence/absence
## of a RCA (period 2)
mat2 <- mat1
mat2[2, 1] <- 0

## run the function
exit_list(mat1, mat2)

## generate a third region - industry matrix in which cells represent the presence/absence
## of a RCA (period 3)
mat3 <- mat2
mat3[5, 1] <- 0

## run the function
exit_list(mat1, mat2, mat3)

## generate a fourth region - industry matrix in which cells represent the presence/absence
## of a RCA (period 4)
mat4 <- mat3
mat4[5, 3] <- 0

## run the function
exit_list(mat1, mat2, mat3, mat4)

Generate a matrix of exit events from two regions - industries matrices (same matrix composition from two different periods)

Description

This function generates a matrix of exit events from two regions - industries matrices (different matrix compositions are allowed)

Usage

exit_mat(mat1, mat2)

Arguments

mat1

An incidence matrix with regions in rows and industries in columns (period 1)

mat2

An incidence matrix with regions in rows and industries in columns (period 2)

Value

A matrix representing the exit events from two regions - industries matrices, with rows representing regions and columns representing industries

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl
Wolf-Hendrik Uhlbach w.p.uhlbach@students.uu.nl

References

Examples

## generate a first region - industry matrix in which cells represent the presence/absence
## of a RCA (period 1)
set.seed(31)
mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix in which cells represent the presence/absence
## of a RCA (period 2)
mat2 <- mat1
mat2[2, 1] <- 0


## run the function
exit_mat(mat1, mat2)

Compute the expy index of regions from regions - industries matrices

Description

This function computes the expy index of regions from (incidence) regions - industries matrices, as proposed by Hausmann, Hwang & Rodrik (2007). The index is a measure of the productivity level associated with a region's specialization pattern.

Usage

expy(mat, vec)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

vec

A vector that gives GDP, R&D, education or any other relevant regional attribute that will be used to compute the weighted average for each industry

Value

A numeric vector representing the expy index of regions computed from the regions - industries matrix

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123

Hausmann, R., Hwang, J. & Rodrik, D. (2007) What you export matters, Journal of economic growth 12: 1-25.

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## a vector of GDP of regions
vec <- c(5, 10, 15, 25, 50)
## run the function
expy(mat, vec)

Create regular data frames from regions - industries matrices

Description

This function creates regular data frames with three columns (regions, industries, count) from (incidence) matrices (wide to long format) using the reshape2 package

Usage

get_list(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns (or the other way around)

Value

A data frame with three columns: "Region" (representing the region), "Industry" (representing the industry), and "Count" (representing the count of occurrences)

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
get_list(mat)

Create regions - industries matrices from regular data frames

Description

This function creates regions - industries (incidence) matrices from regular data frames (long to wide format) using the reshape2 package or the Matrix package

Usage

get_matrix (my_data, sparse = FALSE)

Arguments

my_data

is a data frame with three columns (regions, industries, count)

sparse

Logical; shall the returned output be a sparse matrix? Defaults to FALSE, but can be set to TRUE if the dataset is very large

Value

A regions - industries matrix in either dense or sparse format, depending on the value of the "sparse" parameter

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

Examples

## generate a region - industry data frame
set.seed(31)
region <- c("R1", "R1", "R1", "R1", "R2", "R2", "R3", "R4", "R5", "R5")
industry <- c("I1", "I2", "I3", "I4", "I1", "I2", "I1", "I1", "I3", "I3")
my_data <- data.frame(region, industry)
my_data$count <- 1

## run the function
get_matrix(my_data)
get_matrix(my_data, sparse = TRUE)

Compute the Gini coefficient

Description

This function computes the Gini coefficient. The Gini index measures spatial inequality. It ranges from 0 (perfect income equality) to 1 (perfect income inequality) and is derived from the Lorenz curve. The Gini coefficient is defined as a ratio of two surfaces derived from the Lorenz curve. The numerator is given by the area between the Lorenz curve of the distribution and the uniform distribution line (45 degrees line). The denominator is the area under the uniform distribution line (the lower triangle). This index gives an indication of the unequal distribution of an industry accross n regions. Maximum inequality in the sample occurs when n-1 regions have a score of zero and one region has a positive score. The maximum value of the Gini coefficient is (n-1)/n and approaches 1 (theoretical maximum limit) as the number of observations (regions) increases.

Usage

gini(mat)

Arguments

mat

A region-industry count matrix

Value

The Gini coefficient or a data frame with the Gini coefficient for each industry (if the input is a matrix with multiple columns)

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Gini, C. (1921) Measurement of Inequality of Incomes, The Economic Journal 31: 124-126

Examples

## generate vectors of industrial count
ind <- c(0, 10, 10, 30, 50)

## run the function
gini(ind)

## generate a region - industry matrix
mat <- matrix(
  c(
    0, 1, 0, 0,
    0, 1, 0, 0,
    0, 1, 0, 0,
    0, 1, 0, 1,
    0, 1, 1, 1
  ),
  ncol = 4, byrow = TRUE
)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
gini(mat)

## run the function by aggregating all industries
gini(rowSums(mat))

## run the function for industry #1 only (perfect equality)
gini(mat[, 1])

## run the function for industry #2 only (perfect equality)
gini(mat[, 2])

## run the function for industry #3 only (perfect unequality: max gini = (5-1)/5)
gini(mat[, 3])

## run the function for industry #4 only (top 40% produces 100% of the output)
gini(mat[, 4])

Generate a matrix of industrial growth by industries from two regions - industries matrices (same matrix composition from two different periods)

Description

This function generates a matrix of industrial growth by industries from two regions - industries matrices (same matrix composition from two different periods)

Usage

growth_ind(mat1, mat2)

Arguments

mat1

An incidence matrix with regions in rows and industries in columns (period 1)

mat2

An incidence matrix with regions in rows and industries in columns (period 2)

Value

A matrix of industrial growth by industries

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a first region - industry matrix with full count (period 1)
set.seed(31)
mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix with full count (period 2)
mat2 <- mat1
mat2[3, 1] <- 8


## run the function
growth_ind(mat1, mat2)

Generate a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods)

Description

This function generates a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods). In this function, the maximum number of periods is limited to 20.

Usage

growth_list(...)

Arguments

...

Incidence matrices with regions in rows and industries in columns (period ... - optional)

Value

A data frame of industrial growth in regions

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a first region - industry matrix with full count (period 1)
set.seed(31)
mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix with full count (period 2)
mat2 <- mat1
mat2[3, 1] <- 8

## run the function
growth_list(mat1, mat2)

## generate a third region - industry matrix with full count (period 3)
mat3 <- mat2
mat3[5, 2] <- 1

## run the function
growth_list(mat1, mat2, mat3)

## generate a fourth region - industry matrix with full count (period 4)
mat4 <- mat3
mat4[5, 4] <- 1

## run the function
growth_list(mat1, mat2, mat3, mat4)

Generate a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods)

Description

Usage

growth_list_ind(...)

Arguments

...

Incidence matrices with regions in rows and industries in columns (period ... - optional)

Value

A data frame of industrial growth in regions

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a first region - industry matrix with full count (period 1)
set.seed(31)
mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix with full count (period 2)
mat2 <- mat1
mat2[3, 1] <- 8

## run the function
growth_list_ind(mat1, mat2)

## generate a third region - industry matrix with full count (period 3)
mat3 <- mat2
mat3[5, 2] <- 1

## run the function
growth_list_ind(mat1, mat2, mat3)

## generate a fourth region - industry matrix with full count (period 4)
mat4 <- mat3
mat4[5, 4] <- 1

## run the function
growth_list_ind(mat1, mat2, mat3, mat4)

Generate a data frame of region growth from multiple regions - industries matrices (same matrix composition for the different periods)

Description

Usage

growth_list_reg(...)

Arguments

...

Incidence matrices with regions in rows and industries in columns (period ... - optional)

Value

A data frame of region growth from multiple regions - industries matrices

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a first region - industry matrix with full count (period 1)
set.seed(31)
mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix with full count (period 2)
mat2 <- mat1
mat2[3, 1] <- 8

## run the function
growth_list_reg(mat1, mat2)

## generate a third region - industry matrix with full count (period 3)
mat3 <- mat2
mat3[5, 2] <- 1

## run the function
growth_list_reg(mat1, mat2, mat3)

## generate a fourth region - industry matrix with full count (period 4)
mat4 <- mat3
mat4[5, 4] <- 1

## run the function
growth_list_reg(mat1, mat2, mat3, mat4)

Generate a matrix of industrial growth in regions from two regions - industries matrices (same matrix composition from two different periods)

Description

This function generates a matrix of industrial growth in regions from two regions - industries matrices (same matrix composition from two different periods)

Usage

growth_mat(mat1, mat2)

Arguments

mat1

An incidence matrix with regions in rows and industries in columns (period 1)

mat2

An incidence matrix with regions in rows and industries in columns (period 2)

Value

A matrix of industrial growth in regions from two regions - industries matrices

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a first region - industry matrix with full count (period 1)
set.seed(31)
mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix with full count (period 2)
mat2 <- mat1
mat2[3, 1] <- 8


## run the function
growth_mat(mat1, mat2)

Generate a matrix of industrial growth by regions from two regions - industries matrices (same matrix composition from two different periods)

Description

This function generates a matrix of industrial growth by regions from two regions - industries matrices (same matrix composition from two different periods)

Usage

growth_reg(mat1, mat2)

Arguments

mat1

An incidence matrix with regions in rows and industries in columns (period 1)

mat2

An incidence matrix with regions in rows and industries in columns (period 2)

Value

A vector of industrial growth by regions from two regions - industries matrices

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a first region - industry matrix with full count (period 1)
set.seed(31)
mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix with full count (period 2)
mat2 <- mat1
mat2[3, 1] <- 8


## run the function
growth_reg(mat1, mat2)

Compute the Hachman index from regions - industries matrices

Description

This function computes the Hachman index from regions - industries matrices. The Hachman index indicates how closely the industrial distribution of a region resembles the one of a more global economy (nation, world). The index varies between 0 (extreme dissimilarity between the region and the more global economy) and 1 (extreme similarity between the region and the more global economy)

Usage

hachman(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A vector of Hachman index values indicating the similarity between the industrial distribution of a region and a more global economy

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
hachman(mat)

Compute the Herfindahl index from regions - industries matrices

Description

This function computes the Herfindahl index from regions - industries matrices from (incidence) regions - industries matrices. This index is also known as the Herfindahl-Hirschman index.

Usage

herfindahl(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A vector of Herfindahl index values indicating the concentration of industries within regions

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Herfindahl, O.C. (1959) Copper Costs and Prices: 1870-1957. Baltimore: The Johns Hopkins Press.

Hirschman, A.O. (1945) National Power and the Structure of Foreign Trade, Berkeley and Los Angeles: University of California Press.

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
herfindahl(mat)

Plot a Hoover curve from regions - industries matrices

Description

This function plots a Hoover curve from regions - industries matrices.

Usage

hoover_curve(mat, pop, plot = TRUE, pdf = FALSE, pdf_location = NULL)

Arguments

mat

An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column).

pop

A vector of population regional count

plot

Logical; shall the curve be automatically plotted? Defaults to TRUE. If set to TRUE, the function will return x y coordinates that you can latter use to plot and customize the curve.

pdf

Logical; shall a pdf be saved? Defaults to FALSE. If set to TRUE, a pdf with all will be compiled and saved to R's temp dir if no 'pdf_location' is specified.

pdf_location

Output location of pdf file

Value

If 'plot = FALSE', a list containing the cumulative distribution of population shares ('cum.reg') and industry shares ('cum.out') is returned. If 'plot = TRUE', no return value is specified.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171

Examples

## generate vectors of industrial and population count
ind <- c(0, 10, 10, 30, 50)
pop <- c(10, 15, 20, 25, 30)

## run the function (30% of the population produces 50% of the industrial output)
hoover_curve (ind, pop)
hoover_curve (ind, pop, pdf = FALSE)
hoover_curve (ind, pop, plot = FALSE)

## generate a region - industry matrix
mat = matrix (
c (0, 10, 0, 0,
0, 15, 0, 0,
0, 20, 0, 0,
0, 25, 0, 1,
0, 30, 1, 1), ncol = 4, byrow = TRUE)
rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c ("I1", "I2", "I3", "I4")

## run the function
hoover_curve (mat, pop)
hoover_curve (mat, pop, plot = FALSE)

## run the function by aggregating all industries
hoover_curve (rowSums(mat), pop)
hoover_curve (rowSums(mat), pop, plot = FALSE)

## run the function for industry #1 only
hoover_curve (mat[,1], pop)
hoover_curve (mat[,1], pop, plot = FALSE)

## run the function for industry #2 only (perfectly proportional to population)
hoover_curve (mat[,2], pop)
hoover_curve (mat[,2], pop, plot = FALSE)

## run the function for industry #3 only (30% of the pop. produces 100% of the output)
hoover_curve (mat[,3], pop)
hoover_curve (mat[,3], pop, plot = FALSE)

## run the function for industry #4 only (55% of the pop. produces 100% of the output)
hoover_curve (mat[,4], pop)
hoover_curve (mat[,4], pop, plot = FALSE)

## Compare the distribution of the #industries
oldpar <- par(mfrow = c(2, 2))  # Save the current graphical parameter settings
hoover_curve (mat[,1], pop)
hoover_curve (mat[,2], pop)
hoover_curve (mat[,3], pop)
hoover_curve (mat[,4], pop)
par(oldpar)  # Reset the graphical parameters to their original values

## Save output as pdf
hoover_curve (mat, pop, pdf = TRUE)

## To specify an output directory for the pdf,
## specify 'pdf_location', for instance as '/Users/jones/hoover_curve.pdf'
## hoover_curve(mat, pop, pdf = TRUE, pdf_location = '/Users/jones/hoover_curve.pdf')

Compute the Hoover Gini

Description

This function computes the Hoover Gini, named after Hedgar hoover_ The Hoover index is a measure of spatial inequality. It ranges from 0 (perfect equality) to 1 (perfect inequality) and is calculated from the Hoover curve associated with a given distribution of population, industries or technologies and a reference category. In this sense, it is closely related to the Gini coefficient and the Hoover index. The numerator is given by the area between the Hoover curve of the distribution and the uniform distribution line (45 degrees line). The denominator is the area under the uniform distribution line (the lower triangle).

Usage

hoover_gini(mat, pop)

Arguments

mat

An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column).

pop

A vector of population regional count

Value

The Hoover Gini value(s). If the input matrix has a single column, the function returns a numeric value representing the Hoover Gini index. If the input matrix has multiple columns, the function returns a data frame with two columns: "Industry" (names of the industries) and "hoover_gini" (corresponding Hoover Gini values).

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171

Examples

## generate vectors of industrial and population count
ind <- c(0, 10, 10, 30, 50)
pop <- c(10, 15, 20, 25, 30)

## run the function (30% of the population produces 50% of the industrial output)
hoover_gini(ind, pop)

## generate a region - industry matrix
mat <- matrix(
  c(
    0, 10, 0, 0,
    0, 15, 0, 0,
    0, 20, 0, 0,
    0, 25, 0, 1,
    0, 30, 1, 1
  ),
  ncol = 4, byrow = TRUE
)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
hoover_gini(mat, pop)

## run the function by aggregating all industries
hoover_gini(rowSums(mat), pop)

## run the function for industry #1 only
hoover_gini(mat[, 1], pop)

## run the function for industry #2 only (perfectly proportional to population)
hoover_gini(mat[, 2], pop)

## run the function for industry #3 only (30% of the pop. produces 100% of the output)
hoover_gini(mat[, 3], pop)

## run the function for industry #4 only (55% of the pop. produces 100% of the output)
hoover_gini(mat[, 4], pop)

Compute the Hoover index

Description

This function computes the Hoover index, named after Hedgar Hoover. The Hoover index is a measure of spatial inequality. It ranges from 0 (perfect equality) to 100 (perfect inequality) and is calculated from the Lorenz curve associated with a given distribution of population, industries or technologies. In this sense, it is closely related to the Gini coefficient. The Hoover index represents the maximum vertical distance between the Lorenz curve and the 45 degree line of perfect spatial equality. It indicates the proportion of industries, jobs, or population needed to be transferred from the top to the bottom of the distribution to achieve perfect spatial equality. The Hoover index is also known as the Robin Hood index in studies of income inequality.

Computation of the Hoover index: H=1/2\sum _{ i=1 }^{ N }{ \left| \frac { { E }_{ i } }{ { E }_{ total } } -\frac { { A }_{ i } }{ { A }_{ total } } \right| }

Usage

hoover_index(mat, pop)

Arguments

mat

An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column).

pop

A vector of population regional count; if this argument is missing an equal distribution of the reference group will be assumed.

Value

The Hoover index value(s) as either a numeric value or a data frame with two columns: "Industry" (names of the industries) and "hoover_index" (corresponding Hoover index values).

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171

Examples

## generate vectors of industrial and population count
ind <- c(0, 10, 10, 30, 50)
pop <- c(10, 15, 20, 25, 30)

## run the function (30% of the population produces 50% of the industrial output)
hoover_index(ind, pop)

## generate a region - industry matrix
mat <- matrix(
  c(
    0, 10, 0, 0,
    0, 15, 0, 0,
    0, 20, 0, 0,
    0, 25, 0, 1,
    0, 30, 1, 1
  ),
  ncol = 4, byrow = TRUE
)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
hoover_index(mat, pop)

## run the function by aggregating all industries
hoover_index(rowSums(mat), pop)

## run the function for industry #1 only
hoover_index(mat[, 1], pop)

## run the function for industry #2 only (perfectly proportional to population)
hoover_index(mat[, 2], pop)

## run the function for industry #3 only (30% of the pop. produces 100% of the output)
hoover_index(mat[, 3], pop)

## run the function for industry #4 only (55% of the pop. produces 100% of the output)
hoover_index(mat[, 4], pop)

Compute a measure of complexity from the inverse of the normalized ubiquity of industries

Description

This function computes a measure of complexity from the inverse of the normalized ubiquity of industries. We divide the logarithm of the total count (employment, number of firms, number of patents, ...) in an industry by its ubiquity. Ubiquity is given by the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices

Usage

inv_norm_ubiquity(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A vector of complexity values computed from the inverse of the normalized ubiquity of industries.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.

Examples

## generate a region - industry matrix with full count
set.seed(31)
mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
inv_norm_ubiquity(mat)

Compute an index of knowledge complexity of regions using the eigenvector method

Description

This function computes an index of knowledge complexity of regions using the eigenvector method from regions - industries (incidence) matrices. Technically, the function returns the eigenvector associated with the second largest eigenvalue of the projected region - region matrix.

Usage

kci(mat, rca = FALSE)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

rca

Value

A vector representing the index of knowledge complexity of regions computed using the eigenvector method.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.

Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.

Examples

## generate a region - industry matrix with full count
set.seed(31)
mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
kci(mat, rca = TRUE)

## generate a region - industry matrix in which cells represent the presence/absence of a RCA
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
kci(mat)

## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4)
countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4")
products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4")
my_data <- data.frame(countries, products)
my_data$freq <- 1
mat <- get_matrix(my_data)

## run the function
kci(mat)

Compute the Krugman index from regions - industries matrices

Description

This function computes the Krugman index from regions - industries matrices. The higher the coefficient, the greater the regional specialization. This index is often referred to as the Krugman specialisation index and measures the distance between the distributions of industry shares in a region and at a more aggregated level (country for instance).

Usage

krugman_index(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A vector representing the Krugman index of regional specialization computed from the regions - industries matrix.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Krugman P. (1991) Geography and Trade, MIT Press, Cambridge

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
krugman_index(mat)

Compute location quotients from regions - industries matrices

Description

This function computes location quotients from (incidence) regions - industries matrices. The numerator is the share of a given industry in a given region. The denominator is the share of a this industry in a larger economy (overall country for instance). This index is also refered to as the index of Revealed Comparative Advantage (RCA) following Ballasa (1965), or the Hoover-Balassa index.

Usage

location_quotient(mat, binary = FALSE)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

binary

Logical; shall the returned output be a dichotomized version (0/1) of the location quotient? Defaults to FALSE (the full values of the location quotient will be returned), but can be set to TRUE (location quotient values above 1 will be set to 1 & location quotient values below 1 will be set to 0)

Value

A matrix of location quotients computed from the regions - industries matrix. If the 'binary' parameter is set to TRUE, the returned matrix will contain binary values (0/1) representing the location quotient. If 'binary' is set to FALSE, the full values of the location quotient will be returned.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123.

Examples

## generate a region - industry matrix
mat <- matrix(
  c(
    100, 0, 0, 0, 0,
    0, 15, 5, 70, 10,
    0, 20, 10, 20, 50,
    0, 25, 30, 5, 40,
    0, 40, 55, 5, 0
  ),
  ncol = 5, byrow = TRUE
)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4", "I5")

## run the function
location_quotient(mat)
location_quotient(mat, binary = TRUE)

Compute average location quotients of regions from regions - industries matrices

Description

This function computes the average location quotients of regions from (incidence) regions - industries matrices. This index is also referred to as the coefficient of specialization (Hoover and Giarratani, 1985).

Usage

location_quotient_avg(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A vector of average location quotients computed for each region from the regions - industries matrix. The average location quotient represents the degree of specialization of each region in different industries.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Hoover, E.M. and Giarratani, F. (1985) An Introduction to Regional Economics. 3rd edition. New York: Alfred A. Knopf

Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250

Examples

## generate a region - industry matrix
mat <- matrix(
  c(
    100, 0, 0, 0, 0,
    0, 15, 5, 70, 10,
    0, 20, 10, 20, 50,
    0, 25, 30, 5, 40,
    0, 40, 55, 5, 0
  ),
  ncol = 5, byrow = TRUE
)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4", "I5")

## run the function
location_quotient_avg(mat)

Compute the locational Gini coefficient from regions - industries matrices

Description

This function computes the locational Gini coefficient as proposed by Krugman from regions - industries matrices. The higher the coefficient (theoretical limit = 0.5), the greater the industrial concentration. The locational Gini of an industry that is not localized at all (perfectly spread out) in proportion to overall employment would be 0.

Usage

locational_gini(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A data frame with two columns: "Industry" and "Loc_gini". The "Industry" column contains the names of the industries, and the "Loc_gini" column contains the locational Gini coefficient computed for each industry from the regions - industries matrix.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Krugman P. (1991) Geography and Trade, MIT Press, Cambridge (chapter 2 - p.56)

Examples

## generate a region - industry matrix
mat <- matrix(
  c(
    100, 0, 0, 0, 0,
    0, 15, 5, 70, 10,
    0, 20, 10, 20, 50,
    0, 25, 30, 5, 40,
    0, 40, 55, 5, 0
  ),
  ncol = 5, byrow = TRUE
)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4", "I5")

## run the function
locational_gini(mat)

Plot a locational Gini curve from regions - industries matrices

Description

This function plots a locational Gini curve following Krugman from regions - industries matrices.

Usage

locational_gini_curve(mat, pdf = FALSE, pdf_location = NULL)

Arguments

mat

An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column).

pdf

Logical; shall a pdf be saved? Defaults to FALSE. If set to TRUE, a pdf with all will be compiled and saved to R's temp dir if no 'pdf_location' is specified.

pdf_location

Output location of pdf file

Value

No return value, produces a plot or pdf.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Krugman P. (1991) Geography and Trade, MIT Press, Cambridge (chapter 2 - p.56)

Examples


## generate a region - industry matrix
mat <- matrix(
  c(
    100, 0, 0, 0, 0,
    0, 15, 5, 70, 10,
    0, 20, 10, 20, 50,
    0, 25, 30, 5, 40,
    0, 40, 55, 5, 0
  ),
  ncol = 5, byrow = TRUE
)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4", "I5")

## run the function (shows industry #5)
locational_gini_curve(mat, pdf = FALSE)
locational_gini_curve(mat, pdf = FALSE)

## Save output as pdf
locational_gini_curve(mat, pdf = TRUE)

## To specify an output directory for the pdf,
## specify 'pdf_location', for instance as '/Users/jones/locational_gini_curve.pdf'
## locational_gini_curve(mat, pdf = TRUE, pdf_location = '/Users/jones/locational_gini_curve.pdf')

Plot a Lorenz curve from regional industrial counts

Description

This function plots a Lorenz curve from regional industrial counts. This curve gives an indication of the unequal distribution of an industry accross regions.

Usage

lorenz_curve(mat, plot = TRUE, pdf = TRUE, pdf_location = NULL)

Arguments

mat

An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column).

plot

Logical; shall the curve be automatically plotted? Defaults to TRUE. If set to TRUE, the function will return x y coordinates that you can latter use to plot and customize the curve.

pdf

Logical; shall a pdf be saved? Defaults to FALSE. If set to TRUE, a pdf with all will be compiled and saved to R's temp dir if no 'pdf_location' is specified.

pdf_location

Output location of pdf file

Value

If 'plot = FALSE', the function returns a list with two components: - 'cum.reg': A vector of cumulative proportions of regions. - 'cum.out': A vector of cumulative proportions of industrial output. If 'plot = TRUE', the function generates a plot of the Lorenz curve and does not return a value.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Lorenz, M. O. (1905) Methods of measuring the concentration of wealth, Publications of the American Statistical Association 9: 209–219

Examples

## generate vectors of industrial count
ind <- c(0, 10, 10, 30, 50)

## run the function
lorenz_curve (ind)
lorenz_curve (ind, plot = FALSE)

## generate a region - industry matrix
mat = matrix (
c (0, 1, 0, 0,
0, 1, 0, 0,
0, 1, 0, 0,
0, 1, 0, 1,
0, 1, 1, 1), ncol = 4, byrow = TRUE)
rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c ("I1", "I2", "I3", "I4")

## run the function
lorenz_curve (mat)
lorenz_curve (mat, plot = FALSE)

## run the function by aggregating all industries
lorenz_curve (rowSums(mat))
lorenz_curve (rowSums(mat), plot = FALSE)

## run the function for industry #1 only (perfect equality)
lorenz_curve (mat[,1])
lorenz_curve (mat[,1], plot = FALSE)

## run the function for industry #2 only (perfect equality)
lorenz_curve (mat[,2])
lorenz_curve (mat[,2], plot = FALSE)

## run the function for industry #3 only (perfect unequality)
lorenz_curve (mat[,3])
lorenz_curve (mat[,3], plot = FALSE)

## run the function for industry #4 only (top 40% produces 100% of the output)
lorenz_curve (mat[,4])
lorenz_curve (mat[,4], plot = FALSE)

## Compare the distribution of the #industries
oldpar <- par(mfrow = c(2, 2))  # Save the current graphical parameter settings
lorenz_curve (mat[,1])
lorenz_curve (mat[,2])
lorenz_curve (mat[,3])
lorenz_curve (mat[,4])
par(oldpar)  # Reset the graphical parameters to their original values

## Save output as pdf
lorenz_curve (mat, pdf = TRUE)

## To specify an output directory for the pdf,
## specify 'pdf_location', for instance as '/Users/jones/lorenz_curve.pdf'
## lorenz_curve(mat, pdf = TRUE, pdf_location = '/Users/jones/lorenz_curve.pdf')

Re-arrange the dimension of a matrix based on the dimension of another matrix

Description

This function e-arranges the dimension of a matrix based on the dimension of another matrix

Usage

match_mat(fill, dim, missing = TRUE)

Arguments

fill

A matrix that will be used to populate the matrix output

dim

A matrix that will be used to determine the dimensions of the matrix output

missing

Logical; Shall the cells of the non matching rows/columns set to NA? Default to TRUE but can be set to FALSE to set the cells of the non matching rows/columns to 0 instead.

Value

The matrix output with the dimensions rearranged based on the input 'dim' matrix.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

Examples

## generate a first region - industry matrix
set.seed(31)
mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat1) <- c("I1", "I2", "I3", "I4")

## generate a second region - industry matrix
set.seed(31)
mat2 <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4)
rownames(mat2) <- c("R1", "R2", "R3", "R5")
colnames(mat2) <- c("I1", "I2", "I3", "I4")

## run the function
match_mat(fill = mat1, dim = mat2)
match_mat(fill = mat2, dim = mat1)
match_mat(fill = mat2, dim = mat1, missing = FALSE)

Compute a measure of modular complexity of patent documents

Description

This function computes a measure of modular complexity of patent documents from technological classes - patents (incidence) matrices

Usage

modular_complexity(mat, sparse = FALSE, list = FALSE)

Arguments

mat

A bipartite adjacency matrix (can be a sparse matrix)

sparse

Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix

list

Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list

Value

A data frame with columns "patent" and "mod.comp" representing the patents and their corresponding modular complexity values.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039

Examples

## generate a technology - patent matrix
set.seed(31)
mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5)
rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6")
colnames(mat) <- c("US1", "US2", "US3", "US4", "US5")

## run the function
modular_complexity(mat)

## generate a technology - patent sparse matrix
library(Matrix)

## run the function
smat <- Matrix(mat, sparse = TRUE)

modular_complexity(smat, sparse = TRUE)
## generate a regular data frame (list)
my_list <- get_list(mat)

## run the function
modular_complexity(my_list, list = TRUE)

Compute a measure of average modular complexity of technologies

Description

This function computes a measure of average modular complexity of technologies (average complexity of patent documents in a given technological class) from technological classes - patents (incidence) matrices

Usage

modular_complexity_avg(mat, sparse = FALSE, list = FALSE)

Arguments

mat

A bipartite adjacency matrix (can be a sparse matrix)

sparse

Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix

list

Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list

Value

A data frame with columns "tech" and "avg.mod.comp" representing the technologies and their corresponding average modular complexity values.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039

Examples

## generate a technology - patent matrix
set.seed(31)
mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5)
rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6")
colnames(mat) <- c("US1", "US2", "US3", "US4", "US5")

## run the function
modular_complexity_avg(mat)

## generate a technology - patent sparse matrix
library(Matrix)

## run the function
smat <- Matrix(mat, sparse = TRUE)

modular_complexity_avg(smat, sparse = TRUE)
## generate a regular data frame (list)
my_list <- get_list(mat)

## run the function
modular_complexity_avg(my_list, list = TRUE)

Compute an index of knowledge complexity of regions using the method of reflection

Description

This function computes an index of knowledge complexity of regions using the method of reflection from regions - industries (incidence) matrices. The index has been developed by Hidalgo and Hausmann (2009) for country - product matrices and adapted by Balland and Rigby (2016) to city - technology matrices.

Usage

morc(mat, rca = FALSE, steps = 20)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

rca

steps

Number of iteration steps. Defaults to 20, but can be set to 0 to give diversity (number of industry in which a region has a RCA), to 1 to give the average ubiquity of the industries in which a region has a RCA, to 2 to give the average diversity of regions that have similar industrial structures, or to any other number of steps < or = to 22. Note that above steps = 2 the index will be rescaled from 0 (minimum relative complexity) to 100 (maximum relative complexity).

Value

If 'steps' is set to 0, the function returns a numeric vector representing the diversification of regions. Otherwise, it returns

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a region - industry matrix with full count
set.seed(31)
mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
morc(mat, rca = TRUE)
morc(mat, rca = TRUE, steps = 0)
morc(mat, rca = TRUE, steps = 1)
morc(mat, rca = TRUE, steps = 2)

## generate a region - industry matrix in which cells represent the presence/absence of an RCA
set.seed(32)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
morc(mat)
morc(mat, steps = 0)
morc(mat, steps = 1)
morc(mat, steps = 2)

## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4)
countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4")
products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4")
my_data <- data.frame(countries, products)
my_data$freq <- 1
mat <- get_matrix(my_data)

## run the function
morc(mat)
morc(mat, steps = 0)
morc(mat, steps = 1)
morc(mat, steps = 2)

Compute an index of knowledge complexity of industries using the method of reflection

Description

This function computes an index of knowledge complexity of industries using the method of reflection from regions - industries (incidence) matrices. The index has been developed by Hidalgo and Hausmann (2009) for country - product matrices and adapted by Balland and Rigby (2016) to city - technology matrices.

Usage

mort(mat, rca = FALSE, steps = 19)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

rca

steps

Number of iteration steps. Defaults to 19, but can be set to 0 to give ubiquity (number of regions that have a RCA in a industry), to 1 to give the average diversity of the regions that have a RCA in this industry, to 2 to give the average ubiquity of technologies developed in the same regions, or to any other number of steps < or = to 21. Note that above steps = 2 the index will be rescaled from 0 (minimum relative complexity) to 100 (maximum relative complexity).

Value

If 'steps' is set to 0, the function returns a numeric vector representing the ubiquity (number of regions that have a relative comparative advantage) of industries. Otherwise, it returns a numeric vector representing the index of knowledge complexity of industries based on the specified number of iteration steps.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a region - industry matrix with full count
set.seed(31)
mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
mort(mat, rca = TRUE)
mort(mat, rca = TRUE, steps = 0)
mort(mat, rca = TRUE, steps = 1)
mort(mat, rca = TRUE, steps = 2)

## generate a region - industry matrix in which cells represent the presence/absence of a rca
set.seed(32)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
mort(mat)
mort(mat, steps = 0)
mort(mat, steps = 1)
mort(mat, steps = 2)

## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4)
countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4")
products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4")
my_data <- data.frame(countries, products)
my_data$freq <- 1
mat <- get_matrix(my_data)

## run the function
mort(mat)
mort(mat, steps = 0)
mort(mat, steps = 1)
mort(mat, steps = 2)

Compute a measure of complexity by normalizing ubiquity of industries

Description

This function computes a measure of complexity by normalizing ubiquity of industries. We divide the share of the total count (employment, number of firms, number of patents, ...) in an industry by its share of ubiquity. Ubiquity is given by the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices

Usage

norm_ubiquity(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A numeric vector representing the measure of complexity obtained by normalizing the ubiquity of industries. Each value in the vector corresponds to the normalized complexity score of an industry.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.

Examples

## generate a region - industry matrix with full count
set.seed(31)
mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
norm_ubiquity(mat)

Compute the prody index of industries from regions - industries matrices

Description

This function computes the prody index of industries from (incidence) regions - industries matrices, as proposed by Hausmann, Hwang & Rodrik (2007). The index gives an associated income level for each industry. It represents a weighted average of per-capita GDPs (but GDP can be replaced by R&D, education...), where the weights correspond to the revealed comparative advantage of each region in a given industry (or sector, technology, ...).

Usage

prody(mat, vec)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

vec

A vector that gives GDP, R&D, education or any other relevant regional attribute that will be used to compute the weighted average for each industry

Value

A numeric vector representing the prody index of industries. Each value in the vector corresponds to the associated income level for an industry.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## a vector of GDP of regions
vec <- c(5, 10, 15, 25, 50)
## run the function
prody(mat, vec)

Compute an index of revealed comparative advantage (RCA) from regions - industries matrices

Description

This function computes an index of revealed comparative advantage (RCA) from (incidence) regions - industries matrices. The numerator is the share of a given industry in a given region. The denominator is the share of a this industry in a larger economy (overall country for instance). This index is also refered to as a location quotient, or the Hoover-Balassa index.

Usage

rca(mat, binary = FALSE)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

binary

Logical; shall the returned output be a dichotomized version (0/1) of the RCA? Defaults to FALSE (the full values of the RCA will be returned), but can be set to TRUE (RCA above 1 will be set to 1 & RCA values below 1 will be set to 0)

Value

A matrix representing the index of revealed comparative advantage (RCA) or location quotient. Each cell in the matrix corresponds to the RCA value for a specific region and industry. If the 'binary' parameter is set to TRUE, the returned matrix will be dichotomized, with values above 1 set to 1 and values below 1 set to 0.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123.

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
rca(mat)
rca(mat, binary = TRUE)

Compute the relatedness between entities (industries, technologies, ...) from their co-occurence matrix

Description

This function computes the relatedness between entities (industries, technologies, ...) from their co-occurence (adjacency) matrix. Different normalization procedures are proposed following van Eck and Waltman (2009): association strength, cosine, Jaccard, and an adapted version of the association strength that we refer to as probability index.

Usage

relatedness(mat, method = "prob")

Arguments

mat

An adjacency matrix of co-occurences between entities (industries, technologies, cities...)

method

Which normalization method should be used to compute relatedness? Defaults to "prob", but it can be "association", "cosine" or "Jaccard"

Value

A matrix representing the relatedness between entities (industries, technologies, etc.) based on their co-occurrence matrix. The specific method of normalization used is determined by the 'method' parameter, which can be "prob" (probability index), "association" (association strength), "cosine" (cosine similarity), or "jaccard" (Jaccard index).

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl
Joan Crespo J.Crespo@uu.nl
Mathieu Steijn M.P.A.Steijn@uu.nl

References

van Eck, N.J. and Waltman, L. (2009) How to normalize cooccurrence data? An analysis of some well-known similarity measures, Journal of the American Society for Information Science and Technology 60 (8): 1635-1651

Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114

Hidalgo, C.A., Klinger, B., Barabasi, A. and Hausmann, R. (2007) The product space conditions the development of nations, Science 317: 482-487

Balland, P.A. (2016) Relatedness and the Geography of Innovation, in: R. Shearmur, C. Carrincazeaux and D. Doloreux (eds) Handbook on the Geographies of Innovation. Northampton, MA: Edward Elgar

Steijn, M.P.A. (2017) Improvement on the association strength: implementing probability measures based on combinations without repetition, Working Paper, Utrecht University

Examples

## generate an industry - industry matrix in which cells give the number of co-occurences
## between two industries
set.seed(31)
mat <- matrix(sample(0:10, 36, replace = TRUE), ncol = 6)
mat[lower.tri(mat, diag = TRUE)] <- t(mat)[lower.tri(t(mat), diag = TRUE)]
rownames(mat) <- c("I1", "I2", "I3", "I4", "I5", "I6")
colnames(mat) <- c("I1", "I2", "I3", "I4", "I5", "I6")

## run the function
relatedness(mat)
relatedness(mat, method = "association")
relatedness(mat, method = "cosine")
relatedness(mat, method = "jaccard")

Compute the relatedness density between regions and industries from regions - industries matrices and industries - industries matrices

Description

This function computes the relatedness density between regions and industries from regions - industries (incidence) matrices and industries - industries (adjacency) matrices

Usage

relatedness_density(mat, relatedness)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

relatedness

An adjacency industry - industry matrix indicating the degree of relatedness between industries

Value

A matrix representing the relatedness density between regions and industries. The values in the matrix indicate the share of industries related to each industry in each region, scaled from 0 to 100. Rows represent regions and columns represent industries.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a region - industry matrix in which cells represent the presence/absence of a RCA
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## generate an industry - industry matrix in which cells indicate if two industries are
## related (1) or not (0)
relatedness <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4)
relatedness[lower.tri(relatedness, diag = TRUE)] <- t(relatedness)[lower.tri(t(relatedness),
  diag = TRUE
)]
rownames(relatedness) <- c("I1", "I2", "I3", "I4")
colnames(relatedness) <- c("I1", "I2", "I3", "I4")

## run the function
relatedness_density(mat, relatedness)

Compute the relatedness density between regions and industries that are not part of the regional portfolio from regions - industries matrices and industries - industries matrices

Description

This function computes the relatedness density between regions and industries that are not part of the regional portfolio from regions - industries (incidence) matrices and industries - industries (adjacency) matrices

Usage

relatedness_density_ext(mat, relatedness)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

relatedness

An adjacency industry - industry matrix indicating the degree of relatedness between industries

Value

A matrix representing the relatedness density between regions and industries that are not part of the regional portfolio. The values in the matrix indicate the share of industries related to each industry in each region, scaled from 0 to 100. Rows represent regions and columns represent industries. Industries that are part of the regional portfolio are assigned NA.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a region - industry matrix in which cells represent the presence/absence of a RCA
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## generate an industry - industry matrix in which cells indicate if two industries are
## related (1) or not (0)
relatedness <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4)
relatedness[lower.tri(relatedness, diag = TRUE)] <- t(relatedness)[lower.tri(t(relatedness),
  diag = TRUE
)]
rownames(relatedness) <- c("I1", "I2", "I3", "I4")
colnames(relatedness) <- c("I1", "I2", "I3", "I4")

## run the function
relatedness_density_ext(mat, relatedness)

Compute the average relatedness density of regions to industries that are not part of the regional portfolio from regions - industries matrices and industries - industries matrices

Description

This function computes the average relatedness density of regions to industries that are not part of the regional portfolio from regions - industries (incidence) matrices and industries - industries (adjacency) matrices. This is the technological flexibility indicator proposed by Balland et al. (2015).

Usage

relatedness_density_ext_avg(mat, relatedness)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

relatedness

An adjacency industry - industry matrix indicating the degree of relatedness between industries

Value

A vector representing the average relatedness density of regions to industries that are not part of the regional portfolio. The values in the vector indicate the average relatedness density for each region, rounded to the nearest integer.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250

Balland P.A., Rigby, D., and Boschma, R. (2015) The Technological Resilience of U.S. Cities, Cambridge Journal of Regions, Economy and Society, 8 (2): 167-184

Examples

## generate a region - industry matrix in which cells represent the presence/absence
## of a RCA
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## generate an industry - industry matrix in which cells indicate if two industries are
## related (1) or not (0)
relatedness <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4)
relatedness[lower.tri(relatedness, diag = TRUE)] <- t(relatedness)[lower.tri(t(relatedness),
  diag = TRUE
)]
rownames(relatedness) <- c("I1", "I2", "I3", "I4")
colnames(relatedness) <- c("I1", "I2", "I3", "I4")

## run the function
relatedness_density_ext_avg(mat, relatedness)

Compute the relatedness density between regions and industries that are part of the regional portfolio from regions - industries matrices and industries - industries matrices

Description

This function computes the relatedness density between regions and industries that are part of the regional portfolio from regions - industries (incidence) matrices and industries - industries (adjacency) matrices

Usage

relatedness_density_int(mat, relatedness)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

relatedness

An adjacency industry - industry matrix indicating the degree of relatedness between industries

Value

A matrix representing the relatedness density between regions and industries that are part of the regional portfolio. The values in the matrix indicate the relatedness density for each region and industry, scaled from 0 to 100.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a region - industry matrix in which cells represent the presence/absence
## of a RCA
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## generate an industry - industry matrix in which cells indicate if two industries are
## related (1) or not (0)
relatedness <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4)
relatedness[lower.tri(relatedness, diag = TRUE)] <- t(relatedness)[lower.tri(t(relatedness),
  diag = TRUE
)]
rownames(relatedness) <- c("I1", "I2", "I3", "I4")
colnames(relatedness) <- c("I1", "I2", "I3", "I4")

## run the function
relatedness_density_int(mat, relatedness)

Compute the average relatedness density within the regional portfolio from regions - industries matrices and industries - industries matrices

Description

This function computes the average relatedness density within the regional portfolio from regions - industries (incidence) matrices and industries - industries (adjacency) matrices. This is a measure of the technological coherence of the regional industrial structure.

Usage

relatedness_density_int_avg(mat, relatedness)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

relatedness

An adjacency industry - industry matrix indicating the degree of relatedness between industries

Value

A vector representing the average relatedness density within the regional portfolio. The values in the vector indicate the average relatedness density for each region, scaled from 0 to 100.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250

Balland P.A., Rigby, D., and Boschma, R. (2015) The Technological Resilience of U.S. Cities, Cambridge Journal of Regions, Economy and Society, 8 (2): 167-184

Examples

## generate a region - industry matrix in which cells represent the presence/absence
## of a RCA
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## generate an industry - industry matrix in which cells indicate if two industries are
## related (1) or not (0)
relatedness <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4)
relatedness[lower.tri(relatedness, diag = TRUE)] <- t(relatedness)[lower.tri(t(relatedness),
  diag = TRUE
)]
rownames(relatedness) <- c("I1", "I2", "I3", "I4")
colnames(relatedness) <- c("I1", "I2", "I3", "I4")

## run the function
relatedness_density_int_avg(mat, relatedness)

Compute the Hoover coefficient of specialization from regions - industries matrices

Description

This function computes the Hoover coefficient of specialization from regions - industries matrices. The higher the coefficient, the greater the regional specialization. This index is closely related to the Krugman specialisation index.

Usage

spec_coeff(mat)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

Value

A vector representing the Hoover coefficient of specialization for each region. The values in the vector indicate the degree of regional specialization, with higher values indicating greater specialization.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Hoover, E.M. and Giarratani, F. (1985) An Introduction to Regional Economics. 3rd edition. New York: Alfred A. Knopf (see table 9-4 in particular)

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
spec_coeff(mat)

Compute an index of knowledge complexity of industries using the eigenvector method

Description

This function computes an index of knowledge complexity of industries using the eigenvector method from regions - industries (incidence) matrices. Technically, the function returns the eigenvector associated with the second largest eigenvalue of the projected industry - industry matrix.

Usage

tci(mat, rca = FALSE)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

rca

Value

A numeric vector representing the index of knowledge complexity of industries. The vector contains the values of the eigenvector associated with the second largest eigenvalue of the projected industry - industry matrix.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Examples

## generate a region - industry matrix with full count
set.seed(31)
mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
tci(mat, rca = TRUE)

## generate a region - industry matrix in which cells represent the presence/absence of a rca
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
tci(mat)

## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4)
countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4")
products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4")
my_data <- data.frame(countries, products)
my_data$freq <- 1
mat <- get_matrix(my_data)

## run the function
tci(mat)

Compute a simple measure of ubiquity of industries

Description

This function computes a simple measure of ubiquity of industries by counting the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices

Usage

ubiquity(mat, rca = FALSE)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

rca

Value

A numeric vector representing the measure of ubiquity of industries. Each element of the vector corresponds to the number of regions in which an industry can be found (location quotient > 1).

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

References

Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.

Examples

## generate a region - industry matrix with full count
set.seed(31)
mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
ubiquity(mat, rca = TRUE)

## generate a region - industry matrix in which cells represent the presence/absence of a rca
set.seed(31)
mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## run the function
ubiquity(mat)

Compute a weighted average of regions or industries from regions - industries matrices

Description

This function computes a weighted average of regions or industries from (incidence) regions - industries matrices.

Usage

weighted_avg(mat, vec, reg = TRUE)

Arguments

mat

An incidence matrix with regions in rows and industries in columns

vec

A vector that will be used to compute the weighted average for each industry/region

reg

Logical; Shall the weighted average for regions be returned? Default to TRUE (requires a vector of industry value) but can be set to FALSE (requires a vector of region value) if the weighted average for industries should be returned

Value

A numeric vector representing the weighted average of regions or industries, depending on the value of the 'reg' argument. If 'reg = TRUE', the weighted average for regions is returned; if 'reg = FALSE', the weighted average for industries is returned.

Author(s)

Pierre-Alexandre Balland p.balland@uu.nl

Examples

## generate a region - industry matrix
set.seed(31)
mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4)
rownames(mat) <- c("R1", "R2", "R3", "R4", "R5")
colnames(mat) <- c("I1", "I2", "I3", "I4")

## a vector for regions will be used to computed the weighted average of industries
vec <- c(5, 10, 15, 25, 50)
## run the function
weighted_avg(mat, vec, reg = FALSE)

## a vector for industries will be used to computed the weighted average of regions
vec <- c(5, 10, 15, 25)
## run the function
weighted_avg(mat, vec, reg = TRUE)

Compute the z-score between technologies from an incidence matrix

Description

This function computes the z-score between pairs of technologies from a patent-technology incidence matrix. The z-score is a measure to analyze the co-occurrence of technologies in patent documents (i.e. knowledge combination). It compares the observed number of co-occurrences to what would be expected under the hypothesis that combination is random. A positive z-score indicates a typical co-occurrence which has occurred multiple times before. In contrast, a negative z-socre indicates an atypical co-occurrence. The z-score has been used to estimate the degree of novelty of patents (Kim 2016), scientific publications (Uzzi et al. 2013) or the relatedness between industries (Teece et al. 1994).

Usage

z_score(mat)

Arguments

mat

A patent-technology incidence matrix with patents in rows and technologies in columns

Value

A matrix of z-scores representing the co-occurrence of technologies in the input incidence matrix. The z-score measures the deviation of the observed co-occurrence from the expected co-occurrence under the assumption of random combination. Positive z-scores indicate typical co-occurrences, while negative z-scores indicate atypical co-occurrences.

Author(s)

Lars Mewes mewes@wigeo.uni-hannover.de

References

Kim, D., Cerigo, D. B., Jeong, H., and Youn, H. (2016). Technological novelty proile and invention's future impact. EPJ Data Science, 5 (1):1–15

Teece, D. J., Rumelt, R., Dosi, G., and Winter, S. (1994). Understanding corporate coherence. Theory and evidence. Journal of Economic Behavior and Organization, 23 (1):1–30

Uzzi, B., Mukherjee, S., Stringer, M., and Jones, B. (2013). Atypical Combinations and Scientific Impact. Science, 342 (6157):468–472

Examples


## Generate a toy incidence matrix
set.seed(2210)
techs <- paste0("T", seq(1, 5))
techs <- sample(techs, 50, replace = TRUE)
patents <- paste0("P", seq(1, 20))
patents <- sort(sample(patents, 50, replace = TRUE))
my_data <- data.frame(patents, techs)
my_dat <- unique(my_data)
mat <- as.matrix(table(my_data$patents, my_data$techs))

## run the function
z_score(mat)

Compute the number of co-occurrences between industry pairs from an incidence (industry - event) matrix

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Compute a simple measure of diversity of regions

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Compute the ease of recombination of a given technological class

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Compute the Shannon entropy index from regions - industries matrices

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Generate a data frame of entry events from multiple regions - industries matrices (same matrix composition for the different periods)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Generate a matrix of entry events from two regions - industries matrices (same matrix composition from two different periods)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Generate a data frame of exit events from multiple regions - industries matrices (same matrix composition for the different periods)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Generate a matrix of exit events from two regions - industries matrices (same matrix composition from two different periods)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Compute the expy index of regions from regions - industries matrices

Description

Usage

Arguments

Value

Author(s)

References

See Also