#' Argus Media: Home assignment — Data scientist, Curves
#'
#' @description
#'
#' This file presents a fully reproducible data pipeline developed as part of
#' a home assignment for the Data Scientist position at the
#' [Argus Media Group](https://www.argusmedia.com),
#' a leading independent provider of global energy and commodity market
#' intelligence.
#'
#' For a detailed report—including code, explanations, and outputs—visit:
#' https://danielvartan.github.io/argus/
#'
#' @author Daniel Vartanian <danvartan@gmail.com> <linktr.ee/danielvartan>
#' @date 2025-07-11
#'
#' @license CC0 1.0 Universal (Public Domain Dedication)
#'
#' @references
#'
#' Vartanian, D. (2025). *Argus Media: Home assignment — Data scientist,
#' Curves* [Report]. https://danielvartan.github.io/argus
## Load Required Packages and Functions -----
library(data.table)
library(openxlsx) # -> Neither `data.table` nor base R provide native
# support for reading `.xlsx` files.
accumulate_2 <- function(x, y, f, ...) {
inputs <- Map(\(xi, yi) list(xi, yi), x, y)
Reduce(
\(acc, pair) f(acc, pair[[1]], pair[[2]], ...),
inputs[-1],
init = f(inputs[[1]][[1]], inputs[[1]][[1]], inputs[[1]][[2]], ...),
accumulate = TRUE
)
}
factor_mult <- function(previous, current, factor) {
fifelse(!is.na(current), current, round(previous * factor, 2))
}
interpolate <- function(data, group) {
market <- group |> unlist() |> unname()
out <-
data |>
_[, reference_data[.SD, on = .(TERM, REF_MARKET)]] |>
_[, VALUE := accumulate_2(VALUE, FACTOR, factor_mult)]
if ((market %in% lookup$REF_MARKET) &&
(!market %in% reference_data$REF_MARKET)) {
reference_data <<-
list(
reference_data,
out |>
_[, !c("REF_MARKET", "FACTOR")] |>
_[, c("REF_MARKET", "VALUE") := .(market, VALUE / shift(VALUE))] |>
_[!is.na(VALUE)] |>
setnames("VALUE", "FACTOR") |>
_[, .(TERM, REF_MARKET, FACTOR)]
) |>
rbindlist()
}
out[, VALUE]
}
## Import, Tidy, and Transform the Data -----
raw_data_file <- file.path("data-raw", "data_P2_ds_test.xlsx")
lookup <- rowwiseDT(
MARKET=, REF_MARKET=,
"MARKET_1", "REF_MARKET_2",
"MARKET_2", "MARKET_1",
"MARKET_3", "REF_MARKET_2",
"MARKET_4", "REF_MARKET_1",
"MARKET_5", "REF_MARKET_3",
"MARKET_6", "MARKET_5",
"MARKET_7", "REF_MARKET_2"
)
market_data <-
raw_data_file |>
read.xlsx("MARKET") |>
as.data.table() |>
_[, TERM := as.IDate(TERM, origin = "1899-12-30")] |>
melt(
id.vars = "TERM",
variable.name = "MARKET",
value.name = "VALUE"
) |>
_[order(MARKET, TERM)]
reference_data <-
raw_data_file|>
read.xlsx("REFERENCE_MARKET") |>
as.data.table() |>
_[, TERM := as.IDate(TERM, origin = "1899-12-30")] |>
_[, VALUE := VALUE / shift(VALUE), by = REF_MARKET] |>
_[!is.na(VALUE)] |>
setnames("VALUE", "FACTOR") |>
_[order(REF_MARKET, TERM)]
## Interpolate the Data -----
market_data <-
market_data |>
_[, lookup[.SD, on = .(MARKET)]] |>
_[order(MARKET, TERM)] |>
_[, VALUE := interpolate(.SD, .BY), by = MARKET] # "in one go"
## Arrange, Clean and Pivot the Data -----
market_data <-
market_data |>
_[, .(TERM, MARKET, VALUE)] |>
dcast(TERM ~ MARKET, value.var = "VALUE") |>
_[!is.na(TERM)] |>
_[order(TERM)] |>
na.omit()
## Final Output -----
market_dataArgus Media: Home assignment — Data scientist, Curves
Overview
This report presents a fully reproducible data pipeline developed as part of a home assignment for the Data Scientist position (Curves) at the Argus Media Group, a leading independent provider of global energy and commodity market intelligence.
The second part of this assignment—a Shiny application—can be accessed here, with its code repository available here (see the note below).
This website is only visible for those with the link. It is not indexed by search engines and is not intended for public access.
Please note that the code repositories are private and require access permissions. If you would like access, please contact the author.
Assignment
This section presents the assignment instructions as provided by the Argus Media Group. For clarity, some links, emphasis, and minor corrections have been added. The original document is available here.
About this Test
This test is composed of two parts.
- In the first one, you need to show your abilities with
data.tableor base R. - In the second problem, you need to develop a Shiny dashboard for visualization.
Deliverables
Problem 1: A single R script with the code used to populate the missing values.
Problem 2: Create an interactive Shiny application that showcases the visualization from the previous exercise.
Files should be sent to: sueidi.souza@argusmedia.com with subject line: Test Results - Data Science, Latin America.
Problem 1
The exercise requires the candidate to reproduce a simple example in R.
The relevant data is in the Excel file data_P2_ds_test.xlsx.
The task involves data wrangling skills for populating one table with missing values. Please proceed with the instructions below:
- The objective is to populate the market’s missing values.
- In the tab
MARKET, we have incomplete data on six [correction: seven] markets in wide format.- As you may observe, the market data starts and ends on different terms.
- In the tab
REFERENCE, we have data on three reference markets in long format.- To populate the columns on the
MARKETtab we apply the factors obtained from the reference curve on the latest available market data.
- To populate the columns on the
- In column
Ion the tabMARKET, we provide an example (MARKET_8) of how these markets should be computed.
- In the tab
- Table below discloses which reference to apply respectively to each market.
| MARKET | REFERENCE |
|---|---|
| MARKET_1 | REF_MARKET_2 |
| MARKET_2 | MARKET_1 |
| MARKET_3 | REF_MARKET_2 |
| MARKET_4 | REF_MARKET_1 |
| MARKET_5 | REF_MARKET_3 |
| MARKET_6 | MARKET_5 |
| MARKET_7 | REF_MARKET_2 |
- You should provide an elegant solution that calculates all market values in one go using
data.table. - The final output table must have:
- Complete market data with 48 data points each, starting from 2024-04-01 [correction: 2025-04-01].
- No missing values.
Problem 2
Create a Shiny app that takes the data from Problem 1 and displays the dates and the data for the Market in an editable table. The user needs to be able to edit the data for the reference market on any date (for instance, changing March 2026 from 73.51460126 to 77.00), and the app will show a plot with the original Market values (Original) and the new Market values (New) calculated with the new data points.
Methods
Source of Data
The data used in this report were provided by the Argus Media Group as part of the home assignment for the Data Scientist position. The original dataset is available here.
Data Munging
The data munging followed the data science workflow outlined by Wickham et al. (2023), as illustrated in Figure 1. All processes were made using the Quarto publishing system (Allaire et al., n.d.), the R programming language (R Core Team, n.d.) and several R packages.
Source: Reproduced from Wickham et al. (2023).
Code Style
The Tidyverse code style guide and design principles were followed to ensure consistency and enhance readability.
Reproduction
The pipeline is fully reproducible and can be run again at any time. The renv package is used to manage the project dependencies, ensuring that the code runs with the same package versions as when it was developed.
Problem 1: Missing Data
Below is a unified pipeline that produces the required results using data.table and base R. The following sections break down each step of the data wrangling process.
Set the Environment
The openxlsx package is used to read Excel data, since neither data.table nor base R provide native support for reading .xlsx files.
While data.table offers the fread() function for fast data import, it only supports text-based formats such as CSV. To work with Excel files, you can either use a package like openxlsx or convert the file to CSV using a shell command such as in2csv before importing.
Code
library(data.table)
library(openxlsx) # See note above.Add the Lookup Table
Code
lookup <- rowwiseDT(
MARKET=, REF_MARKET=,
"MARKET_1", "REF_MARKET_2",
"MARKET_2", "MARKET_1",
"MARKET_3", "REF_MARKET_2",
"MARKET_4", "REF_MARKET_1",
"MARKET_5", "REF_MARKET_3",
"MARKET_6", "MARKET_5",
"MARKET_7", "REF_MARKET_2"
)Import the Data
Code
raw_data_file <- file.path("data-raw", "data_P2_ds_test.xlsx")Code
market_data <- raw_data_file |> read.xlsx("MARKET")Code
reference_data <- raw_data_file|> read.xlsx("REFERENCE_MARKET")Tidy the Data
Code
market_data <-
market_data |>
as.data.table() |>
_[, TERM := as.IDate(TERM, origin = "1899-12-30")]Code
reference_data <-
reference_data |>
as.data.table() |>
_[, TERM := as.IDate(TERM, origin = "1899-12-30")]Transform the Data
Interpolate the Data
Code
interpolate <- function(data, group) {
market <- group |> unlist() |> unname()
out <-
data |>
_[, reference_data[.SD, on = .(TERM, REF_MARKET)]] |>
_[, VALUE := accumulate_2(VALUE, FACTOR, factor_mult)]
if ((market %in% lookup$REF_MARKET) &&
(!market %in% reference_data$REF_MARKET)) {
reference_data <<-
list(
reference_data,
out |>
_[, !c("REF_MARKET", "FACTOR")] |>
_[, c("REF_MARKET", "VALUE") := .(market, VALUE / shift(VALUE))] |>
_[!is.na(VALUE)] |>
setnames("VALUE", "FACTOR") |>
_[, .(TERM, REF_MARKET, FACTOR)]
) |>
rbindlist()
}
out[, VALUE]
}Code
market_data <-
market_data |>
_[, lookup[.SD, on = .(MARKET)]] |>
_[order(MARKET, TERM)] |>
_[, VALUE := interpolate(.SD, .BY), by = MARKET]Arrange and Pivot the Data
Data Dictionary
Code
metadata <-
market_data |>
`var_label<-`(
list(
TERM = "Date of the market value",
MARKET_1 = "Market 1 value",
MARKET_2 = "Market 2 value",
MARKET_3 = "Market 3 value",
MARKET_4 = "Market 4 value",
MARKET_5 = "Market 5 value",
MARKET_6 = "Market 6 value",
MARKET_7 = "Market 7 value",
MARKET_8 = "Market 8 value"
)
) |>
generate_dictionary(details = "full") |>
convert_list_columns_to_character()Save the Valid Data
Data
Metadata
Visualize the Data
Code
market_data <-
market_data |>
_[, TERM := as.IDate(TERM)] |>
melt(
id.vars = "TERM",
variable.name = "MARKET",
value.name = "VALUE"
) |>
_[, MARKET := gsub("MARKET_", "", MARKET) |> as.integer()]Code
market_data |>
ggplot(
aes(
x = TERM,
y = VALUE,
color = as.factor(MARKET)
)
) +
geom_smooth(
method = "gam",
se = FALSE,
formula = y ~ poly(x, 12)
) +
scale_x_date(
breaks = c(
as.Date("2025-04-01"),
as.Date("2026-01-01"),
as.Date("2027-01-01"),
as.Date("2028-01-01"),
as.Date("2029-01-01")
),
date_labels = "%b %Y"
) +
scale_color_brand_d() +
labs(
x = "Term",
y = "Market Value",
linetype = "Market",
color = "Market",
)Problem 2: Shiny Application
The second part of this assignment—a Shiny application—can be accessed here, with its code repository available here (see the note below).
Please note that the code repositories are private and require access permissions. If you would like access, please contact the author.
How to Cite
To cite this work, please use the following format:
Vartanian, D. (2025). Argus Media: Home assignment — Data scientist, Curves [Report]. https://danielvartan.github.io/argus
A BibTeX entry for LaTeX users is
@techreport{vartanian2025,
title = {Argus Media: Home assignment — Data scientist, Curves},
author = {{Daniel Vartanian}},
year = {2025},
address = {São Paulo},
langid = {en},
url = {https://danielvartan.github.io/argus}
}
License
This content is licensed under CC0 1.0 Universal, placing these materials in the public domain. You may freely copy, modify, distribute, and use this work, even for commercial purposes, without permission or attribution.

