Skip to content

Commit

Permalink
Merge pull request #15 from lsms-worldbank/flag-possibly-truncated-va…
Browse files Browse the repository at this point in the history
…r-lbls

Create commands to check variable label length
  • Loading branch information
kbjarkefur authored Jan 16, 2024
2 parents 5e2e61a + 9933a86 commit ce11672
Show file tree
Hide file tree
Showing 11 changed files with 514 additions and 27 deletions.
3 changes: 3 additions & 0 deletions run-adodown-util.do
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,16 @@
global clone "C:\Users\wb393438\stata_funs\labeller"
}


/*
ad_setup, adf("${clone}") ///
name("labeller") ///
description("A packge with utility commands related to lables. Particularly, but not exclusively, in relation to data sets collected using SurveySolutions.") ///
author("LSMS Worldbank") ///
contact("[email protected]") ///
url("https://github.com/lsms-worldbank/labeller") ///
github
*/

ad_sthlp , adf("${clone}")

Expand Down
28 changes: 28 additions & 0 deletions src/ado/lbl_assert_no_long_varlbl.ado
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
cap program drop lbl_assert_no_long_varlbl
program define lbl_assert_no_long_varlbl, rclass

version 14

syntax [varlist], [MAXlen(integer 80)]

qui {

* look for variables whose labels >= max length
lbl_list_long_varlbl `varlist', maxlen(`maxlen')
local any_max_len = (`r(count_matches)' > 0)
local which_max_len "`r(varlist)'"

* return results
return local varlist "`which_max_len'"
return local count_matches "`any_max_len'"

* if any variables with long labels found, message and error
if (`any_max_len' == 1) {
di as error "{pstd}Variables found whose labels are >= `maxlen' characters:{p_end}",
di as error "{phang}`which_max_len'{p_end}"
error 9
}

}

end
48 changes: 48 additions & 0 deletions src/ado/lbl_list_long_varlbl.ado
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
*! version XX XXXXXXXXX ADAUTHORNAME ADCONTACTINFO

cap program drop lbl_list_long_varlbl
program define lbl_list_long_varlbl, rclass

version 14

syntax [varlist], [MAXlen(integer 80)]

qui {

* get list of all variables
ds `varlist', has(varlabel)
local vars = r(varlist)

* initialize list of variables with labels that are too long
local vars_lbl_too_long ""

* populate list of variables
foreach var of local vars {

* extract variable label
local var_lbl : variable label `var'

* if length is greater than or equal to max, put in list
if (`: ustrlen local var_lbl' >= `maxlen') {
local vars_lbl_too_long "`vars_lbl_too_long' `var'"
}
}

* compute the number of matches
local n_matches : list sizeof vars_lbl_too_long

* return the varlist and count of matches
return local varlist "`vars_lbl_too_long'"
return local count_matches "`n_matches'"

* message about outcome
if (`n_matches' >= 1) {
noi di as result "{pstd}Variables with at least `maxlen' characters found (`n_matches' variables) :{p_end}"
noi di as result "{phang}`vars_lbl_too_long'{p_end}"
}
else if (`n_matches' == 0) {
noi di as result "{pstd}No variables found with a label >= `maxlen' characters found{p_end}"
}
}

end
16 changes: 9 additions & 7 deletions src/dev/run-adodown-util.do
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
* Kristoffer's root path
if "`c(username)'" == "wb462869" {
global clone "C:/Users/wb462869/github/labeller"
}
else if "`c(username)'" == "wb393438" {
global clone "C:\Users\wb393438\stata_funs\labeller"
}
* Kristoffer's root path
if "`c(username)'" == "wb462869" {
global clone "C:/Users/wb462869/github/labeller"
}
* Fill in your root path here
if "`c(username)'" == "wb393438" {
global clone "C:\Users\wb393438\stata_funs\labeller"
}


// ad_setup, adf("${clone}") ///
// name("labeller") ///
Expand Down
4 changes: 4 additions & 0 deletions src/labeller.pkg
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,13 @@ d
d Distribution-Date: 20231109
d
*** adofiles
f ado/lbl_assert_no_long_varlbl.ado
f ado/lbl_list_long_varlbl.ado
f ado/labeller.ado

*** helpfiles
f sthlp/lbl_assert_no_long_varlbl.sthlp
f sthlp/lbl_list_long_varlbl.sthlp
f sthlp/labeller.sthlp

*** ancillaryfiles
Expand Down
59 changes: 59 additions & 0 deletions src/mdhlp/lbl_assert_no_long_varlbl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Title

__lbl_assert_no_long_varlbl__ - Assert that there is no variable in memory whose variable length exceeds the desired character length.

# Syntax

__lbl_assert_no_long_varlbl__ , __**max**len__(_integer_)

| _options_ | Description |
|-----------|-------------|
| __**max**len__(_integer_) | Maximum character length allowed.

# Description

This command assert that there is no variable in memory whose variable label length exceeds the desired character length.

By default, the command take the maximum length to be Stata's maximum length for labels: 80 characters. If desired, the command can specify an alternative length through the __**max**len__(_integer_) option.

If there is at least one variable whose length exceeds the maximum length, the command will return an error and list the variables whose variable labels are too long.

# Options

__**max**len__(_integer_) sets the maximum length of variable labels.

# Examples

```
* create set of variables
gen var1 = .
gen var2 = .
gen var3 = .
gen var4 = .
gen var5 = .
* apply variables
label variable var1 "Short label"
label variable var2 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
label variable var3 "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
label variable var4 "Another short label"
label variable var5 "你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好"
* assert no variables with labels longer than default max length (80 characters)
lbl_assert_no_long_varlbl
* assert no variables with labels longer than user-specified max length (80 characters)
lbl_assert_no_long_varlbl, maxlen(12)
```

# Feedback, bug reports and contributions

Read more about the commands in this package at https://github.com/lsms-worldbank/labeller.

Please provide any feedback by opening an issue at https://github.com/lsms-worldbank/labeller/issues.

PRs with suggestions for improvements are also greatly appreciated.

# Authors

LSMS Team, The World Bank [email protected]
63 changes: 63 additions & 0 deletions src/mdhlp/lbl_list_long_varlbl.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Title

__lbl_list_long_varlbl__ - List variables whose variable label is longer than the desired character length.

# Syntax

__lbl_list_long_varlbl__ , __**max**len__(_integer_)

| _options_ | Description |
|-----------|-------------|
| __**max**len__(_integer_) | Maximum character length allowed.

# Description

When variable labels are too long, Stata truncates them to the first 80 characters of the string provided. This situation might arise for data exported from Survey Solutions. If provided, Survey Solutions uses the Variable label field in Designer, whose length is capped at 80 characters (in line with Stata's limits). If no label is specified in that field, Survey Solutions uses the Question text field, whose length maximum length is 2,000 characters. In the latter case, Survey Solutions uses the first 80 characters of the question text as its label.

To detect possible cases of truncation, data producers can check the length of each variable label individually (e.g., `local var_lbl : variable label my_var; local lbl_len : ustrlen local var_lbl`).

However, there is no base Stata operation for doing so in batch.

This command provides just such a tool.

By default, the command take the maximum length to be Stata's maximum length for labels: 80 characters. If desired, the command can specify an alternative length through the __**max**len__(_integer_) option.

# Options

__**max**len__(_integer_) sets the maximum length of variable labels, beyond which a variable is listed by this command.

# Examples

```
* create set of variables
gen var1 = .
gen var2 = .
gen var3 = .
gen var4 = .
gen var5 = .
* apply variables
label variable var1 "Short label"
label variable var2 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
label variable var3 "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
label variable var4 "Another short label"
label variable var5 "你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好"
* list variables with longer than the default max length (80 characters)
lbl_list_long_varlbl
* list variables with longer than the user-specified max length
lbl_list_long_varlbl, maxlen(12)
```

# Feedback, bug reports and contributions

Read more about the commands in this package at https://github.com/lsms-worldbank/labeller.

Please provide any feedback by opening an issue at https://github.com/lsms-worldbank/labeller/issues.

PRs with suggestions for improvements are also greatly appreciated.

# Authors

LSMS Team, The World Bank [email protected]
32 changes: 12 additions & 20 deletions src/sthlp/labeller.sthlp
Original file line number Diff line number Diff line change
Expand Up @@ -6,40 +6,32 @@

{title:Title}

{phang}{bf:labeller} - Package command with utilities for the rest of the package
{phang}{bf:labeller} - This command is used for short description.
{p_end}

{title:Syntax}

{phang}{bf:labeller}
{phang}{bf:labeller} , {bf:{ul:opt}ion1}({it:string})
{p_end}

{title:Description}
{synoptset 15}{...}
{synopthdr:options}
{synoptline}
{synopt: {bf:{ul:opt}ion1}({it:string})}Short description of option1{p_end}
{synoptline}

{pstd}This command only returns the version number and version data to the user.
This command has little application for the user.
For packages installed on SSC it is important that a there is a command
in the package that has the same name as the package.
That is the main purpose of this command.
{p_end}
{title:Description}

{title:Options}

{pstd}This command has no options.
{p_end}

{title:Feedback, bug reports and contributions}

{pstd}Read more about the commands in this package at https://github.com/lsms-worldbank/labeller.
{pstd}{bf:{ul:opt}ion1}({it:string}) is used for xyz. Longer description (paragraph length) of all options, their intended use case and best practices related to them.
{p_end}

{pstd}Please provide any feed back by opening and issue at https://github.com/lsms-worldbank/labeller/issues.
{p_end}
{title:Examples}

{pstd}PRs with suggestions for improvements are also greatly appreciated.
{p_end}
{title:Feedback, bug reports and contributions}

{title:Authors}

{pstd}LSMS Team, The World Bank [email protected]
{pstd}TODO: Populate this field from .pkg file
{p_end}
75 changes: 75 additions & 0 deletions src/sthlp/lbl_assert_no_long_varlbl.sthlp
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
{smcl}
{* 01 Jan 1960}{...}
{hline}
{pstd}help file for {hi:lbl_assert_no_long_varlbl}{p_end}
{hline}

{title:Title}

{phang}{bf:lbl_assert_no_long_varlbl} - Assert that there is no variable in memory whose variable length exceeds the desired character length.
{p_end}

{title:Syntax}

{phang}{bf:lbl_assert_no_long_varlbl} , {bf:{ul:max}len}({it:integer})
{p_end}

{synoptset 15}{...}
{synopthdr:options}
{synoptline}
{synopt: {bf:{ul:max}len}({it:integer})}Maximum character length allowed.{p_end}
{synoptline}

{title:Description}

{pstd}This command assert that there is no variable in memory whose variable label length exceeds the desired character length.
{p_end}

{pstd}By default, the command take the maximum length to be Stata{c 39}s maximum length for labels: 80 characters. If desired, the command can specify an alternative length through the {bf:{ul:max}len}({it:integer}) option.
{p_end}

{pstd}If there is at least one variable whose length exceeds the maximum length, the command will return an error and list the variables whose variable labels are too long.
{p_end}

{title:Options}

{pstd}{bf:{ul:max}len}({it:integer}) sets the maximum length of variable labels.
{p_end}

{title:Examples}

{input}{space 8}* create set of variables
{space 8}gen var1 = .
{space 8}gen var2 = .
{space 8}gen var3 = .
{space 8}gen var4 = .
{space 8}gen var5 = .
{space 8}
{space 8}* apply variables
{space 8}label variable var1 "Short label"
{space 8}label variable var2 "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
{space 8}label variable var3 "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
{space 8}label variable var4 "Another short label"
{space 8}label variable var5 "你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好你好"
{space 8}
{space 8}* assert no variables with labels longer than default max length (80 characters)
{space 8}lbl_assert_no_long_varlbl
{space 8}
{space 8}* assert no variables with labels longer than user-specified max length (80 characters)
{space 8}lbl_assert_no_long_varlbl, maxlen(12)
{text}
{title:Feedback, bug reports and contributions}

{pstd}Read more about the commands in this package at https://github.com/lsms-worldbank/labeller.
{p_end}

{pstd}Please provide any feedback by opening an issue at https://github.com/lsms-worldbank/labeller/issues.
{p_end}

{pstd}PRs with suggestions for improvements are also greatly appreciated.
{p_end}

{title:Authors}

{pstd}LSMS Team, The World Bank [email protected]
{p_end}
Loading

0 comments on commit ce11672

Please sign in to comment.