Home › Forums › CPF code developement › Troubleshooting
- This topic has 12 replies, 1 voice, and was last updated 8 months, 3 weeks ago by Konrad Turek.
-
AuthorPosts
-
-
GertGuest
Great, important, and very challenging work. Congratulations!
I am giving it a test ride at the moment and will post if I run into any troubles. This way this thread might be used as a troubleshooter for others who want to use the code.
-
GertGuest
#1
Progress:
I ran 1_Folder_setup.do and copied the data (that I have) to the respective folders. Instructions on PSID-files are a bit unclear at the moment. -
GertGuest
#2 Running: 2_CPF_Main_Fill_and_run.do
line 74: global psid_ind_er “${psid_in}\pack\IND2017ER.txt”
Problem: the global “psid_in” is not yet defined
my temporary solution: replace with full file path of IND2017ER.txt (as I referred to in the previous post, it was unclear that this file needed to be unpacked already)line 95: do “${your_dir}\11_CPF_in_syntax\00_master\_10_Directories_global.do”
file C:\CPF\11_CPF_in_syntax\00_master\_10_Directories_global.do not found
Minor Problem: it was a bit unclear that the CPF-Code-main-structure needed to be copied into the 11_CPF_in_syntax folder. (It says so on the website, but the folder is only created after running 1_Folder_setup.do)
line 112: do “${your_dir}\11_CPF_in_syntax\00_master\_11_Run_cntr_do_files.do”
Code stops running somewhere in 11_CPF_in_syntax.do , see next post.
-
GertGuest
#3 HILDA
Stata output:
### CPF: 01_HILDA ###########################################
Preparing CPF datafile for 01_HILDA based on do-files in:
C:\CPF\11_CPF_in_syntax\01_HILDA
->> Running do-files:
au_01_Prepare_data.do
au_02_1_Harmonize (p1- cnef).do
au_02_2_Harmonize (p2- combined).do
au_02_3_Combine_p1p2.do
au_03_Sample_selection.do
->> CPF version 1.0 of 01_HILDA saved
no variables defined——–>
I ran _10_directories_global.do separately, which fixed the problem -
GertGuest
*******************************
Intermediary update
*******************************It appears that the troubles I’m having so far all have to do with the file structure. For both HILDA and KLIPS, the data files need to be placed in a nonexistent sub-folder (which I found by running the separate syntax files) of the \data folder. I have created these by hand and this appears to run smoothly for now.
-
Konrad TurekKeymaster
Dear Gert, thank you for the feedback! I will take a look at this and update the files.
-
GertGuest
********
UPDATE, I have started over (and finished). What follows is a list of the problems I ran into running every do-file separately. When running 2_CPF_Main__Fill_and_run.do it is unclear in the output window why/where the code stops running at times and harder to control if everything is going smoothly (this is also mentioned in the do-files themselves, so this is not a comment, just an observation).
Some of the problems I encountered are due to what might be problems with the syntax, while others are just my own manual errors, but I put them here anyway in case someone encounters the same problem.****Progression:
2_CPF_Main__Fill_and_run.do (only ran step A)
_10_directories_global.do**********
**********
HILDA:
**********
**********au_01_Prepare_data.do
***error*** “no room to ad more variables”
had to manually set maxvar to 1000 (because I’m running the files separately)
—-> no further problems
au_02_1_Harmonize(p1-cnef).do
—-> no problems
au_2_2_Harmonize(p2-combined).do
—-> no problems
au_02_3_Combine_p1p2.do
—-> no problems
au_03_Sample_selection.do
—-> no problems
********
********
KLIPS
********
********ko_01_Prepare_data.do
***error*** file not found
had to run both 2_CPF_Main__Fill_and_run.do (only ran step A) & _10_directories_global.do again
—-> no further problems
ko_02_Harmonize.do
—-> no problems
ko_03_Sample_selection.do
—-> no problems
********
********
KLIPS
********
********
us_01_1_Create_psid_crossy_ind.do—-> no problems
us_01_2_Create_waves_psidtools.do
—-> no problems
us_01_3_Get_vars.do
***error*** file C:\CPF\02_Country_Data_Origin\03_PSID\data\Cross-year Individual 1968-2017\psid_crossy_ind.dta not
foundThis indicates that there was a problem with running us_01_1_Create_psid_crossy_ind.do
As it turns out: us_01_1_Create_psid_crossy_ind.do does not run the final two commands properly:
gen pid=(ER30001*1000)+ER30002
&
save “${psid_in}\psid_crossy_ind.dta”, replacethese are not executed.
From the stata-manual I gathered the following:
Stata has an optional ability to allow its lines to continue up to semicolons. In Stata, you code
#delimit ;
and the delimiter is changed to semicolon until your do-file or ado-file ends, or until you code
#delimit cr—-> solution: I inserted #delimit cr in line 3074
ran us_01_1_Create_psid_crossy_ind.do again without problems
ran us_01_3_Get_vars.do (had to run “drop program combvars” first)
—-> no more problems
us_02_Harmonize.do (this also runs the additional us_02add* do-files)
—-> no problems
us_03_Sample_selection.do
—-> no problems
********
********
RLMS
********
********
+++
tangently related (not-CPF): I probably missed the instructions somewhere, but I wasn’t able to open the RLMS datafiles when downloading them from dataverse,
it turns out they were zip-files, but the extension was missing, so you have to rename them to *.zip by hand.
+++ru_01_Prepare_data.do
—-> no problems
ru_02_Harmonize.do
—-> no problems
ru_03_Sample_selection.do
—-> no problems
********
********
SHP
********
********
no issues********
********
SOEP
********
********ge_01_Prepare_data.do
***error*** file C:\CPF\02_Country_Data_Origin\06_SOEP\Data\soep.v34\ppathl.dta not found
additional folder soep.v34 had to be created (in my case I have v34 instead of the latest v35)
***error*** op. sys. refuses to provide memory
I’ve had this problem before, pl.dta is a humongous file and my laptop simply won’t open it.
Made an ad hoc workaround. Although there’s a 2 variable names that are different in v34,
so the “keep” command needs to be adjusted somewhat (plb0282_h –> plb0282; plg0269–>plg0269_v1; deleted plg0269_v2 from syntax)ge_02_harmonize.do
***error***
label e11105 not found
—> might be due to my specific datafile, removed from syntax and ran it again without problems***error***
variable plg0269_v2 not found
—> see previous do-file. Variable not in dataset –> deleted this from syntax along with temp_train2
—-> no further problems (probably runs fine with v.35)ge_03_Sample_selection.do
—-> no problems
********
********
BHPS-UKHLS
********
********The entire BHPS-UKHLS is by far the most challenging one, not in the least because it requires a lot of CPU-power to be able to run it.
uk_02_Harmonize.do
***error*** ‘age’ already defined
—-> Age was already included as a variable in the BHPS, so a recode with ,gen() doesn’t work.
I solved it by replacing this code with:replace age=age_dv if age_dv>=0 & age==.
replace age=-1 if age_dv==-9 & age==.—> no more problems (with the syntax, at least)
*************
_12_Append.do
—-> no issues_13_Labels.do
—-> no issues*********************
Final thoughts,Given the size of the undertaking and the sheer number of lines of syntax that need to be run, there are remarkably few actual problems with the syntax itself. A very impressive feat indeed, and my congratulations and much appreciation to the authors. I believe the main issue for widespread use of this syntax is, however, the required processing power to run some of the syntax (especially when appending SOEP and BHPS/UKHLS). This could possibly be avoided if one wrote a “keep”-syntax removing most of the unnecessary variables from the original datasets.
-
Konrad TurekKeymaster
Thank you Gert, this is a great contribution! I’ll incorporate your suggestion to the code and improve instructions. This will be posted as an updated release of the code with credits for you, Gert.
I am also very happy for the feedback because this is the best way to really evaluate and correct such a complex syntax.- This reply was modified 3 years, 11 months ago by Konrad Turek.
-
Konrad TurekKeymaster
I checked it all and introduced several minor changes. You have well figured out how it all works! Doing it all step-by-step is the best idea. I included the changes in version 1.1 of the code. It seems to be running smoothly. The Manual was also updated with some additional instructions (ver. 1.2).
–About SOEP version–
Indeed, the CPF v.1.0 was prepared for SOEP ver. 35. So if you have an older version, you first need to change: “
2_CPF_Main__Fill_and_run.do” –> A.1. global soep_w “35” –> change to 34 or older.
However, as you noticed, the names of variables have been changing from wave to wave, and you need to take care of this yourself.
Fortunately, for SOEP it’s not a major concern.–About the size of files and computer-power limitations–
Yes, UKHLS or SOEP have some very large files. I decided not to drop too many variables at the beginning to provide an opportunity to add some variables to the CPF. However, when your computer says “NO”, it’s good to keep only the necessary variables at the end of “01_Prepare_data.do”.Thanks once more.
- This reply was modified 3 years, 11 months ago by Konrad Turek.
- This reply was modified 3 years, 11 months ago by Konrad Turek.
-
MichaelGuest
Dear Konrad and colleagues,
thanks so much for providing this invaluable data harmonization to the research community!
I’m using a non-Windows machine and there is an issue I’ve encountered before for file paths. Using \ works only in Stata run on Windows. So whenever there is reference to a path or file there’s an error (starting already when setting up the file structure).
From what I know, however, using a / instead works on both Windows and non-Windows machines. So I would offer to exchange the respective \ for / in all files and send them to you.
Of course, if there is a simpler way of fixing this issue, I’m eager to learn about it.
Best,
Michael-
Konrad TurekKeymaster
Hi Michael,
great idea! Thank you for noticing it. Indeed, a small but nice improvement. As this is a minor technical issue, it will be easy for me to include it in the updated version myself without replacing files now. However, if you want to do it, I suggest using GitHub and editing the existing code. But as I can see, Github does not support automatic replacement; thus it may be much faster to do it for me in Stata with automatic replacement (and not miss any “\”!).
Good luck with the analysis!
Konrad
-
Varun SatishGuest
Hi Konrad,
I encountered this issue too. I am doing what you suggested (using automatic replacement in Stata) to get my code working.
If I go ahead and fix all of these, assuming I fix the issue properly, what is the best way for me to pay it forward? Should I fork + commit + push and then initiate a pull request on Github?
-
-
Konrad TurekKeymaster
Hi Varun Satish,
The issues was solved. There was another pull request on this (changing slashes) and I have just merged it with the core file. However, thank you for the communication and willingness to engage! And indeed, a fork-push way should be the best in such cases.
Good luck with the work!
Konrad
-
-
AuthorPosts