Re: [請益] cross-sectional 與 time-series 的 da …

看板Economics (經濟學)作者cewjlhwj (嗨你好)時間18年前 (2008/05/17 22:43)推噓4(4推 0噓 0→)

留言4則, 3人參與討論串4/6 (看更多)

以下討論的主要問題都是 iid 的假設和 OLS model 因為panel data通常是大N小T the random sampling assumption does allow for temporal correlation. assume random sampling in the cross section dimension. The dependentce in the time series dimension can be entirely unrestricted. 但前提是，大N小T。所以如果sample不是大N小T，就麻煩了，不能 justify 上述的假設如果是N和T數目差不多，目前的討論也不多，wooldridge(2002)在他的書 econometrics analysis of cross section and panel data p.7有介紹相關的文章。就我所知Panel data可能有的問題 1. multicollinearity 這在panel data下照常test，把有問題的變數去掉吧。通常regression model 都會直接 drop(test) perfectly correlated variables 但我的經驗是 near correlated 就不一定了在stata下可以使用coldiag2，參考文章：Belsley(1991), conditioning diagnostics, collinearity and weak data in regression 2. time-series方面如上述的討論，model 通常 allow for temporal correlation Wooldridge(2002) 提出一個 serial correlation in panel data 的檢定他的假設很簡單，很好用。 Drukker, D. M. (2003) 提出這個檢定方式在各種不同假設的模型下perform well。在Stata中已經有人寫出這個程式了 (xtserial) 可以用 robust to arbitrary autocorrelation 的 estimator 估計 autocorrelation 的 data 但這類的kernel estimator(像是單純時間序列的作法, ex: newey-west) the asymptotics rely on the number of periods going off to infinity 又是因為現在的data大部分是大N小T，所以要用要小心，不常用。 3. cross-sectional方面 contemparaneous correlation(不符合 E(x'it uit) =0的假設) 也就是指同一期t下的correlation stata中有以下兩個test xtcsd (for small T large N) xttest2 (for large T small N) 如果發現有問題可能是 3.1 within-group correlation 指不同i可能是處於同一個群體中，所以他們會相關。例如：好幾個不同的人（i)同時是一家公司的員工，則這些i的行為可能會被同一家公司的特別福利、政策、無法觀察到的文化因素而影響實際操作上可以用cluster解決，the asymptotics rely on the number of clusters going off to infinity 很容易做，但clusters要夠多，我看過一個人在stata的討論版說 50 or more being a good rule of thumb 或是調整資料，例如：aggregate 同一個firm下個人的資料，使用each firm作為不同 i，這樣可以解決within-group correlation。但可能會面臨下一個問題。 3.2 spatial correlation 若每個 i 是很大的區域資料，不同 i 之間的變數可能會互相影響。例如：i是指美國不同州的資料，一個州的減稅政策會影響另外一個州的變數。這個實際操作上很難解決，通常忽略。 4. groupwise heteroskedasticity 變異數齊一性 stata中可以用 xttest3 test fixed effect的model是否符合 Ho: homoskedasticity 如果拒絕Ho，用robust的指令，就會有好的estimator。原則上 heteroskedasticity multicollinearity autocorrelation comtemporaneous correlation 在panel data下都有專門的test，理論上和cross-section或time-series時差不多，但panel data有時候要處理更多假設。當然用不同的model(ex:fixed effect or ramdom effect)，對data的假設就不同，可能需要不同的test，估計結果才會比較正確我的作法是先找到統計軟體裡專門的test檢驗data有無上述問題，如果發現問題，再利用robust estimator估計出consistent 和 efficient的估計值建議看看wooldridege(2002)的書，是本關於panel data很好的書，要不然找到test很容易誤解，也不知道該用什麼model解決，用錯了估計的結果就不正確上述的內容和主要說法大部分直接翻譯於wooldridge(2002)的 econometrics analysis of cross section and panel data，以及我在stata討論版蒐集的心得，有些地方（尤其是time-series方面）我還有些疑惑，如果有誤很抱歉。 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.109.230.252 ※ 編輯: cewjlhwj 來自: 140.109.230.252 (05/17 22:53)