Tuesday, October 27, 2020

AVM is a Market Solution, Comparable Sales Analysis isn't (Part 1 of 2)

 

(Click on the image to enlarge)

In developing the analysis, the same sales population – derived from a single Zip Code – has been used across all three graphs (of course, one may use other fixed locations like Census Tract, School District, etc.). Considering all sales originated in the same Zip, it helped minimize the impact of location (of course, one can never make location totally irrelevant as each block has a different appeal).

The above graph shows the noisy relationship between the uncorrected (raw) Sale Price and Bldg Size (Heated Living Area). The reason is straightforward: Each sale is directly related to a buyer's judgment, causing a high level of subjectivity; for instance, the buyers are paying between $100K and $250K for a 1,500 SF home. While the investors would target the lower end of the range, the informed buyers would be in the middle, and the uninformed buyers (someone bent on buying a pink house!) would succumb to the higher end of the range. Therefore, the R-squared is extremely low (0.189), thus explaining very little of the variations in sale prices.
  



The Regression Value-1 graph proves that even a rudimentary regression model (with only three independent variables - Land SF, Bldg SF, and Bldg Age) can produce a decent market solution. The fit is significantly tighter, especially at the long end of the curve. The R-squared jumps from 0.19 to 0.91, accounting for 90% of the variations in sale prices. But this model has bi-modal issues between 1,000 and 2,200 SF as the regression values are forked. It's important to note that such stacked values must be investigated for the underlying reasons. One of the simple ways to identify the issue is to scatter the normalized regression values against the other independent variables and look for possible explanations.





The above investigation guides to the solution. As the normalized regression values from the first model were scattered against the Bldg Age variable (above graph), it was evident that many buyers were paying a premium for the younger homes, causing the stack. A sizeable portion of those buyers was willing to pay over $130/SF for the younger homes, while very few offered such a premium for the older stock. More precisely, none paid over $160/SF for the older stock.



So the Bldg Age variable had to be transformed from continuous to binary (younger homes vs. the rest). The regression model's re-run with the transformed Bldg Age produced the above (Regression Value-2) graph. Consequently, the value fork has disappeared, translating to a much tighter fit, with improved R-squared, lower intercept, and a steeper slope approaching 45 degrees.  

Stay safe!

-Sid Som
homequant@gmail.com


No comments:

Post a Comment