AVM Yatra: How to Build a Better Automated Valuation Model (AVM)

Most residential AVMs are market models. Therefore, a representative sales sample is used to develop the model and then applied to the unsold population to generate model values for all unsold properties (assuming non-missing data, etc.). While it sounds like a simple process, it does require significant econometric knowledge to develop such a model. A smart modeler figures out how to take the adventure out of the modeling process and standardize it, thus stabilizing values from year to year. For example, once the process is standardized, leading to efficient values, smart modelers continue to introduce new sales but not new data variables, and by doing so, they take the two most essential features of valuation modeling – explainability and decomposability – out of the environment. Therefore, to standardize the modeling process and bring stability in values, the following sequential steps are needed:

1. Sales Sample: Assuming it's an "Application" AVM (meaning the model will be applied on to the population the sample is derived from), the modeler starts with a representative sales sample. While testing the sampling properties, all three variable categories – continuous (Bldg SF, Lot SF, Age, etc.), categorical (Bldg Style, Exterior Wall, Grade, Condition, etc.) and fixed location (Town, School District, Assessing District, etc.) – are meaningfully evaluated. A good representative sample is a must, lacking which the model would be inaccurate, producing faulty values. That is why smart modelers consider this as a make or break step.

2. Time Adjustment: Depending on the sales liquidity, a market AVM requires 12 to 24 months' worth of arms-length sales. Since price levels go up and down, monthly time adjustments are needed. When the time series is extended (24+ months), quarterly adjustments are more useful as they are smoother and more market-efficient (reduces inconsistencies arising from using the "closing" dates rather than the contract dates). Unlike sales analysis, time in AVM is a surface correction, so it's better done at the "outer" level; for instance, while modeling a county, it's better to keep time adjustments at the county level, without drilling down to the sub-markets. Also, the price-related (FHA, Jumbo, etc.) corrections must be avoided.

3. Hold-out Sample: Once the sales are time adjusted, the sales sample must be split up between the modeling sample (e.g., 80%) and hold-out sample (20%). Both sub-samples – modeling and hold-out – must have very similar attributes to the original sales sample representing the population. It's good to use one of the software-provided sampling procedures that help reduce the judgment. The model is developed with the modeling sample and then tested on to the hold-out. While the results will not be exact, they must be similar (very close).

4. Multi-stage Regression: Since the sample comprises three different types of variables, it's prudent to develop a three-stage regression model, piggy-backing the output from the prior stage. Considering that the contributions of each variable are generally non-linear, the log-linear model is more effective. If the dataset is comprehensive with many categorical variables, a correlation matrix is needed to determine any multi-collinearity (if specific variables are highly correlated), which often leads to a reduced number of variables. Conversely, if the number of variables is limited, the t-stat is generally an excellent metric to control the variables' significance levels. If a variable's t-stat is less than 2, it is typically non-contributing.

5. Multi-cycle Regression: To make the model efficient, it is a practice to develop it in three cycles. The sales ratios (1st Cycle AVM values to Adjusted Sale Prices) from the first cycle will help define and remove outliers. Then, the outlier-free sample will lead to cycle two at the end of the first cycle, generating pre-residual AVM values and ratios. The end of cycle two helps make residual corrections, which is a very time-taking and iterative process. Once all of the residual revisions are finalized, the third and final cycle is run, which generates the model and produces the final values. Since a smart modeler knows how to systematically and methodically develop a model, it's generally far more efficient than those created ad hoc or in one single step.

6. Residual Analysis: At the end of the second cycle, the residuals need to be worked on. The fact that some sales ratios are clustered around 70 while some are around 140 does not mean the AVM values are wrong or inefficient because the AVM values are being compared with sales, which, individually, are all judgment calls. For instance, a prospective homebuyer bent on purchasing a pink house would in all likelihood overpay, while an SFR Rental company or an aggressive flipper or investor would buy a group of properties, perhaps paying way below the market (although some of those properties would be coded as arms-length), etc. In other words, the model is essentially fixing those anomalies. Nonetheless, as indicated before, residual analysis and correction is an arduous but necessary optimization task.

7. Independent Validation: Once the draft model is ready, it's good to have a comparables-based sample (where the sales ratios are either below 80 or above 120) worked up by an experienced appraiser or self-directed comps-based valuation site. In working up the sample, the adjustment matrix needs to be set up in line with model coefficients; for example, if a Coastal town or county is being modeled, the living area size adjustment factors would be significantly higher than their Midwest counterparts, etc. Similarly, the time and valuation dates must adequately follow the model as well. If the model shows a 12% annual appreciation in the area and the values are being forecasted for a future date, the sample must be set up accordingly; otherwise, the AVM and the validation sample would be apples to oranges. Ideally, the model values should be within 10-15% of the comps'.

8. Hold-out Sample Testing: Once the draft model is ready, and it needs to be tested on the hold-out sample kept aside at the beginning of the modeling process. Upon application of the model on to the hold-out and the first set of hold-out sales ratios are generated, the outliers need to be removed using the same ratio range as in the modeling sample. The hold-out application results (hold-out sales ratio stats – percentile distribution, COD/COV, PRD, etc.) must be very similar to those of the primary model. If they are at variance, the modeler must start investigating. Here is where the investigation should start: The modeler must immediately ensure that the final version from the 3rd cycle is being applied. Often, the newcomers are confused about the various cycles and use a model from the interim cycle.

9. Applying on to Population: One must remember that the whole AVM exercise is to develop a model from the sold population (on average, 5% homes sell annually) to value the mutually exclusive 95% unsold population. Of course, when the model is applied, it is used on the universe (meaning both sold and unsold properties). Here is why: Since the sold population is the subset of the universe, those model values will be regenerated along with the unsold, thereby forming a reasonable basis for a successful test application. In other words, as soon as the model is applied and values generated, a small matching extract from the modeling sample should be compared, and if the values match, subject to some rounding differences, the application would be in the right direction.

The need for AVM is growing by leaps and bounds. Assessors, Banks, Mortgage Companies, Mortgage servicers, REITs, Hedge funds, SFR Rentals, Tax Appeal houses, and Law firms, etc. are all big users of the certified AVM values.

-Sid Som, MBA, MIM

homequant@gmail.com

Link to the AVM Book

AVM Yatra

Sunday, September 6, 2020

How to Build a Better Automated Valuation Model (AVM)

No comments:

Post a Comment

Featured Post

Subscription-based Data Analysis and Modeling