Thursday, November 7, 2019

9 Issues that make an AVM Inefficient, often Ineffective

The difference between a smart modeler and an average modeler is that the former quickly figures out and intelligently avoids the (valuation) modeling that tends to make the process inefficient, if not ineffective. That is why establishing a quality control or a modeling review process is essential to avoid having to deal with significant damage control down the line. A handful of such poor practices may raise serious concerns for the quality control reviewers. Here are some of those poor practices.

1. Time Adjustment: Some practitioners use the number of months since sale (NMSS) as an independent variable to ascertain the rate of growth (+/-) in the targeted price level, paving the way for the time coefficient to adjust the modeling sale prices, which then serves as the dependent variable in the regression equation. While this is acceptable as a lead-up regression to generate the primary dependent variable, it is unacceptable as an independent variable going into the regression equation. The reason is simple: since NMSS would be missing in the unsold population (the model would eventually be applied to), resulting in the application to fail. 

2. Sales GIS: Testing the Sales GIS representativeness is not an easy proposition, forcing many practitioners to skip this sampling test. Sales GIS is often a function of the market dynamics, deviating from the Population GIS. Therefore, an untested Sales GIS paves the way for an inefficient AVM. That is why many practitioners tend to use "fixed neighborhoods" in the modeling process as they are more stable and well-accepted, meaning they do not succumb to the short-term market swings and are generally liquid enough to help test the representativeness of the modeling sample.

3. Chasing Trophies: An AVM is not meant for the entire population. Trying to achieve a maximum of two sigma solution (95%) is more meaningful than the whole population, leaving out the admixture of the utterly unattainable 5%, including the limited number of trophy properties and large mansions. The reason is apparent: The paucity of such sales. Therefore, the smart modelers remove them from the modeling spectrum altogether, i.e., they do not let any of those sales enter the modeling sample, nor do they apply the model onto that subset.

4. Chasing Tiny Bungalows: This is the flip-side of dealing with the trophy properties. There is a tiny waterfront (in fact, many of them are prime oceanfront, e.g., Point Lookout, Atlantic shore Long Island, NY) bungalows all around the country, but this subset represents the land values primarily. Once they are sold, they generally take a completely different form, often as multifamily properties with re-zoning. Therefore, as long as the existing improvement is sound, they must be hand-worked by the appraisers. Letting them into the modeling process would be a red flag for the quality control reviewers.

5. Combining 2, 3, and 4-Family with SFRs: Considering this sub-class of residential properties is mostly the income-producing, the smart modelers know it's imprudent to a group and model them the single-family residences (SFR). New modelers often mistake combining them with SFRS as they are part of the same tax class in individual states or sharing the same mortgage category. Of course, the properties within this sub-group are mostly transacted using comparable sales to be market modeled, but as an independent and mutually exclusive (of SFRs) group. Mother-and-Daughter, a relatively common form of set up in big cities, is not a technical 2-family to be modeled with the SFRs.

6. Synthetic Variables: Many modelers who come from the non-quantitative background become obsessed with better modeling stats, allowing irrational or unexplainable synthetic variables like (X * Y) ^ Z into the equation. Grated, such variables may enhance the model's favorable stats but reduces its explainability and decomposability, and therefore, it's overall utility! On the other hand, the modelers with sound quantitative background realize from the get-go that the use of such variables is nothing but sowing the seeds for inefficient modeling, knowing very well they could never explain the underlying market economics.

7. Untested Models: As explained in the previous chapter, it's always a prudent practice to test the draft model on to a mutually exclusive hold-out sample before being applied on to the population. The hold-out sample test must produce very similar results, both before and after the outliers, as in the draft model, ensuring that the real model (not an interim version) is being applied. Caution: Since the hold-out sample is part of the original sales sample and therefore comprises sale dates, it will NOT detect the NMISS issue (# 1 above). Again, this is a critical step, the absence of which is tantamount to adventuresome modeling and could be costly at the end.

8. Sales Complex: Any part of the sales complex, directly or indirectly (e.g., ASP/SF, etc.), must not be used as independent variables in a regression equation as it violates the basic assumption of multiple regression analysis. NMISS is a classic case of such violation of the standard modeling practice. To reiterate, when any part of the sales complex is introduced on the independent side of the regression equation, it could not be applied to the unsold population as it would lack the sales attributes. Of course, this is one of the first rule violations any qualified quality control reviewer would look for.

9. Lack of Value Optimization: Since the final quality of the vast majority of the CAMA models is primarily error-based (usually the last model's Coefficient of Dispersion/COD, Price Related Differentials/PRD, etc.), there is hardly any practice of optimizing the final values from the regression models. In other words, those models are not subjected to the real test of optimality of the solution; for example, the CAMA modeler can hardly answer if their model COD of 8 is better than the model COD of 10. In individual events, the COD of 8 could be a post-optimal solution despite being a better error stat, while the COD of 10 could have been the optimal solution. Therefore, the final regression values need to be optimized via Linear/Non-linear Programming.

To conclude, modelers must remember to get into the habit of producing industry-standard models, without getting carried away by the quality stats. Case in point: A model with a conforming COD of 12 could receive a much better quality control score than its counterpart with a COD of 9.  

-Sid Som, MBA, MIM
President, Homequant, Inc.


How to Build a Better Automated Valuation Model (AVM)

Most residential AVMs are market models. Therefore, a representative sales sample is used to develop the model and then applied to the unsold population to generate model values for all unsold properties (assuming non-missing data, etc.). While it sounds like a simple process, it does require significant econometric knowledge to develop such a model. A smart modeler figures out how to take the adventure out of the modeling process and standardize it, thus stabilizing values from year to year. For example, once the process is standardized, leading to efficient values, smart modelers continue to introduce new sales but not new data variables, and by doing so, they take the two most essential features of valuation modeling – explainability and decomposability – out of the environment. Therefore, to standardize the modeling process and bring stability in values, the following sequential steps are needed:

1. Sales Sample: Assuming it's an "Application" AVM (meaning the model will be applied on to the population the sample is derived from), the modeler starts with a representative sales sample. While testing the sampling properties, all three variable categories – continuous (Bldg SF, Lot SF, Age, etc.), categorical (Bldg Style, Exterior Wall, Grade, Condition, etc.) and fixed location (Town, School District, Assessing District, etc.) – are meaningfully evaluated. A good representative sample is a must, lacking which the model would be inaccurate, producing faulty values. That is why smart modelers consider this as a make or break step.

2. Time Adjustment: Depending on the sales liquidity, a market AVM requires 12 to 24 months' worth of arms-length sales. Since price levels go up and down, monthly time adjustments are needed. When the time series is extended (24+ months), quarterly adjustments are more useful as they are smoother and more market-efficient (reduces inconsistencies arising from using the "closing" dates rather than the contract dates). Unlike sales analysis, time in AVM is a surface correction, so it's better done at the "outer" level; for instance, while modeling a county, it's better to keep time adjustments at the county level, without drilling down to the sub-markets. Also, the price-related (FHA, Jumbo, etc.) corrections must be avoided.

·  3. Hold-out Sample: Once the sales are time adjusted, the sales sample must be split up between the modeling sample (e.g., 80%) and hold-out sample (20%). Both sub-samples – modeling and hold-out – must have very similar attributes to the original sales sample representing the population. It's good to use one of the software-provided sampling procedures that help reduce the judgment. The model is developed with the modeling sample and then tested on to the hold-out. While the results will not be exact, they must be similar (very close).

·  4. Multi-stage Regression: Since the sample comprises three different types of variables, it's prudent to develop a three-stage regression model, piggy-backing the output from the prior stage. Considering that the contributions of each variable are generally non-linear, the log-linear model is more effective. If the dataset is comprehensive with many categorical variables, a correlation matrix is needed to determine any multi-collinearity (if specific variables are highly correlated), which often leads to a reduced number of variables. Conversely, if the number of variables is limited, the t-stat is generally an excellent metric to control the variables' significance levels. If a variable's t-stat is less than 2, it is typically non-contributing.

·  5. Multi-cycle Regression: To make the model efficient, it is a practice to develop it in three cycles. The sales ratios (1st Cycle AVM values to Adjusted Sale Prices) from the first cycle will help define and remove outliers. Then, the outlier-free sample will lead to cycle two at the end of the first cycle, generating pre-residual AVM values and ratios. The end of cycle two helps make residual corrections, which is a very time-taking and iterative process. Once all of the residual revisions are finalized, the third and final cycle is run, which generates the model and produces the final values. Since a smart modeler knows how to systematically and methodically develop a model, it's generally far more efficient than those created ad hoc or in one single step.

·  6. Residual Analysis: At the end of the second cycle, the residuals need to be worked on. The fact that some sales ratios are clustered around 70 while some are around 140 does not mean the AVM values are wrong or inefficient because the AVM values are being compared with sales, which, individually, are all judgment calls. For instance, a prospective homebuyer bent on purchasing a pink house would in all likelihood overpay, while an SFR Rental company or an aggressive flipper or investor would buy a group of properties, perhaps paying way below the market (although some of those properties would be coded as arms-length), etc. In other words, the model is essentially fixing those anomalies. Nonetheless, as indicated before, residual analysis and correction is an arduous but necessary optimization task.

·  7. Independent Validation: Once the draft model is ready, it's good to have a comparables-based sample (where the sales ratios are either below 80 or above 120) worked up by an experienced appraiser or self-directed comps-based valuation site. In working up the sample, the adjustment matrix needs to be set up in line with model coefficients; for example, if a Coastal town or county is being modeled, the living area size adjustment factors would be significantly higher than their Midwest counterparts, etc. Similarly, the time and valuation dates must adequately follow the model as well. If the model shows a 12% annual appreciation in the area and the values are being forecasted for a future date, the sample must be set up accordingly; otherwise, the AVM and the validation sample would be apples to oranges. Ideally, the model values should be within 10-15% of the comps'.

·  8. Hold-out Sample Testing: Once the draft model is ready, and it needs to be tested on the hold-out sample kept aside at the beginning of the modeling process. Upon application of the model on to the hold-out and the first set of hold-out sales ratios are generated, the outliers need to be removed using the same ratio range as in the modeling sample. The hold-out application results (hold-out sales ratio stats – percentile distribution, COD/COV, PRD, etc.) must be very similar to those of the primary model. If they are at variance, the modeler must start investigating. Here is where the investigation should start: The modeler must immediately ensure that the final version from the 3rd cycle is being applied. Often, the newcomers are confused about the various cycles and use a model from the interim cycle.

·  9. Applying on to Population: One must remember that the whole AVM exercise is to develop a model from the sold population (on average, 5% homes sell annually) to value the mutually exclusive 95% unsold population. Of course, when the model is applied, it is used on the universe (meaning both sold and unsold properties). Here is why: Since the sold population is the subset of the universe, those model values will be regenerated along with the unsold, thereby forming a reasonable basis for a successful test application. In other words, as soon as the model is applied and values generated, a small matching extract from the modeling sample should be compared, and if the values match, subject to some rounding differences, the application would be in the right direction.
                      


  The need for AVM is growing by leaps and bounds. Assessors, Banks, Mortgage Companies, Mortgage servicers, REITs, Hedge funds, SFR Rentals, Tax Appeal houses, and Law firms, etc. are all big users of the certified AVM values.
-Sid Som, MBA, MIM
President, Homequant, Inc.