Thursday, November 7, 2019

9 Issues that make an AVM Inefficient, often Ineffective

The difference between a smart modeler and an average modeler is that the former quickly figures out and intelligently avoids the (valuation) modeling that tends to make the process inefficient, if not ineffective. That is why establishing a quality control or a modeling review process is essential to avoid having to deal with significant damage control down the line. A handful of such poor practices may raise serious concerns for the quality control reviewers. Here are some of those poor practices.

1. Time Adjustment: Some practitioners use the number of months since sale (NMSS) as an independent variable to ascertain the rate of growth (+/-) in the targeted price level, paving the way for the time coefficient to adjust the modeling sale prices, which then serves as the dependent variable in the regression equation. While this is acceptable as a lead-up regression to generate the primary dependent variable, it is unacceptable as an independent variable going into the regression equation. The reason is simple: since NMSS would be missing in the unsold population (the model would eventually be applied to), resulting in the application to fail. 

2. Sales GIS: Testing the Sales GIS representativeness is not an easy proposition, forcing many practitioners to skip this sampling test. Sales GIS is often a function of the market dynamics, deviating from the Population GIS. Therefore, an untested Sales GIS paves the way for an inefficient AVM. That is why many practitioners tend to use "fixed neighborhoods" in the modeling process as they are more stable and well-accepted, meaning they do not succumb to the short-term market swings and are generally liquid enough to help test the representativeness of the modeling sample.

3. Chasing Trophies: An AVM is not meant for the entire population. Trying to achieve a maximum of two sigma solution (95%) is more meaningful than the whole population, leaving out the admixture of the utterly unattainable 5%, including the limited number of trophy properties and large mansions. The reason is apparent: The paucity of such sales. Therefore, the smart modelers remove them from the modeling spectrum altogether, i.e., they do not let any of those sales enter the modeling sample, nor do they apply the model onto that subset.

4. Chasing Tiny Bungalows: This is the flip-side of dealing with the trophy properties. There is a tiny waterfront (in fact, many of them are prime oceanfront, e.g., Point Lookout, Atlantic shore Long Island, NY) bungalows all around the country, but this subset represents the land values primarily. Once they are sold, they generally take a completely different form, often as multifamily properties with re-zoning. Therefore, as long as the existing improvement is sound, they must be hand-worked by the appraisers. Letting them into the modeling process would be a red flag for the quality control reviewers.

5. Combining 2, 3, and 4-Family with SFRs: Considering this sub-class of residential properties is mostly the income-producing, the smart modelers know it's imprudent to a group and model them the single-family residences (SFR). New modelers often mistake combining them with SFRS as they are part of the same tax class in individual states or sharing the same mortgage category. Of course, the properties within this sub-group are mostly transacted using comparable sales to be market modeled, but as an independent and mutually exclusive (of SFRs) group. Mother-and-Daughter, a relatively common form of set up in big cities, is not a technical 2-family to be modeled with the SFRs.

6. Synthetic Variables: Many modelers who come from the non-quantitative background become obsessed with better modeling stats, allowing irrational or unexplainable synthetic variables like (X * Y) ^ Z into the equation. Grated, such variables may enhance the model's favorable stats but reduces its explainability and decomposability, and therefore, it's overall utility! On the other hand, the modelers with sound quantitative background realize from the get-go that the use of such variables is nothing but sowing the seeds for inefficient modeling, knowing very well they could never explain the underlying market economics.

7. Untested Models: As explained in the previous chapter, it's always a prudent practice to test the draft model on to a mutually exclusive hold-out sample before being applied on to the population. The hold-out sample test must produce very similar results, both before and after the outliers, as in the draft model, ensuring that the real model (not an interim version) is being applied. Caution: Since the hold-out sample is part of the original sales sample and therefore comprises sale dates, it will NOT detect the NMISS issue (# 1 above). Again, this is a critical step, the absence of which is tantamount to adventuresome modeling and could be costly at the end.

8. Sales Complex: Any part of the sales complex, directly or indirectly (e.g., ASP/SF, etc.), must not be used as independent variables in a regression equation as it violates the basic assumption of multiple regression analysis. NMISS is a classic case of such violation of the standard modeling practice. To reiterate, when any part of the sales complex is introduced on the independent side of the regression equation, it could not be applied to the unsold population as it would lack the sales attributes. Of course, this is one of the first rule violations any qualified quality control reviewer would look for.

9. Lack of Value Optimization: Since the final quality of the vast majority of the CAMA models is primarily error-based (usually the last model's Coefficient of Dispersion/COD, Price Related Differentials/PRD, etc.), there is hardly any practice of optimizing the final values from the regression models. In other words, those models are not subjected to the real test of optimality of the solution; for example, the CAMA modeler can hardly answer if their model COD of 8 is better than the model COD of 10. In individual events, the COD of 8 could be a post-optimal solution despite being a better error stat, while the COD of 10 could have been the optimal solution. Therefore, the final regression values need to be optimized via Linear/Non-linear Programming.

To conclude, modelers must remember to get into the habit of producing industry-standard models, without getting carried away by the quality stats. Case in point: A model with a conforming COD of 12 could receive a much better quality control score than its counterpart with a COD of 9.  

-Sid Som, MBA, MIM
President, Homequant, Inc.


No comments:

Post a Comment