Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Exploring a Pricing Model for Urban Rental Houses from a Geographical Perspective

Exploring a Pricing Model for Urban Rental Houses from a Geographical Perspective land Article Exploring a Pricing Model for Urban Rental Houses from a Geographical Perspective 1 1 , 2 , 1 1 , 3 1 Hang Shen , Lin Li * , Haihong Zhu , Yu Liu and Zhenwei Luo School of Resource and Environment Sciences, Wuhan University, Wuhan 430079, China; shenhang@whu.edu.cn (H.S.); hhzhu@whu.edu.cn (H.Z.); liuyu0201@whu.edu.cn (Y.L.); luowei1993@whu.edu.cn (Z.L.) Institute of Smart Perception and Intelligent Computing, SRES, Wuhan University, 129 Luoyu Road, Wuhan 430079, China Institute of Environment and Development, Guangdong Academy of Social Sciences, Guangzhou 510635, China * Correspondence: lilin@whu.edu.cn; Tel.: +86-138-7150-4963 Abstract: Models for estimating urban rental house prices in the real estate market continue to pose a challenging problem due to the insufficiency of algorithms and comprehensive perspectives. Existing rental house price models based on either the geographically weighted regression (GWR) or deep- learning methods can hardly predict very satisfactory prices, since the rental house prices involve both complicated nonlinear characteristics and spatial heterogeneity. The linear-based GWR model cannot characterize the nonlinear complexity of rental house prices, while existing deep-learning methods cannot explicitly model the spatial heterogeneity. This paper proposes a fully connected neural network–geographically weighted regression (FCNN–GWR) model that combines deep learning with GWR and can handle both of the problems above. In addition, when calculating the geographical location of a house, we propose a set of locational and neighborhood variables based on the quantities of nearby points of interests (POIs). Compared with traditional locational and neighborhood variables, the proposed “quantity-based” locational and neighborhood variables can cover more geographic objects and reflect the locational characteristics of a house from a comprehensive geographical perspective. Taking four major Chinese cities (Wuhan, Nanjing, Beijing, and Xi’an) as study areas, we Citation: Shen, H.; Li, L.; Zhu, H.; Liu, Y.; Luo, Z. Exploring a Pricing compare the proposed method with other commonly used methods, and this paper presents a more Model for Urban Rental Houses from precise estimation model for rental house prices. The method proposed in this paper may serve as a a Geographical Perspective. Land useful reference for individuals and enterprises in their transactions relevant to rental houses, and for 2022, 11, 4. https://doi.org/ the government in terms of the policies and positions of public rental housing. 10.3390/land11010004 Keywords: house rental price; geographically weighted regression; spatial heterogeneity; Academic Editor: Monika Kuffer deep learning Received: 23 November 2021 Accepted: 16 December 2021 Published: 21 December 2021 Publisher’s Note: MDPI stays neutral 1. Introduction with regard to jurisdictional claims in Prices in the real estate market may be one of the most important issues that people published maps and institutional affil- are concerned with. We usually consult real estate websites or agents to find a reference iations. for the price of a house before conducting the final transaction of buying or renting it. In addition, real estate valuation may indicate the economic situation or urban vibrancy of related regions [1]. Businesses are inclined to invest in a location by referring to an assessment of the relevant real estate market, and renters usually need to evaluate the cost Copyright: © 2021 by the authors. of living and expenditures based on the rental house prices in a certain place to determine Licensee MDPI, Basel, Switzerland. the positions of their jobs and lives. Therefore, real estate valuation is inextricably linked This article is an open access article with people’s lives and the economy, and as a result, estimating housing and rental housing distributed under the terms and prices is presently a popular issue. Housing price estimation may serve as a benchmark for conditions of the Creative Commons buyers/tenants and sellers/lessors. By estimating the selling or rental price of a property, Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ purchasers and tenants may assess whether the transaction is reasonable, and sellers and 4.0/). lessors can calculate the price of a house in a certain location and condition. Financial Land 2022, 11, 4. https://doi.org/10.3390/land11010004 https://www.mdpi.com/journal/land Land 2022, 11, 4 2 of 28 applications that require a reliable system for mortgage or lease calculations also demand the estimation of housing prices [2]. To obtain a dependable and accurate estimation model, it is important for the algorithm to handle the features and relationships in housing prices. The formation of housing prices is dependent on many factors, and the relationships among them are nonlinear and complex, with spatial heterogeneity [3,4]. Recent studies have concentrated on the characteristics of spatial heterogeneity and nonlinear relationships. In recent years, the geographically weighted regression (GWR) model and deep-learning models have usually been adopted for estimating housing prices [5,6]. The GWR model incorporates the influence of spatial heterogeneity on housing prices, which means that the model can take into account the impact of characteristics of the surrounding houses. However, as a linear-based model, GWR cannot present the nonlinear and complex relationships among the housing prices and their factors. A clear disadvantage of the GWR model has been observed in terms of out-of-sample forecasts [7]. In the era of big data and machine intelligence, deep-learning methods have been more frequently utilized in research and engineering problems, due to their superior fitting abilities and powerful generalization performance. The house selling and rental prices can also be strongly modeled by deep learning, and they can be automatically provided to assess prices in the housing market with higher accuracy and reliability [8,9]. The currently adopted deep-learning models for housing prices include the multilayer perceptron regressors [10–12], convolutional neural networks (CNN) [2,9,13,14], and their variants. These methods can explain the nonlinear and complex relationships but do not explicitly consider the spatial heterogeneity of the houses in an area. In general, if only GWR is applied, the nonlinear relationship will not be represented in the model, while if only a nonlinear model is adopted, the spatial heterogeneity will not be considered. Both of these issues may lead to a loss of precision in the housing price model. In terms of rental housing, many people currently have to choose renting a house before purchasing their own living space, and rental housing has become a significant component of many people’s lives [15–17]. To date, many models have been used to simulate rental house prices [7,10,18,19], and these models are often applied as supplements to research on housing (selling) prices, and usually present relatively lower price precision. In China, due to unreasonable rental prices, some public rental housing of the government do not sell well or are not received well by people due to distorted prices [20]. The problem is that the spatial heterogeneity or nonlinear relationship existing in rental house prices is absent in the models. Moreover, there is enormous complexity in the formation of house selling prices, in that housing sales happen not only through the movement of use value but also through fluctuations in transaction values [21,22]. The equilibrium rate of the utilization of the housing stock by renters is higher than that by buyers [21], and the demand for houses to rent may be more sensitive to geographical factors than the demand for houses to purchase [22]. Explorations of rental house prices might be less affected by market fluctuations and more closely affected by people’s consumption demands and abilities. Therefore, it is meaningful and necessary to explore and find a reliable pricing model with higher accuracy. In addition, to express the geographical location of a house, a series of locational and neighborhood variables [23–25], such as the distance to a bus station, distance to a school, and distance to a park, are often used in housing price models. These are based on distance and consider only the nearest geographic object of a house (the nearest bus station, school, park, etc.), which may lose the locational information generated by other neighboring or nearby objects. As a result, housing price models may have limited accuracy. In this paper we make efforts to improve the accuracy of the rental house-pricing model, and the model that we propose will result in higher precision and may be more practical (e.g., adjusted R = 0.9192, Pearson R = 0.9534 in Wuhan). A fully connected neural network–geographically weighted regression (FCNN–GWR) model that combines deep learning and the GWR model is presented. The proposed model is based on an FCNN, which is a basic type of neural network, and it incorporates the parameters and principles Land 2022, 11, 4 3 of 28 of GWR. It characterizes the nonlinear complexity of rental house prices and considers the influence of neighboring rental houses. Numerous rental cases available from real estate websites and points of interest (POIs) provide considerable samples for model training. In addition, to express the locational characteristics of a house, we present a series of locational and neighborhood variables based on the quantities of surrounding POIs. These quantity-based variables can better reflect the comprehensive locational characteristics of a house than the traditionally used locational and neighborhood variables, and they can help improve the accuracy of the pricing model. To evaluate the accuracy of the proposed method, several principal models of rental house prices, such as the hedonic price model (HPM) and GWR, are compared in the study. This paper is organized as follows: Section 2 reviews the relevant research on housing price and rental housing price models. Section 3 introduces the study area and the data used in the research. Section 4 introduces the models and methods adopted in the research. Section 5 analyzes the various methods and experiments and compares their results. Finally, the model with best fitting and strong predictive ability for rental house prices is obtained, and Section 6 presents the conclusions and future work. 2. Related Works Methods of housing price modeling include the HPM [23], the spatial lag model (SLM) [26], the spatial error model (SEM) [27], the generalized additive model (GAM) [28], GWR [5], deep-learning models, and their related methods. These methods of housing price modeling have been used in many cases and have been proven to be effective. However, they need to be improved to estimate housing selling and rental prices. 2.1. HPM and Spatial-Based Housing Price Models The HPM [23], which is a basic method for explaining housing prices, has been widely used since its introduction [24] and is the basis of other housing price models. The HPM relies on the assumption that housing price can be divided into several factors, including structural variables (the characteristics of the building itself), locational variables (such as the distance to the central business district (CBD)) and neighborhood variables (such as the distance to a nearby park). The HPM method typically uses multiple linear regression (MLR) to fit the relationships among housing prices and their factors. A number of studies have explored housing prices based on the HPM [6,19,24,25,29], proving it to be an effective approach, but the fitting accuracy of MLR is not high. To improve the accuracy of the simple linear model, different methods have been applied to housing price modeling by considering the spatial differences. The SLM [26] and SEM [27] methods focus on the spatial autocorrelation in housing prices. The SLM takes into account the impact of the dependent variable, while the SEM assumes that the spatial-autocorrelation issue can be handled by considering the spatial dependencies in the errors. The accuracies of the SLM and SEM methods are higher than that of the HPM, confirming the existence of spatial variances in housing prices. However, the improvement is not very remarkable since the models are still linear. For example, an SLM and SEM were used by Won [18] to model rental house prices in Seoul, South Korea, and the results were not very accurate. It is known that housing prices are complex and contain spatial heterogeneity [3,4]. To explore and obtain a price model, the GWR model proposed by Fotheringham [5] was applied to analyze housing prices. GWR is based on local smoothing, which can explain the spatial heterogeneity. In recent years, GWR has become the commonly used approach in housing price studies [4,6,30,31]. However, GWR cannot capture the complicated nonlinear characteristics in the housing prices because of its linear form. In recent research [7], the HPM and GWR models were employed to estimate the price of 570,000 rental flats. The results suggest that the HPM alternately performs better in out-of-sample forecasts than GWR, which is evidence for the disadvantage of GWR in accuracy and robustness of price forecasts. Land 2022, 11, 4 4 of 28 In summary, it is generally difficult for linear models to achieve satisfactory accuracy for housing price estimation, although their forms usually have good explanatory abilities for the price and spatial factors. 2.2. Nonlinear and Complex Housing Price Models Housing prices are complex and have nonlinear relationships, and nonlinear methods can clearly improve the precision of housing price models [2,32,33]. The GAM is a nonlinear model [28] first adopted for housing prices. However, it is actually a linear-extensive model, and the R improvement in the price evaluation in the GAM-based studies is usually less than 5% compared with MLR [3,7], revealing its limitations in improving the performance of estimation results. Over the years, machine-learning approaches have been adopted for the housing price problem with the hedonic model. Yoo [34] first applied machine learning for the hedonic model and proved that random forests may be practical for selecting the important variables for the hedonic model and enhancing the performance. Hu [35] monitored rental house prices with social media data, revealed the determinants and relative importance of rental house prices based on machine-learning approaches, and demonstrated the ability to integrate machine learning with the hedonic model to map spatial patterns. Rico-Juan [36] discovered that the methods of ordinary least squares hedonic regression, quantile hedonic regression, and machine learning have their respective superiorities in explaining housing prices, and the analysis of the Shapley values [37] based on random-forest machine learning is profound since it can identify the nonlinear and synergistic relationships from a three-dimensional perspective. These machine-learning approaches clearly have better accuracies than linear housing price models, and they also have a certain explanatory ability for the dependent variable. Nonetheless, the performance of machine-learning methods to predict or estimate housing prices can still be improved. In the era of big data and machine intelligence, deep-learning methods have been more frequently utilized in research and engineering problems for their powerful fitting and automation abilities. For housing selling and rental prices, deep-learning evalua- tion methods tend to be provided to automatically and intelligently assess the housing market values with higher accuracy and reliability [8,9]. Bency [10], Yao [13], Yu [14], and Wang P. [2] used CNNs to model housing prices with remote sensing images and general housing price factors. The CNN model of Yu [14] treats the housing price vari- ables as an image and can extract the complexity of the relationships among the variables. However, it is questionable whether the arrangement of the variables is dependable and whether the pooling layers are necessary. Wang J. [32] uses the neural networks based on synaptic memristor to predict housing prices. Some researchers have used street view images [1,9,38] or indoor pictures [39] to help improve deep learning for housing price models. The multisource data-fusion and attention mechanism utilized by Bin [9] has performed efficiently in property value assessment. The above studies have achieved im- proved accuracy for housing price prediction since nonlinear and complex characteristics can be extracted from the deep-learning models. However, when using these methods, the spatial heterogeneity, which is a significant and nonnegligible factor in housing prices, is still absent. Deng [31] combined the GWR approach with the extreme learning machine (ELM) and generated a “geographically weighted ELM (GWELM)”. It has been proven to be effective in revealing both spatial heterogeneity and nonlinear aspects, but it may have unstable depressing accuracies in several cases. This model has not been applied in housing-price modeling, but it can be inferred that the combination of deep learning and GWR may yield satisfactory results. The geographically weighted artificial neural network (GWANN) developed by Hagenauer and Helbich [40] can combine the nonlinearity and spatial heterogeneity in housing prices. Unfortunately, their study did not consider the detailed locational and neighborhood variables or compare their method with other deep- learning models. A specific method of how to incorporate both nonlinearity and spatial heterogeneity into the estimation of housing (selling and rental) prices and the effect of the approach still needs to be explored. Land 2022, 11, 4 5 of 28 Housing selling prices are complex because they are influenced by both the move- ment of use value and fluctuations in transaction values, while rental house prices may be more closely related to people’s consumption demands, abilities, and preferences for locations [21,22]. In general, rental house-pricing models usually follow those used to eval- uate selling house prices, while their accuracy is usually lower. Examples include Cajias [7], Bency [10], and Liebelt [19]. The issue of rental house prices has also been substantially discussed in socioeconomics in various dimensions, such as the market [33], population movement [41], personal and communal situations [17] or comprehensive social economic factors [42]. Such research generally presents macro and statistical views of rental house prices, without considering the spatial heterogeneity of the samples from a geographical perspective. Rental house pricing based on evaluations of fluctuating accuracy may mis- guide renters and lessees in their transactions of rental houses, as well as the government’s policymaking and administration of public rental housing. In China, for example, public rental residences do not sell well and are not well received in some areas [20]. It has been revealed that the lack of information and the transaction process are important factors for the emotion of regret in people’s rental house transitions [43]. A reliable estimation of the rental house prices may provide people with more dependable information and more convenient transactions. Therefore, it is both worthwhile and necessary to explore and develop a more effective and dependable rental house-pricing model. It is clear from current studies that to improve the accuracy of the rental house-pricing model, the proposed model should be characterized by both nonlinearity and spatial heterogeneity. Such characterization is the main target of this study. In addition, to improve the accuracy of the model, we present new kinds of locational and neighborhood variables that can cover more geographic objects and reflect the locational characteristics of a house from a multiscale and comprehensive geographical perspective. 3. Data 3.1. Study Areas The study areas include some major cities in China. Considering data availability, the rental houses in four cities are included in this study: Wuhan, Nanjing, Beijing, and Xi’an (Figure 1). Wuhan is a megacity in Central China. It is an important industrial, science, and 0  0 education base and a comprehensive transportation center. Wuhan (29 58 –31 22 N, 0  0 2 113 41 115 05 E) has 13 municipal districts with a total area of 8569.15 km , and in 2020, it had a resident population of 12.448 million people. Nanjing is an important megacity in Southeast China. It is a science and education 0  0  0 base, and a comprehensive transportation center. Nanjing (31 14 32 37 N, 118 22 – 0 2 119 14 E) has 11 municipal districts with a total area of 6587.02 km , and in 2020, it had a resident population of 9.320 million people. Beijing is the capital city of China and the largest city in Northern China. Beijing 0  0  0  0 (39 24 41 36 N, 115 42 117 24 E) has 16 municipal districts with a total area of 16,410.54 km , and in 2020, it had a resident population of 21.890 million people. Xi’an is a megacity in Northwest China. It is an important industrial, science, and 0  0  0 0 education base of China. Xi’an (33 25 34 27 N, 107 24 109.29 E) has 13 municipal districts with a total area of 10,752 km , and in 2020, it had a resident population of 12.953 million people. Land Land 2022 2021 , , 11 10 , , x FO 4 R PEER REVIEW 6 of 6 of 28 28 Figure 1. The four cities constituting the study areas: Wuhan, Nanjing, Beijing, and Xi’an. Figure 1. The four cities constituting the study areas: Wuhan, Nanjing, Beijing, and Xi’an. • The Wuh experiments an is a megac involving ity in Cent theseral fo Chin ur differ a. It ent iscities an im arport e intended ant ind to ust test riathe l, sc availability ience, and and generality educationof ba our se and method a com for prehensiv differentecities. transp These ortation cities cent ar er. e located Wuhanin (2dif 9°58 fer′–3 ent 1°zones 22′ N, of China (Wuhan: Central China; Nanjing: Southeast China; Beijing: Northern China; 113°41′−115°05′ E) has 13 municipal districts with a total area of 8569.15 km , and in Xi’an: Northwest China) and may represent the geographical locational and sociocultural 2020, it had a resident population of 12.448 million people. diversity • Nanj of ing China is anas import a vastant country megac [44 ity ]. in Ther Soe utar heast e rising Chin numbers a. It is a sc of floating ience and populations education in these cities, which means that they have considerable demands for rental housing and base, and a comprehensive transportation center. Nanjing (31°14′−32°37′ N, 118°22′– flourishing rental housing markets. Thus, we chose them as study areas. 119°14′ E) has 11 municipal districts with a total area of 6587.02 km , and in 2020, it had a resident population of 9.320 million people. 3.2. POI Data • Beijing is the capital city of China and the largest city in Northern China. Beijing The POI data in this study were captured from the Baidu Map website, which is the (39°24′−41°36′ N, 115°42′−117°24′ E) has 16 municipal districts with a total area of largest map service 2 provider in China. The POIs of Baidu Map are classified into 21 primary 16,410.54 km , and in 2020, it had a resident population of 21.890 million people. categories [45], as depicted in Table 1. Administrative landmarks and addresses are not • Xi’an is a megacity in Northwest China. It is an important industrial, science, and considered in our study because they are map features rather than entities. Therefore, 19 education base of China. Xi’an (33°25′−34°27′ N, 107°24′−109.29′ E) has 13 municipal primary categories and 134 secondary categories 2 are taken into account. We collected the districts with a total area of 10,752 km , and in 2020, it had a resident population of POIs of Baidu Map in February to March 2020, and finally, more than 1.7 million POI data 12.953 million people. were obtained for the four cities. The experiments involving these four different cities are intended to test the availability and generality of our method for different cities. These cities are located in different zones of China (Wuhan: Central China; Nanjing: Southeast China; Beijing: Land 2022, 11, 4 7 of 28 Table 1. The POI data and categories of Baidu Map. Primary Category Secondary Category Chinese restaurants, foreign restaurants, snack shops, cake and dessert shops, coffee shops, tea Food shops, bars, etc. Hotel Star hotels, fast hotels, apartment hotels, etc. Shopping malls, supermarkets, convenience stores, household building materials, digital Shopping appliances, shops, markets, etc. Communication business halls, post offices, logistics companies, ticket offices, laundromats, photo Life and services shops, real estate intermediaries, public utilities, maintenance points, housekeeping services, funeral services, lottery sales points, pet services, newspaper booths, public toilets, etc. Beauty Beauty, hairdressing, manicure, body beautification, etc. Parks, zoos, botanical gardens, amusement parks, museums, aquariums, beach baths, cultural Scenic spot relics, churches, scenic spots, etc. Recreation and Holiday villages, farmhouses, cinemas, KTV, theatres, song and dance halls, internet cafes, entertainment playgrounds, bath massages, leisure squares, etc. Sports fitness Stadiums, extreme sports venues, fitness centers, etc. Colleges and universities, secondary schools, primary schools, kindergartens, adult education, Education and training parent–child education, special education schools, scientific research institutions, training institutions, libraries, science and technology museums, etc. Cultural media Press and publishing, radio and television, art groups, galleries, exhibitions, cultural palaces, etc. General hospitals, specialized hospitals, clinics, pharmacies, medical institutions, sanatoriums, Medical care emergency centers, CDC, etc. Automobile sales, automobile maintenance, automobile detailing, automobile parts, car rental, Automobile service automobile testing grounds, etc. Railway stations, long-distance bus stations, ports, parking lots, gas stations, service areas, toll Transport facilities stations, bridges, etc. Finance Banks, ATMs, credit cooperatives, investment and financing, pawnbrokers, etc. Real estate Office buildings, residential areas, dormitories, etc. Company and business Companies, parks, agriculture, forestry, horticulture, factories and mines, etc. Governments of all levels, administrative units, public prosecution and law institutions, Government organization foreign-related institutions, party groups, welfare institutions, political and educational institutions, etc. Expressway exits, expressway entrances, airport exits, airport entrances, station exits, station Entrance and exit entrances, parking lot entrances, bicycle high-speed exits, bicycle high-speed entrances, etc. Natural feature Islands, mountains, water systems, etc. Provinces, provincial cities, prefecture-level cities, districts, counties, business districts, towns, Administrative landmark villages, etc. Address Address points, etc. 3.3. Rental House Data The rental house data in the study are from the commonly used real estate website, Lianjia [46]. Recent studies have demonstrated that the data on this website are effective for housing price analysis [47,48]. All of the rental house samples were captured from Lianjia, and data collection occurred between February and July 2020. The attributes of the houses (the rental price, the area, the information of the community, etc.) can be obtained via data collected from the website. Due to the data accessibility, only the whole rental houses were considered in this research. The data were screened for houses with civil electricity and water, and extreme values were excluded. To control the residential density of the houses, for each city, the samples with extremely high plot ratios (of the communities) in the top 5% were eliminated, since residential density is negatively related Land 2022, 11, 4 8 of 28 to rental prices and a dense environment is more likely to result in congestion and the invasion of privacy. Approximately 239,000 rental samples were obtained (Wuhan: 54,466, Nanjing: 32,076, Beijing: 109,683, Xi’an: 43,289). The rental house price is the dependent variable in this paper, and the rental house prices of the samples in this research range from 6.0~148.8 RMB/m /month (calculated as the unit price). The descriptions of the rental prices and relevant attributes are shown in Table 2, and the statistics of these data are depicted in Table A1 of Appendix A. Table 2. The description of the rental house price and relevant attributes. Expected Variable The Explanation and the Measurement Method for the Variables Type Sign Area - Square meters of the living area (m ) TotalFloor The total number of floors of the building Unknown The level of the floor on which the unit is situated. (1: “low-level”, <1/3 of the total Level floor value of the building; 2: “middle level”, between 1/3 and 2/3 of the total floor Unknown value, 3: “high level”, >1/3 of the total floor value) The age of the building (calculated by subtracting the construction month from the Age - transaction month, and divided by 12 (so it is in years)) Age- The squared value of “Age” [49] + squared The (monthly) time trend variable. The earliest rental date in the dataset of this Month paper is February 2020. Thus, the observation of this sample takes the value of 1 for Unknown the “Month” variable; then, it is 2 for the houses rented in March 2020, etc. A dummy variable for whether the house is rented in the first quarter of the year (1: Spring Unknown the transaction date is January, February, or March; 0: otherwise) A dummy variable for whether the house is rented in the second quarter of the year Summer Unknown (1: the transaction date is April, May, or June; 0: otherwise) Structural A dummy variable for whether the house is rented in the third quarter of the year (1: Autumn Unknown variables the transaction date is July, August, or September; 0: otherwise) Room The number of bedrooms + Hall The number of halls - Toilet The number of toilets Unknown Whether the room faces south (1: when the direction description of the house South + contains “south”; 0: otherwise) Whether the room faces north (1: when the direction description of the house North + contains “north”; 0: otherwise) Whether the room faces east (1: when the direction description of the house contains East + “east”; 0: otherwise) Whether the room faces west (1: when the direction description of the house West - contains “west”; 0: otherwise) PlotRatio The gross floor area of the building divided by the area of the community. Unknown Green The greening rate of the community + ParkSpace The number of parking spaces of the community + Fee The property management fee of the house (RMB /m /month) + DCBD The distance to the CBD (km) - Locational Dbus The distance to the nearest bus station (km) - variables Dsub The distance to the nearest metro station (km) - DshopCen The distance to the nearest shopping center (km) - Land 2022, 11, 4 9 of 28 Table 2. Cont. Expected Variable The Explanation and the Measurement Method for the Variables Type Sign Dpark The distance to the nearest park (km) - DpriSch The distance to the nearest primary school (km) - Neighborhood DsecSch The distance to the nearest middle school (km) - variables Dnurs The distance to the nearest nursery school (km) - Dhosp The distance to the nearest hospital (km) - Price The rental price of the house (RMB/m /month) In this research, all of the geographic information was transformed into the Baidu metric coordinate system since the POI data of this research are collected from Baidu Map. 4. Method 4.1. Principal Housing Price Models 4.1.1. HPM The HPM is a principal pricing model and is the basis of other housing price models [6]. It is introduced and considered as a control group in this study. HPM believes that a house or rental house can be considered a commodity and that its price is made up of many different factors. The factors of rental houses are usually considered to follow the factors of the selling houses [10,19], including 3 types: structural variables, locational variables, and neighborhood variables. Based on the previous studies [4,25,47] and the situation of our available data, the factors considered in this research are listed as follows: the structural variables include the area of the house (Area), the number of total floors (TotalFloor), floor level of the house (Level), number of rooms (Room), halls (Hall), toilets (Toilet), plot ratio of the community (PlotRatio), greening rate of the community (Green), number of parking spaces in the community (ParkSpace), the property management fee (Fee), the age of the building (Age) and its squared value (Age-squared) [49], the monthly trend variable (Month), the seasonality dummy variables (Spring, Summer and Autumn), and house orientation dummy variables (South, North, East, West); locational variables include the distance to the CBD (DCBD) and distances to the nearest subway station (Dsub), bus stop (Dbus), and shopping center (DshopCen); neighborhood variables include the distance to the nearest park (Dpark), hospital (Dhosp), middle school (DsecSch), primary school (DpriSch), and nursery (Dnurs). Since studies have shown that the road network distance can provide additional and useful insights into the housing price dataset and improve the accuracy of relevant models [16,50], the distances correlated with POIs in this study are all measured as the road network distance. The definitions, statistical values, and expected effects on rental price are listed in Table 2. The basic model of the HPM can be Y = f (X, b, #), where Y is the rental price, X is the characteristic vector consisting of each variable of the house, b is the coefficient in front of each factor, and # is the residual term. Practically, the model is generally implemented by means of MLR, as follows: y = b + b x + # (1) 0 å j j j=1 where b represents the MLR parameter for the jth explanatory variable x , and m is the j j number of explanatory variables. 4.1.2. GWR The HPM is essentially a global ordinary least squares (OLS) model. Although the model can explain the affecting factors, the fitting accuracy is often insufficient. This insufficiency is because a house is not only correlated with the various factors in the Land 2022, 11, 4 10 of 28 HPM, but also closely related to the geographical location: the rental price of a house will be affected by the characteristics and prices of the neighboring houses. The spatial heterogeneity and regional diversity may cause a discrepancy in the parameter b of the factors. Thus, the rental price is more suitable for a local regression model. Fotheringham introduced the GWR model [5], which is a geographical extension of OLS. The attribute coefficients of GWR can be viewed as a semi-logarithmic function of the change of the explanatory variable [24]. GWR considers geographical heterogeneity and allows the variations of local parameters, which is formulated as: y = b (u , v ) + b (u , v )x + # , i = 1, 2, . . . n (2) i i0 i i å k i i ik i k=1 where (u , v ) represents the coordinates of sample i, b (u , v ) represents the local parame- i i ik i i ter of the kth variable of the sample i, which varies for different locations, b (u , v ) is the i0 i i intercept value, and # is the error term. The GWR approach is superior for its ability to reveal the spatial heterogeneity. b (u , v ) can be estimated as follows: k i i T T b(u , v ) = (X W(u , v )X) X W(u , v )Y (3) i i i i i i where the weighting matrix W is a diagonal matrix and the off-diagonal elements are all zero, W(u , v ) = dia(W , W , . . . , W , . . . , W ). As declared above, the locations of 1 2 i i i i ij in all rental houses have been transformed into the Baidu metric coordinate system. The geographical weight of the sample i and sample j are represented by W . In this study, we ij obtain the weighting matrix with the fixed Gaussian kernel function: 2 2 W = exp(d /b ) (4) ij ij where d is the distance between houses i and j, and b is a non-negative parameter (band- ij width) that represents the decay degree with the distance. The bandwidth (b) is a very important parameter, and the appropriate bandwidth can be selected based on the mini- mum Akaike information criterion (AICc) for the GWR model [51]. For data with some geographical correlations, the fitting accuracy of the GWR model is substantially greater than that of the global regression method (HPM) because spatial heterogeneity and loca- tional discrepancy are taken into account. 4.1.3. FCNN—A Deep Learning Model Deep learning commendably supports big data because of its powerful generalization and automation capability and has been widely used in recent years [13]. In this study, we designed a 5-layer fully connected neural network (FCNN) for house rental price and its factors. As shown in Figure 2, the input layer is the vector of the factors, including structural variables, locational variables, and neighborhood variables; there are 4 hidden layers, and the numbers of neurons for them are 200, 120, 100, and 20, respectively; the output layer is the predicted value of the rental house price. Deep learning can address the nonlinear and complex relationships [3] implied by the variables, which is crucial for the fitting of housing prices. The algorithm of deep learning in this study is: the learning aim is to minimize the sum of the residuals of the predicted and actual values; the activation function is ReLU [52]; the back propagation algorithm is the gradient descent algorithm [53]; for each step, the training number of samples (batch size) is 64; the initial learning rate, the attenuation of learning rate, and the attenuation of the sliding average are set to 0.8, 0.99, 0.99, respectively; the L2 regularization [54] is included in the network for eliminating overfitting. The loss function of the FCNN can be formulated as: loss = (Y Y ) + 0.0001 w (5) å å i=1 Land 2022, 11, 4 11 of 28 Land 2021, 10, x FOR PEER REVIEW 11 of 28 where y represents the predicted value, y represents the true value, l is the number of hidden layers, and w is the neuron parameters in each hidden layer. For deep-learning In this study, we designed a 5-layer fully connected neural network (FCNN) for house methods, all of the data should be divided into 2 parts: the training set and the test set. The rental price and its factors. As shown in Figure 2, the input layer is the vector of the factors, training process is carried out in the training set to minimize the value of “loss”. For every including structural variables, locational variables, and neighborhood variables; there are sample in deep learning, the values of each variable have been normalized to 0~1 to avoid 4 h diver idden gence la ofye the rs, and model. thAfter e num the bers training of neuron is completed, s for ththe em a model re 20 is0,r un 120 on , 10 the0,test and set 20 to, respecti estimate vely; the model’s the output layer is the pred fitting accuracy andicted v predictive alue of the rent power for al unknown house pr samples. ice. Figure 2. The structure of the layers and neurons in the FCNN model. Figure 2. The structure of the layers and neurons in the FCNN model. 4.2. FCNN–GWR—The Combination of Deep Learning and GWR Deep learning can address the nonlinear and complex relationships [3] implied by GWR cannot characterize the nonlinear complex characteristics of price, and existing the variables, which is crucial for the fitting of housing prices. The algorithm of deep deep-learning methods cannot explicitly process the spatial heterogeneity. Therefore, we learning in this study is: the learning aim is to minimize the sum of the residuals of the propose the FCNN–GWR model, combining deep learning with GWR, which can handle predicted and actual values; the activation function is ReLU [52]; the back propagation both aspects of the problem. The general idea of the FCNN–GWR model is that the FCNN algorithm is the gradient descent algorithm [53]; for each step, the training number of model can provide an acceptable prediction value for house rental price through deep samples (batch size) is 64; the initial learning rate, the attenuation of learning rate, and the learning, and the implementation of GWR on this value can optimize it. As the b parameters attenuation of the sliding average are set to 0.8, 0.99, 0.99, respectively; the L2 of GWR contain the spatial heterogeneity and spatial discrepancy of the house rental price, regularization [54] is included in the network for eliminating overfitting. The loss function including them in the deep-learning model may explicitly help to optimize the fitting value. of the FCNN can be formulated as: We can compose a matrix M, which combines the GWR b parameters with the structural, locational, and neighborhood variables, and*2 then deep learning 2 can be carried out in the loss=− (YY )+ 0.0001* w (5)  matrix M, which may obtain more accurate prediction results. The matrix M = [b , b , b , i=1 0 1 2 . . . , b , x , x , . . . , x ], where b , b , . . . represents the b parameters in the GWR model as m m 1 2 0 1 where represents the predicted value, represents the true value, l is the number y y i i in Equation (2), and x , x , . . . means the structural, locational, and neighborhood variables, 1 2 of hidden layers, and wi is the neuron parameters in each hidden layer. For deep-learning the same as Equation (1). methods, all of the data should be divided into 2 parts: the training set and the test set. As has already been verified, GWR has a clear disadvantage in out-of-sample forecasts, The training process is carried out in the training set to minimize the value of “loss”. For which means that the prediction value of GWR may be not sufficiently reliable when there every sample in deep learning, the values of each variable have been normalized to 0~1 to are not enough samples near the concerning point. Therefore, we only adopt the GWR avoid divergence of the model. After the training is completed, the model is run on the predictions when there is a relatively large number of samples nearby; when there are test set to estimate the model’s fitting accuracy and predictive power for unknown fewer samples nearby, just the previous FCNN prediction values are adopted as the final sample prediction s. value, which means: FCNN(b , b , b , . . . , b , x , x , . . . , x ) , condition1 4.2. FCNN–GWR—The Combin0 atio1 n of2 Deep Learni m 1 ng 2 and GW m R y = (6) FCNN(x , x , . . . , x ) , condition2 1 1 GWR cannot characterize the nonlinear complex characteristics of price, and existing deep-learning methods cannot explicitly process the spatial heterogeneity. Therefore, we where for the ith house, y denotes its prediction value; other variables are the same as propose the FCNN–GWR model, combining deep learning with GWR, which can handle Equations (1) and (2). For a certain house, Condition 1 means that the number of its both aspects of the problem. The general idea of the FCNN–GWR model is that the FCNN nearby house samples (within the distance of the bandwidth of GWR) is larger than the model can provide an acceptable predniction value for house rental price through deep average level among all houses (= nears /n, where nears represents the number i i i=1 learning, and the implementation of GWR on this value can optimize it. As the β of neighboring samples within the GWR bandwidth for the ith house, and n represents parameters of GWR contain the spatial heterogeneity and spatial discrepancy of the house rental price, including them in the deep-learning model may explicitly help to optimize Land 2022, 11, 4 12 of 28 the total number of houses in the dataset); Condition 2 means that the number of nearby (within the distance of bandwidth) houses around it is smaller than the average level among all house samples. Since the bandwidth is a decisive parameter in GWR, and only samples within the distance of the bandwidth of the GWR play a relatively important role in the calculation, we divided the quantities of nearby samples into the 2 conditions by the number of samples within the distance of the bandwidth. When the quantity of nearby samples is smaller than the average, the prediction by GWR may not be sufficiently credible. In these cases, FCNN is more reliable while conversely, weighting geographically may reduce precision. The minus value of accuracy increment of the GWELM in Table 1 of Deng et al. [31] may be attributed to this phenomenon. Recent studies have proved that the attention mechanism can be effective for the neural networks of the housing price [1,2,9,55]. In our research, the house variables [x , x , . . . , 1 2 Land 2021, 10, x FOR PEER REVIEW 13 of 28 x ] and GWR b parameters [b , b , b , . . . , b ] were assigned with an attention block [1] m 0 1 2 m in front of the first fully connected layer, respectively. The attention block can convert the original input characteristics into attended characteristics, in order to identify the important Step 3: Put the GWR β parameters, and the structural, locational, and neighborhood features that influence the rental prices. The attention block can be described as a Softmax- variables together to make up the matrix M. Through a deep-learning tra exp(ihni)ng with the activated fully connected layer, and the algorithm is: y = F(h )x = x . where k k k k å exp(h ) k k matrix M wrapped with the attention blocks, the FCNN model including the GWR β h = w x + b , where F() is the Softmax function, x is the input features, y is the output k ki i i parameters (and structural, locational, and neighborhood variables) can be obtained. featur Step 4: Pred es, h is the neur ict the price v ons of thisafully-connected lue on the test set with t layer, andhw e FCNN is the weights model of includin the input g the x. F GWR (h) is βthe parameters (obtai soft attention-weighted ned in Step vector 3). Th , which e pred can iction value signify the is re importance ferenced as of the f1. features of the house variables and GWR b parameters. The variation among attended features y Step 5: Predict the price value on the test set with the ordinary FCNN model would be substantially larger than the variance among the original features x as a result of (obtained in Step 1, only with the structural, locational, and neighborhood variables, the attention block, suggesting that the important characteristics for the house rental price without the GWR β parameters). The prediction value is referenced as f0. are emphasized in the network, and it would benefit the convergence and performance of Step 6: On the test set, the final predicted results of FCNN–GWR are obtained the model. according to equation (6): if there are relatively more samples nearby (Condition 1), the In the FCNN–GWR model, the data should be divided into the training sets the test final prediction value would be f1; if there are relatively fewer samples nearby (Condition sets. The process of FCNN–GWR is shown in Figure 3 and can be described as follows: 2), the final prediction value would be f0. Figure Figure 3. 3. The The p pr rocedur ocedure of e of the the FCNN–GWR model. FCNN–GWR model. Step 1: Train the FCNN model on the training set with the structural, locational, and Through this method of synthetic training, the FCNN–GWR model not only has the neighborhood variables of the house. ability to explain the nonlinear complexity of the price but also addresses the spatial heterogeneity explicitly since the method considers the influence of surrounding rental houses. In this paper, FCNN–GWR and other models were used and compared in the study areas to demonstrate the superiority of the proposed model. 4.3. Quantity-Based Locational and Neighborhood Variables In traditional housing price models, the locational and neighborhood variables include DCBD, Dpark, and so on (Table 2). These factors can reflect the location of the house, but a limited number of variables are allowed in order to avoid the multicollinear problem [56]. In this way, although main factors of the price can be effectively explained, problems still exist. Firstly, these locational and neighborhood variables are distance- based, while expressing the location of houses with the distance may be somewhat inaccurate, which leads to the loss of precision in the house-pricing model. Seo et al. [16], Li et al. [29], and Bency et al. [10] have given the evidence. As shown in Figure 4, there are many POIs typed “school” in this area. When calculating the locational and neighborhood variables of the houses in this area, only the information of the blue points is actually used, which are the “nearest” school POIs to the houses; the information of the neighboring Land 2022, 11, 4 13 of 28 Step 2: Execute the GWR model on the training set with the structural, locational, and neighborhood variables. Then, the b parameters of GWR can be calculated for each house via GWR fitting. Step 3: Put the GWR b parameters, and the structural, locational, and neighborhood variables together to make up the matrix M. Through a deep-learning training with the matrix M wrapped with the attention blocks, the FCNN model including the GWR b parameters (and structural, locational, and neighborhood variables) can be obtained. Step 4: Predict the price value on the test set with the FCNN model including the GWR b parameters (obtained in Step 3). The prediction value is referenced as f . Step 5: Predict the price value on the test set with the ordinary FCNN model (obtained in Step 1, only with the structural, locational, and neighborhood variables, without the GWR b parameters). The prediction value is referenced as f . Step 6: On the test set, the final predicted results of FCNN–GWR are obtained accord- ing to equation (6): if there are relatively more samples nearby (Condition 1), the final prediction value would be f ; if there are relatively fewer samples nearby (Condition 2), the final prediction value would be f . Through this method of synthetic training, the FCNN–GWR model not only has the ability to explain the nonlinear complexity of the price but also addresses the spatial heterogeneity explicitly since the method considers the influence of surrounding rental houses. In this paper, FCNN–GWR and other models were used and compared in the study areas to demonstrate the superiority of the proposed model. 4.3. Quantity-Based Locational and Neighborhood Variables In traditional housing price models, the locational and neighborhood variables include DCBD, Dpark, and so on (Table 2). These factors can reflect the location of the house, but a limited number of variables are allowed in order to avoid the multicollinear problem [56]. In this way, although main factors of the price can be effectively explained, problems still exist. Firstly, these locational and neighborhood variables are distance-based, while expressing the location of houses with the distance may be somewhat inaccurate, which leads to the loss of precision in the house-pricing model. Seo et al. [16], Li et al. [29], and Bency et al. [10] have given the evidence. As shown in Figure 4, there are many POIs typed “school” in this area. When calculating the locational and neighborhood variables of the houses in this area, only the information of the blue points is actually used, which are the “nearest” school POIs to the houses; the information of the neighboring yellow points is not included, just because they are not the “nearest” ones to the houses. In other words, the locational information formed by these yellow points is discarded rather than exploited, which may influence the accuracy of the price model. Secondly, some variables are excluded from the model since they are similar to other variables, the model may lose a certain amount of information. These variables can also contribute to the housing price to a certain extent. To solve this, we propose another method to measure the locational characteristics of the house: the quantity-based locational and neighborhood variables. In our perspective, the number and the combination of the various kinds of POIs surrounding a house can better reflect its locational characteristics. For example, Figure 5a is the place near a gate of a school, with very dense POIs around it. It is not reasonable to consider only the selected “nearest” POI since other neighboring POIs also contribute to the locational characteristics. To consider the influences of other POIs, a better way is to calculate the number of every type of POI nearby. The number and the combination of every type of POI can better reflect the location characteristics of houses locally. For example, in Figure 5a there are 1301 commercial POIs, 68 traffic POIs, 20 stadium POIs, and 123 school POIs. The number of commercial and school POIs is very large, which implies that this place may be the intersection between the school and the commercial district. For another example, Figure 5b is a newly built venue in Wuhan. There are 104 business POIs, 53 transportation POIs, 6 stadium POIs, and no school POIs nearby. The number of subways, bus, parking, and other Land 2021, 10, x FOR PEER REVIEW 14 of 28 Land 2022, 11, 4 14 of 28 yellow points is not included, just because they are not the “nearest” ones to the houses. In other words, the locational information formed by these yellow points is discarded rather than exploited, which may influence the accuracy of the price model. Secondly, transportation facilities is very large, but the number of commercial and school POIs is very some variables are excluded from the model since they are similar to other variables, the small, which demonstrates the characteristics of this place as a new infrastructure and new model may lose a certain amount of information. These variables can also contribute to venue. Therefore, the locational characteristics of a place can be reflected in the form of the the housing p above. rice to a certain extent. Land 2021, 10, x FOR PEER REVIEW 15 of 28 Figure Figure 4. 4. TheThe used used and n and not-used POIs ot-used PO in the models Is in the with distance-based models with locational distance-based locational and and neighborhood neighborhood variables. variables. To solve this, we propose another method to measure the locational characteristics of the house: the quantity-based locational and neighborhood variables. In our perspective, the number and the combination of the various kinds of POIs surrounding a house can better reflect its locational characteristics. For example, Figure 5a is the place near a gate of a school, with very dense POIs around it. It is not reasonable to consider only the selected “nearest” POI since other neighboring POIs also contribute to the locational characteristics. To consider the influences of other POIs, a better way is to calculate the number of every type of POI nearby. The number and the combination of every type of POI can better reflect the location characteristics of houses locally. For example, in Figure 5a there are 1301 commercial POIs, 68 traffic POIs, 20 stadium POIs, and 123 school POIs. The number of commercial and school POIs is very large, which implies that this place (a) (b) may be the intersection between the school and the commercial district. For another Figure 5. POIs in the neighboring area of: (a) the gate of a university; (b) a newly built venue. Figure 5. POIs in the neighboring area of: (a) the gate of a university; (b) a newly built venue. example, Figure 5b is a newly built venue in Wuhan. There are 104 business POIs, 53 transportation POIs, 6 stadium POIs, and no school POIs nearby. The number of subways, The amount of distribution of different types of POIs near a house can be measured to The amount of distribution of different types of POIs near a house can be measured bus, parking, and other transportation facilities is very large, but the number of express the quantity characteristics of POIs described above. In fact, the Kernel Density to express the quantity characteristics of POIs described above. In fact, the Kernel Density commercial and school POIs is very small, which demonstrates the characteristics of this Estimation (KDE) [48] is a practical way to measure the number and density of the points Estimation (KDE) [48] is a practical way to measure the number and density of the points near place a as certain a new infr place,ast which ructu isre a and new robust analytical venue. There tool in fore, t GIS for he loc model ational ch discovery aract and eristics of a near a certain place, which is a robust analytical tool in GIS for model discovery and spatial statistical and spatiotemporal data mining. In this research, KDE is adopted and the place can be reflected in the form of the above. spatial statistical and spatiotemporal data mining. In this research, KDE is adopted and estimated density value of KDE for different types of POIs can be used as “quantity-based the estimated density value of KDE for different types of POIs can be used as “quantity- variables” for expressing the characteristics of the rental houses, that is: based variables” for expressing the characteristics of the rental houses, that is: jN j dists ( ,x ) 1 disjj t(s ,,kx ) 1 j j,k η()sK = ( ) (7) h(s ) = K( ) (7) j 2 j å k =1 h h k=1 where η(sj) is the estimated density value of the jth type of POIs for a house sample, Nj is where h(s ) is the estimated density value of the jth type of POIs for a house sample, the total number of the jth type of POIs, dist(sj, xj,k) is the distance between the location of N is the total number of the jth type of POIs, dist(s , x ) is the distance between the j j j,k the house and the location of the kth POI in the jth type of POIs, K(·) is the penalty function location of the house and the location of the kth POI in the jth type of POIs, K() is the (also called kernel function in KDE); and h is the bandwidth of the kernel function, which represents the smoothing effect of the kernel function. If we put [η(s1), η(s1),…, η(sj)…], the estimated density values for all types of POIs, together, they can represent comprehensive locational and neighborhood characteristics for a rental house. The combination of [η(s1), η(s1),…, η(sj), …, η(sN)] is labeled as “quantity-based locational and neighborhood variables” in this paper. From the formula above, we learn that the kernel function K(·) and the bandwidth h are 2 parameters that KDE requires. In this research, 4 types of common-used kernel functions are tested to find a suitable kernel and bandwidth for the quantity-based locational and neighborhood variables: the Triangular Kernel, the Gaussian Kernel, and the Laplacian Kernel. We put the structural variables and the “quantity-based locational and neighborhood variables” together, to construct the vector F, which represents the overall factors of the rental house price: Fs =[) v ,sv , ..., sv ,ηη (s , (s ),...η(s )] (8) 11 2 Q 2 N where sv represents a structural variable of the house (in Table 2); Q and N represent the number of the structural variables and locational and neighborhood variables, respectively. The different types of kernel functions and bandwidths would be tested and optimized to find a best one to make the factors F to get a highest R in the OLS model related to the rental house price. nn n nr r − r r   i,, o is io, i,s 2 ii == 11 i=1 R = nn n n 22 2 2 nr−− ()r n r ( r)    io,,io is, is, ii == 11 i=1 i=1 Land 2022, 11, 4 15 of 28 penalty function (also called kernel function in KDE); and h is the bandwidth of the kernel function, which represents the smoothing effect of the kernel function. If we put [h(s ), h(s ), . . . , h(s ), . . . ], the estimated density values for all types of POIs, together, they can represent comprehensive locational and neighborhood characteristics for a rental house. The combination of [h(s ), h(s ), . . . , h(s ), . . . , h(s )] is labeled as “quantity-based 1 1 j N locational and neighborhood variables” in this paper. From the formula above, we learn that the kernel function K() and the bandwidth h are 2 parameters that KDE requires. In this research, 4 types of common-used kernel functions are tested to find a suitable kernel and bandwidth for the quantity-based loca- tional and neighborhood variables: the Triangular Kernel, the Gaussian Kernel, and the Laplacian Kernel. We put the structural variables and the “quantity-based locational and neighborhood variables” together, to construct the vector F, which represents the overall factors of the rental house price: F = [sv , sv , . . . , sv , h(s ), h(s ), . . . , h(s )] (8) 1 2 Q 1 2 N where sv represents a structural variable of the house (in Table 2); Q and N represent the number of the structural variables and locational and neighborhood variables, respectively. The different types of kernel functions and bandwidths would be tested and optimized to find a best one to make the factors F to get a highest R in the OLS model related to the rental house price. n n n n r r r r å å å i,o i,s i,o i,s 2 i=1 i=1 i=1 R = q q 2 2 n 2 n n 2 n nå r (å r ) nå r (å r ) i,o i,s i=1 i,o i=1 i=1 i,s i=1 where r and r are the observed (actual) and simulated (calculated by the model) rental i;o i;s prices (unit: RMB/m /month) for the ith house, and n is the number of rental house samples in each dataset. By testing we find that the Gaussian Kernel performs the best for all the 4 cities and is chosen as the KDE kernel for generating the quantity-based locational and neighborhood variables in this study. After optimizing, the KDE bandwidth h of the 4 cities is determined as 12,657.4 m, 18,495.5 m, 11,549.4 m, 14,386.9 m for Wuhan, Beijing, Nanjing, and Xi’an, respectively. The quantity-based locational and neighborhood variables are more comprehensive than the traditionally used locational and neighborhood variables in Table 2 and better reflect the multiscale and comprehensive geographical characteristics of the location. To compare with the “quantity-based” variables, the distances to the 134 types of POIs can also be the locational and neighborhood variables, and they are introduced and labeled as “distance-based locational and neighborhood variables” in this paper. Compared with distance-based and quantity-based variables, the “traditional locational and neighborhood variables” refer to the locational and neighborhood variables in Table 2, which includes 9 frequently used variables. It should be noted that the traditional, distance-based and quantity-based locational and neighborhood variables are all correlated with the distance to the POIs. The distances correlated with POIs in this study are measured as the road network distance, which can be measured through GIS network analysis. Then each kind of locational and neighborhood variable can be generated: the traditional variables and distance-based variables are generated by the nearest distance to a certain kind of POI; the quantity-based variables are generated by the KDE kernel functions and bandwidths. The “distance-based” and “quantity-based” locational and neighborhood variables contain large, complex, similar, and multicollinear factors, which is the situation where deep-learning methods perform well. Hundreds of locational variables provide a sufficient number of vectors for learning and make the models more accurate. However, it should also be noted that the problem of multicollinearity inevitably exists among the very large number. Apparently, if these similar and multicollinear factors are employed in the HPM and GWR model, we cannot evaluate their impacts on the price through the model anymore. The main role of quantity-based variables is to improve the fitting accuracy. That is, the Land 2022, 11, 4 16 of 28 parameter b in the HPM and b(u , v ) in the GWR are no longer of economic significance i i with the employment of distance-based and quantity-based locational and neighborhood variables, but the fitting accuracy of the model will be greatly improved. Therefore, the meaning of the parameters of the variables (the impact on the price) will not be discussed in this paper. In addition, when calculating b(u , v ) in Equation (3) of the GWR model, the i i solution of the inverse matrix should be replaced by its pseudoinverse in case there is no inverse matrix. In this study, traditional locational and neighborhood variables, distance-based lo- cational and neighborhood variables, and quantity-based locational and neighborhood variables will be respectively employed in the HPM, GWR, FCNN, and FCNN–GWR model. When it comes to the distance-based and quantity-based locational and neighborhood vari- ables for each model, the meaning of the parameters in the model will not be discussed as they make no sense, and only the fitting accuracy and predictive ability would be discussed. 4.4. Accuracy Assessment of the Models The basic rental price models in this study include 4 types: HPM, GWR, FCNN and the proposed FCNN–GWR model. The locational and neighborhood variables of the house include 3 kinds: traditional variables, distance-based variables, and quantity-based vari- ables. For these 4 basic models, all 3 kinds of locational and neighborhood variables will be respectively employed, and the fitting results will be compared. The corresponding experimental groups are labeled as traditional HPM, distance-based HPM, quantity-based HPM, traditional GWR, distance-based GWR, quantity-based GWR, traditional FCNN, distance-based FCNN, quantity-based FCNN, traditional FCNN-GWR, distance-based FCNN-GWR, and quantity-based FCNN-GWR. For the traditional HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory variables are 20 structural variables and 9 traditional locational and neighborhood variables with no multicollinearity. For the distance-based HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory vari- ables are the structural variables and 134 distance-based locational and neighborhood variables. For the quantity-based HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory variables are the structural variables and 134 quantity-based locational and neighborhood variables. In all of the experiments, the data are divided into a 70% training set and a 30% test set. First, each model would be fitted or trained on the training set; then, the model would be executed on the test set to access the fitting accuracy and predictive power for unknown samples. To enhance the reliability of the experiment, a shuffle and split cross-validation is carried out. The training set and test set are shuffled 4 times, and the results are averaged finally in case that they are determined by inaccurate information. Several accuracy assessment indicators are calculated to appraise the performance of the above models, referring to existing studies [13,31] and commonly adopted indicators, including the Pearson’s correlation coefficient (Pearson R), the adjusted coefficient of determination (adj R ), the root mean square error (RMSE) and its percentage (%RMSE), and the mean absolute error (MAE) and its percentage (%MAE): (r r )(r r ) å o s i,o i,s i=1 r r Pearson R = 2 2 n n (r r ) (r r ) å o å s i,o i,s i=1 i=1 n n n n r r r r å å å 2 i,o i,s i,o i,s i=1 i=1 i=1 q q R = 2 2 n 2 n n 2 n nå r (å r ) nå r (å r ) i,o i,s i=1 i,o i=1 i=1 i,s i=1 (1R )(n1) adj R = nm1 (r r ) i,o i,s i=1 RMSE = å (r r ) i,o i,s i=1 %RMSE = /r r r å j j i,o i,s i=1 MAE = r r /r å j j i,o i,s i,o i=1 %MAE = n Land 2022, 11, 4 17 of 28 where r and r are the observed and simulated rental prices for the ith house (unit: i;o i;s RMB/m /month), and n is the number of rental houses in each dataset. 5. Results and Discussions In this section the above-mentioned models will be conducted and analyzed. Since there are many data included in this research, not all of them are able to be displayed in this limited paper. In some cases, the results of Wuhan are preferentially demonstrated as the example, and the data of other cities (Nanjing, Beijing, Xi’an) are employed to strengthen and verify our models and results. 5.1. HPM Results The HPM model was implemented with MLR, and the result is shown in Table 3. For the traditional HPM of the Wuhan dataset, the value of R is 0.520, suggesting that 52.0% of the variance in the house rental price is explained by this model. The results of other cities are similar. The value of R is 0.509, 0.663, 0.342 in Nanjing, Beijing, and Xi’an. The HPM can abstract and generalize the factors, and global patterns of the rental prices can be comprehended through it [24,25]. However, the accuracy of the traditional HPM is not high and cannot meet the requirements for estimating and forecasting the price. The results in Table 3 show that the R values of the distance-based HPM and quantity-based HPM reach 0.623 and 0.746 respectively in Wuhan, which can explain 62.3% and 74.6% of the variance in the rental price. The results of other cities also represent that the distance-based and quantity-based HPM apparently fits better than the traditional HPM. As the quantity-based variables can more comprehensively reflect the characteristics of the house with more geographic objects considered and with a comprehensive geographical perspective, they can greatly increase the accuracy of MLR fitting of its rental price. Table 3. Accuracy assessment results of HPM with different locational and neighborhood variables— the traditional, distance-based, and quantity-based. Wuhan Nanjing Distance- Quantity- Distance- Quantity- Traditional Traditional Based Based Based Based HPM HPM HPM HPM HPM HPM Pearson R 0.7187 0.7857 0.8601 0.6954 0.7505 0.7673 adj R 0.5197 0.6225 0.7463 0.5088 0.5951 0.6216 RMSE 7.2032 6.3765 5.1945 11.4868 10.8076 10.4635 %RMSE 21.62% 19.17% 15.61% 19.14% 18.04% 17.44% MAE 5.2871 4.6981 3.5077 8.507 7.8434 7.7032 %MAE 16.70% 13.99% 10.70% 15.45% 14.26% 14.09% t-test <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 Beijing Xi’an Distance- Quantity- Distance- Quantity- Traditional Traditional Based Based Based Based HPM HPM HPM HPM HPM HPM Pearson R 0.7964 0.8266 0.8316 0.5878 0.7494 0.8024 adj R 0.6633 0.7162 0.7249 0.3423 0.5583 0.6393 RMSE 19.2098 17.9573 17.7323 7.7096 6.4836 6.115 %RMSE 20.01% 18.71% 18.48% 25.73% 21.77% 20.53% MAE 13.4011 12.2154 11.9717 5.0645 4.3013 4.0390 %MAE 15.78% 14.10% 13.75% 17.57% 14.73% 13.91% t-test <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 Land 2022, 11, 4 18 of 28 5.2. GWR Results 5.2.1. Traditional GWR The results of GWR on traditional adopted variables are labeled as traditional GWR and are summarized in Table 4 (compared with traditional HPM). In this method the R values in the test sets are 0.756, 0.533, 0.709, and 0.615 for Wuhan, Nanjing, Beijing, and Xi’an, respectively, which is apparently larger than that of traditional HPM. The AICc values are smaller than that of traditional HPM (All of the values are averaged for the shuffled datasets). It can be seen from the results that traditional GWR has a much higher explanatory ability for the house rental price than traditional HPM. As mentioned before, this is because GWR takes into account the spatial heterogeneity in the rental house prices, which is not a negligible property for rental house price modeling. Table 4. The results of traditional GWR. Wuhan Nanjing Beijing Xi’an band width 868.5 1042.7 960.2 823.9 adj R of traditional GWR (training 0.8416 0.6306 0.7665 0.7062 set) adj R of traditional GWR (test set) 0.7558 0.5328 0.7087 0.6148 0.5197 0.5088 0.6633 0.3423 adj R of traditional HPM (test set) 4 5 6 5 AICc of traditional GWR (test set) 8.683  10 2.594  10 1.140  10 1.295  10 5 5 6 5 AICc of traditional HPM (test set) 2.119  10 2.940  10 1.273  10 2.679  10 5.2.2. Distance-Based GWR and Quantity-Based GWR In case of multicollinearity, when calculating b(u , v ) in the test set of the distance- i i based and quantity-based GWR method, the solution of the inverse matrix in Equation (3) should be replaced by the pseudoinverse. Then, GWR prediction would be executed in the test set [57]. With the Wuhan dataset as an example, the performance of each method in the training set and test set can be seen in Table 5, which suggests that quantity-based GWR performs better than distance-based GWR. Therefore, the quantity-based locational and neighborhood variables are also effective in the GWR model. All GWR methods are superior to the HPM in accuracy for the consideration of the spatial heterogeneity. Table 5. The performances of GWR and FCNN in the training set and the test set of Wuhan. Distance-Based GWR Distance-Based FCNN Method Difference between Difference Between Training Set Test Set Training Set Test Set Training and Test Sets Training and Test Sets Pearson R 0.9237 0.8883 0.0354 0.9211 0.9141 0.0070 0.9072 0.8047 0.1025 0.8813 0.8682 0.0131 adj R RMSE 3.092 4.5450 1.4530 3.3609 3.551 0.1901 %RMSE 9.25% 13.61% 4.35% 10.06% 10.63% 0.57% MAE 2.2676 2.6683 0.4007 2.3721 2.4827 0.1106 %MAE 6.59% 7.77% 1.18% 6.79% 7.11% 0.32% Quantity-Based GWR Quantity-Based FCNN Method Difference between Difference between Training Set Test Set Training Set Test Set Training and Test Sets Training and Test Sets Pearson R 0.9498 0.8882 0.0616 0.9439 0.9344 0.0095 adj R 0.9479 0.8379 0.1100 0.9104 0.8925 0.0179 RMSE 2.0271 4.5271 2.500 2.9762 3.2923 0.3161 %RMSE 6.09% 13.56% 7.47% 8.92% 9.86% 0.95% MAE 1.4830 2.4363 0.9533 1.8410 2.0179 0.1769 %MAE 4.44% 7.30% 2.86% 5.65% 6.19% 0.54% Land 2022, 11, 4 19 of 28 Moreover, from the left part of Table 5 it can be seen that the accuracy of the GWR models is good in the training set but decreases significantly in the test set for each indicator. For example, the value of R decreased by 0.10 in distance-based GWR and decreased by 0.11 in quantity-based GWR from the training set to the test set. As noted by Cajias et al. [7], the GWR model may have overfitting problems and might have a disadvantage in out-of-sample forecasts. For the possible overfitting defect of the GWR model, we tested the FCNN deep-learning method to improve the predictive power of the house rental price. 5.3. Combining the FCNN and GWR 5.3.1. FCNN Results Similarly, traditional, distance-based, and quantity-based locational and neighborhood variables are employed in the FCNN model which are labeled as traditional FCNN, distance- based FCNN, and quantity-based FCNN, respectively. Taking Wuhan as an example: the traditional FCNN becomes stable after approximately 50,000 steps of training, the distance- based FCNN tends stable after approximately 300,000 steps, and the quantity-based FCNN method achieves stability approximately about 300,000 steps. The results in Table A2 (in Appendix A) show that the fitting and prediction accuracy of the quantity-based FCNN is higher than that of the traditional and the distance-based FCNN. Besides, the quantity- based FCNN method is the better one in terms of the accuracy indicators among the above HPM, GWR, and FCNN methods. 5.3.2. Comparison of GWR and FCNN Overall, the GWR model is a linear-based model that reflects the influence of the neighboring house samples; deep learning involves complex nonlinear characteristics but does not directly handle the influence of the neighboring houses. When there are abundant locational and neighborhood variables, for neighboring houses, their characteristic vectors consisting of the structural, locational and neighborhood variables (the x in Equations (1) and (2)) would be very similar. The similar vectors will greatly contribute to the fitting value of these neighboring houses, which to some extent can be seen as an implicit reflection of the neighboring influences. As a result, the FCNN method generally has better results than GWR, both in terms of the distance-based and quantity-based variables, as shown in Table 5 with the example of Wuhan. Moreover, the FCNN models results in much smaller discrepancies between the training sets and test sets than the GWR models. For example, in Table 5, for the quantity-based GWR the R differs by 0.11 between the training and test sets, and the RMSE differs by 2.5; while for the quantity-based FCNN, the R differs by only 0.02, and the RMSE differs by only 0.32 between the training and test sets. Therefore, this study also provides evidence for the possible disadvantage of the GWR model in out-of-sample forecasts, and prediction with the FCNN is more stable and robust than using GWR. 5.3.3. FCNN–GWR Results and Discussion Although the prediction accuracy of the FCNN is clearly higher than that of GWR, the FCNN model does not explicitly handle the influence of neighboring houses; thus, it can be improved. The FCNN–GWR model is proposed by combining deep learning with GWR, which can not only reflect the nonlinearly complicated characteristics of the rental house prices but also explicitly process spatial heterogeneity. The FCNN–GWR results are shown in Table A2 (in Appendix A) and can be compared with those of the FCNN model and the GWR model. In addition to the indicators introduced in Section 4.4, four indicators are adopted to evaluate the stability of different shuffles of training sets and test sets. They are as follows: (1). Pearson R range: the maximum deviation of the Pearson R values in different shuffles of datasets; (2) R range: the maximum deviation of the R values in different shuffles of datasets; (3) Pearson R std.: the adjusted deviation of the Pearson R in different shuffles of datasets; and (4) R std.: the adjusted deviation of R in different shuffles of datasets. Finally, the stability of the partition of the training sets is discussed. Land 2022, 11, 4 20 of 28 When viewed horizontally, Table A2 shows that for each city, quantity-based FCNN- GWR is clearly the best method for predicting rental house prices: all the 6 accuracy indicators obtained the best values among all of the experiments. For models with tra- ditional, distance-based, and quantity-based locational and neighborhood variables, the fitting accuracy of the HPM methods are relatively low. The GWR methods have higher Land 2021, 10, x FOR PEER REVIEW 21 of 28 fitting precisions, but the stability and robustness are not very good. The FCNN methods have higher accuracy than GWR, and FCNN–GWR has the highest accuracy. FCNN–GWR includes both the complex nonlinear characteristics and the spatial heterogeneity of the rental house prices and reduces the instability of GWR in areas with relatively sparse relatively sparse samples via Equation (6). Therefore, it has fine precision and stability for samples via Equation (6). Therefore, it has fine precision and stability for the fitting and the fitting and forecasting of rental house price. forecasting of rental house price. For the four HPM, GWR, FCNN, and FCNN–GWR models, with the quantity-based For the four HPM, GWR, FCNN, and FCNN–GWR models, with the quantity-based locational and neighborhood variables, we counted the percentage of the mean absolute locational and neighborhood variables, we counted the percentage of the mean absolute prediction error (namely, the %MAE in Section 4.4) of each method for the communities prediction error (namely, the %MAE in Section 4.4) of each method for the communities in in the Wuhan dataset, and the errors (averaged by all experiments) is visualized in Figure the Wuhan dataset, and the errors (averaged by all experiments) is visualized in Figure 6. 6. Cold colors (blue and green) mean the average prediction accuracies of these Cold colors (blue and green) mean the average prediction accuracies of these communities communities are relatively high, and warm colors (red and yellow) mean the prediction are relatively high, and warm colors (red and yellow) mean the prediction accuracies accuracies are relatively low. As shown in the figure, the accuracy of the HPM is relatively are relatively low. As shown in the figure, the accuracy of the HPM is relatively low low (Figure 6c). For GWR, the number of warm-colored dots (Figure 6a) is greater than (Figure 6c). For GWR, the number of warm-colored dots (Figure 6a) is greater than that for that for the FCNN (Figure 6b); and for GWR, most of the red dots are located in places the FCNN (Figure 6b); and for GWR, most of the red dots are located in places where the where the nearby neighborhoods are relatively sparse, which means there are relatively nearby neighborhoods are relatively sparse, which means there are relatively few samples few samples of houses nearby. This phenomenon suggests GWR is possibly to perform of houses nearby. This phenomenon suggests GWR is possibly to perform fluctuated in fluctuated in the area where the dots are sparse, or near the margin of the gathered dots. the area where the dots are sparse, or near the margin of the gathered dots. Therefore, Therefore, it can be inferred that the reliability of GWR prediction may be insufficient it can be inferred that the reliability of GWR prediction may be insufficient when the when the number of nearby samples around the prediction point is small. When there are number of nearby samples around the prediction point is small. When there are many many samples nearby (inside the gathered dots), both GWR and FCNN have relatively samples nearby (inside the gathered dots), both GWR and FCNN have relatively good good performance of accuracy, since GWR and FCNN consider the spatial heterogeneity performance of accuracy, since GWR and FCNN consider the spatial heterogeneity aspect aspect and the nonlinearity aspect of rental house prices, respectively. FCNN–GWR and the nonlinearity aspect of rental house prices, respectively. FCNN–GWR combines combines the FCNN and GWR, and it takes into account the nonlinearity and spatial the FCNN and GWR, and it takes into account the nonlinearity and spatial heterogeneity heterogeneity when there are many samples nearby; additionally, only the FCNN is when there are many samples nearby; additionally, only the FCNN is adopted when there adopted when there are relatively few nearby samples, to avoid the possible instability of are relatively few nearby samples, to avoid the possible instability of GWR in these cases. GWR in these cases. Therefore, FCNN–GWR can better predict rental house prices. Therefore, FCNN–GWR can better predict rental house prices. (a) (b) Figure 6. Cont. Land 2021, 10, x FOR PEER REVIEW 22 of 28 Land 2022, 11, 4 21 of 28 (c) (d) Figure 6. The difference of %MAE in predicted rental house prices of Wuhan when using the models of (a) GWR; (b) Figure 6. The difference of %MAE in predicted rental house prices of Wuhan when using the models FCNN; (c) HPM; (d) FCNN-GWR. (Some of the districts: ① Jiangan, ② Jianghan, ③ Qiaokou, ④ Qingshan, ⑤ Wuchang, of (a) GWR; (b) FCNN; (c) HPM; (d) FCNN-GWR. (Some of the districts: 1 Jiangan, 2 Jianghan, 3 ⑥ Hanyang, and ⑦ Hongshan). Qiaokou, 4 Qingshan, 5 Wuchang, 6 Hanyang, and 7 Hongshan). When Table A2 is viewed vertically, the traditional, distance-based, and quantity- When Table A2 is viewed vertically, the traditional, distance-based, and quantity-based based locational and neighborhood variables can be compared. According to the results, locational and neighborhood variables can be compared. According to the results, the the quantity-based variables perform better than the traditional and distance-based quantity-based variables perform better than the traditional and distance-based variables variables under all of the HPM, GWR, FCNN, and FCNN–GWR methods. Therefore, it under all of the HPM, GWR, FCNN, and FCNN–GWR methods. Therefore, it can be verified can be verified that the quantity-based locational and neighborhood variables can take that the quantity-based locational and neighborhood variables can take into account more into account more geographic information from a comprehensive perspective, thus geographic information from a comprehensive perspective, thus supporting the rental supporting the rental price model in obtaining better predictive power compared to the price model in obtaining better predictive power compared to the traditional and the traditional and the distance-based locational and neighborhood variables. distance-based locational and neighborhood variables. Regarding the stability of different partitions of training sets, in almost all cases, the Regarding the stability of different partitions of training sets, in almost all cases, the ranges of R and Pearson R for different partitions of the datasets are less than 0.008, and ranges of R and Pearson R for different partitions of the datasets are less than 0.008, and the adjusted deviations of R and Pearson R are less than 0.004, which all remain at not the adjusted deviations of R and Pearson R are less than 0.004, which all remain at not high high levels. The result shows that there is no apparent discrepancy under different levels. The result shows that there is no apparent discrepancy under different partitions of partitions of training sets, and different shuffles do not apparently affect the precision and training sets, and different shuffles do not apparently affect the precision and generalization generalization ability of our model. ability of our model. Additionally, the FCNN–GWR proposed in this paper is compared with other Additionally, the FCNN–GWR proposed in this paper is compared with other pre- previously published models in the Wuhan dataset, including a model of CNN [14], Bin viously published models in the Wuhan dataset, including a model of CNN [14], Bin [9] [9] and GWELM [31]. (We do not consider the image part of these models since there are and GWELM [31]. (We do not consider the image part of these models since there are no image data in this study.) Quantity-based locational and neighborhood variables are no image data in this study.) Quantity-based locational and neighborhood variables are adopted for these models, and the results are shown in Table 6. Among the methods adopted for these models, and the results are shown in Table 6. Among the methods above, above, FCNN–GWR performs the best in the indicators of the estimation or prediction of FCNN–GWR performs the best in the indicators of the estimation or prediction of rental rental house prices. The CNN model of Yu [14] treats housing price variables as an image house prices. The CNN model of Yu [14] treats housing price variables as an image and can and ca extract n extra theccomplexity t the complexi of the ty of the rel relationships ationamong ships among the v the variables. ariable However s. Ho,wever, the the model model does not consider spatial heterogeneity, which is a significant and non-negligible does not consider spatial heterogeneity, which is a significant and non-negligible factor, and factor, result and inrlimited esult in l performance imited perform in estimating ance in est rental imathouse ing reprices. ntal house pr It is also ices questionable . It is also whether questionthe able pooling whether t layers he pooling of the CNN layersar of e the CNN are nec necessary for theessar regry for the re ession of rental gression house of prices. rental Regar house prices. Regarding ding Bin’s approachBin’ [9],salthough approach [ the9boosted ], although regrthe boosted regressi ession trees in the model on trees can impr in the model can ove the accuracyimprove the accur of the prediction efficiently acy of , the the spatial predic heter tion eff ogeneity icientl isy, the spa also absenttia inl the architecture of the model. Thus, the accuracy of the model still has room for improve- heterogeneity is also absent in the architecture of the model. Thus, the accuracy of the ment. model st The ill h GWELM as room pr for oposed improvement by Deng.[ The G 31] incorporates WELM proposed by D GWR and ELM, eng [31] incorporate and on principle, s it Gmight WR anbe d EL able M, an to r d on p eveal rboth incipthe le, it spatial might b heter e abogeneity le to reveal b andononlinear th the spatcharacteristics ial heterogeneitin y Land 2022, 11, 4 22 of 28 rental house prices. However, as discussed above, when the quantity of nearby samples is small, the prediction by the GWR-like method may not be sufficiently dependable. GWELM would have unstable depressing accuracies in such cases and consequently reduce the final accuracy of rental house pricing estimation. Table 6. Comparison of FCNN–GWR with some previously published models (in Wuhan). FCNN–GWR CNN [14] Bin [9] GWELM [31] Pearson R 0.9534 0.9285 0.9364 0.9214 adj R 0.9192 0.8823 0.8902 0.8549 RMSE 3.2285 3.3678 3.3261 3.7874 %RMSE 9.69% 10.11% 9.96% 11.34% MAE 1.9748 2.0599 2.0387 2.2237 %MAE 6.07% 6.21% 6.12% 6.66% In summary, for the combination of spatial heterogeneity, nonlinear model, and ge- ographical scaling, the proposed quantity-based FCNN–GWR can steadily improve the performance of rental house price modeling. However, at present, the locational and neigh- borhood variables of this study were derived only from POI data. This model may lose some precision in some cases with complex characteristics, since other multitype data, such as remote sensing images [2], street view images [58], or landscape amenities [59] are not considered. In the future, experiments involving neural networks with the above multitype data will be conducted to further improve the accuracy of the rental house price model. Simplification for the procedure of the model and verification of the models in other cities can also be worthwhile work in the future. 6. Conclusions In this research, we make efforts to improve the accuracy of the rental house-pricing model. Taking four cities in China (Wuhan, Nanjing, Beijing, and Xi’an) as study areas, we combine deep learning and GWR to grasp both the nonlinear characteristics and spatial heterogeneity and propose the FCNN–GWR model to evaluate rental house prices. In this paper, the results of the HPM, GWR, the FCNN, and the proposed FCNN–GWR are compared in terms of accuracy. The results show that the quantity-based FCNN–GWR model has the highest accuracy. Compared with GWR, the proposed model shows the ability to include the nonlinear complexity of rental house prices, and it presents stable and more-accurate forecasts. Compared with the FCNN deep-learning method, the proposed model explicitly addresses spatial heterogeneity because it considers nearby influences. The work performed in this research verifies that deep learning and GWR explain rental house prices from different perspectives, and the combination of both can improve the evaluation accuracy of rental prices. Moreover, the method proposed in this paper may provide a useful reference for individuals and businesses in their transactions related to rental houses and assist the government in making appropriate policies for the price levels and positions of public rental housing. The quantity-based locational and neighborhood variables proposed in this paper offer a more comprehensive geographical perspective of locational and neighborhood character- istics. They can express the locational information of houses from a perspective involving more types (134 types of POIs in this study), and more comprehensively (the KDE method used in this study), with more geographic objects taken into account. Our experiments show that quantity-based variables better reflect the location of a rental house compared to the traditionally used and distance-based locational and neighborhood variables, and they help improve the accuracy of the pricing model. However, the locational and neighborhood variables in this paper were derived only from POI data, which is a limitation of this study. This model may lose some accuracy in some cases with complex characteristics, which might be involved in remote sensing images, street view images, or texts. In the future, experiments involving neural networks Land 2022, 11, 4 23 of 28 with such multitype data will be conducted to further increase the accuracy of rental house price estimation. In addition, there are many parameters and procedures in the architecture of the proposed FCNN–GWR. More kinds of neural networks can be tested to reduce the complexity of the procedures of this work and to improve the performance of the deep-learning model of rental house prices. Author Contributions: Conceptualization, H.S. and L.L.; methodology, L.L.; software, Y.L. and Z.L.; Validation, H.Z.; data curation, Z.L. and Y.L.; writing—original draft preparation, H.S. and H.Z.; project administration, L.L. All authors have read and agreed to the published version of the manuscript. Funding: This study is supported by the National Key Research and Development Program of China (2017YFB0503701). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: The data used in this paper mainly come from lianjia.com (accessed on 20 November 2021). Acknowledgments: The authors thank the editors and reviewers for providing insightful suggestions and comments. Conflicts of Interest: The authors declare no conflict of interest. Appendix A Table A1 supplements Table 2 and presents the statistical data of the rental house price data and relevant variables in Wuhan, Nanjing, Beijing, and Xi’an. Table A1. The statistics of the rental house price data and relevant attributes of the study areas. Mean Std. Type Variable Wuhan Nanjing Beijing Xi’an Wuhan Nanjing Beijing Xi’an Area 91.89 66.58 55.04 86.24 33.96 32.16 31.58 42.60 TotalFloor 22.65 16.24 14.70 27.58 11.97 13.56 8.01 7.60 Level 2.11 1.95 1.96 2.00 0.69 0.68 0.77 0.68 Age 11.07 15.03 17.66 8.17 6.18 7.32 9.37 4.16 Age-squared 152.13 279.08 406.07 85.32 161.57 274.56 416.80 88.12 Month 4.14 4.30 4.43 4.16 1.03 1.74 1.65 1.68 Spring * * * * * * * * Summer * * * * * * * * Structural Autumn * * * * * * * * variables Room 2.22 1.97 2.14 2.00 0.81 0.87 0.90 0.89 Hall 1.60 1.28 1.07 1.43 0.53 0.53 0.44 0.75 Toilet 1.21 1.03 1.14 1.19 0.44 0.37 0.44 0.47 South * * * * * * * * North * * * * * * * * East * * * * * * * * West * * * * * * * * PlotRatio 3.27 1.38 2.23 4.03 1.63 0.98 0.90 1.48 Land 2022, 11, 4 24 of 28 Table A1. Cont. Mean Std. Type Variable Wuhan Nanjing Beijing Xi’an Wuhan Nanjing Beijing Xi’an Green 0.33 35.36 31.01 36.40 0.07 9.92 7.09 7.70 ParkSpace 791.32 569.48 618.57 1080.70 1056.91 666.48 2714.80 1400.44 Fee 1.79 0.92 1.06 1.33 0.83 0.69 1.52 0.64 DCBD 7.06 11.33 18.86 17.32 5.98 6.69 12.21 7.96 Locational Dbus 2.09 0.21 0.27 0.25 1.67 0.12 0.14 0.70 variables Dsub 4.21 1.28 2.37 2.04 3.94 1.29 5.13 2.35 DshopCen 1.43 1.74 3.38 1.29 1.35 1.14 5.89 0.93 Dpark 1.57 1.06 6.38 1.70 1.10 0.68 8.75 1.00 DpriSch 0.88 10.47 2.43 0.83 0.90 5.00 5.23 0.46 Neighborhood variables DsecSch 1.00 8.41 2.39 1.38 0.85 5.57 4.66 0.81 Dnurs 0.43 4.09 1.63 0.42 0.43 2.88 4.24 0.35 Dhosp 0.28 0.79 2.79 0.53 0.28 0.58 5.53 0.55 Price 33.32 60.03 95.94 29.94 11.24 24.37 44.28 10.25 *: not applicable for dummy variables. Table A2 presents the accuracy assessment results of the methods compared in this paper. The table includes 4 basic types of models: HPM, GWR, FCNN and FCNN-GWR. Additionally, it includes 3 kinds of locational and neighborhood variables: traditional variables, distance-based variables and quantity-based variables. It also includes 4 cities: Wuhan, Nanjing, Beijing and Xi’an. Table A2. Accuracy assessment results of each method. Wuhan Nanjing Traditional Traditional Traditional Traditional Traditional Traditional Traditional Traditional FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.7187 0.7985 0.8831 0.9158 0.6954 0.7348 0.8573 0.8783 adj R 0.5197 0.7558 0.7901 0.8268 0.5088 0.5328 0.7788 0.8202 RMSE 7.2032 5.5664 4.7929 4.5955 11.4868 11.3654 9.3097 8.8566 %RMSE 21.62% 16.77% 14.40% 13.79% 19.14% 18.91% 15.50% 14.73% MAE 5.2871 3.0454 3.3635 3.2788 8.507 7.1967 6.4888 6.0519 %MAE 16.70% 9.40% 10.57% 10.35% 15.45% 13.20% 11.68% 10.91% Pearson R 0.0035 0.0036 0.0029 0.0025 0.0031 0.0032 0.0041 0.004 range R range 0.0037 0.0032 0.0035 0.0032 0.0066 0.0053 0.0069 0.0062 Pearson R 0.0016 0.0014 0.0015 0.0011 0.0017 0.0015 0.0021 0.002 std. R std. 0.0018 0.0016 0.0019 0.0017 0.004 0.0028 0.0033 0.0031 Land 2022, 11, 4 25 of 28 Table A2. Cont. Wuhan Nanjing distance- distance- distance- distance- distance- distance- distance- distance- based based based based based based based based FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.7857 0.8883 0.9141 0.9288 0.7505 0.8254 0.8901 0.9153 adj R 0.6225 0.8047 0.8682 0.8922 0.5951 0.6548 0.8419 0.8641 RMSE 6.3765 4.545 3.551 3.466 10.8076 9.8284 8.4661 8.2322 %RMSE 19.17% 13.61% 10.63% 10.41% 18.04% 16.39% 14.09% 13.70% MAE 4.6981 2.6683 2.4827 2.2425 7.8434 7.1285 5.6652 5.5031 %MAE 13.99% 7.77% 7.11% 6.97% 14.26% 12.96% 10.35% 10.05% Pearson R 0.0042 0.0044 0.0054 0.0055 0.0054 0.004 0.0038 0.0044 range R range 0.0058 0.0052 0.0061 0.0049 0.0075 0.0051 0.0078 0.0057 Pearson R 0.0021 0.002 0.0025 0.0024 0.0023 0.0021 0.0022 0.0015 std. R std. 0.0031 0.003 0.0034 0.003 0.0044 0.002 0.0035 0.0031 quantity- quantity- quantity- quantity- quantity- quantity- quantity- quantity- based based based based based based based based FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.8601 0.8882 0.9344 0.9534 0.7673 0.9023 0.8919 0.9209 0.7463 0.8379 0.8925 0.9192 0.6216 0.6686 0.8455 0.8715 adj R RMSE 5.1945 4.5271 3.2923 3.2285 10.4635 11.1837 8.4302 8.1608 %RMSE 15.61% 13.56% 9.86% 9.69% 17.44% 18.66% 14.06% 13.56% MAE 3.5077 2.4363 2.0179 1.9748 7.7032 7.032 5.7471 5.5475 %MAE 10.70% 7.30% 6.19% 6.07% 14.09% 12.75% 10.46% 10.12% Pearson R 0.0038 0.0047 0.0033 0.0035 0.0027 0.0029 0.0037 0.0048 range R range 0.0043 0.0061 0.006 0.004 0.0049 0.003 0.0076 0.0055 Pearson R 0.002 0.0019 0.002 0.0015 0.0011 0.0013 0.0019 0.002 std. R std. 0.0028 0.0032 0.0024 0.0024 0.0017 0.0014 0.0036 0.0027 Beijing Xi’an traditional traditional traditional traditional traditional traditional traditional traditional FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.7964 0.8172 0.9019 0.9205 0.5878 0.8123 0.7546 0.8253 adj R 0.6633 0.7087 0.8616 0.8839 0.3423 0.6148 0.5701 0.6798 RMSE 19.2098 17.8532 13.983 13.2301 7.7096 6.3072 6.4261 5.9846 %RMSE 20.01% 18.60% 14.65% 13.81% 25.73% 21.16% 21.61% 20.10% MAE 13.4011 11.4688 9.2452 8.515 5.0645 3.7569 4.3707 3.8722 %MAE 15.78% 12.74% 10.23% 9.37% 17.57% 12.94% 15.25% 13.27% Pearson R 0.0022 0.0029 0.002 0.0021 0.0036 0.0031 0.0026 0.0026 range Land 2022, 11, 4 26 of 28 Table A2. Cont. Beijing Xi’an 0.003 0.0035 0.0074 0.0037 0.0041 0.0088 0.0036 0.0035 R range Pearson R 0.001 0.0012 0.0008 0.001 0.0016 0.0013 0.0011 0.0012 std. 0.0014 0.0018 0.0038 0.0013 0.0015 0.0033 0.0014 0.0016 R std. distance- distance- distance- distance- distance- distance- distance- distance- based based based based based based based based FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.8266 0.8541 0.8966 0.9093 0.7494 0.8253 0.874 0.8934 adj R 0.7162 0.7643 0.8469 0.8673 0.5583 0.6723 0.7631 0.7856 RMSE 17.9573 16.7231 14.2703 13.9321 6.4836 5.9612 5.6395 5.4804 %RMSE 18.71% 17.42% 14.91% 14.51% 21.77% 20.00% 18.78% 18.26% MAE 12.2154 10.6983 8.8871 8.6769 4.3013 3.7842 3.4775 3.3791 %MAE 14.10% 11.89% 9.66% 9.44% 14.73% 12.89% 11.71% 11.77% Pearson R 0.0016 0.0014 0.0027 0.004 0.0042 0.003 0.0067 0.0033 range R range 0.0031 0.0017 0.0044 0.004 0.005 0.004 0.007 0.0045 Pearson R 0.001 0.0005 0.001 0.0015 0.002 0.0017 0.003 0.0015 std. R std. 0.0016 0.0008 0.0025 0.0018 0.0021 0.0025 0.0028 0.0023 quantity- quantity- quantity- quantity- quantity- quantity- quantity- quantity- based based based based based based based based FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.8316 0.8624 0.915 0.9251 0.8024 0.8373 0.8872 0.9042 0.7249 0.7723 0.8822 0.8981 0.6393 0.6849 0.7818 0.8051 adj R RMSE 17.7323 16.5705 13.1021 12.8735 6.115 5.8374 5.5555 5.3923 %RMSE 18.48% 17.26% 13.67% 13.39% 20.53% 19.59% 18.50% 17.95% MAE 11.9717 10.5897 8.3726 8.2273 4.039 3.7053 3.3941 3.2896 %MAE 13.75% 11.77% 9.21% 9.05% 13.91% 12.62% 11.42% 11.07% Pearson R 0.0016 0.0023 0.0023 0.0042 0.0021 0.0019 0.0028 0.0051 range 0.0025 0.0028 0.0042 0.0057 0.0029 0.0023 0.0067 0.0064 R range Pearson R 0.0008 0.0011 0.0012 0.0018 0.001 0.0011 0.0014 0.0026 std. R std. 0.001 0.0014 0.0018 0.0022 0.0012 0.0013 0.0034 0.0034 References 1. Bin, J.; Gardiner, B.; Liu, Z.; Li, E. Attention-Based Multi-Modal Fusion for Improved Real Estate Appraisal: A Case Study in Los Angeles. Multimed. Tools Appl. 2019, 78, 31163–31184. [CrossRef] 2. Wang, P.-Y.; Chen, C.-T.; Su, J.-W.; Wang, T.-Y.; Huang, S.-H. Deep Learning Model for House Price Prediction Using Heterogeneous Data Analysis Along with Joint Self-Attention Mechanism. IEEE Access 2021, 9, 55244–55259. [CrossRef] 3. Shimizu, C.; Karato, K.; Nishimura, K. Nonlinearity of Housing Price Structure: Assessment of Three Approaches to Nonlinearity in the Previously Owned Condominium Market of Tokyo. Int. J. Hous. Mark. Anal. 2014, 7, 459–488. [CrossRef] 4. Liang, X.; Liu, Y.; Qiu, T.; Jing, Y.; Fang, F. The Effects of Locational Factors on the Housing Prices of Residential Communities: The Case of Ningbo, China. Habitat Int. 2018, 81, 1–11. [CrossRef] Land 2022, 11, 4 27 of 28 5. Fotheringham, A.S.; Charlton, M.E.; Brunsdon, C. Geographically Weighted Regression: A Natural Evolution of the Expansion Method for Spatial Data Analysis. Environ. Plan. A 1998, 30, 1905–1927. [CrossRef] 6. Wu, C.; Ye, X.; Ren, F.; Wan, Y.; Ning, P.; Du, Q. Spatial and Social Media Data Analytics of Housing Prices in Shenzhen, China. PLoS ONE 2016, 11, e0164553. [CrossRef] 7. Cajias, M.; Ertl, S. Spatial Effects and Non-Linearity in Hedonic Modeling Will Large Data Sets Change Our Assumptions? J. Prop. Invest. Financ. 2018, 36, 32–49. [CrossRef] 8. Bellotti, A. Reliable Region Predictions for Automated Valuation Models. Ann. Math. Artif. Intell. 2017, 81, 71–84. [CrossRef] 9. Bin, J.; Gardiner, B.; Li, E.; Liu, Z. Multi-Source Urban Data Fusion for Property Value Assessment: A Case Study in Philadelphia. Neurocomputing 2020, 404, 70–83. [CrossRef] 10. Bency, A.J.; Rallapalli, S.; Ganti, R.K.; Srivatsa, M.; Manjunath, B.S. Beyond Spatial Auto-Regressive Models: Predicting Housing Prices with Satellite Imagery. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (Wacv 2017), Santa Rosa, CA, USA, 24–31 March 2017; pp. 320–329. 11. Jiang, Z.; Shen, G. Prediction of House Price Based on the Back Propagation Neural Network in the Keras Deep Learning Framework. In Proceedings of the 2019 6th International Conference on Systems and Informatics (ICSAI), Shanghai, China, 8–10 November 2019; pp. 1408–1412. 12. Xu, J. A Novel Deep Neural Network Based Method for House Price Prediction. In Proceedings of the 2021 International Conference of Social Computing and Digital Economy (ICSCDE), Chongqing, China, 28–29 August 2021; pp. 12–16. 13. Yao, Y.; Zhang, J.; Hong, Y.; Liang, H.; He, J. Mapping Fine-Scale Urban Housing Prices by Fusing Remotely Sensed Imagery and Social Media Data. Trans. GIS 2018, 22, 561–581. [CrossRef] 14. Yu, L.; Jiao, C.; Xin, H.; Wang, Y.; Wang, K. Prediction on Housing Price Based on Deep Learning. Int. J. Comput. Inf. Eng. 2018, 12, 90–99. 15. Andrew, M.; Haurin, D.; Munasib, A. Explaining the Route to Owner-Occupation: A Transatlantic Comparison. J. Hous. Econ. 2006, 15, 189–216. [CrossRef] 16. Seo, K.; Golub, A.; Kuby, M. Combined Impacts of Highways and Light Rail Transit on Residential Property Values: A Spatial Hedonic Price Model for Phoenix, Arizona. J. Transp. Geogr. 2014, 41, 53–62. [CrossRef] 17. Dong, H. The Impact of Income Inequality on Rental Affordability: An Empirical Study in Large American Metropolitan Areas. Urban Stud. 2018, 55, 2106–2122. [CrossRef] 18. Won, J.; Lee, J.-S. Investigating How the Rents of Small Urban Houses Are Determined: Using Spatial Hedonic Modeling for Urban Residential Housing in Seoul. Sustainability 2018, 10, 31. [CrossRef] 19. Liebelt, V.; Bartke, S.; Schwarz, N. Hedonic Pricing Analysis of the Influence of Urban Green Spaces onto Residential Prices: The Case of Leipzig, Germany. Eur. Plan. Stud. 2018, 26, 133–157. [CrossRef] 20. Yu, T.; Song, Y. Solving the Problem of ‘Cold Weather ’ of Public Rental Houses—Based on the Analysis of Government’s Purchase of Public Service. China Econ. Trade Guide (Middle) 2018, 35, 74–76. 21. Henderson, J.V.; Ioannides, Y.M. A Model of Housing Tenure Choice. Am. Econ. Rev. 1983, 73, 98–113. 22. Ioannides, Y.M.; Rosenthal, S.S. Estimating the Consumption and Investment Demands for Housing and Their Effect on Housing Tenure Status. Rev. Econ. Stat. 1994, 76, 127–141. [CrossRef] 23. Rosen, S. Hedonic Prices and Implicit Markets-Product Differentiation in Pure Competition. J. Political Econ. 1974, 82, 34–55. [CrossRef] 24. Malpezzi, S. Hedonic Pricing Models: A Selective and Applied Review. Hous. Econ. Public Policy 2002, 1, 67–89. 25. Wu, J.; Wang, M.; Li, W.; Peng, J.; Huang, L. Impact of Urban Green Space on Residential Housing Prices: Case Study in Shenzhen. J. Urban Plan. Dev. 2015, 141, 05014023. [CrossRef] 26. Anselin, L.; Bera, A.K. Spatial Dependence in Linear Regression Models with an Introduction to Spatial Econometrics. Stat.: Textb. Monogr 1998, 155, 237–289. 27. Anselin, L. Exploring Spatial Data with Geodatm: A Workbook; Center for Spatially Integrated Social Science: Urbana, IL, USA, 2004; Volume 51, pp. 213–223. 28. Hastie, T.J.; Tibshirani, R.J. Generalized Additive Models. Stat. Sci. 1986, 1, 297–310. [CrossRef] 29. Li, S.; Ye, X.; Lee, J.; Gong, J.; Qin, C. Spatiotemporal Analysis of Housing Prices in China: A Big Data Perspective. Appl. Spat. Anal. Policy 2017, 10, 421–433. [CrossRef] 30. Huang, B.; Wu, B.; Barry, M. Geographically and Temporally Weighted Regression for Modeling Spatio-Temporal Variation in House Prices. Int. J. Geogr. Inf. Sci. 2010, 24, 383–401. [CrossRef] 31. Deng, M.; Yang, W.; Liu, Q. Geographically Weighted Extreme Learning Machine: A Method for Space-Time Prediction. Geogr. Anal. 2017, 49, 433–450. [CrossRef] 32. Wang, J.J.; Hu, S.G.; Zhan, X.T.; Luo, Q.; Yu, Q.; Liu, Z.; Chen, T.P.; Yin, Y.; Hosaka, S.; Liu, Y. Predicting House Price with a Memristor-Based Artificial Neural Network. IEEE Access 2018, 6, 16523–16528. [CrossRef] 33. Fan, G.Z.; Pu, M.; Deng, X.; Ong, S.E. Optimal Portfolio Choices and the Determination of Housing Rents under Housing Market Uncertainty. J. Hous. Econ. 2018, 41, 200–217. [CrossRef] 34. Yoo, S.; Im, J.; Wagner, J.E. Variable Selection for Hedonic Model Using Machine Learning Approaches: A Case Study in Onondaga County, NY. Landsc. Urban Plan. 2012, 107, 293–306. [CrossRef] Land 2022, 11, 4 28 of 28 35. Hu, L.; He, S.; Han, Z.; Xiao, H.; Su, S.; Weng, M.; Cai, Z. Monitoring Housing Rental Prices Based on Social Media: An Integrated Approach of Machine-Learning Algorithms and Hedonic Modeling to Inform Equitable Housing Policies. Land Use Policy 2019, 82, 657–673. [CrossRef] 36. Rico-Juan, J.R.; de La Paz, P.T. Machine Learning with Explainability or Spatial Hedonics Tools? An Analysis of the Asking Prices in the Housing Market in Alicante, Spain. Expert Syst. Appl. 2021, 171, 114590. [CrossRef] 37. Roth, A.E. The Shapley Value: Essays in Honor of Lloyd S. Shapley; Cambridge University Press: Cambridge, NY, USA, 1988. 38. Fu, X.; Jia, T.; Zhang, X.; Li, S.; Zhang, Y. Do Street-Level Scene Perceptions Affect Housing Prices in Chinese Megacities? An Analysis Using Open Access Datasets and Deep Learning. PLoS ONE 2019, 14, e0217505. [CrossRef] [PubMed] 39. Zhao, Y.; Chetty, G.; Tran, D. Deep Learning with Xgboost for Real Estate Appraisal. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019. 40. Hagenauer, J.; Helbich, M. A Geographically Weighted Artificial Neural Network. Int. J. Geogr. Inf. Sci. 2021, 35, 1–21. [CrossRef] 41. Steentoft, A.A.; Poorthuis, A.; Lee, B.S.; Schlapfer, M. The Canary in the City: Indicator Groups as Predictors of Local Rent Increases. EPJ Data Sci. 2018, 7, 1–15. [CrossRef] 42. Huang, D.J.; Leung, C.K.Y.; Tse, C.Y. What Accounts for the Differences in Rent-Price Ratio and Turnover Rate? A Search-and- Matching Approach. J. Real Estate Financ. Econ. 2018, 57, 431–475. [CrossRef] 43. Ullah, F.; Sepasgozar, S.M.E. Key Factors Influencing Purchase or Rent Decisions in Smart Real Estate Investments: A System Dynamics Approach Using Online Forum Thread Data. Sustainability 2020, 12, 4382. [CrossRef] 44. Su, S.; Zhang, J.; He, S.; Zhang, H.; Hu, L.; Kang, M. Unraveling the Impact of Tod on Housing Rental Prices and Implications on Spatial Planning: A Comparative Analysis of Five Chinese Megacities. Habitat Int. 2021, 107, 102309. [CrossRef] 45. Baidu. Baidu Map Open Platform. Available online: https://lbsyun.baidu.com/index.php?title=lbscloud/poitags (accessed on 20 November 2021). 46. Lianjia. Lianjia Flagship Website. Available online: https://wh.lianjia.com/ (accessed on 20 November 2021). 47. Wu, H.; Jiao, H.; Yu, Y.; Li, Z.; Peng, Z.; Liu, L.; Zeng, Z. Influence Factors and Regression Model of Urban Housing Prices Based on Internet Open Access Data. Sustainability 2018, 10, 1676. [CrossRef] 48. Li, H.; Wei, Y.D.; Wu, Y.; Tian, G. Analyzing Housing Prices in Shanghai with Open Data: Amenity, Accessibility and Urban Structure. Cities 2019, 91, 165–179. [CrossRef] 49. Bokhari, S.; Geltner, D. Characteristics of Depreciation in Commercial and Multifamily Property: An Investment Perspective. Real Estate Econ. 2018, 46, 745–782. [CrossRef] 50. Lu, B.; Charlton, M.; Harris, P.; Fotheringham, A.S. Geographically Weighted Regression with a Non-Euclidean Distance Metric: A Case Study Using Hedonic House Price Data. Int. J. Geogr. Inf. Sci. 2014, 28, 660–681. [CrossRef] 51. Hurvich, C.M.; Simonoff, J.S.; Tsai, C.L. Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1998, 60, 271–293. [CrossRef] 52. Hahnloser, R.H.R.; Sarpeshkar, R.; Mahowald, M.A.; Douglas, R.J.; Seung, H.S. Digital Selection and Analogue Amplification Coexist in a Cortex-Inspired Silicon Circuit. Nature 2000, 405, 947–951. [CrossRef] 53. Barzilai, J.; Borwein, J.M. Two-Point Step Size Gradient Methods. IMA J. Numer. Anal. 1988, 8, 141–148. [CrossRef] 54. Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Netw. 2015, 61, 85–117. [CrossRef] [PubMed] 55. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. 56. Farrar, D.E.; Glauber, R.R. Multicollinearity in Regression Analysis: The Problem Revisited. Rev. Econ. Stat. 1967, 49, 92–107. [CrossRef] 57. Harris, P.; Fotheringham, A.S.; Crespo, R.; Charlton, M. The Use of Geographically Weighted Regression for Spatial Prediction: An Evaluation of Models Using Simulated Data Sets. Math. Geosci. 2010, 42, 657–680. [CrossRef] 58. Li, S.; Jiang, Y.; Ke, S.; Nie, K.; Wu, C. Understanding the Effects of Influential Factors on Housing Prices by Combining Extreme Gradient Boosting and a Hedonic Price Model (Xgboost-Hpm). Land 2021, 10, 533. [CrossRef] 59. Su, S.; He, S.; Sun, C.; Zhang, H.; Hu, L.; Kang, M. Do Landscape Amenities Impact Private Housing Rental Prices? A Hierarchical Hedonic Modeling Approach Based on Semantic and Sentimental Analysis of Online Housing Advertisements across Five Chinese Megacities. Urban For. Urban Green. 2021, 58, 126968. [CrossRef] http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Land Multidisciplinary Digital Publishing Institute

Exploring a Pricing Model for Urban Rental Houses from a Geographical Perspective

Land , Volume 11 (1) – Dec 21, 2021

Loading next page...
 
/lp/multidisciplinary-digital-publishing-institute/exploring-a-pricing-model-for-urban-rental-houses-from-a-geographical-Q4MXMWnFw9

References (45)

Publisher
Multidisciplinary Digital Publishing Institute
Copyright
© 1996-2022 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer The statements, opinions and data contained in the journals are solely those of the individual authors and contributors and not of the publisher and the editor(s). MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Terms and Conditions Privacy Policy
ISSN
2073-445X
DOI
10.3390/land11010004
Publisher site
See Article on Publisher Site

Abstract

land Article Exploring a Pricing Model for Urban Rental Houses from a Geographical Perspective 1 1 , 2 , 1 1 , 3 1 Hang Shen , Lin Li * , Haihong Zhu , Yu Liu and Zhenwei Luo School of Resource and Environment Sciences, Wuhan University, Wuhan 430079, China; shenhang@whu.edu.cn (H.S.); hhzhu@whu.edu.cn (H.Z.); liuyu0201@whu.edu.cn (Y.L.); luowei1993@whu.edu.cn (Z.L.) Institute of Smart Perception and Intelligent Computing, SRES, Wuhan University, 129 Luoyu Road, Wuhan 430079, China Institute of Environment and Development, Guangdong Academy of Social Sciences, Guangzhou 510635, China * Correspondence: lilin@whu.edu.cn; Tel.: +86-138-7150-4963 Abstract: Models for estimating urban rental house prices in the real estate market continue to pose a challenging problem due to the insufficiency of algorithms and comprehensive perspectives. Existing rental house price models based on either the geographically weighted regression (GWR) or deep- learning methods can hardly predict very satisfactory prices, since the rental house prices involve both complicated nonlinear characteristics and spatial heterogeneity. The linear-based GWR model cannot characterize the nonlinear complexity of rental house prices, while existing deep-learning methods cannot explicitly model the spatial heterogeneity. This paper proposes a fully connected neural network–geographically weighted regression (FCNN–GWR) model that combines deep learning with GWR and can handle both of the problems above. In addition, when calculating the geographical location of a house, we propose a set of locational and neighborhood variables based on the quantities of nearby points of interests (POIs). Compared with traditional locational and neighborhood variables, the proposed “quantity-based” locational and neighborhood variables can cover more geographic objects and reflect the locational characteristics of a house from a comprehensive geographical perspective. Taking four major Chinese cities (Wuhan, Nanjing, Beijing, and Xi’an) as study areas, we Citation: Shen, H.; Li, L.; Zhu, H.; Liu, Y.; Luo, Z. Exploring a Pricing compare the proposed method with other commonly used methods, and this paper presents a more Model for Urban Rental Houses from precise estimation model for rental house prices. The method proposed in this paper may serve as a a Geographical Perspective. Land useful reference for individuals and enterprises in their transactions relevant to rental houses, and for 2022, 11, 4. https://doi.org/ the government in terms of the policies and positions of public rental housing. 10.3390/land11010004 Keywords: house rental price; geographically weighted regression; spatial heterogeneity; Academic Editor: Monika Kuffer deep learning Received: 23 November 2021 Accepted: 16 December 2021 Published: 21 December 2021 Publisher’s Note: MDPI stays neutral 1. Introduction with regard to jurisdictional claims in Prices in the real estate market may be one of the most important issues that people published maps and institutional affil- are concerned with. We usually consult real estate websites or agents to find a reference iations. for the price of a house before conducting the final transaction of buying or renting it. In addition, real estate valuation may indicate the economic situation or urban vibrancy of related regions [1]. Businesses are inclined to invest in a location by referring to an assessment of the relevant real estate market, and renters usually need to evaluate the cost Copyright: © 2021 by the authors. of living and expenditures based on the rental house prices in a certain place to determine Licensee MDPI, Basel, Switzerland. the positions of their jobs and lives. Therefore, real estate valuation is inextricably linked This article is an open access article with people’s lives and the economy, and as a result, estimating housing and rental housing distributed under the terms and prices is presently a popular issue. Housing price estimation may serve as a benchmark for conditions of the Creative Commons buyers/tenants and sellers/lessors. By estimating the selling or rental price of a property, Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ purchasers and tenants may assess whether the transaction is reasonable, and sellers and 4.0/). lessors can calculate the price of a house in a certain location and condition. Financial Land 2022, 11, 4. https://doi.org/10.3390/land11010004 https://www.mdpi.com/journal/land Land 2022, 11, 4 2 of 28 applications that require a reliable system for mortgage or lease calculations also demand the estimation of housing prices [2]. To obtain a dependable and accurate estimation model, it is important for the algorithm to handle the features and relationships in housing prices. The formation of housing prices is dependent on many factors, and the relationships among them are nonlinear and complex, with spatial heterogeneity [3,4]. Recent studies have concentrated on the characteristics of spatial heterogeneity and nonlinear relationships. In recent years, the geographically weighted regression (GWR) model and deep-learning models have usually been adopted for estimating housing prices [5,6]. The GWR model incorporates the influence of spatial heterogeneity on housing prices, which means that the model can take into account the impact of characteristics of the surrounding houses. However, as a linear-based model, GWR cannot present the nonlinear and complex relationships among the housing prices and their factors. A clear disadvantage of the GWR model has been observed in terms of out-of-sample forecasts [7]. In the era of big data and machine intelligence, deep-learning methods have been more frequently utilized in research and engineering problems, due to their superior fitting abilities and powerful generalization performance. The house selling and rental prices can also be strongly modeled by deep learning, and they can be automatically provided to assess prices in the housing market with higher accuracy and reliability [8,9]. The currently adopted deep-learning models for housing prices include the multilayer perceptron regressors [10–12], convolutional neural networks (CNN) [2,9,13,14], and their variants. These methods can explain the nonlinear and complex relationships but do not explicitly consider the spatial heterogeneity of the houses in an area. In general, if only GWR is applied, the nonlinear relationship will not be represented in the model, while if only a nonlinear model is adopted, the spatial heterogeneity will not be considered. Both of these issues may lead to a loss of precision in the housing price model. In terms of rental housing, many people currently have to choose renting a house before purchasing their own living space, and rental housing has become a significant component of many people’s lives [15–17]. To date, many models have been used to simulate rental house prices [7,10,18,19], and these models are often applied as supplements to research on housing (selling) prices, and usually present relatively lower price precision. In China, due to unreasonable rental prices, some public rental housing of the government do not sell well or are not received well by people due to distorted prices [20]. The problem is that the spatial heterogeneity or nonlinear relationship existing in rental house prices is absent in the models. Moreover, there is enormous complexity in the formation of house selling prices, in that housing sales happen not only through the movement of use value but also through fluctuations in transaction values [21,22]. The equilibrium rate of the utilization of the housing stock by renters is higher than that by buyers [21], and the demand for houses to rent may be more sensitive to geographical factors than the demand for houses to purchase [22]. Explorations of rental house prices might be less affected by market fluctuations and more closely affected by people’s consumption demands and abilities. Therefore, it is meaningful and necessary to explore and find a reliable pricing model with higher accuracy. In addition, to express the geographical location of a house, a series of locational and neighborhood variables [23–25], such as the distance to a bus station, distance to a school, and distance to a park, are often used in housing price models. These are based on distance and consider only the nearest geographic object of a house (the nearest bus station, school, park, etc.), which may lose the locational information generated by other neighboring or nearby objects. As a result, housing price models may have limited accuracy. In this paper we make efforts to improve the accuracy of the rental house-pricing model, and the model that we propose will result in higher precision and may be more practical (e.g., adjusted R = 0.9192, Pearson R = 0.9534 in Wuhan). A fully connected neural network–geographically weighted regression (FCNN–GWR) model that combines deep learning and the GWR model is presented. The proposed model is based on an FCNN, which is a basic type of neural network, and it incorporates the parameters and principles Land 2022, 11, 4 3 of 28 of GWR. It characterizes the nonlinear complexity of rental house prices and considers the influence of neighboring rental houses. Numerous rental cases available from real estate websites and points of interest (POIs) provide considerable samples for model training. In addition, to express the locational characteristics of a house, we present a series of locational and neighborhood variables based on the quantities of surrounding POIs. These quantity-based variables can better reflect the comprehensive locational characteristics of a house than the traditionally used locational and neighborhood variables, and they can help improve the accuracy of the pricing model. To evaluate the accuracy of the proposed method, several principal models of rental house prices, such as the hedonic price model (HPM) and GWR, are compared in the study. This paper is organized as follows: Section 2 reviews the relevant research on housing price and rental housing price models. Section 3 introduces the study area and the data used in the research. Section 4 introduces the models and methods adopted in the research. Section 5 analyzes the various methods and experiments and compares their results. Finally, the model with best fitting and strong predictive ability for rental house prices is obtained, and Section 6 presents the conclusions and future work. 2. Related Works Methods of housing price modeling include the HPM [23], the spatial lag model (SLM) [26], the spatial error model (SEM) [27], the generalized additive model (GAM) [28], GWR [5], deep-learning models, and their related methods. These methods of housing price modeling have been used in many cases and have been proven to be effective. However, they need to be improved to estimate housing selling and rental prices. 2.1. HPM and Spatial-Based Housing Price Models The HPM [23], which is a basic method for explaining housing prices, has been widely used since its introduction [24] and is the basis of other housing price models. The HPM relies on the assumption that housing price can be divided into several factors, including structural variables (the characteristics of the building itself), locational variables (such as the distance to the central business district (CBD)) and neighborhood variables (such as the distance to a nearby park). The HPM method typically uses multiple linear regression (MLR) to fit the relationships among housing prices and their factors. A number of studies have explored housing prices based on the HPM [6,19,24,25,29], proving it to be an effective approach, but the fitting accuracy of MLR is not high. To improve the accuracy of the simple linear model, different methods have been applied to housing price modeling by considering the spatial differences. The SLM [26] and SEM [27] methods focus on the spatial autocorrelation in housing prices. The SLM takes into account the impact of the dependent variable, while the SEM assumes that the spatial-autocorrelation issue can be handled by considering the spatial dependencies in the errors. The accuracies of the SLM and SEM methods are higher than that of the HPM, confirming the existence of spatial variances in housing prices. However, the improvement is not very remarkable since the models are still linear. For example, an SLM and SEM were used by Won [18] to model rental house prices in Seoul, South Korea, and the results were not very accurate. It is known that housing prices are complex and contain spatial heterogeneity [3,4]. To explore and obtain a price model, the GWR model proposed by Fotheringham [5] was applied to analyze housing prices. GWR is based on local smoothing, which can explain the spatial heterogeneity. In recent years, GWR has become the commonly used approach in housing price studies [4,6,30,31]. However, GWR cannot capture the complicated nonlinear characteristics in the housing prices because of its linear form. In recent research [7], the HPM and GWR models were employed to estimate the price of 570,000 rental flats. The results suggest that the HPM alternately performs better in out-of-sample forecasts than GWR, which is evidence for the disadvantage of GWR in accuracy and robustness of price forecasts. Land 2022, 11, 4 4 of 28 In summary, it is generally difficult for linear models to achieve satisfactory accuracy for housing price estimation, although their forms usually have good explanatory abilities for the price and spatial factors. 2.2. Nonlinear and Complex Housing Price Models Housing prices are complex and have nonlinear relationships, and nonlinear methods can clearly improve the precision of housing price models [2,32,33]. The GAM is a nonlinear model [28] first adopted for housing prices. However, it is actually a linear-extensive model, and the R improvement in the price evaluation in the GAM-based studies is usually less than 5% compared with MLR [3,7], revealing its limitations in improving the performance of estimation results. Over the years, machine-learning approaches have been adopted for the housing price problem with the hedonic model. Yoo [34] first applied machine learning for the hedonic model and proved that random forests may be practical for selecting the important variables for the hedonic model and enhancing the performance. Hu [35] monitored rental house prices with social media data, revealed the determinants and relative importance of rental house prices based on machine-learning approaches, and demonstrated the ability to integrate machine learning with the hedonic model to map spatial patterns. Rico-Juan [36] discovered that the methods of ordinary least squares hedonic regression, quantile hedonic regression, and machine learning have their respective superiorities in explaining housing prices, and the analysis of the Shapley values [37] based on random-forest machine learning is profound since it can identify the nonlinear and synergistic relationships from a three-dimensional perspective. These machine-learning approaches clearly have better accuracies than linear housing price models, and they also have a certain explanatory ability for the dependent variable. Nonetheless, the performance of machine-learning methods to predict or estimate housing prices can still be improved. In the era of big data and machine intelligence, deep-learning methods have been more frequently utilized in research and engineering problems for their powerful fitting and automation abilities. For housing selling and rental prices, deep-learning evalua- tion methods tend to be provided to automatically and intelligently assess the housing market values with higher accuracy and reliability [8,9]. Bency [10], Yao [13], Yu [14], and Wang P. [2] used CNNs to model housing prices with remote sensing images and general housing price factors. The CNN model of Yu [14] treats the housing price vari- ables as an image and can extract the complexity of the relationships among the variables. However, it is questionable whether the arrangement of the variables is dependable and whether the pooling layers are necessary. Wang J. [32] uses the neural networks based on synaptic memristor to predict housing prices. Some researchers have used street view images [1,9,38] or indoor pictures [39] to help improve deep learning for housing price models. The multisource data-fusion and attention mechanism utilized by Bin [9] has performed efficiently in property value assessment. The above studies have achieved im- proved accuracy for housing price prediction since nonlinear and complex characteristics can be extracted from the deep-learning models. However, when using these methods, the spatial heterogeneity, which is a significant and nonnegligible factor in housing prices, is still absent. Deng [31] combined the GWR approach with the extreme learning machine (ELM) and generated a “geographically weighted ELM (GWELM)”. It has been proven to be effective in revealing both spatial heterogeneity and nonlinear aspects, but it may have unstable depressing accuracies in several cases. This model has not been applied in housing-price modeling, but it can be inferred that the combination of deep learning and GWR may yield satisfactory results. The geographically weighted artificial neural network (GWANN) developed by Hagenauer and Helbich [40] can combine the nonlinearity and spatial heterogeneity in housing prices. Unfortunately, their study did not consider the detailed locational and neighborhood variables or compare their method with other deep- learning models. A specific method of how to incorporate both nonlinearity and spatial heterogeneity into the estimation of housing (selling and rental) prices and the effect of the approach still needs to be explored. Land 2022, 11, 4 5 of 28 Housing selling prices are complex because they are influenced by both the move- ment of use value and fluctuations in transaction values, while rental house prices may be more closely related to people’s consumption demands, abilities, and preferences for locations [21,22]. In general, rental house-pricing models usually follow those used to eval- uate selling house prices, while their accuracy is usually lower. Examples include Cajias [7], Bency [10], and Liebelt [19]. The issue of rental house prices has also been substantially discussed in socioeconomics in various dimensions, such as the market [33], population movement [41], personal and communal situations [17] or comprehensive social economic factors [42]. Such research generally presents macro and statistical views of rental house prices, without considering the spatial heterogeneity of the samples from a geographical perspective. Rental house pricing based on evaluations of fluctuating accuracy may mis- guide renters and lessees in their transactions of rental houses, as well as the government’s policymaking and administration of public rental housing. In China, for example, public rental residences do not sell well and are not well received in some areas [20]. It has been revealed that the lack of information and the transaction process are important factors for the emotion of regret in people’s rental house transitions [43]. A reliable estimation of the rental house prices may provide people with more dependable information and more convenient transactions. Therefore, it is both worthwhile and necessary to explore and develop a more effective and dependable rental house-pricing model. It is clear from current studies that to improve the accuracy of the rental house-pricing model, the proposed model should be characterized by both nonlinearity and spatial heterogeneity. Such characterization is the main target of this study. In addition, to improve the accuracy of the model, we present new kinds of locational and neighborhood variables that can cover more geographic objects and reflect the locational characteristics of a house from a multiscale and comprehensive geographical perspective. 3. Data 3.1. Study Areas The study areas include some major cities in China. Considering data availability, the rental houses in four cities are included in this study: Wuhan, Nanjing, Beijing, and Xi’an (Figure 1). Wuhan is a megacity in Central China. It is an important industrial, science, and 0  0 education base and a comprehensive transportation center. Wuhan (29 58 –31 22 N, 0  0 2 113 41 115 05 E) has 13 municipal districts with a total area of 8569.15 km , and in 2020, it had a resident population of 12.448 million people. Nanjing is an important megacity in Southeast China. It is a science and education 0  0  0 base, and a comprehensive transportation center. Nanjing (31 14 32 37 N, 118 22 – 0 2 119 14 E) has 11 municipal districts with a total area of 6587.02 km , and in 2020, it had a resident population of 9.320 million people. Beijing is the capital city of China and the largest city in Northern China. Beijing 0  0  0  0 (39 24 41 36 N, 115 42 117 24 E) has 16 municipal districts with a total area of 16,410.54 km , and in 2020, it had a resident population of 21.890 million people. Xi’an is a megacity in Northwest China. It is an important industrial, science, and 0  0  0 0 education base of China. Xi’an (33 25 34 27 N, 107 24 109.29 E) has 13 municipal districts with a total area of 10,752 km , and in 2020, it had a resident population of 12.953 million people. Land Land 2022 2021 , , 11 10 , , x FO 4 R PEER REVIEW 6 of 6 of 28 28 Figure 1. The four cities constituting the study areas: Wuhan, Nanjing, Beijing, and Xi’an. Figure 1. The four cities constituting the study areas: Wuhan, Nanjing, Beijing, and Xi’an. • The Wuh experiments an is a megac involving ity in Cent theseral fo Chin ur differ a. It ent iscities an im arport e intended ant ind to ust test riathe l, sc availability ience, and and generality educationof ba our se and method a com for prehensiv differentecities. transp These ortation cities cent ar er. e located Wuhanin (2dif 9°58 fer′–3 ent 1°zones 22′ N, of China (Wuhan: Central China; Nanjing: Southeast China; Beijing: Northern China; 113°41′−115°05′ E) has 13 municipal districts with a total area of 8569.15 km , and in Xi’an: Northwest China) and may represent the geographical locational and sociocultural 2020, it had a resident population of 12.448 million people. diversity • Nanj of ing China is anas import a vastant country megac [44 ity ]. in Ther Soe utar heast e rising Chin numbers a. It is a sc of floating ience and populations education in these cities, which means that they have considerable demands for rental housing and base, and a comprehensive transportation center. Nanjing (31°14′−32°37′ N, 118°22′– flourishing rental housing markets. Thus, we chose them as study areas. 119°14′ E) has 11 municipal districts with a total area of 6587.02 km , and in 2020, it had a resident population of 9.320 million people. 3.2. POI Data • Beijing is the capital city of China and the largest city in Northern China. Beijing The POI data in this study were captured from the Baidu Map website, which is the (39°24′−41°36′ N, 115°42′−117°24′ E) has 16 municipal districts with a total area of largest map service 2 provider in China. The POIs of Baidu Map are classified into 21 primary 16,410.54 km , and in 2020, it had a resident population of 21.890 million people. categories [45], as depicted in Table 1. Administrative landmarks and addresses are not • Xi’an is a megacity in Northwest China. It is an important industrial, science, and considered in our study because they are map features rather than entities. Therefore, 19 education base of China. Xi’an (33°25′−34°27′ N, 107°24′−109.29′ E) has 13 municipal primary categories and 134 secondary categories 2 are taken into account. We collected the districts with a total area of 10,752 km , and in 2020, it had a resident population of POIs of Baidu Map in February to March 2020, and finally, more than 1.7 million POI data 12.953 million people. were obtained for the four cities. The experiments involving these four different cities are intended to test the availability and generality of our method for different cities. These cities are located in different zones of China (Wuhan: Central China; Nanjing: Southeast China; Beijing: Land 2022, 11, 4 7 of 28 Table 1. The POI data and categories of Baidu Map. Primary Category Secondary Category Chinese restaurants, foreign restaurants, snack shops, cake and dessert shops, coffee shops, tea Food shops, bars, etc. Hotel Star hotels, fast hotels, apartment hotels, etc. Shopping malls, supermarkets, convenience stores, household building materials, digital Shopping appliances, shops, markets, etc. Communication business halls, post offices, logistics companies, ticket offices, laundromats, photo Life and services shops, real estate intermediaries, public utilities, maintenance points, housekeeping services, funeral services, lottery sales points, pet services, newspaper booths, public toilets, etc. Beauty Beauty, hairdressing, manicure, body beautification, etc. Parks, zoos, botanical gardens, amusement parks, museums, aquariums, beach baths, cultural Scenic spot relics, churches, scenic spots, etc. Recreation and Holiday villages, farmhouses, cinemas, KTV, theatres, song and dance halls, internet cafes, entertainment playgrounds, bath massages, leisure squares, etc. Sports fitness Stadiums, extreme sports venues, fitness centers, etc. Colleges and universities, secondary schools, primary schools, kindergartens, adult education, Education and training parent–child education, special education schools, scientific research institutions, training institutions, libraries, science and technology museums, etc. Cultural media Press and publishing, radio and television, art groups, galleries, exhibitions, cultural palaces, etc. General hospitals, specialized hospitals, clinics, pharmacies, medical institutions, sanatoriums, Medical care emergency centers, CDC, etc. Automobile sales, automobile maintenance, automobile detailing, automobile parts, car rental, Automobile service automobile testing grounds, etc. Railway stations, long-distance bus stations, ports, parking lots, gas stations, service areas, toll Transport facilities stations, bridges, etc. Finance Banks, ATMs, credit cooperatives, investment and financing, pawnbrokers, etc. Real estate Office buildings, residential areas, dormitories, etc. Company and business Companies, parks, agriculture, forestry, horticulture, factories and mines, etc. Governments of all levels, administrative units, public prosecution and law institutions, Government organization foreign-related institutions, party groups, welfare institutions, political and educational institutions, etc. Expressway exits, expressway entrances, airport exits, airport entrances, station exits, station Entrance and exit entrances, parking lot entrances, bicycle high-speed exits, bicycle high-speed entrances, etc. Natural feature Islands, mountains, water systems, etc. Provinces, provincial cities, prefecture-level cities, districts, counties, business districts, towns, Administrative landmark villages, etc. Address Address points, etc. 3.3. Rental House Data The rental house data in the study are from the commonly used real estate website, Lianjia [46]. Recent studies have demonstrated that the data on this website are effective for housing price analysis [47,48]. All of the rental house samples were captured from Lianjia, and data collection occurred between February and July 2020. The attributes of the houses (the rental price, the area, the information of the community, etc.) can be obtained via data collected from the website. Due to the data accessibility, only the whole rental houses were considered in this research. The data were screened for houses with civil electricity and water, and extreme values were excluded. To control the residential density of the houses, for each city, the samples with extremely high plot ratios (of the communities) in the top 5% were eliminated, since residential density is negatively related Land 2022, 11, 4 8 of 28 to rental prices and a dense environment is more likely to result in congestion and the invasion of privacy. Approximately 239,000 rental samples were obtained (Wuhan: 54,466, Nanjing: 32,076, Beijing: 109,683, Xi’an: 43,289). The rental house price is the dependent variable in this paper, and the rental house prices of the samples in this research range from 6.0~148.8 RMB/m /month (calculated as the unit price). The descriptions of the rental prices and relevant attributes are shown in Table 2, and the statistics of these data are depicted in Table A1 of Appendix A. Table 2. The description of the rental house price and relevant attributes. Expected Variable The Explanation and the Measurement Method for the Variables Type Sign Area - Square meters of the living area (m ) TotalFloor The total number of floors of the building Unknown The level of the floor on which the unit is situated. (1: “low-level”, <1/3 of the total Level floor value of the building; 2: “middle level”, between 1/3 and 2/3 of the total floor Unknown value, 3: “high level”, >1/3 of the total floor value) The age of the building (calculated by subtracting the construction month from the Age - transaction month, and divided by 12 (so it is in years)) Age- The squared value of “Age” [49] + squared The (monthly) time trend variable. The earliest rental date in the dataset of this Month paper is February 2020. Thus, the observation of this sample takes the value of 1 for Unknown the “Month” variable; then, it is 2 for the houses rented in March 2020, etc. A dummy variable for whether the house is rented in the first quarter of the year (1: Spring Unknown the transaction date is January, February, or March; 0: otherwise) A dummy variable for whether the house is rented in the second quarter of the year Summer Unknown (1: the transaction date is April, May, or June; 0: otherwise) Structural A dummy variable for whether the house is rented in the third quarter of the year (1: Autumn Unknown variables the transaction date is July, August, or September; 0: otherwise) Room The number of bedrooms + Hall The number of halls - Toilet The number of toilets Unknown Whether the room faces south (1: when the direction description of the house South + contains “south”; 0: otherwise) Whether the room faces north (1: when the direction description of the house North + contains “north”; 0: otherwise) Whether the room faces east (1: when the direction description of the house contains East + “east”; 0: otherwise) Whether the room faces west (1: when the direction description of the house West - contains “west”; 0: otherwise) PlotRatio The gross floor area of the building divided by the area of the community. Unknown Green The greening rate of the community + ParkSpace The number of parking spaces of the community + Fee The property management fee of the house (RMB /m /month) + DCBD The distance to the CBD (km) - Locational Dbus The distance to the nearest bus station (km) - variables Dsub The distance to the nearest metro station (km) - DshopCen The distance to the nearest shopping center (km) - Land 2022, 11, 4 9 of 28 Table 2. Cont. Expected Variable The Explanation and the Measurement Method for the Variables Type Sign Dpark The distance to the nearest park (km) - DpriSch The distance to the nearest primary school (km) - Neighborhood DsecSch The distance to the nearest middle school (km) - variables Dnurs The distance to the nearest nursery school (km) - Dhosp The distance to the nearest hospital (km) - Price The rental price of the house (RMB/m /month) In this research, all of the geographic information was transformed into the Baidu metric coordinate system since the POI data of this research are collected from Baidu Map. 4. Method 4.1. Principal Housing Price Models 4.1.1. HPM The HPM is a principal pricing model and is the basis of other housing price models [6]. It is introduced and considered as a control group in this study. HPM believes that a house or rental house can be considered a commodity and that its price is made up of many different factors. The factors of rental houses are usually considered to follow the factors of the selling houses [10,19], including 3 types: structural variables, locational variables, and neighborhood variables. Based on the previous studies [4,25,47] and the situation of our available data, the factors considered in this research are listed as follows: the structural variables include the area of the house (Area), the number of total floors (TotalFloor), floor level of the house (Level), number of rooms (Room), halls (Hall), toilets (Toilet), plot ratio of the community (PlotRatio), greening rate of the community (Green), number of parking spaces in the community (ParkSpace), the property management fee (Fee), the age of the building (Age) and its squared value (Age-squared) [49], the monthly trend variable (Month), the seasonality dummy variables (Spring, Summer and Autumn), and house orientation dummy variables (South, North, East, West); locational variables include the distance to the CBD (DCBD) and distances to the nearest subway station (Dsub), bus stop (Dbus), and shopping center (DshopCen); neighborhood variables include the distance to the nearest park (Dpark), hospital (Dhosp), middle school (DsecSch), primary school (DpriSch), and nursery (Dnurs). Since studies have shown that the road network distance can provide additional and useful insights into the housing price dataset and improve the accuracy of relevant models [16,50], the distances correlated with POIs in this study are all measured as the road network distance. The definitions, statistical values, and expected effects on rental price are listed in Table 2. The basic model of the HPM can be Y = f (X, b, #), where Y is the rental price, X is the characteristic vector consisting of each variable of the house, b is the coefficient in front of each factor, and # is the residual term. Practically, the model is generally implemented by means of MLR, as follows: y = b + b x + # (1) 0 å j j j=1 where b represents the MLR parameter for the jth explanatory variable x , and m is the j j number of explanatory variables. 4.1.2. GWR The HPM is essentially a global ordinary least squares (OLS) model. Although the model can explain the affecting factors, the fitting accuracy is often insufficient. This insufficiency is because a house is not only correlated with the various factors in the Land 2022, 11, 4 10 of 28 HPM, but also closely related to the geographical location: the rental price of a house will be affected by the characteristics and prices of the neighboring houses. The spatial heterogeneity and regional diversity may cause a discrepancy in the parameter b of the factors. Thus, the rental price is more suitable for a local regression model. Fotheringham introduced the GWR model [5], which is a geographical extension of OLS. The attribute coefficients of GWR can be viewed as a semi-logarithmic function of the change of the explanatory variable [24]. GWR considers geographical heterogeneity and allows the variations of local parameters, which is formulated as: y = b (u , v ) + b (u , v )x + # , i = 1, 2, . . . n (2) i i0 i i å k i i ik i k=1 where (u , v ) represents the coordinates of sample i, b (u , v ) represents the local parame- i i ik i i ter of the kth variable of the sample i, which varies for different locations, b (u , v ) is the i0 i i intercept value, and # is the error term. The GWR approach is superior for its ability to reveal the spatial heterogeneity. b (u , v ) can be estimated as follows: k i i T T b(u , v ) = (X W(u , v )X) X W(u , v )Y (3) i i i i i i where the weighting matrix W is a diagonal matrix and the off-diagonal elements are all zero, W(u , v ) = dia(W , W , . . . , W , . . . , W ). As declared above, the locations of 1 2 i i i i ij in all rental houses have been transformed into the Baidu metric coordinate system. The geographical weight of the sample i and sample j are represented by W . In this study, we ij obtain the weighting matrix with the fixed Gaussian kernel function: 2 2 W = exp(d /b ) (4) ij ij where d is the distance between houses i and j, and b is a non-negative parameter (band- ij width) that represents the decay degree with the distance. The bandwidth (b) is a very important parameter, and the appropriate bandwidth can be selected based on the mini- mum Akaike information criterion (AICc) for the GWR model [51]. For data with some geographical correlations, the fitting accuracy of the GWR model is substantially greater than that of the global regression method (HPM) because spatial heterogeneity and loca- tional discrepancy are taken into account. 4.1.3. FCNN—A Deep Learning Model Deep learning commendably supports big data because of its powerful generalization and automation capability and has been widely used in recent years [13]. In this study, we designed a 5-layer fully connected neural network (FCNN) for house rental price and its factors. As shown in Figure 2, the input layer is the vector of the factors, including structural variables, locational variables, and neighborhood variables; there are 4 hidden layers, and the numbers of neurons for them are 200, 120, 100, and 20, respectively; the output layer is the predicted value of the rental house price. Deep learning can address the nonlinear and complex relationships [3] implied by the variables, which is crucial for the fitting of housing prices. The algorithm of deep learning in this study is: the learning aim is to minimize the sum of the residuals of the predicted and actual values; the activation function is ReLU [52]; the back propagation algorithm is the gradient descent algorithm [53]; for each step, the training number of samples (batch size) is 64; the initial learning rate, the attenuation of learning rate, and the attenuation of the sliding average are set to 0.8, 0.99, 0.99, respectively; the L2 regularization [54] is included in the network for eliminating overfitting. The loss function of the FCNN can be formulated as: loss = (Y Y ) + 0.0001 w (5) å å i=1 Land 2022, 11, 4 11 of 28 Land 2021, 10, x FOR PEER REVIEW 11 of 28 where y represents the predicted value, y represents the true value, l is the number of hidden layers, and w is the neuron parameters in each hidden layer. For deep-learning In this study, we designed a 5-layer fully connected neural network (FCNN) for house methods, all of the data should be divided into 2 parts: the training set and the test set. The rental price and its factors. As shown in Figure 2, the input layer is the vector of the factors, training process is carried out in the training set to minimize the value of “loss”. For every including structural variables, locational variables, and neighborhood variables; there are sample in deep learning, the values of each variable have been normalized to 0~1 to avoid 4 h diver idden gence la ofye the rs, and model. thAfter e num the bers training of neuron is completed, s for ththe em a model re 20 is0,r un 120 on , 10 the0,test and set 20 to, respecti estimate vely; the model’s the output layer is the pred fitting accuracy andicted v predictive alue of the rent power for al unknown house pr samples. ice. Figure 2. The structure of the layers and neurons in the FCNN model. Figure 2. The structure of the layers and neurons in the FCNN model. 4.2. FCNN–GWR—The Combination of Deep Learning and GWR Deep learning can address the nonlinear and complex relationships [3] implied by GWR cannot characterize the nonlinear complex characteristics of price, and existing the variables, which is crucial for the fitting of housing prices. The algorithm of deep deep-learning methods cannot explicitly process the spatial heterogeneity. Therefore, we learning in this study is: the learning aim is to minimize the sum of the residuals of the propose the FCNN–GWR model, combining deep learning with GWR, which can handle predicted and actual values; the activation function is ReLU [52]; the back propagation both aspects of the problem. The general idea of the FCNN–GWR model is that the FCNN algorithm is the gradient descent algorithm [53]; for each step, the training number of model can provide an acceptable prediction value for house rental price through deep samples (batch size) is 64; the initial learning rate, the attenuation of learning rate, and the learning, and the implementation of GWR on this value can optimize it. As the b parameters attenuation of the sliding average are set to 0.8, 0.99, 0.99, respectively; the L2 of GWR contain the spatial heterogeneity and spatial discrepancy of the house rental price, regularization [54] is included in the network for eliminating overfitting. The loss function including them in the deep-learning model may explicitly help to optimize the fitting value. of the FCNN can be formulated as: We can compose a matrix M, which combines the GWR b parameters with the structural, locational, and neighborhood variables, and*2 then deep learning 2 can be carried out in the loss=− (YY )+ 0.0001* w (5)  matrix M, which may obtain more accurate prediction results. The matrix M = [b , b , b , i=1 0 1 2 . . . , b , x , x , . . . , x ], where b , b , . . . represents the b parameters in the GWR model as m m 1 2 0 1 where represents the predicted value, represents the true value, l is the number y y i i in Equation (2), and x , x , . . . means the structural, locational, and neighborhood variables, 1 2 of hidden layers, and wi is the neuron parameters in each hidden layer. For deep-learning the same as Equation (1). methods, all of the data should be divided into 2 parts: the training set and the test set. As has already been verified, GWR has a clear disadvantage in out-of-sample forecasts, The training process is carried out in the training set to minimize the value of “loss”. For which means that the prediction value of GWR may be not sufficiently reliable when there every sample in deep learning, the values of each variable have been normalized to 0~1 to are not enough samples near the concerning point. Therefore, we only adopt the GWR avoid divergence of the model. After the training is completed, the model is run on the predictions when there is a relatively large number of samples nearby; when there are test set to estimate the model’s fitting accuracy and predictive power for unknown fewer samples nearby, just the previous FCNN prediction values are adopted as the final sample prediction s. value, which means: FCNN(b , b , b , . . . , b , x , x , . . . , x ) , condition1 4.2. FCNN–GWR—The Combin0 atio1 n of2 Deep Learni m 1 ng 2 and GW m R y = (6) FCNN(x , x , . . . , x ) , condition2 1 1 GWR cannot characterize the nonlinear complex characteristics of price, and existing deep-learning methods cannot explicitly process the spatial heterogeneity. Therefore, we where for the ith house, y denotes its prediction value; other variables are the same as propose the FCNN–GWR model, combining deep learning with GWR, which can handle Equations (1) and (2). For a certain house, Condition 1 means that the number of its both aspects of the problem. The general idea of the FCNN–GWR model is that the FCNN nearby house samples (within the distance of the bandwidth of GWR) is larger than the model can provide an acceptable predniction value for house rental price through deep average level among all houses (= nears /n, where nears represents the number i i i=1 learning, and the implementation of GWR on this value can optimize it. As the β of neighboring samples within the GWR bandwidth for the ith house, and n represents parameters of GWR contain the spatial heterogeneity and spatial discrepancy of the house rental price, including them in the deep-learning model may explicitly help to optimize Land 2022, 11, 4 12 of 28 the total number of houses in the dataset); Condition 2 means that the number of nearby (within the distance of bandwidth) houses around it is smaller than the average level among all house samples. Since the bandwidth is a decisive parameter in GWR, and only samples within the distance of the bandwidth of the GWR play a relatively important role in the calculation, we divided the quantities of nearby samples into the 2 conditions by the number of samples within the distance of the bandwidth. When the quantity of nearby samples is smaller than the average, the prediction by GWR may not be sufficiently credible. In these cases, FCNN is more reliable while conversely, weighting geographically may reduce precision. The minus value of accuracy increment of the GWELM in Table 1 of Deng et al. [31] may be attributed to this phenomenon. Recent studies have proved that the attention mechanism can be effective for the neural networks of the housing price [1,2,9,55]. In our research, the house variables [x , x , . . . , 1 2 Land 2021, 10, x FOR PEER REVIEW 13 of 28 x ] and GWR b parameters [b , b , b , . . . , b ] were assigned with an attention block [1] m 0 1 2 m in front of the first fully connected layer, respectively. The attention block can convert the original input characteristics into attended characteristics, in order to identify the important Step 3: Put the GWR β parameters, and the structural, locational, and neighborhood features that influence the rental prices. The attention block can be described as a Softmax- variables together to make up the matrix M. Through a deep-learning tra exp(ihni)ng with the activated fully connected layer, and the algorithm is: y = F(h )x = x . where k k k k å exp(h ) k k matrix M wrapped with the attention blocks, the FCNN model including the GWR β h = w x + b , where F() is the Softmax function, x is the input features, y is the output k ki i i parameters (and structural, locational, and neighborhood variables) can be obtained. featur Step 4: Pred es, h is the neur ict the price v ons of thisafully-connected lue on the test set with t layer, andhw e FCNN is the weights model of includin the input g the x. F GWR (h) is βthe parameters (obtai soft attention-weighted ned in Step vector 3). Th , which e pred can iction value signify the is re importance ferenced as of the f1. features of the house variables and GWR b parameters. The variation among attended features y Step 5: Predict the price value on the test set with the ordinary FCNN model would be substantially larger than the variance among the original features x as a result of (obtained in Step 1, only with the structural, locational, and neighborhood variables, the attention block, suggesting that the important characteristics for the house rental price without the GWR β parameters). The prediction value is referenced as f0. are emphasized in the network, and it would benefit the convergence and performance of Step 6: On the test set, the final predicted results of FCNN–GWR are obtained the model. according to equation (6): if there are relatively more samples nearby (Condition 1), the In the FCNN–GWR model, the data should be divided into the training sets the test final prediction value would be f1; if there are relatively fewer samples nearby (Condition sets. The process of FCNN–GWR is shown in Figure 3 and can be described as follows: 2), the final prediction value would be f0. Figure Figure 3. 3. The The p pr rocedur ocedure of e of the the FCNN–GWR model. FCNN–GWR model. Step 1: Train the FCNN model on the training set with the structural, locational, and Through this method of synthetic training, the FCNN–GWR model not only has the neighborhood variables of the house. ability to explain the nonlinear complexity of the price but also addresses the spatial heterogeneity explicitly since the method considers the influence of surrounding rental houses. In this paper, FCNN–GWR and other models were used and compared in the study areas to demonstrate the superiority of the proposed model. 4.3. Quantity-Based Locational and Neighborhood Variables In traditional housing price models, the locational and neighborhood variables include DCBD, Dpark, and so on (Table 2). These factors can reflect the location of the house, but a limited number of variables are allowed in order to avoid the multicollinear problem [56]. In this way, although main factors of the price can be effectively explained, problems still exist. Firstly, these locational and neighborhood variables are distance- based, while expressing the location of houses with the distance may be somewhat inaccurate, which leads to the loss of precision in the house-pricing model. Seo et al. [16], Li et al. [29], and Bency et al. [10] have given the evidence. As shown in Figure 4, there are many POIs typed “school” in this area. When calculating the locational and neighborhood variables of the houses in this area, only the information of the blue points is actually used, which are the “nearest” school POIs to the houses; the information of the neighboring Land 2022, 11, 4 13 of 28 Step 2: Execute the GWR model on the training set with the structural, locational, and neighborhood variables. Then, the b parameters of GWR can be calculated for each house via GWR fitting. Step 3: Put the GWR b parameters, and the structural, locational, and neighborhood variables together to make up the matrix M. Through a deep-learning training with the matrix M wrapped with the attention blocks, the FCNN model including the GWR b parameters (and structural, locational, and neighborhood variables) can be obtained. Step 4: Predict the price value on the test set with the FCNN model including the GWR b parameters (obtained in Step 3). The prediction value is referenced as f . Step 5: Predict the price value on the test set with the ordinary FCNN model (obtained in Step 1, only with the structural, locational, and neighborhood variables, without the GWR b parameters). The prediction value is referenced as f . Step 6: On the test set, the final predicted results of FCNN–GWR are obtained accord- ing to equation (6): if there are relatively more samples nearby (Condition 1), the final prediction value would be f ; if there are relatively fewer samples nearby (Condition 2), the final prediction value would be f . Through this method of synthetic training, the FCNN–GWR model not only has the ability to explain the nonlinear complexity of the price but also addresses the spatial heterogeneity explicitly since the method considers the influence of surrounding rental houses. In this paper, FCNN–GWR and other models were used and compared in the study areas to demonstrate the superiority of the proposed model. 4.3. Quantity-Based Locational and Neighborhood Variables In traditional housing price models, the locational and neighborhood variables include DCBD, Dpark, and so on (Table 2). These factors can reflect the location of the house, but a limited number of variables are allowed in order to avoid the multicollinear problem [56]. In this way, although main factors of the price can be effectively explained, problems still exist. Firstly, these locational and neighborhood variables are distance-based, while expressing the location of houses with the distance may be somewhat inaccurate, which leads to the loss of precision in the house-pricing model. Seo et al. [16], Li et al. [29], and Bency et al. [10] have given the evidence. As shown in Figure 4, there are many POIs typed “school” in this area. When calculating the locational and neighborhood variables of the houses in this area, only the information of the blue points is actually used, which are the “nearest” school POIs to the houses; the information of the neighboring yellow points is not included, just because they are not the “nearest” ones to the houses. In other words, the locational information formed by these yellow points is discarded rather than exploited, which may influence the accuracy of the price model. Secondly, some variables are excluded from the model since they are similar to other variables, the model may lose a certain amount of information. These variables can also contribute to the housing price to a certain extent. To solve this, we propose another method to measure the locational characteristics of the house: the quantity-based locational and neighborhood variables. In our perspective, the number and the combination of the various kinds of POIs surrounding a house can better reflect its locational characteristics. For example, Figure 5a is the place near a gate of a school, with very dense POIs around it. It is not reasonable to consider only the selected “nearest” POI since other neighboring POIs also contribute to the locational characteristics. To consider the influences of other POIs, a better way is to calculate the number of every type of POI nearby. The number and the combination of every type of POI can better reflect the location characteristics of houses locally. For example, in Figure 5a there are 1301 commercial POIs, 68 traffic POIs, 20 stadium POIs, and 123 school POIs. The number of commercial and school POIs is very large, which implies that this place may be the intersection between the school and the commercial district. For another example, Figure 5b is a newly built venue in Wuhan. There are 104 business POIs, 53 transportation POIs, 6 stadium POIs, and no school POIs nearby. The number of subways, bus, parking, and other Land 2021, 10, x FOR PEER REVIEW 14 of 28 Land 2022, 11, 4 14 of 28 yellow points is not included, just because they are not the “nearest” ones to the houses. In other words, the locational information formed by these yellow points is discarded rather than exploited, which may influence the accuracy of the price model. Secondly, transportation facilities is very large, but the number of commercial and school POIs is very some variables are excluded from the model since they are similar to other variables, the small, which demonstrates the characteristics of this place as a new infrastructure and new model may lose a certain amount of information. These variables can also contribute to venue. Therefore, the locational characteristics of a place can be reflected in the form of the the housing p above. rice to a certain extent. Land 2021, 10, x FOR PEER REVIEW 15 of 28 Figure Figure 4. 4. TheThe used used and n and not-used POIs ot-used PO in the models Is in the with distance-based models with locational distance-based locational and and neighborhood neighborhood variables. variables. To solve this, we propose another method to measure the locational characteristics of the house: the quantity-based locational and neighborhood variables. In our perspective, the number and the combination of the various kinds of POIs surrounding a house can better reflect its locational characteristics. For example, Figure 5a is the place near a gate of a school, with very dense POIs around it. It is not reasonable to consider only the selected “nearest” POI since other neighboring POIs also contribute to the locational characteristics. To consider the influences of other POIs, a better way is to calculate the number of every type of POI nearby. The number and the combination of every type of POI can better reflect the location characteristics of houses locally. For example, in Figure 5a there are 1301 commercial POIs, 68 traffic POIs, 20 stadium POIs, and 123 school POIs. The number of commercial and school POIs is very large, which implies that this place (a) (b) may be the intersection between the school and the commercial district. For another Figure 5. POIs in the neighboring area of: (a) the gate of a university; (b) a newly built venue. Figure 5. POIs in the neighboring area of: (a) the gate of a university; (b) a newly built venue. example, Figure 5b is a newly built venue in Wuhan. There are 104 business POIs, 53 transportation POIs, 6 stadium POIs, and no school POIs nearby. The number of subways, The amount of distribution of different types of POIs near a house can be measured to The amount of distribution of different types of POIs near a house can be measured bus, parking, and other transportation facilities is very large, but the number of express the quantity characteristics of POIs described above. In fact, the Kernel Density to express the quantity characteristics of POIs described above. In fact, the Kernel Density commercial and school POIs is very small, which demonstrates the characteristics of this Estimation (KDE) [48] is a practical way to measure the number and density of the points Estimation (KDE) [48] is a practical way to measure the number and density of the points near place a as certain a new infr place,ast which ructu isre a and new robust analytical venue. There tool in fore, t GIS for he loc model ational ch discovery aract and eristics of a near a certain place, which is a robust analytical tool in GIS for model discovery and spatial statistical and spatiotemporal data mining. In this research, KDE is adopted and the place can be reflected in the form of the above. spatial statistical and spatiotemporal data mining. In this research, KDE is adopted and estimated density value of KDE for different types of POIs can be used as “quantity-based the estimated density value of KDE for different types of POIs can be used as “quantity- variables” for expressing the characteristics of the rental houses, that is: based variables” for expressing the characteristics of the rental houses, that is: jN j dists ( ,x ) 1 disjj t(s ,,kx ) 1 j j,k η()sK = ( ) (7) h(s ) = K( ) (7) j 2 j å k =1 h h k=1 where η(sj) is the estimated density value of the jth type of POIs for a house sample, Nj is where h(s ) is the estimated density value of the jth type of POIs for a house sample, the total number of the jth type of POIs, dist(sj, xj,k) is the distance between the location of N is the total number of the jth type of POIs, dist(s , x ) is the distance between the j j j,k the house and the location of the kth POI in the jth type of POIs, K(·) is the penalty function location of the house and the location of the kth POI in the jth type of POIs, K() is the (also called kernel function in KDE); and h is the bandwidth of the kernel function, which represents the smoothing effect of the kernel function. If we put [η(s1), η(s1),…, η(sj)…], the estimated density values for all types of POIs, together, they can represent comprehensive locational and neighborhood characteristics for a rental house. The combination of [η(s1), η(s1),…, η(sj), …, η(sN)] is labeled as “quantity-based locational and neighborhood variables” in this paper. From the formula above, we learn that the kernel function K(·) and the bandwidth h are 2 parameters that KDE requires. In this research, 4 types of common-used kernel functions are tested to find a suitable kernel and bandwidth for the quantity-based locational and neighborhood variables: the Triangular Kernel, the Gaussian Kernel, and the Laplacian Kernel. We put the structural variables and the “quantity-based locational and neighborhood variables” together, to construct the vector F, which represents the overall factors of the rental house price: Fs =[) v ,sv , ..., sv ,ηη (s , (s ),...η(s )] (8) 11 2 Q 2 N where sv represents a structural variable of the house (in Table 2); Q and N represent the number of the structural variables and locational and neighborhood variables, respectively. The different types of kernel functions and bandwidths would be tested and optimized to find a best one to make the factors F to get a highest R in the OLS model related to the rental house price. nn n nr r − r r   i,, o is io, i,s 2 ii == 11 i=1 R = nn n n 22 2 2 nr−− ()r n r ( r)    io,,io is, is, ii == 11 i=1 i=1 Land 2022, 11, 4 15 of 28 penalty function (also called kernel function in KDE); and h is the bandwidth of the kernel function, which represents the smoothing effect of the kernel function. If we put [h(s ), h(s ), . . . , h(s ), . . . ], the estimated density values for all types of POIs, together, they can represent comprehensive locational and neighborhood characteristics for a rental house. The combination of [h(s ), h(s ), . . . , h(s ), . . . , h(s )] is labeled as “quantity-based 1 1 j N locational and neighborhood variables” in this paper. From the formula above, we learn that the kernel function K() and the bandwidth h are 2 parameters that KDE requires. In this research, 4 types of common-used kernel functions are tested to find a suitable kernel and bandwidth for the quantity-based loca- tional and neighborhood variables: the Triangular Kernel, the Gaussian Kernel, and the Laplacian Kernel. We put the structural variables and the “quantity-based locational and neighborhood variables” together, to construct the vector F, which represents the overall factors of the rental house price: F = [sv , sv , . . . , sv , h(s ), h(s ), . . . , h(s )] (8) 1 2 Q 1 2 N where sv represents a structural variable of the house (in Table 2); Q and N represent the number of the structural variables and locational and neighborhood variables, respectively. The different types of kernel functions and bandwidths would be tested and optimized to find a best one to make the factors F to get a highest R in the OLS model related to the rental house price. n n n n r r r r å å å i,o i,s i,o i,s 2 i=1 i=1 i=1 R = q q 2 2 n 2 n n 2 n nå r (å r ) nå r (å r ) i,o i,s i=1 i,o i=1 i=1 i,s i=1 where r and r are the observed (actual) and simulated (calculated by the model) rental i;o i;s prices (unit: RMB/m /month) for the ith house, and n is the number of rental house samples in each dataset. By testing we find that the Gaussian Kernel performs the best for all the 4 cities and is chosen as the KDE kernel for generating the quantity-based locational and neighborhood variables in this study. After optimizing, the KDE bandwidth h of the 4 cities is determined as 12,657.4 m, 18,495.5 m, 11,549.4 m, 14,386.9 m for Wuhan, Beijing, Nanjing, and Xi’an, respectively. The quantity-based locational and neighborhood variables are more comprehensive than the traditionally used locational and neighborhood variables in Table 2 and better reflect the multiscale and comprehensive geographical characteristics of the location. To compare with the “quantity-based” variables, the distances to the 134 types of POIs can also be the locational and neighborhood variables, and they are introduced and labeled as “distance-based locational and neighborhood variables” in this paper. Compared with distance-based and quantity-based variables, the “traditional locational and neighborhood variables” refer to the locational and neighborhood variables in Table 2, which includes 9 frequently used variables. It should be noted that the traditional, distance-based and quantity-based locational and neighborhood variables are all correlated with the distance to the POIs. The distances correlated with POIs in this study are measured as the road network distance, which can be measured through GIS network analysis. Then each kind of locational and neighborhood variable can be generated: the traditional variables and distance-based variables are generated by the nearest distance to a certain kind of POI; the quantity-based variables are generated by the KDE kernel functions and bandwidths. The “distance-based” and “quantity-based” locational and neighborhood variables contain large, complex, similar, and multicollinear factors, which is the situation where deep-learning methods perform well. Hundreds of locational variables provide a sufficient number of vectors for learning and make the models more accurate. However, it should also be noted that the problem of multicollinearity inevitably exists among the very large number. Apparently, if these similar and multicollinear factors are employed in the HPM and GWR model, we cannot evaluate their impacts on the price through the model anymore. The main role of quantity-based variables is to improve the fitting accuracy. That is, the Land 2022, 11, 4 16 of 28 parameter b in the HPM and b(u , v ) in the GWR are no longer of economic significance i i with the employment of distance-based and quantity-based locational and neighborhood variables, but the fitting accuracy of the model will be greatly improved. Therefore, the meaning of the parameters of the variables (the impact on the price) will not be discussed in this paper. In addition, when calculating b(u , v ) in Equation (3) of the GWR model, the i i solution of the inverse matrix should be replaced by its pseudoinverse in case there is no inverse matrix. In this study, traditional locational and neighborhood variables, distance-based lo- cational and neighborhood variables, and quantity-based locational and neighborhood variables will be respectively employed in the HPM, GWR, FCNN, and FCNN–GWR model. When it comes to the distance-based and quantity-based locational and neighborhood vari- ables for each model, the meaning of the parameters in the model will not be discussed as they make no sense, and only the fitting accuracy and predictive ability would be discussed. 4.4. Accuracy Assessment of the Models The basic rental price models in this study include 4 types: HPM, GWR, FCNN and the proposed FCNN–GWR model. The locational and neighborhood variables of the house include 3 kinds: traditional variables, distance-based variables, and quantity-based vari- ables. For these 4 basic models, all 3 kinds of locational and neighborhood variables will be respectively employed, and the fitting results will be compared. The corresponding experimental groups are labeled as traditional HPM, distance-based HPM, quantity-based HPM, traditional GWR, distance-based GWR, quantity-based GWR, traditional FCNN, distance-based FCNN, quantity-based FCNN, traditional FCNN-GWR, distance-based FCNN-GWR, and quantity-based FCNN-GWR. For the traditional HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory variables are 20 structural variables and 9 traditional locational and neighborhood variables with no multicollinearity. For the distance-based HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory vari- ables are the structural variables and 134 distance-based locational and neighborhood variables. For the quantity-based HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory variables are the structural variables and 134 quantity-based locational and neighborhood variables. In all of the experiments, the data are divided into a 70% training set and a 30% test set. First, each model would be fitted or trained on the training set; then, the model would be executed on the test set to access the fitting accuracy and predictive power for unknown samples. To enhance the reliability of the experiment, a shuffle and split cross-validation is carried out. The training set and test set are shuffled 4 times, and the results are averaged finally in case that they are determined by inaccurate information. Several accuracy assessment indicators are calculated to appraise the performance of the above models, referring to existing studies [13,31] and commonly adopted indicators, including the Pearson’s correlation coefficient (Pearson R), the adjusted coefficient of determination (adj R ), the root mean square error (RMSE) and its percentage (%RMSE), and the mean absolute error (MAE) and its percentage (%MAE): (r r )(r r ) å o s i,o i,s i=1 r r Pearson R = 2 2 n n (r r ) (r r ) å o å s i,o i,s i=1 i=1 n n n n r r r r å å å 2 i,o i,s i,o i,s i=1 i=1 i=1 q q R = 2 2 n 2 n n 2 n nå r (å r ) nå r (å r ) i,o i,s i=1 i,o i=1 i=1 i,s i=1 (1R )(n1) adj R = nm1 (r r ) i,o i,s i=1 RMSE = å (r r ) i,o i,s i=1 %RMSE = /r r r å j j i,o i,s i=1 MAE = r r /r å j j i,o i,s i,o i=1 %MAE = n Land 2022, 11, 4 17 of 28 where r and r are the observed and simulated rental prices for the ith house (unit: i;o i;s RMB/m /month), and n is the number of rental houses in each dataset. 5. Results and Discussions In this section the above-mentioned models will be conducted and analyzed. Since there are many data included in this research, not all of them are able to be displayed in this limited paper. In some cases, the results of Wuhan are preferentially demonstrated as the example, and the data of other cities (Nanjing, Beijing, Xi’an) are employed to strengthen and verify our models and results. 5.1. HPM Results The HPM model was implemented with MLR, and the result is shown in Table 3. For the traditional HPM of the Wuhan dataset, the value of R is 0.520, suggesting that 52.0% of the variance in the house rental price is explained by this model. The results of other cities are similar. The value of R is 0.509, 0.663, 0.342 in Nanjing, Beijing, and Xi’an. The HPM can abstract and generalize the factors, and global patterns of the rental prices can be comprehended through it [24,25]. However, the accuracy of the traditional HPM is not high and cannot meet the requirements for estimating and forecasting the price. The results in Table 3 show that the R values of the distance-based HPM and quantity-based HPM reach 0.623 and 0.746 respectively in Wuhan, which can explain 62.3% and 74.6% of the variance in the rental price. The results of other cities also represent that the distance-based and quantity-based HPM apparently fits better than the traditional HPM. As the quantity-based variables can more comprehensively reflect the characteristics of the house with more geographic objects considered and with a comprehensive geographical perspective, they can greatly increase the accuracy of MLR fitting of its rental price. Table 3. Accuracy assessment results of HPM with different locational and neighborhood variables— the traditional, distance-based, and quantity-based. Wuhan Nanjing Distance- Quantity- Distance- Quantity- Traditional Traditional Based Based Based Based HPM HPM HPM HPM HPM HPM Pearson R 0.7187 0.7857 0.8601 0.6954 0.7505 0.7673 adj R 0.5197 0.6225 0.7463 0.5088 0.5951 0.6216 RMSE 7.2032 6.3765 5.1945 11.4868 10.8076 10.4635 %RMSE 21.62% 19.17% 15.61% 19.14% 18.04% 17.44% MAE 5.2871 4.6981 3.5077 8.507 7.8434 7.7032 %MAE 16.70% 13.99% 10.70% 15.45% 14.26% 14.09% t-test <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 Beijing Xi’an Distance- Quantity- Distance- Quantity- Traditional Traditional Based Based Based Based HPM HPM HPM HPM HPM HPM Pearson R 0.7964 0.8266 0.8316 0.5878 0.7494 0.8024 adj R 0.6633 0.7162 0.7249 0.3423 0.5583 0.6393 RMSE 19.2098 17.9573 17.7323 7.7096 6.4836 6.115 %RMSE 20.01% 18.71% 18.48% 25.73% 21.77% 20.53% MAE 13.4011 12.2154 11.9717 5.0645 4.3013 4.0390 %MAE 15.78% 14.10% 13.75% 17.57% 14.73% 13.91% t-test <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 Land 2022, 11, 4 18 of 28 5.2. GWR Results 5.2.1. Traditional GWR The results of GWR on traditional adopted variables are labeled as traditional GWR and are summarized in Table 4 (compared with traditional HPM). In this method the R values in the test sets are 0.756, 0.533, 0.709, and 0.615 for Wuhan, Nanjing, Beijing, and Xi’an, respectively, which is apparently larger than that of traditional HPM. The AICc values are smaller than that of traditional HPM (All of the values are averaged for the shuffled datasets). It can be seen from the results that traditional GWR has a much higher explanatory ability for the house rental price than traditional HPM. As mentioned before, this is because GWR takes into account the spatial heterogeneity in the rental house prices, which is not a negligible property for rental house price modeling. Table 4. The results of traditional GWR. Wuhan Nanjing Beijing Xi’an band width 868.5 1042.7 960.2 823.9 adj R of traditional GWR (training 0.8416 0.6306 0.7665 0.7062 set) adj R of traditional GWR (test set) 0.7558 0.5328 0.7087 0.6148 0.5197 0.5088 0.6633 0.3423 adj R of traditional HPM (test set) 4 5 6 5 AICc of traditional GWR (test set) 8.683  10 2.594  10 1.140  10 1.295  10 5 5 6 5 AICc of traditional HPM (test set) 2.119  10 2.940  10 1.273  10 2.679  10 5.2.2. Distance-Based GWR and Quantity-Based GWR In case of multicollinearity, when calculating b(u , v ) in the test set of the distance- i i based and quantity-based GWR method, the solution of the inverse matrix in Equation (3) should be replaced by the pseudoinverse. Then, GWR prediction would be executed in the test set [57]. With the Wuhan dataset as an example, the performance of each method in the training set and test set can be seen in Table 5, which suggests that quantity-based GWR performs better than distance-based GWR. Therefore, the quantity-based locational and neighborhood variables are also effective in the GWR model. All GWR methods are superior to the HPM in accuracy for the consideration of the spatial heterogeneity. Table 5. The performances of GWR and FCNN in the training set and the test set of Wuhan. Distance-Based GWR Distance-Based FCNN Method Difference between Difference Between Training Set Test Set Training Set Test Set Training and Test Sets Training and Test Sets Pearson R 0.9237 0.8883 0.0354 0.9211 0.9141 0.0070 0.9072 0.8047 0.1025 0.8813 0.8682 0.0131 adj R RMSE 3.092 4.5450 1.4530 3.3609 3.551 0.1901 %RMSE 9.25% 13.61% 4.35% 10.06% 10.63% 0.57% MAE 2.2676 2.6683 0.4007 2.3721 2.4827 0.1106 %MAE 6.59% 7.77% 1.18% 6.79% 7.11% 0.32% Quantity-Based GWR Quantity-Based FCNN Method Difference between Difference between Training Set Test Set Training Set Test Set Training and Test Sets Training and Test Sets Pearson R 0.9498 0.8882 0.0616 0.9439 0.9344 0.0095 adj R 0.9479 0.8379 0.1100 0.9104 0.8925 0.0179 RMSE 2.0271 4.5271 2.500 2.9762 3.2923 0.3161 %RMSE 6.09% 13.56% 7.47% 8.92% 9.86% 0.95% MAE 1.4830 2.4363 0.9533 1.8410 2.0179 0.1769 %MAE 4.44% 7.30% 2.86% 5.65% 6.19% 0.54% Land 2022, 11, 4 19 of 28 Moreover, from the left part of Table 5 it can be seen that the accuracy of the GWR models is good in the training set but decreases significantly in the test set for each indicator. For example, the value of R decreased by 0.10 in distance-based GWR and decreased by 0.11 in quantity-based GWR from the training set to the test set. As noted by Cajias et al. [7], the GWR model may have overfitting problems and might have a disadvantage in out-of-sample forecasts. For the possible overfitting defect of the GWR model, we tested the FCNN deep-learning method to improve the predictive power of the house rental price. 5.3. Combining the FCNN and GWR 5.3.1. FCNN Results Similarly, traditional, distance-based, and quantity-based locational and neighborhood variables are employed in the FCNN model which are labeled as traditional FCNN, distance- based FCNN, and quantity-based FCNN, respectively. Taking Wuhan as an example: the traditional FCNN becomes stable after approximately 50,000 steps of training, the distance- based FCNN tends stable after approximately 300,000 steps, and the quantity-based FCNN method achieves stability approximately about 300,000 steps. The results in Table A2 (in Appendix A) show that the fitting and prediction accuracy of the quantity-based FCNN is higher than that of the traditional and the distance-based FCNN. Besides, the quantity- based FCNN method is the better one in terms of the accuracy indicators among the above HPM, GWR, and FCNN methods. 5.3.2. Comparison of GWR and FCNN Overall, the GWR model is a linear-based model that reflects the influence of the neighboring house samples; deep learning involves complex nonlinear characteristics but does not directly handle the influence of the neighboring houses. When there are abundant locational and neighborhood variables, for neighboring houses, their characteristic vectors consisting of the structural, locational and neighborhood variables (the x in Equations (1) and (2)) would be very similar. The similar vectors will greatly contribute to the fitting value of these neighboring houses, which to some extent can be seen as an implicit reflection of the neighboring influences. As a result, the FCNN method generally has better results than GWR, both in terms of the distance-based and quantity-based variables, as shown in Table 5 with the example of Wuhan. Moreover, the FCNN models results in much smaller discrepancies between the training sets and test sets than the GWR models. For example, in Table 5, for the quantity-based GWR the R differs by 0.11 between the training and test sets, and the RMSE differs by 2.5; while for the quantity-based FCNN, the R differs by only 0.02, and the RMSE differs by only 0.32 between the training and test sets. Therefore, this study also provides evidence for the possible disadvantage of the GWR model in out-of-sample forecasts, and prediction with the FCNN is more stable and robust than using GWR. 5.3.3. FCNN–GWR Results and Discussion Although the prediction accuracy of the FCNN is clearly higher than that of GWR, the FCNN model does not explicitly handle the influence of neighboring houses; thus, it can be improved. The FCNN–GWR model is proposed by combining deep learning with GWR, which can not only reflect the nonlinearly complicated characteristics of the rental house prices but also explicitly process spatial heterogeneity. The FCNN–GWR results are shown in Table A2 (in Appendix A) and can be compared with those of the FCNN model and the GWR model. In addition to the indicators introduced in Section 4.4, four indicators are adopted to evaluate the stability of different shuffles of training sets and test sets. They are as follows: (1). Pearson R range: the maximum deviation of the Pearson R values in different shuffles of datasets; (2) R range: the maximum deviation of the R values in different shuffles of datasets; (3) Pearson R std.: the adjusted deviation of the Pearson R in different shuffles of datasets; and (4) R std.: the adjusted deviation of R in different shuffles of datasets. Finally, the stability of the partition of the training sets is discussed. Land 2022, 11, 4 20 of 28 When viewed horizontally, Table A2 shows that for each city, quantity-based FCNN- GWR is clearly the best method for predicting rental house prices: all the 6 accuracy indicators obtained the best values among all of the experiments. For models with tra- ditional, distance-based, and quantity-based locational and neighborhood variables, the fitting accuracy of the HPM methods are relatively low. The GWR methods have higher Land 2021, 10, x FOR PEER REVIEW 21 of 28 fitting precisions, but the stability and robustness are not very good. The FCNN methods have higher accuracy than GWR, and FCNN–GWR has the highest accuracy. FCNN–GWR includes both the complex nonlinear characteristics and the spatial heterogeneity of the rental house prices and reduces the instability of GWR in areas with relatively sparse relatively sparse samples via Equation (6). Therefore, it has fine precision and stability for samples via Equation (6). Therefore, it has fine precision and stability for the fitting and the fitting and forecasting of rental house price. forecasting of rental house price. For the four HPM, GWR, FCNN, and FCNN–GWR models, with the quantity-based For the four HPM, GWR, FCNN, and FCNN–GWR models, with the quantity-based locational and neighborhood variables, we counted the percentage of the mean absolute locational and neighborhood variables, we counted the percentage of the mean absolute prediction error (namely, the %MAE in Section 4.4) of each method for the communities prediction error (namely, the %MAE in Section 4.4) of each method for the communities in in the Wuhan dataset, and the errors (averaged by all experiments) is visualized in Figure the Wuhan dataset, and the errors (averaged by all experiments) is visualized in Figure 6. 6. Cold colors (blue and green) mean the average prediction accuracies of these Cold colors (blue and green) mean the average prediction accuracies of these communities communities are relatively high, and warm colors (red and yellow) mean the prediction are relatively high, and warm colors (red and yellow) mean the prediction accuracies accuracies are relatively low. As shown in the figure, the accuracy of the HPM is relatively are relatively low. As shown in the figure, the accuracy of the HPM is relatively low low (Figure 6c). For GWR, the number of warm-colored dots (Figure 6a) is greater than (Figure 6c). For GWR, the number of warm-colored dots (Figure 6a) is greater than that for that for the FCNN (Figure 6b); and for GWR, most of the red dots are located in places the FCNN (Figure 6b); and for GWR, most of the red dots are located in places where the where the nearby neighborhoods are relatively sparse, which means there are relatively nearby neighborhoods are relatively sparse, which means there are relatively few samples few samples of houses nearby. This phenomenon suggests GWR is possibly to perform of houses nearby. This phenomenon suggests GWR is possibly to perform fluctuated in fluctuated in the area where the dots are sparse, or near the margin of the gathered dots. the area where the dots are sparse, or near the margin of the gathered dots. Therefore, Therefore, it can be inferred that the reliability of GWR prediction may be insufficient it can be inferred that the reliability of GWR prediction may be insufficient when the when the number of nearby samples around the prediction point is small. When there are number of nearby samples around the prediction point is small. When there are many many samples nearby (inside the gathered dots), both GWR and FCNN have relatively samples nearby (inside the gathered dots), both GWR and FCNN have relatively good good performance of accuracy, since GWR and FCNN consider the spatial heterogeneity performance of accuracy, since GWR and FCNN consider the spatial heterogeneity aspect aspect and the nonlinearity aspect of rental house prices, respectively. FCNN–GWR and the nonlinearity aspect of rental house prices, respectively. FCNN–GWR combines combines the FCNN and GWR, and it takes into account the nonlinearity and spatial the FCNN and GWR, and it takes into account the nonlinearity and spatial heterogeneity heterogeneity when there are many samples nearby; additionally, only the FCNN is when there are many samples nearby; additionally, only the FCNN is adopted when there adopted when there are relatively few nearby samples, to avoid the possible instability of are relatively few nearby samples, to avoid the possible instability of GWR in these cases. GWR in these cases. Therefore, FCNN–GWR can better predict rental house prices. Therefore, FCNN–GWR can better predict rental house prices. (a) (b) Figure 6. Cont. Land 2021, 10, x FOR PEER REVIEW 22 of 28 Land 2022, 11, 4 21 of 28 (c) (d) Figure 6. The difference of %MAE in predicted rental house prices of Wuhan when using the models of (a) GWR; (b) Figure 6. The difference of %MAE in predicted rental house prices of Wuhan when using the models FCNN; (c) HPM; (d) FCNN-GWR. (Some of the districts: ① Jiangan, ② Jianghan, ③ Qiaokou, ④ Qingshan, ⑤ Wuchang, of (a) GWR; (b) FCNN; (c) HPM; (d) FCNN-GWR. (Some of the districts: 1 Jiangan, 2 Jianghan, 3 ⑥ Hanyang, and ⑦ Hongshan). Qiaokou, 4 Qingshan, 5 Wuchang, 6 Hanyang, and 7 Hongshan). When Table A2 is viewed vertically, the traditional, distance-based, and quantity- When Table A2 is viewed vertically, the traditional, distance-based, and quantity-based based locational and neighborhood variables can be compared. According to the results, locational and neighborhood variables can be compared. According to the results, the the quantity-based variables perform better than the traditional and distance-based quantity-based variables perform better than the traditional and distance-based variables variables under all of the HPM, GWR, FCNN, and FCNN–GWR methods. Therefore, it under all of the HPM, GWR, FCNN, and FCNN–GWR methods. Therefore, it can be verified can be verified that the quantity-based locational and neighborhood variables can take that the quantity-based locational and neighborhood variables can take into account more into account more geographic information from a comprehensive perspective, thus geographic information from a comprehensive perspective, thus supporting the rental supporting the rental price model in obtaining better predictive power compared to the price model in obtaining better predictive power compared to the traditional and the traditional and the distance-based locational and neighborhood variables. distance-based locational and neighborhood variables. Regarding the stability of different partitions of training sets, in almost all cases, the Regarding the stability of different partitions of training sets, in almost all cases, the ranges of R and Pearson R for different partitions of the datasets are less than 0.008, and ranges of R and Pearson R for different partitions of the datasets are less than 0.008, and the adjusted deviations of R and Pearson R are less than 0.004, which all remain at not the adjusted deviations of R and Pearson R are less than 0.004, which all remain at not high high levels. The result shows that there is no apparent discrepancy under different levels. The result shows that there is no apparent discrepancy under different partitions of partitions of training sets, and different shuffles do not apparently affect the precision and training sets, and different shuffles do not apparently affect the precision and generalization generalization ability of our model. ability of our model. Additionally, the FCNN–GWR proposed in this paper is compared with other Additionally, the FCNN–GWR proposed in this paper is compared with other pre- previously published models in the Wuhan dataset, including a model of CNN [14], Bin viously published models in the Wuhan dataset, including a model of CNN [14], Bin [9] [9] and GWELM [31]. (We do not consider the image part of these models since there are and GWELM [31]. (We do not consider the image part of these models since there are no image data in this study.) Quantity-based locational and neighborhood variables are no image data in this study.) Quantity-based locational and neighborhood variables are adopted for these models, and the results are shown in Table 6. Among the methods adopted for these models, and the results are shown in Table 6. Among the methods above, above, FCNN–GWR performs the best in the indicators of the estimation or prediction of FCNN–GWR performs the best in the indicators of the estimation or prediction of rental rental house prices. The CNN model of Yu [14] treats housing price variables as an image house prices. The CNN model of Yu [14] treats housing price variables as an image and can and ca extract n extra theccomplexity t the complexi of the ty of the rel relationships ationamong ships among the v the variables. ariable However s. Ho,wever, the the model model does not consider spatial heterogeneity, which is a significant and non-negligible does not consider spatial heterogeneity, which is a significant and non-negligible factor, and factor, result and inrlimited esult in l performance imited perform in estimating ance in est rental imathouse ing reprices. ntal house pr It is also ices questionable . It is also whether questionthe able pooling whether t layers he pooling of the CNN layersar of e the CNN are nec necessary for theessar regry for the re ession of rental gression house of prices. rental Regar house prices. Regarding ding Bin’s approachBin’ [9],salthough approach [ the9boosted ], although regrthe boosted regressi ession trees in the model on trees can impr in the model can ove the accuracyimprove the accur of the prediction efficiently acy of , the the spatial predic heter tion eff ogeneity icientl isy, the spa also absenttia inl the architecture of the model. Thus, the accuracy of the model still has room for improve- heterogeneity is also absent in the architecture of the model. Thus, the accuracy of the ment. model st The ill h GWELM as room pr for oposed improvement by Deng.[ The G 31] incorporates WELM proposed by D GWR and ELM, eng [31] incorporate and on principle, s it Gmight WR anbe d EL able M, an to r d on p eveal rboth incipthe le, it spatial might b heter e abogeneity le to reveal b andononlinear th the spatcharacteristics ial heterogeneitin y Land 2022, 11, 4 22 of 28 rental house prices. However, as discussed above, when the quantity of nearby samples is small, the prediction by the GWR-like method may not be sufficiently dependable. GWELM would have unstable depressing accuracies in such cases and consequently reduce the final accuracy of rental house pricing estimation. Table 6. Comparison of FCNN–GWR with some previously published models (in Wuhan). FCNN–GWR CNN [14] Bin [9] GWELM [31] Pearson R 0.9534 0.9285 0.9364 0.9214 adj R 0.9192 0.8823 0.8902 0.8549 RMSE 3.2285 3.3678 3.3261 3.7874 %RMSE 9.69% 10.11% 9.96% 11.34% MAE 1.9748 2.0599 2.0387 2.2237 %MAE 6.07% 6.21% 6.12% 6.66% In summary, for the combination of spatial heterogeneity, nonlinear model, and ge- ographical scaling, the proposed quantity-based FCNN–GWR can steadily improve the performance of rental house price modeling. However, at present, the locational and neigh- borhood variables of this study were derived only from POI data. This model may lose some precision in some cases with complex characteristics, since other multitype data, such as remote sensing images [2], street view images [58], or landscape amenities [59] are not considered. In the future, experiments involving neural networks with the above multitype data will be conducted to further improve the accuracy of the rental house price model. Simplification for the procedure of the model and verification of the models in other cities can also be worthwhile work in the future. 6. Conclusions In this research, we make efforts to improve the accuracy of the rental house-pricing model. Taking four cities in China (Wuhan, Nanjing, Beijing, and Xi’an) as study areas, we combine deep learning and GWR to grasp both the nonlinear characteristics and spatial heterogeneity and propose the FCNN–GWR model to evaluate rental house prices. In this paper, the results of the HPM, GWR, the FCNN, and the proposed FCNN–GWR are compared in terms of accuracy. The results show that the quantity-based FCNN–GWR model has the highest accuracy. Compared with GWR, the proposed model shows the ability to include the nonlinear complexity of rental house prices, and it presents stable and more-accurate forecasts. Compared with the FCNN deep-learning method, the proposed model explicitly addresses spatial heterogeneity because it considers nearby influences. The work performed in this research verifies that deep learning and GWR explain rental house prices from different perspectives, and the combination of both can improve the evaluation accuracy of rental prices. Moreover, the method proposed in this paper may provide a useful reference for individuals and businesses in their transactions related to rental houses and assist the government in making appropriate policies for the price levels and positions of public rental housing. The quantity-based locational and neighborhood variables proposed in this paper offer a more comprehensive geographical perspective of locational and neighborhood character- istics. They can express the locational information of houses from a perspective involving more types (134 types of POIs in this study), and more comprehensively (the KDE method used in this study), with more geographic objects taken into account. Our experiments show that quantity-based variables better reflect the location of a rental house compared to the traditionally used and distance-based locational and neighborhood variables, and they help improve the accuracy of the pricing model. However, the locational and neighborhood variables in this paper were derived only from POI data, which is a limitation of this study. This model may lose some accuracy in some cases with complex characteristics, which might be involved in remote sensing images, street view images, or texts. In the future, experiments involving neural networks Land 2022, 11, 4 23 of 28 with such multitype data will be conducted to further increase the accuracy of rental house price estimation. In addition, there are many parameters and procedures in the architecture of the proposed FCNN–GWR. More kinds of neural networks can be tested to reduce the complexity of the procedures of this work and to improve the performance of the deep-learning model of rental house prices. Author Contributions: Conceptualization, H.S. and L.L.; methodology, L.L.; software, Y.L. and Z.L.; Validation, H.Z.; data curation, Z.L. and Y.L.; writing—original draft preparation, H.S. and H.Z.; project administration, L.L. All authors have read and agreed to the published version of the manuscript. Funding: This study is supported by the National Key Research and Development Program of China (2017YFB0503701). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: The data used in this paper mainly come from lianjia.com (accessed on 20 November 2021). Acknowledgments: The authors thank the editors and reviewers for providing insightful suggestions and comments. Conflicts of Interest: The authors declare no conflict of interest. Appendix A Table A1 supplements Table 2 and presents the statistical data of the rental house price data and relevant variables in Wuhan, Nanjing, Beijing, and Xi’an. Table A1. The statistics of the rental house price data and relevant attributes of the study areas. Mean Std. Type Variable Wuhan Nanjing Beijing Xi’an Wuhan Nanjing Beijing Xi’an Area 91.89 66.58 55.04 86.24 33.96 32.16 31.58 42.60 TotalFloor 22.65 16.24 14.70 27.58 11.97 13.56 8.01 7.60 Level 2.11 1.95 1.96 2.00 0.69 0.68 0.77 0.68 Age 11.07 15.03 17.66 8.17 6.18 7.32 9.37 4.16 Age-squared 152.13 279.08 406.07 85.32 161.57 274.56 416.80 88.12 Month 4.14 4.30 4.43 4.16 1.03 1.74 1.65 1.68 Spring * * * * * * * * Summer * * * * * * * * Structural Autumn * * * * * * * * variables Room 2.22 1.97 2.14 2.00 0.81 0.87 0.90 0.89 Hall 1.60 1.28 1.07 1.43 0.53 0.53 0.44 0.75 Toilet 1.21 1.03 1.14 1.19 0.44 0.37 0.44 0.47 South * * * * * * * * North * * * * * * * * East * * * * * * * * West * * * * * * * * PlotRatio 3.27 1.38 2.23 4.03 1.63 0.98 0.90 1.48 Land 2022, 11, 4 24 of 28 Table A1. Cont. Mean Std. Type Variable Wuhan Nanjing Beijing Xi’an Wuhan Nanjing Beijing Xi’an Green 0.33 35.36 31.01 36.40 0.07 9.92 7.09 7.70 ParkSpace 791.32 569.48 618.57 1080.70 1056.91 666.48 2714.80 1400.44 Fee 1.79 0.92 1.06 1.33 0.83 0.69 1.52 0.64 DCBD 7.06 11.33 18.86 17.32 5.98 6.69 12.21 7.96 Locational Dbus 2.09 0.21 0.27 0.25 1.67 0.12 0.14 0.70 variables Dsub 4.21 1.28 2.37 2.04 3.94 1.29 5.13 2.35 DshopCen 1.43 1.74 3.38 1.29 1.35 1.14 5.89 0.93 Dpark 1.57 1.06 6.38 1.70 1.10 0.68 8.75 1.00 DpriSch 0.88 10.47 2.43 0.83 0.90 5.00 5.23 0.46 Neighborhood variables DsecSch 1.00 8.41 2.39 1.38 0.85 5.57 4.66 0.81 Dnurs 0.43 4.09 1.63 0.42 0.43 2.88 4.24 0.35 Dhosp 0.28 0.79 2.79 0.53 0.28 0.58 5.53 0.55 Price 33.32 60.03 95.94 29.94 11.24 24.37 44.28 10.25 *: not applicable for dummy variables. Table A2 presents the accuracy assessment results of the methods compared in this paper. The table includes 4 basic types of models: HPM, GWR, FCNN and FCNN-GWR. Additionally, it includes 3 kinds of locational and neighborhood variables: traditional variables, distance-based variables and quantity-based variables. It also includes 4 cities: Wuhan, Nanjing, Beijing and Xi’an. Table A2. Accuracy assessment results of each method. Wuhan Nanjing Traditional Traditional Traditional Traditional Traditional Traditional Traditional Traditional FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.7187 0.7985 0.8831 0.9158 0.6954 0.7348 0.8573 0.8783 adj R 0.5197 0.7558 0.7901 0.8268 0.5088 0.5328 0.7788 0.8202 RMSE 7.2032 5.5664 4.7929 4.5955 11.4868 11.3654 9.3097 8.8566 %RMSE 21.62% 16.77% 14.40% 13.79% 19.14% 18.91% 15.50% 14.73% MAE 5.2871 3.0454 3.3635 3.2788 8.507 7.1967 6.4888 6.0519 %MAE 16.70% 9.40% 10.57% 10.35% 15.45% 13.20% 11.68% 10.91% Pearson R 0.0035 0.0036 0.0029 0.0025 0.0031 0.0032 0.0041 0.004 range R range 0.0037 0.0032 0.0035 0.0032 0.0066 0.0053 0.0069 0.0062 Pearson R 0.0016 0.0014 0.0015 0.0011 0.0017 0.0015 0.0021 0.002 std. R std. 0.0018 0.0016 0.0019 0.0017 0.004 0.0028 0.0033 0.0031 Land 2022, 11, 4 25 of 28 Table A2. Cont. Wuhan Nanjing distance- distance- distance- distance- distance- distance- distance- distance- based based based based based based based based FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.7857 0.8883 0.9141 0.9288 0.7505 0.8254 0.8901 0.9153 adj R 0.6225 0.8047 0.8682 0.8922 0.5951 0.6548 0.8419 0.8641 RMSE 6.3765 4.545 3.551 3.466 10.8076 9.8284 8.4661 8.2322 %RMSE 19.17% 13.61% 10.63% 10.41% 18.04% 16.39% 14.09% 13.70% MAE 4.6981 2.6683 2.4827 2.2425 7.8434 7.1285 5.6652 5.5031 %MAE 13.99% 7.77% 7.11% 6.97% 14.26% 12.96% 10.35% 10.05% Pearson R 0.0042 0.0044 0.0054 0.0055 0.0054 0.004 0.0038 0.0044 range R range 0.0058 0.0052 0.0061 0.0049 0.0075 0.0051 0.0078 0.0057 Pearson R 0.0021 0.002 0.0025 0.0024 0.0023 0.0021 0.0022 0.0015 std. R std. 0.0031 0.003 0.0034 0.003 0.0044 0.002 0.0035 0.0031 quantity- quantity- quantity- quantity- quantity- quantity- quantity- quantity- based based based based based based based based FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.8601 0.8882 0.9344 0.9534 0.7673 0.9023 0.8919 0.9209 0.7463 0.8379 0.8925 0.9192 0.6216 0.6686 0.8455 0.8715 adj R RMSE 5.1945 4.5271 3.2923 3.2285 10.4635 11.1837 8.4302 8.1608 %RMSE 15.61% 13.56% 9.86% 9.69% 17.44% 18.66% 14.06% 13.56% MAE 3.5077 2.4363 2.0179 1.9748 7.7032 7.032 5.7471 5.5475 %MAE 10.70% 7.30% 6.19% 6.07% 14.09% 12.75% 10.46% 10.12% Pearson R 0.0038 0.0047 0.0033 0.0035 0.0027 0.0029 0.0037 0.0048 range R range 0.0043 0.0061 0.006 0.004 0.0049 0.003 0.0076 0.0055 Pearson R 0.002 0.0019 0.002 0.0015 0.0011 0.0013 0.0019 0.002 std. R std. 0.0028 0.0032 0.0024 0.0024 0.0017 0.0014 0.0036 0.0027 Beijing Xi’an traditional traditional traditional traditional traditional traditional traditional traditional FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.7964 0.8172 0.9019 0.9205 0.5878 0.8123 0.7546 0.8253 adj R 0.6633 0.7087 0.8616 0.8839 0.3423 0.6148 0.5701 0.6798 RMSE 19.2098 17.8532 13.983 13.2301 7.7096 6.3072 6.4261 5.9846 %RMSE 20.01% 18.60% 14.65% 13.81% 25.73% 21.16% 21.61% 20.10% MAE 13.4011 11.4688 9.2452 8.515 5.0645 3.7569 4.3707 3.8722 %MAE 15.78% 12.74% 10.23% 9.37% 17.57% 12.94% 15.25% 13.27% Pearson R 0.0022 0.0029 0.002 0.0021 0.0036 0.0031 0.0026 0.0026 range Land 2022, 11, 4 26 of 28 Table A2. Cont. Beijing Xi’an 0.003 0.0035 0.0074 0.0037 0.0041 0.0088 0.0036 0.0035 R range Pearson R 0.001 0.0012 0.0008 0.001 0.0016 0.0013 0.0011 0.0012 std. 0.0014 0.0018 0.0038 0.0013 0.0015 0.0033 0.0014 0.0016 R std. distance- distance- distance- distance- distance- distance- distance- distance- based based based based based based based based FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.8266 0.8541 0.8966 0.9093 0.7494 0.8253 0.874 0.8934 adj R 0.7162 0.7643 0.8469 0.8673 0.5583 0.6723 0.7631 0.7856 RMSE 17.9573 16.7231 14.2703 13.9321 6.4836 5.9612 5.6395 5.4804 %RMSE 18.71% 17.42% 14.91% 14.51% 21.77% 20.00% 18.78% 18.26% MAE 12.2154 10.6983 8.8871 8.6769 4.3013 3.7842 3.4775 3.3791 %MAE 14.10% 11.89% 9.66% 9.44% 14.73% 12.89% 11.71% 11.77% Pearson R 0.0016 0.0014 0.0027 0.004 0.0042 0.003 0.0067 0.0033 range R range 0.0031 0.0017 0.0044 0.004 0.005 0.004 0.007 0.0045 Pearson R 0.001 0.0005 0.001 0.0015 0.002 0.0017 0.003 0.0015 std. R std. 0.0016 0.0008 0.0025 0.0018 0.0021 0.0025 0.0028 0.0023 quantity- quantity- quantity- quantity- quantity- quantity- quantity- quantity- based based based based based based based based FCNN- FCNN- HPM GWR FCNN HPM GWR FCNN GWR GWR Pearson R 0.8316 0.8624 0.915 0.9251 0.8024 0.8373 0.8872 0.9042 0.7249 0.7723 0.8822 0.8981 0.6393 0.6849 0.7818 0.8051 adj R RMSE 17.7323 16.5705 13.1021 12.8735 6.115 5.8374 5.5555 5.3923 %RMSE 18.48% 17.26% 13.67% 13.39% 20.53% 19.59% 18.50% 17.95% MAE 11.9717 10.5897 8.3726 8.2273 4.039 3.7053 3.3941 3.2896 %MAE 13.75% 11.77% 9.21% 9.05% 13.91% 12.62% 11.42% 11.07% Pearson R 0.0016 0.0023 0.0023 0.0042 0.0021 0.0019 0.0028 0.0051 range 0.0025 0.0028 0.0042 0.0057 0.0029 0.0023 0.0067 0.0064 R range Pearson R 0.0008 0.0011 0.0012 0.0018 0.001 0.0011 0.0014 0.0026 std. R std. 0.001 0.0014 0.0018 0.0022 0.0012 0.0013 0.0034 0.0034 References 1. Bin, J.; Gardiner, B.; Liu, Z.; Li, E. Attention-Based Multi-Modal Fusion for Improved Real Estate Appraisal: A Case Study in Los Angeles. Multimed. Tools Appl. 2019, 78, 31163–31184. [CrossRef] 2. Wang, P.-Y.; Chen, C.-T.; Su, J.-W.; Wang, T.-Y.; Huang, S.-H. Deep Learning Model for House Price Prediction Using Heterogeneous Data Analysis Along with Joint Self-Attention Mechanism. IEEE Access 2021, 9, 55244–55259. [CrossRef] 3. Shimizu, C.; Karato, K.; Nishimura, K. Nonlinearity of Housing Price Structure: Assessment of Three Approaches to Nonlinearity in the Previously Owned Condominium Market of Tokyo. Int. J. Hous. Mark. Anal. 2014, 7, 459–488. [CrossRef] 4. Liang, X.; Liu, Y.; Qiu, T.; Jing, Y.; Fang, F. The Effects of Locational Factors on the Housing Prices of Residential Communities: The Case of Ningbo, China. Habitat Int. 2018, 81, 1–11. [CrossRef] Land 2022, 11, 4 27 of 28 5. Fotheringham, A.S.; Charlton, M.E.; Brunsdon, C. Geographically Weighted Regression: A Natural Evolution of the Expansion Method for Spatial Data Analysis. Environ. Plan. A 1998, 30, 1905–1927. [CrossRef] 6. Wu, C.; Ye, X.; Ren, F.; Wan, Y.; Ning, P.; Du, Q. Spatial and Social Media Data Analytics of Housing Prices in Shenzhen, China. PLoS ONE 2016, 11, e0164553. [CrossRef] 7. Cajias, M.; Ertl, S. Spatial Effects and Non-Linearity in Hedonic Modeling Will Large Data Sets Change Our Assumptions? J. Prop. Invest. Financ. 2018, 36, 32–49. [CrossRef] 8. Bellotti, A. Reliable Region Predictions for Automated Valuation Models. Ann. Math. Artif. Intell. 2017, 81, 71–84. [CrossRef] 9. Bin, J.; Gardiner, B.; Li, E.; Liu, Z. Multi-Source Urban Data Fusion for Property Value Assessment: A Case Study in Philadelphia. Neurocomputing 2020, 404, 70–83. [CrossRef] 10. Bency, A.J.; Rallapalli, S.; Ganti, R.K.; Srivatsa, M.; Manjunath, B.S. Beyond Spatial Auto-Regressive Models: Predicting Housing Prices with Satellite Imagery. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (Wacv 2017), Santa Rosa, CA, USA, 24–31 March 2017; pp. 320–329. 11. Jiang, Z.; Shen, G. Prediction of House Price Based on the Back Propagation Neural Network in the Keras Deep Learning Framework. In Proceedings of the 2019 6th International Conference on Systems and Informatics (ICSAI), Shanghai, China, 8–10 November 2019; pp. 1408–1412. 12. Xu, J. A Novel Deep Neural Network Based Method for House Price Prediction. In Proceedings of the 2021 International Conference of Social Computing and Digital Economy (ICSCDE), Chongqing, China, 28–29 August 2021; pp. 12–16. 13. Yao, Y.; Zhang, J.; Hong, Y.; Liang, H.; He, J. Mapping Fine-Scale Urban Housing Prices by Fusing Remotely Sensed Imagery and Social Media Data. Trans. GIS 2018, 22, 561–581. [CrossRef] 14. Yu, L.; Jiao, C.; Xin, H.; Wang, Y.; Wang, K. Prediction on Housing Price Based on Deep Learning. Int. J. Comput. Inf. Eng. 2018, 12, 90–99. 15. Andrew, M.; Haurin, D.; Munasib, A. Explaining the Route to Owner-Occupation: A Transatlantic Comparison. J. Hous. Econ. 2006, 15, 189–216. [CrossRef] 16. Seo, K.; Golub, A.; Kuby, M. Combined Impacts of Highways and Light Rail Transit on Residential Property Values: A Spatial Hedonic Price Model for Phoenix, Arizona. J. Transp. Geogr. 2014, 41, 53–62. [CrossRef] 17. Dong, H. The Impact of Income Inequality on Rental Affordability: An Empirical Study in Large American Metropolitan Areas. Urban Stud. 2018, 55, 2106–2122. [CrossRef] 18. Won, J.; Lee, J.-S. Investigating How the Rents of Small Urban Houses Are Determined: Using Spatial Hedonic Modeling for Urban Residential Housing in Seoul. Sustainability 2018, 10, 31. [CrossRef] 19. Liebelt, V.; Bartke, S.; Schwarz, N. Hedonic Pricing Analysis of the Influence of Urban Green Spaces onto Residential Prices: The Case of Leipzig, Germany. Eur. Plan. Stud. 2018, 26, 133–157. [CrossRef] 20. Yu, T.; Song, Y. Solving the Problem of ‘Cold Weather ’ of Public Rental Houses—Based on the Analysis of Government’s Purchase of Public Service. China Econ. Trade Guide (Middle) 2018, 35, 74–76. 21. Henderson, J.V.; Ioannides, Y.M. A Model of Housing Tenure Choice. Am. Econ. Rev. 1983, 73, 98–113. 22. Ioannides, Y.M.; Rosenthal, S.S. Estimating the Consumption and Investment Demands for Housing and Their Effect on Housing Tenure Status. Rev. Econ. Stat. 1994, 76, 127–141. [CrossRef] 23. Rosen, S. Hedonic Prices and Implicit Markets-Product Differentiation in Pure Competition. J. Political Econ. 1974, 82, 34–55. [CrossRef] 24. Malpezzi, S. Hedonic Pricing Models: A Selective and Applied Review. Hous. Econ. Public Policy 2002, 1, 67–89. 25. Wu, J.; Wang, M.; Li, W.; Peng, J.; Huang, L. Impact of Urban Green Space on Residential Housing Prices: Case Study in Shenzhen. J. Urban Plan. Dev. 2015, 141, 05014023. [CrossRef] 26. Anselin, L.; Bera, A.K. Spatial Dependence in Linear Regression Models with an Introduction to Spatial Econometrics. Stat.: Textb. Monogr 1998, 155, 237–289. 27. Anselin, L. Exploring Spatial Data with Geodatm: A Workbook; Center for Spatially Integrated Social Science: Urbana, IL, USA, 2004; Volume 51, pp. 213–223. 28. Hastie, T.J.; Tibshirani, R.J. Generalized Additive Models. Stat. Sci. 1986, 1, 297–310. [CrossRef] 29. Li, S.; Ye, X.; Lee, J.; Gong, J.; Qin, C. Spatiotemporal Analysis of Housing Prices in China: A Big Data Perspective. Appl. Spat. Anal. Policy 2017, 10, 421–433. [CrossRef] 30. Huang, B.; Wu, B.; Barry, M. Geographically and Temporally Weighted Regression for Modeling Spatio-Temporal Variation in House Prices. Int. J. Geogr. Inf. Sci. 2010, 24, 383–401. [CrossRef] 31. Deng, M.; Yang, W.; Liu, Q. Geographically Weighted Extreme Learning Machine: A Method for Space-Time Prediction. Geogr. Anal. 2017, 49, 433–450. [CrossRef] 32. Wang, J.J.; Hu, S.G.; Zhan, X.T.; Luo, Q.; Yu, Q.; Liu, Z.; Chen, T.P.; Yin, Y.; Hosaka, S.; Liu, Y. Predicting House Price with a Memristor-Based Artificial Neural Network. IEEE Access 2018, 6, 16523–16528. [CrossRef] 33. Fan, G.Z.; Pu, M.; Deng, X.; Ong, S.E. Optimal Portfolio Choices and the Determination of Housing Rents under Housing Market Uncertainty. J. Hous. Econ. 2018, 41, 200–217. [CrossRef] 34. Yoo, S.; Im, J.; Wagner, J.E. Variable Selection for Hedonic Model Using Machine Learning Approaches: A Case Study in Onondaga County, NY. Landsc. Urban Plan. 2012, 107, 293–306. [CrossRef] Land 2022, 11, 4 28 of 28 35. Hu, L.; He, S.; Han, Z.; Xiao, H.; Su, S.; Weng, M.; Cai, Z. Monitoring Housing Rental Prices Based on Social Media: An Integrated Approach of Machine-Learning Algorithms and Hedonic Modeling to Inform Equitable Housing Policies. Land Use Policy 2019, 82, 657–673. [CrossRef] 36. Rico-Juan, J.R.; de La Paz, P.T. Machine Learning with Explainability or Spatial Hedonics Tools? An Analysis of the Asking Prices in the Housing Market in Alicante, Spain. Expert Syst. Appl. 2021, 171, 114590. [CrossRef] 37. Roth, A.E. The Shapley Value: Essays in Honor of Lloyd S. Shapley; Cambridge University Press: Cambridge, NY, USA, 1988. 38. Fu, X.; Jia, T.; Zhang, X.; Li, S.; Zhang, Y. Do Street-Level Scene Perceptions Affect Housing Prices in Chinese Megacities? An Analysis Using Open Access Datasets and Deep Learning. PLoS ONE 2019, 14, e0217505. [CrossRef] [PubMed] 39. Zhao, Y.; Chetty, G.; Tran, D. Deep Learning with Xgboost for Real Estate Appraisal. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019. 40. Hagenauer, J.; Helbich, M. A Geographically Weighted Artificial Neural Network. Int. J. Geogr. Inf. Sci. 2021, 35, 1–21. [CrossRef] 41. Steentoft, A.A.; Poorthuis, A.; Lee, B.S.; Schlapfer, M. The Canary in the City: Indicator Groups as Predictors of Local Rent Increases. EPJ Data Sci. 2018, 7, 1–15. [CrossRef] 42. Huang, D.J.; Leung, C.K.Y.; Tse, C.Y. What Accounts for the Differences in Rent-Price Ratio and Turnover Rate? A Search-and- Matching Approach. J. Real Estate Financ. Econ. 2018, 57, 431–475. [CrossRef] 43. Ullah, F.; Sepasgozar, S.M.E. Key Factors Influencing Purchase or Rent Decisions in Smart Real Estate Investments: A System Dynamics Approach Using Online Forum Thread Data. Sustainability 2020, 12, 4382. [CrossRef] 44. Su, S.; Zhang, J.; He, S.; Zhang, H.; Hu, L.; Kang, M. Unraveling the Impact of Tod on Housing Rental Prices and Implications on Spatial Planning: A Comparative Analysis of Five Chinese Megacities. Habitat Int. 2021, 107, 102309. [CrossRef] 45. Baidu. Baidu Map Open Platform. Available online: https://lbsyun.baidu.com/index.php?title=lbscloud/poitags (accessed on 20 November 2021). 46. Lianjia. Lianjia Flagship Website. Available online: https://wh.lianjia.com/ (accessed on 20 November 2021). 47. Wu, H.; Jiao, H.; Yu, Y.; Li, Z.; Peng, Z.; Liu, L.; Zeng, Z. Influence Factors and Regression Model of Urban Housing Prices Based on Internet Open Access Data. Sustainability 2018, 10, 1676. [CrossRef] 48. Li, H.; Wei, Y.D.; Wu, Y.; Tian, G. Analyzing Housing Prices in Shanghai with Open Data: Amenity, Accessibility and Urban Structure. Cities 2019, 91, 165–179. [CrossRef] 49. Bokhari, S.; Geltner, D. Characteristics of Depreciation in Commercial and Multifamily Property: An Investment Perspective. Real Estate Econ. 2018, 46, 745–782. [CrossRef] 50. Lu, B.; Charlton, M.; Harris, P.; Fotheringham, A.S. Geographically Weighted Regression with a Non-Euclidean Distance Metric: A Case Study Using Hedonic House Price Data. Int. J. Geogr. Inf. Sci. 2014, 28, 660–681. [CrossRef] 51. Hurvich, C.M.; Simonoff, J.S.; Tsai, C.L. Smoothing Parameter Selection in Nonparametric Regression Using an Improved Akaike Information Criterion. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1998, 60, 271–293. [CrossRef] 52. Hahnloser, R.H.R.; Sarpeshkar, R.; Mahowald, M.A.; Douglas, R.J.; Seung, H.S. Digital Selection and Analogue Amplification Coexist in a Cortex-Inspired Silicon Circuit. Nature 2000, 405, 947–951. [CrossRef] 53. Barzilai, J.; Borwein, J.M. Two-Point Step Size Gradient Methods. IMA J. Numer. Anal. 1988, 8, 141–148. [CrossRef] 54. Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Netw. 2015, 61, 85–117. [CrossRef] [PubMed] 55. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. 56. Farrar, D.E.; Glauber, R.R. Multicollinearity in Regression Analysis: The Problem Revisited. Rev. Econ. Stat. 1967, 49, 92–107. [CrossRef] 57. Harris, P.; Fotheringham, A.S.; Crespo, R.; Charlton, M. The Use of Geographically Weighted Regression for Spatial Prediction: An Evaluation of Models Using Simulated Data Sets. Math. Geosci. 2010, 42, 657–680. [CrossRef] 58. Li, S.; Jiang, Y.; Ke, S.; Nie, K.; Wu, C. Understanding the Effects of Influential Factors on Housing Prices by Combining Extreme Gradient Boosting and a Hedonic Price Model (Xgboost-Hpm). Land 2021, 10, 533. [CrossRef] 59. Su, S.; He, S.; Sun, C.; Zhang, H.; Hu, L.; Kang, M. Do Landscape Amenities Impact Private Housing Rental Prices? A Hierarchical Hedonic Modeling Approach Based on Semantic and Sentimental Analysis of Online Housing Advertisements across Five Chinese Megacities. Urban For. Urban Green. 2021, 58, 126968. [CrossRef]

Journal

LandMultidisciplinary Digital Publishing Institute

Published: Dec 21, 2021

Keywords: house rental price; geographically weighted regression; spatial heterogeneity; deep learning

There are no references for this article.